๐Ÿค–Expertise

This page contains high level information of the large systems and domains that is owned by the Data & Insights team.

As this page grows, we should add onboarding links to each section, as well as any appropriate links to documentation, ADRs, etc.

DataSync

DataSync is a large subsystem in Vetspire that is handled pretty independantly from the rest of the codebase and day-to-day workings of Vetspire.

This subsystem is a framework which can be configured to quickly build ETL pipelines to allow data to be imported from existing Veterinary Practices not using Vetspire, into Vetspire.

We work with an external partner, Bitwerx, to handle the vast majority of our DataSync imports. They are responsible for exporting data from legacy PIMs systems (analogues of Vetspire) and give us an API to pull CSVs in a normalized format.

The DataSync engine was originally written as a way to import Bitwerx's normalized CSV format, but was built in a pluggable way for ease of debugging and ease of extensibility.

A result of this is that from time-to-time, the DataSync engine is used on an ad-hoc basis to handle data imports outside of Bitwerx's normalized format (usually for sheer speed of delivery more than anything else).

Per our Ways of Working, DataSync work is a top-business priority and is handled as an perpetually ongoing stream of work.

Code

DataSync is built as a seperate child app under the top level :vetspire umbrella, and is referenced as :datasync throughout the codebase where appropriate. See code under apps/datasync/lib/ for more details.

The fact that DataSync is built as a seperate application allows us to deploy it seperately from our standard user-serving application. This is done to reduce load on our main API pods. Due to the nature of the work DataSync was designed to do, it has very high memory and CPU requirements, as in some cases, we're dealing with having to import tens-to-hundreds of gigabytes of historical data, medical records, billing statements, etc.

In order to do this, DataSync work is both heavily optimized:

  • DataSync entrypoints are all asynchronous and run via Oban Jobs to provide retries, job status updates to the UI, and backpressure.

  • DataSync mappers can be configured to run asychronously, or can be run in blocking mode if needed, to optimize runtime.

  • DataSync mappers are all build on top of Flow and Streams to avoid reading entire CSVs into RAM and crashing pods.

The DataSync engine also provides a simple DSL to transform, pre-process, or post-process data, comes with its own testing framework, provides transactionality, and is very easily pluggable and extendable to convert arbitrary data formats into something we can use!

Ownership

The DataSync project was originally started by Sam Ginn (@samginn), but formalized into its own application and maintained by Chris Bailey (@vereis) and Bill Kontos (@william-ko) who should be deemed the current domain experts.

Protocols

The Protocols Engine is another large, discrete subsystem in Vetspire that is handled independantly.

This subsystem is responsible for two things: showing veterinarians using Vetspire a patient's current up-to-date vaccine/medical history in a concise way, as well as sending out all of Vetspire's medical reminders.

Requirements

The "fun" bit about Protocols is the fact that large requirements gap between both use cases, and the Protocols Engine has been extremely optimized to be able to serve both use cases as optimally as possible.

To determine a patient's up-to-date medical history:

  1. The Protocols Engine needs to look at all of immunizations or orders made for a given patient.

  2. These immunizations and orders have their products cross referenced with any configured medical protocol at an organization.

  3. If any products match any configured protocol, the most recent instance of that immunization/order takes precedence

  4. For the immunizations/orders that take precedence, we need to determine the date it was given, and the due date for the next time it is required, based on various factors.

The above is relatively simple, and for a lot of Vetspire's early life, was done in a single SQL query with a few joins and filters. Each query only has to deal with ~4 indexed tables and maybe less-than ~25 rows in each table.

To send out medical reminders, however, we have to:

  1. Get every patient's up-to-date medical history

  2. Cross reference each triggered protocol against a list of configured cadences

  3. Determine if a cadence has been triggered for a protocol in the past, and if not

  4. Batch reminders that would be sent on subsequent days, and reminders that would be sent to the same client, and

  5. Send either SMS/Emails/Postcards to said clients.

This is considerably more difficult, not neccessarily due to the logic, but the scale of data. Our largest organizations have extremely complex protocols, cadences, hundreds of thousands of patients spanning back decades, and medical histories per patient spanning several years of activity.

This latter requirement has been a constant source of issues for clinics, and has been iterated on and effectively completely rewritten three times!

Code

Protocols is built as a seperate child app under the top level :vetspire umbrella, and is referenced as :protocols throughout the codebase where appropriate. See code under apps/protocols/lib for more details.

The fact that Protocols is built as a seperate application allows us to deploy it seperately from our standard user-serving application. This is done to reduce load on our main API pods. Due to the nature of the work Protocols was designed to do, it has very high memory requirements. Due to the requirements above, Protocols also has soft-realtime requirements.

Protocols communicates with our API pods via a simple HTTP-RPC bridge. In future, we may look into solutions such as GraphQL Federation. See VSIntegrations.Protocols and Protocols.API for details on how this is done exactly.

A tradeoff we've made when building Protocols was the fact that in order to satisfy our two very distinc requirements, its built in a very specific, inflexible way and heavily uses metaprogramming and macros to prevent engineers from making mistakes that could bring down the engine.

As a result of this tradeoff, we have very specific semantics and terminology for any protocols derivation, which is described below. Some of the very cool benefits of this tradeoff are:

  • Protocols is linearly scalable for any given location.

    • The Protocols Engine has successfully run for some very large locations for our largest clients.

    • These locations have history spanning decades, as we assume that all future locations will be maximally scoped to this scale of data.

    • A "Location" is our base level of sharding, and thus no matter how many locations you throw into the engine, we can be very confident that derivation will succeed.

    • More locations equates to more jobs in the queue, which can be further scaled by way of scaling our pods, or database connections.

  • Protocols are entirely stateless, which means in a worst case scenario, we can reset the state of the engine by just restarting it or any failed work.

  • Protocols can be regenerated lazily; and either requirement breaking does not affect the other requirement. Issues breaking the UI won't manifest into breakages for the reminder sending, or vice versa.

Derivation Process

The Protocols Engine can be seen as a MapReduce framework.

Before any derivation is executed, users are expected to give us a Scope, which is used by the Protocols Engine to scope a derivation down to a certain limited subset of our production data.

To serve the requirement where we need to send reminders to every client at an org, we usually start one derivation process per location at an org, and create a Scope with said location_id set accordingly.

To serve the soft-realtime requirement where we need to view a patient's up to date medical history, we create a Scope for a given patient_id.

Given a Scope, we then run several different MapReduce stages on any data that matches said scope. The idea is that while Protocols needs to process hundreds of thousands of patients, their orders, their immunizations, etc, we only need to consider a very small subset of those for future stages in our derivation process.

Each stage in our process is a module that uses Protocols.Stage, and is itself an Ecto.Schema. Each stage is designed to run, list, process, and generate data given a specific scope, and write the results of the stage to its own dedicated Postgres table. The results of these stages (which by definition should be much smaller than that stage's own input) can further be consumed, mapped, and reduce by subsequent stages until a terminal stage has the side-effect of triggered SMSs, emails, postcards to go out.

Dependency Graph

Each stage must explicitly define its inputs: other stages that are consumed by that stage. The Protocols Engine then builds a DAG (Directed Acyclic Graph) of dependencies for all configured stages. Stages with no interdependencies can be run asynchronously and awaited before being consumed by other stages of a derivation.

Once all pre-requisite stages for a given requested stage have been computed, its results are returned to the caller.

For the soft-realtime usecase where we need an up-to-date list of a patient's immunization/order history plus due dates, we can run our DAG until the Protocols.Stage.ProductProtocols stage, whose prerequisites (Protocols.Stage.MostRecentOrderItems, Protocols.Stage.MostRecentImmunizations, etc) can all be run in parallel. In this case, the Protocols Engine can be configured to be executed by Task.async and Task.await workers, meaning that callers of the Protocols Engine feel like they're just calling a synchronous function!

Protocols Engine

For the daily reminder sending usecase, we run our DAG up to stages which send reminders as a side-effect of derivation. This usecase is configured to use Oban as a task runner, for added resiliency, retries, message sending guarantees, backpressure, etc.

Currently, there is only one true terminal stage, but in future, as each stage is incredibly isolated outside of its explicited named dependencies, and the fact that the Protocols.Stage pattern is incredibly parallelizable and generic, other terminal stages could be written to easily allow Vetspire to sending appointment reminders, birthday reminders, practically anything in response to a date or timestamp.

Ownership

The Protocols Engine was originally started by Chris Bailey (@vereis) and has since been maintained by Bill Kontos (@william-ko). Both of thish should be deemed the current domain experts.

Last updated