In this post we’ll discuss different integration patterns for synchronizing information between microservices. One of the advantages of more traditional architectures, where all the application’s data is stored in one database, is that achieving data consistency is extremely simple. In a microservices architecture, where each service owns only a part of all the application’s data, making sure that updates are propagated between services can be challenging. Let’s look at the example below.
Let’s assume that we are responsible for building an application for managing a hospital’s day-to-day operations. Among the many things that our application will have to do, we’ll focus on:
- Managing patients and their medical records
- Managing the morgue’s operations
- Managing patient appointments
Now, let’s say a patient dies. What should our application do? It should certainly update the patient’s record. It should also register the patient in the morgue. It also makes sense to cancel any future appointments that the patient may have had.
In a monolithic architecture, using a single relational database for the whole application, would be straightforward. In the same process and transaction, we can update the patient’s record, mark any existing appointments as canceled and insert the patient’s record into the corresponding morgue table. In case of errors, we can rollback the transaction.
On the other hand, in a distributed microservices architecture, it can start to get complicated. Let’s assume that we have the following services:
- A patients service, which manages patients and their medical records
- A morgue service, which manages the morgue’s operations
- An appointments service, which manages patient appointments
Remember that each service manages its own data, so that when the Patients Service is updated to mark that a patient has died, it not only has to update its own database, but it also has to make sure that the other relevant services are also notified. This is critical for maintaining consistency in the application. But which is the best way for a service to notify other interested services of events or changes?
Database sharing is one the easiest and fastest way of sharing data across multiple services. In our case, we could create a new service, whose sole responsibility will be to keep the three databases in sync. The service could run a batch job every so often, in which it fetches from the patients database the latest unprocessed dead patients since its last run. Then, for each relevant record, it could update the other two services’ databases.
You’ll notice that the new sync service has direct access to the three databases. This means that changes to any of the three services’ database schemas will also affect the new sync service. One of the fundamentals of microservices architectures is that each service/team owns its own data. This allows for development teams to make internal changes to their services without affecting any other services as well as enabling every team to deploy their services independently, with minimum coordination from other teams. By allowing other services to access their database, we’re increasing the coupling between services and teams and introducing friction into the release process.
But what would happen to the other services if the shared tables’ schemas changes? And in any case who is responsible for those tables now anyway?
These two factors alone are enough for rejecting this solution because as a general rule we want to avoid solutions in which multiple services access the same database. Especially when those services are owned by different teams. For example, you might consider having the Patients Service update the other services’ databases directly.
This would eliminate the delay caused by updates being performed offline and in batches. However we still have the same database coupling problem as before.
Whatever integration strategy we choose for our microservices, we need to make sure that it respects services boundaries.
Probably the most intuitive way of integrating our services would be using REST. Both the morgue and Appointments Services can define a dedicated REST endpoint that other services can use to notify them of the death of a patient. In this case the Patients Service will call these two endpoints when it processes a new dead patient.
The advantage of this approach is clear: the implementation details of each service remain private. As long as the REST interface is not changed, each team can make any internal change it needs to its service with minimum coordination with other teams. However, this approach doesn’t come without its own set of problems.
As you probably know, REST works synchronously. There two major downsides to using synchronous communication:
We’ll have a problem if either the Appointments or Morgue Service happens to be unavailable when the Patients Service sends them a request. What should the Patients Service do? Rollback all the changes? Try again? If that doesn’t work, then what?
The more external synchronous calls our service makes, the more dependencies it has and the less resilient it becomes.
Every time a service calls another service synchronously, it will block it until it receives a response. In our case, our Patients Service will have to wait for both the Appointments and Morgue Services to complete their requests before it can return its own response to the calling service or client. This might not seem like much, but this extra latency can add up if you’re not careful.
Also beware of cascading effects: when there is long chain of services communicating with each other via REST, one slow-responding service can slow down all the other services up the chain.
Another issue with this approach is that the calling service has to be aware of any external service that might need to get notified when an interesting event happens. We’ll need to modify our patients service every time that a new service will want to get notified of a new dead patient.
We can easily decouple the caller and callee services by putting a message broker in the middle. Every time it processes a new dead patient, the Patients Service will publish an event to the message broker. On the other side, the Appointments and Morgue Service will have subscribed to the message broker to receive events about patients updates. Each service will receive and process those events independently and at its own pace.
The key concepts here are:
- The message broker will store the messages safely until they are processed. This means that the Patients Service will be able process new dead patients, even if the Morgue or Appointments Services are unavailable. If they happen to be unavailable, messages will accumulate in the message broker and will be processed once the services are back up
- The Patients Service doesn’t need to know about the Appointments and Morgue Services, as it only communicates with the message broker. What’s more, we can add and remove subscribed services just by reconfiguring the message broker, without affecting the Patients Service. The message broker will receive the corresponding message and send it to all subscribed services. This pattern is called fan-out
- Events are being processed asynchronously now, which means that the Patients Service can return a response as soon as it has published the event to the message broker. Other slow-working services won’t affect our service’s response time
- Subscribed Services can process events at its own pace. If the Publisher Services sends more events than the subscribed service can process, the message broker will act as buffer, isolating both services.
Note that we’re able to use asynchronous communication only because our Patients Service doesn’t need a response from the other two services. As part of the decoupling that we get by using a message broker, the Publisher Service won’t know about the Subscribed Services (and vice-versa). While this brings about several advantages, like we just saw, it also means that the Subscribed Services cannot send a response back to the Publisher Service. It would also be remiss not to mention here that using asynchronous communication makes testing and troubleshooting more difficult.
Another possible pattern that we haven’t considered here is using files for sharing data between services. If you have experience with ETL processes, you’ll probably find this approach familiar. Every X amount of time, the patients service could write files containing the relevant events to a common directory (FTP, S3, HDFS, etc…). On the other side, the Morgue and Appointments services can poll the directory for new files to process.
Since data is processed in batches, it has a considerably large delay compared to other approaches. In addition the implementation of this pattern can be quite tricky. If you’re thinking of using file transferring, maybe because you’ve used it in ETL process before, consider using a message broker instead.
We’ve only discussed a few out of many possible integration patterns for microservices architectures. The key take-aways here are:
- Don’t share databases between services. Even if it’s the easiest method to implement, it will end up costing you more in the future
- If your cross-service requests don’t expect a response, it is better to use asynchronous communication
- If you’re using synchronous protocols, like REST or RPC, beware of the latency and coupling that they create. Use patterns like timeouts, circuit-breakers and retries to reduce the damage that a failing external service might cause to your service.
As always, let me know if you have any comments or questions.
For a more thorough explanation of integration patterns, check out the book Enterprise Integration Patterns