Saga distributed transaction pattern

Saga distributed transaction pattern

Overview

The Saga design pattern is a way to manage data consistency across microservices in distributed transaction scenarios.

A saga is a sequence of transactions that updates each service and publishes a message or event to trigger the next transaction step. If a step fails, the saga executes compensating transactions that counteract the preceding transactions.

Context and problem

Transactions must be atomic, consistent, isolated, and durable (ACID). Cross-service data consistency requires a cross-service transaction management strategy.

In multi-service architectures:

Atomicity is an indivisble and irreducible set of operations that must all occur or none occur.
Consistency means that transaction brings the data only from one valid state to another valid state.
Isolation guarantees that concurrent transactions produce the same data state that sequentially executed transactions would have produced.
Durability ensures that the comitted transactions remain committed even in case of system failure.

Database-per-microservice

A database-per-microservice model provides many benefits for microservices architectures. Encapsulating domain data lets each service use its best data store type and schema, scale its own data store as necessary, and be insulated from other services’ failures. However, ensuring data consistency across service-specific databases poses challenges.

Two-phase commit (2PC)

Distributed transactions like the two-phase commit (2PC) protocol require all participants in a transaction to commit or roll back before the transaction can proceed. However, some participant implementations, such as NoSQL databases and message brokering, don’t support this model.

Interprocess communication (IPC)

Operating system-provided IPC allows separate processes to share data but has limitations due to synchronicity and availability. For distributed transactions to commit, all participating services must be available, potentially reducing overall system availability.

Solution

The Saga pattern provides transaction management using a sequence of local transactions.

A local transaction is the atomic work effort performed by a saga participant. Each local transaction updates the database and publishes a message or event to trigger the next local transaction in the saga. If a local transaction fails, the saga executes a series of compensating transactions that undo the changes that were made by the preceding local transactions.

When to use

Use the Saga pattern when you need to:

Ensure data consistency in a distributed system without tight coupling.
Roll back or compensate if one of the operations in the sequence fails.

The Saga pattern is less suitable for:

Tightly coupled transactions.
Compesanting transactions that occur in earlier participants.
Cyclic dependencies.

Issues and considerations

Common concerns

Consider the following points when implementing the Saga pattern:

It requires a new way of thinking on how to coordinate a transaction and maintain data consistency for a business process spanning multiple microservices.
It is particularly hard to debug and the complexity grows as participants increase.
Data can’t be rolled back, because saga participants commit changes to their local databases.
The implementation must be capable of handling a set of potential transient failures, and provide idempotence for reducing side-effects and ensuring data consistency.
It’s best to implement observability to monitor and track the saga workflow.
The lack of participant data isolation imposes durability challenges. The saga implementation must include countermeasures to reduce anomalies.

Anomalies

The following anomalies can appen without proper measures:

Lost updates: When one saga writes without reading changes made by another saga.
Dirty reads: When a transaction or a saga reads updates made by a saga that has not yet completed those updates.
Fuzzy/nonrepeatable reads: When different saga steps read different data because a data update occurs between the reads.

Countermeasures

Suggested countermeassures to reduce or prevent anomalies include:

Semantic lock: An application-level lock where a saga’s compensable transaction uses a semaphore to indicate an update is in progress.
Commutative updates: They can be executed in any order and produce the same result.
Pessimistic view: It’s possible for one saga to read dirty data, while another saga is running a compensable transaction to roll back the operation. Pessimistic view reorders the saga so the underlying data updates in a retryable transaction, which eliminates the possibility of a dirty read.
Reread value: Verifies that data is unchanged, and then updates the record. If the record has changed, the steps abort and the saga may restart.
Version file: Records the operations on a record as they arrive, and then executes them in the correct order.
By value: Uses each request’s business risk to dynamically select the concurrency mechanism. Low-risk requests favor sagas, while high-risk requests favor distributed transactions.

Transaction types

In Saga patterns:

Compensable transactions: Transactions that can potentially be reversed by processing another transaction with the opposite effect.
Pivot transactions: The go/no-go point in a saga. If the pivot transaction commits, the saga runs until completion. It is neither compensable nor retryable, or it can be the last compensable transaction or the first retryable transaction in the saga.
Retryable transactions: Transactions that follow the pivot transaction and are guaranteed to succeed.

Implementation

There are two common saga implementation approaches, choreography and orchestration. Each one has its own set of challenges and technologies to coordinate the workflow.

Choreography

Choreography is a way to coordinate sagas where participants exchange events without a decentralized point of control.

With choreography, each local transaction publishes domain events that trigger local transactions in other services.

Choreography - Benefits

Good for simple workflows that require few participants and don’t need a coordination logic.
Doesn’t require additional service implementation and maintenance.
Doesn’t introduce a single point of failure, since the responsibilities are distributed across the saga participants.

Choreography - Drawbacks

Workflow can become confusing when adding new steps, as it’s difficult to track which saga participants listen to which commands.
There’s a risk of cyclic dependency between saga participants.
Integration testing is difficult because all services must be running to simulate a transaction.

Ochestration

Orchestation is a way to coordinate sagas where a centralized controller tells the saga participants what local transactions to execute. The saga orchestator handles all the transactions and tells the participants which operation to perform based on events.

The orchestrator executes saga requests, stores, and interprets the states of each task, and handles failure recovery with compensating transactions.

Orchestration - Benefits

Good for complex workflows involving many participants or new participants added over time.
Suitable when there is control over every participant in the process, and control over the flow of activities.
Doesn’t introduce cyclical dependencies, because the orchestrator unilaterally depends on the saga participants.
Clear separation of concerns simplifies business logic. Saga participants don’t need to know about commands for other participants.

Orchestration - Drawbacks

Additional design complexity requires an implementation of a coordination logic.
There’s an additional point of failure because the orchestrator manages the complete workflow.

architect-handbook

Software Architect Handbook