Back to blog
Architecture11 min read

Event-Driven Microservices That Don't Fall Apart: CQRS, Saga Patterns, and the Exactly-Once Myth

Event-driven architecture promises loose coupling and scalability. But in production, it introduces failure modes that most teams aren't prepared for. Here's how to build event-driven systems that actually work.

Event-driven microservices are having a moment. Every architecture blog, conference talk, and system design interview revolves around Kafka, event sourcing, CQRS, and saga patterns. The appeal is real: loose coupling between services, natural scalability, and a clean separation of concerns that makes independent deployments possible.

But there's a gap between event-driven architecture as described in blog posts and event-driven architecture as experienced in production at 3 AM when messages are being processed out of order, a consumer is stuck in an infinite retry loop, and your saga has left three services in an inconsistent state.

This post covers the patterns, tradeoffs, and failure modes of event-driven microservices — with a focus on what actually goes wrong and how to prevent it.

Event-Driven vs. Event Sourcing: They're Different

Before going further, let's clarify a common confusion. Event-driven architecture and event sourcing are related but distinct concepts, and conflating them leads to poor design decisions.

CQRS: Separating Reads from Writes

Command Query Responsibility Segregation (CQRS) splits your data model into two: a write model optimized for processing commands (creating orders, updating profiles, processing payments) and a read model optimized for queries (listing orders, searching products, generating reports).

The write model handles business logic and maintains consistency. The read model is denormalized, fast, and optimized for specific query patterns. Events flow from the write side to the read side, keeping the read models eventually consistent.

When CQRS Makes Sense

When CQRS Is Overkill

The biggest mistake teams make with CQRS is applying it globally to their entire system. CQRS should be applied to specific bounded contexts where the read/write asymmetry justifies the complexity. Most services in your architecture should remain simple CRUD.

The Saga Pattern: Distributed Transactions That Work

In a monolith, a business operation that spans multiple entities wraps everything in a database transaction. Either all changes commit or all roll back. In microservices, this isn't possible — each service has its own database, and distributed transactions (2PC) don't scale and create tight coupling.

The Saga pattern replaces a single distributed transaction with a sequence of local transactions, each in its own service. If a step fails, the saga executes compensating transactions to undo the previous steps.

Choreography: Event-Based Coordination

In choreography, each service listens for events and decides autonomously what to do next. There's no central coordinator. For example, the Order Service publishes 'OrderCreated', the Payment Service hears it and processes payment, publishing 'PaymentCompleted', the Inventory Service hears that and reserves stock.

Orchestration: Centralized Coordination

In orchestration, a central Saga Orchestrator service explicitly manages the saga flow. It sends commands to each service and handles their responses. The orchestrator knows the full saga definition — which steps to execute, in what order, and what compensating actions to take on failure.

In practice, most production systems use orchestration for complex business flows and choreography for simpler, more decoupled interactions. It's not either/or — you'll use both patterns in the same system.

The Exactly-Once Myth

Every discussion of event-driven systems eventually hits the delivery guarantee question: at-most-once, at-least-once, or exactly-once? Teams naturally want exactly-once delivery because it eliminates the need to think about duplicates. Here's the uncomfortable truth: exactly-once delivery is impossible in distributed systems.

This isn't a limitation of current technology — it's a fundamental constraint. The Two Generals Problem proves that no protocol can guarantee exactly-once delivery over an unreliable network. What systems like Kafka offer as 'exactly-once' is actually 'effectively-once within the Kafka ecosystem' — they deduplicate within Kafka's own processing but cannot guarantee exactly-once delivery to external systems.

The Real Solution: Idempotency

Instead of trying to prevent duplicate messages (impossible), design your consumers to handle them safely. An idempotent operation produces the same result whether it's executed once or many times.

Accept at-least-once delivery and build idempotent consumers. This is simpler, more reliable, and more honest than chasing the exactly-once illusion.

Dead Letter Queues and Poison Messages

A poison message is a message that consistently fails to process — maybe it contains invalid data, triggers a bug, or depends on a resource that's permanently unavailable. Without a dead letter queue (DLQ), a poison message blocks the entire consumer, causing it to retry forever.

Schema Evolution: The Silent Killer

Event schemas change over time. New fields are added, old fields are deprecated, data types evolve. In a microservices architecture where multiple teams produce and consume events, uncoordinated schema changes are the most common source of production incidents.

Observability in Event-Driven Systems

Event-driven systems are inherently harder to observe than synchronous request-response systems. A request doesn't follow a linear path — it triggers a cascade of events across multiple services with no deterministic ordering.

When NOT to Use Event-Driven Architecture

Event-driven architecture is powerful but not universal. Using it where it doesn't fit creates accidental complexity that makes systems harder to build, operate, and debug.

The best architecture is the simplest one that meets your requirements. Event-driven patterns are tools — use them where they provide genuine value, not as a default because they sound sophisticated.

Architect Resilient Systems with Accelar

Accelar designs and builds distributed systems that scale without falling apart. From event-driven architectures and microservices to data pipelines and real-time processing — we engineer the infrastructure that keeps your business running. Let's discuss your architecture challenges.