Taming Databases in Microservices
tl;dr: Microservices give flexibility and independence, but databases don’t scale as easily as code. To survive, you need sharding, replication, caching, polyglot persistence, and a healthy respect for indexes.
Why Databases Are the Hardest Part
Microservice architectures promise flexibility, scalability, and independent teams. But databases are the stubborn part. They are stateful, not stateless, which makes them hard to scale horizontally. Unlike application servers, you can’t just spin up ten more nodes and call it a day.
In microservices, each service often has its own database, which creates new challenges:
- Scaling relational databases horizontally
- Different services needing different data structures
- Guessing database needs too early
- Synchronizing data changes across services
Let’s look at these problems one by one — and the strategies that actually work.
Problem 1: Scaling Relational Databases
Relational databases (SQL) are stateful. You can throw more CPU and RAM at them (vertical scaling), but that hits limits quickly. Horizontal scaling — splitting across servers — is complex because of consistency.
Solution: Sharding and Federated Databases
Sharding means splitting the data into logical parts.
Examples:
- By geography (EU vs US customers)
- By customer ID ranges
- By product category
In SQL Server, the Federated Database approach routes inserts and queries to the right shard using filters and keys. Instead of one massive DB, you have multiple smaller ones, each responsible for part of the data.
It’s messy, but it works — and suddenly, one bottleneck DB becomes many manageable ones.
Problem 2: Different Data Needs
Not every service has the same requirements.
- A payment or account balance service → must be ACID
- An analytics service → needs fast reads on denormalized data
- A recommendation engine → may prefer a graph DB
Trying to force one database type to serve them all is a recipe for pain.
Solution: Polyglot Persistence
Use the right database for the job. This is called polyglot persistence.
- SQL for transactions (payments, balances)
- NoSQL (MongoDB, DynamoDB) for analytics or documents
- Graph DBs (Neo4j) for relationships
- Time-series DBs (Timescale, InfluxDB) for metrics
Yes, it adds complexity (more databases to manage), but it avoids bending one system into something it’s not.
Problem 3: Wrong Requirements Early
At the start of a project, it’s easy to pick a database because it’s “popular” or “easy.” Later, you realize it’s not suited for the load or consistency you actually need.
Solution: Proper Requirements Analysis
Before picking a database for each microservice, check:
- Data model: relational, document, graph?
- Consistency: strong vs eventual?
- Transactions: what isolation level do you need?
- Load: mostly reads, writes, or balanced?
- Growth: how fast will data volume increase?
Asking these questions up front prevents painful migrations later.
Problem 4: Synchronizing Data
When data changes in one service, others often need to know. Without synchronization, systems drift into inconsistent states.
Example: In an e-commerce system, if a customer updates their email in the account service, the notification service also needs to know — otherwise, messages go to the wrong place.
Solution: Event-Driven Architecture & CDC
- Event-driven architecture: publish domain events (like
CustomerEmailUpdated) into Kafka or RabbitMQ. Other services subscribe and react. - Change Data Capture (CDC): track changes at the DB level (via triggers or transaction logs) and send them to other services.
Event-driven is best for workflows. CDC works well for read-heavy systems where you don’t want to add more writes.
Beyond the Core Problems: Reliability Matters Too
So far, we’ve focused on scaling and performance problems: how to split databases, use the right type, or keep services in sync. But microservices don’t just need to be fast — they need to be reliable.
A single database outage can take down a whole service (or worse, your entire system). That’s why replication, caching, and tuning are just as important as sharding or polyglot persistence.
Replication and High Availability
Replication means keeping copies of the database on multiple nodes.
Benefits:
- If one node dies, others still serve data.
- You can spread read requests across replicas.
Example: In PostgreSQL, streaming replication copies the Write Ahead Log (WAL) to standby nodes in real time. If the master dies, a standby takes over.
This makes the system more reliable and also helps with scaling reads.
Caching: The Pressure Valve
Sometimes the best optimization is to avoid hitting the DB at all.
With Redis or Memcached, you can cache frequent queries:
- First, check the cache.
- If not found, hit the DB.
- Store the result in cache for next time.
This massively reduces DB load and speeds up responses.
Query and Schema Optimization
Scaling is not always about bigger hardware. Often it’s about smarter queries.
- Schema design: avoid duplicate data, plan relationships
- Indexes: index frequently filtered fields
- Queries: never
SELECT *, only fetch what you need - JOINs: keep them minimal — they kill performance on big tables
- ORM tricks: use eager loading to avoid N+1 queries
These basics often buy you months of breathing space before needing heavier solutions like sharding.
Indexes: The Secret Weapon
Indexes are where you get real performance gains. In microservices, where each service owns its own DB, indexes are often the key to keeping things fast.
Why Indexes Matter
- Cut query response time
- Reduce CPU/memory use
- Allow services to scale without extra infra
- Make inter-service calls faster
Example: Pending Orders
Imagine a SaaS system with millions of rows in an Orders table. You run a query like this:
SELECT *
FROM orders
WHERE shipped_at IS NULL
ORDER BY created_at;
It asks for all unshipped orders, sorted by creation time. As the table grows, this query gets slower and slower.
- The DB does a full table scan (20M rows)
- Filters down to 1M unshipped
- Sorts them in memory/disk
The result: UI delays, background jobs hanging, even blocked connection pools.
Why the DB Chooses Bad Plans
- If ~5%+ of rows match, Postgres prefers a sequential scan.
- Filtering
IS NULLcan be indexed, but the planner often skips it. - Sorting on millions of rows is expensive and spills to disk.
Fixing It
1. Partial Index
CREATE INDEX idx_orders_unshipped_created
ON orders(created_at)
WHERE shipped_at IS NULL;
This indexes only the rows you care about (unshipped). Smaller, faster, and updates only happen when relevant rows change.
Result: query time dropped from ~3.3s to ~230ms.
2. Combined Index
CREATE INDEX idx_orders_shipped_created
ON orders(shipped_at, created_at);
This organizes rows by shipped_at and then by created_at. It helps, but the index is larger (covers all rows), so it’s less efficient than the partial index.
3. Extended Statistics
Postgres can better optimize if it knows the real data distribution:
ALTER TABLE orders ALTER COLUMN shipped_at SET STATISTICS 1000;
ALTER TABLE orders ALTER COLUMN created_at SET STATISTICS 1000;
ANALYZE orders;
This helps the planner make smarter choices.
Extra Microservices Considerations
- Shared databases: one service’s slow scan can hurt others. Partial indexes reduce the impact.
- Connection pools: slow queries block connections, ripple effect on APIs. Fast queries free them up.
- Materialized views: for very common queries (e.g., “today’s unshipped orders”), materialized views per service can cut load.
Best Practices
From experience, here’s what works best:
- Use partial indexes for frequent states (pending, active, unprocessed).
- Optimize each microservice’s DB individually.
- Dedicate separate DBs for critical services like checkout or billing.
- Monitor execution plans regularly, run ANALYZE often.
- Combine DB optimization with caching strategies.
- For cross-service workflows, use distributed transaction patterns like Saga.
Lessons Learned
- Tuning first, scaling second. Always try query/index optimization before going big with sharding.
- Replication + sharding are powerful but add complexity. Use them only when load demands it.
- Polyglot persistence is real. Different services need different DBs — embrace it, but keep it manageable.
- Event-driven sync is the key to avoiding inconsistent data.
- Indexes are gold. The right index can make a 10x–14x difference.
- Databases are never done. It’s ongoing work — like cleaning a kitchen that everyone keeps using.