DynamoDB
DynamoDB bills you for the access pattern you designed, not the query you wish you could run. Get the partition key wrong and it throttles while looking 99% idle.
On this page
DynamoDB gets sold as “infinitely scalable, fully managed NoSQL.” That is true, and it is also the most expensive sentence in the docs, because it hides the one rule that decides whether your table costs $40/month or $40,000/month: you do not design a schema, you design an access pattern, and the partition key is the whole game.
Teams arrive from relational databases expecting to add indexes later and JOIN their way out of trouble. DynamoDB does not work that way. There are no joins, the query planner is your own application code, and a key choice you made on day one quietly becomes a hot-partition incident on day ninety. The engine will not warn you. It will look 99% idle on the dashboard while a single key throttles your most important customer.
This is the operator’s context article for running DynamoDB at scale — what the request actually does between your SDK call and the 200 OK, where the ceilings are, and how the cost model turns design mistakes into recurring invoices. The partitioning fundamentals it rests on live in Sharding & Partitioning; the consistency tradeoffs trace back to CAP Theorem & Tradeoffs and Consistency & Consensus; and where it fits against everything else, the roadmap tracks the broader datastore map. If you want the relational counterpoint as you read, keep PostgreSQL open in another tab — the two systems make opposite bets, and the contrast is the fastest way to internalize either one.
A motivating failure
A ride-hailing company stores trip events in DynamoDB. The partition key is city_id, the sort key is the trip timestamp. It works beautifully in three launch cities. Reads are 4ms, writes never throttle, the bill is small, and the table is forgotten — which in infrastructure is the highest compliment there is.
Eighteen months later they are in two hundred cities, and one of them is hosting a championship final. As the match ends, every fan in that city opens the app at once. All of those writes carry the same city_id, so they hash to the same physical partition. That partition has a hard ceiling of 1000 WCU, and the city blows past it in seconds.
DynamoDB starts returning ProvisionedThroughputExceededException. The mobile clients, written to retry aggressively, hammer the same hot key harder. The table’s total provisioned capacity is barely 6% used, so the CloudWatch dashboard is a wall of green and the on-call engineer wastes twenty minutes looking for a problem that the headline metrics insist does not exist. Meanwhile surge pricing — which reads the same hot key — stops updating, so the busiest market in the company is also the only one with broken pricing.
Nothing here is a bug. The table did exactly what a partitioned key-value store must do: a single partition key lives on a single partition, and a single partition has a fixed ceiling. The outage lived entirely in the key design — choosing city_id, a low-cardinality attribute with a brutally skewed distribution, as the thing that decides where data physically lands. The team chose a key that read like a sensible business dimension and turned out to be a load concentrator. That is the failure this article exists to prevent.
The one-sentence mental model
DynamoDB hashes your partition key to pick one of many physical partitions, stores items inside that partition sorted by the sort key, and caps each partition at roughly 1000 WCU / 3000 RCU and 10 GB — so your throughput and storage are only ever as scalable as your key is evenly distributed.
Every clause is an operational constraint, not a detail:
- Hashes the partition key → you cannot range-scan across partition keys cheaply; “all users created last week” is not a query, it is a full table scan.
- Sorted by sort key → range queries (
begins_with,between,>) are nearly free within one partition key and impossible across them. - Caps each partition → a popular key throttles even when the table as a whole is 1% utilized. The ceiling is local, the dashboard is global, and that mismatch is where incidents hide.
- Only as scalable as the key is distributed → the partition key is the single most consequential decision in the entire system, and it is the hardest one to change after launch.
flowchart LR R[Request\nPK=user#42] --> H[Hash\npartition key] H --> P1[(Partition A\nmax 1000 WCU)] H --> P2[(Partition B\nmax 1000 WCU)] H --> P3[(Partition C\nmax 1000 WCU)] P2 --> I[Items sorted\nby sort key]
A partition key (PK) decides where an item lives. A sort key (SK) decides order within that location and enables range queries. Together they form the primary key, which must be unique. Internalize that the PK is a placement decision before it is a lookup key, and most DynamoDB surprises stop being surprising.
How it actually works
The request path: from SDK call to partition
When you call GetItem or PutItem, the request hits a fleet of stateless request routers. The router takes your partition key, runs it through an internal hash, and consults the partition metadata to find which storage node currently owns that key range. Your item is then replicated across three storage nodes in different Availability Zones, and a write is acknowledged once a quorum (two of three) has it durably.
This is the lineage of the original Dynamo paper showing through: consistent hashing for placement, quorum replication for durability, and a deliberate refusal to do anything that requires coordinating across partitions. The reason there are no joins is not that AWS ran out of time — it is that a join would require touching multiple partitions in one operation, and the entire design exists to make every operation hit exactly one.
flowchart TD
C[Client SDK] --> RR[Request router\nstateless fleet]
RR --> M{Hash PK\nlook up owner}
M --> L[Leader node\nfor key range]
L --> R1[(Replica AZ-1)]
L --> R2[(Replica AZ-2)]
L --> R3[(Replica AZ-3)]
R1 --> Q[Quorum ack\n2 of 3]
R2 --> Q
Q --> C
Strongly consistent reads go to the leader replica; eventually consistent reads (the default, and half the price) can be served by any replica, which is why they can lag a write by milliseconds.
The query model is the key schema
You cannot WHERE on an arbitrary attribute. A Query operation requires the exact partition key and optionally a sort-key condition. Anything else is a Scan, which reads every item in the table and bills you for all of it. If you find yourself scanning in a hot path, the key design is wrong — DynamoDB is working as intended and presenting you the invoice for asking it the wrong question.
This forces a discipline that feels alien coming from SQL: you enumerate every access pattern before you create the table. “Get user by id,” “list a user’s orders newest-first,” “find an order by its id,” “list orders in PENDING status” — each one must map to a Query against a key or an index, or it does not get to exist cheaply. The access-pattern document is the schema. The table definition is just its shadow.
Single-table design
Because there are no joins, the idiomatic pattern is single-table design: heterogeneous item types share one table, distinguished by key prefixes, so a single Query can fetch a parent and its children in one round trip.
PK SK attributes
USER#42 PROFILE name, email, tier
USER#42 ORDER#2024-01-09 total, status
USER#42 ORDER#2024-02-01 total, status
ORDER#9001 METADATA shipping, items[]
One Query on PK = USER#42 with SK begins_with("ORDER#") returns the profile and every order, already sorted, in a single billed read. That is the join you don’t get to write, pre-materialized into the key layout. It is genuinely elegant when the patterns are stable.
The cost is conceptual and it is real: the table is unreadable without the access-pattern doc, every new pattern may require a backfill, and onboarding an engineer means teaching them a bespoke encoding scheme before they can write a single query. Single-table design trades discoverability for round-trip efficiency on purpose. Do it when the round trips matter; do not cargo-cult it onto a low-traffic table where two simple tables would be clearer.
GSIs and LSIs
When you need to query by a different attribute, you add a secondary index.
- A Local Secondary Index (LSI) shares the partition key but uses a different sort key. It must be created at table-creation time, shares the
10 GBper-partition limit, and supports strongly consistent reads. LSIs are rare in practice precisely because they lock you in at creation and inherit the partition-size ceiling. - A Global Secondary Index (GSI) uses a completely different PK and SK, has its own capacity, is eventually consistent only, and can be added or dropped anytime. This is the workhorse.
GSIs are how you serve “find order by order_id” when the base table is keyed by user. The index is an asynchronously maintained copy of the items, re-keyed on the index’s attributes. A write to the base table propagates to the GSI within single-digit milliseconds normally, but that lag is real, and a read-after-write against a GSI can miss the item entirely.
sequenceDiagram participant App participant Base as Base table participant Log as Internal log participant GSI App->>Base: PutItem PK=USER#42 order=9001 Base-->>App: 200 OK Base->>Log: append change Log->>GSI: async project to order_id Note over GSI: window where GSI\nlacks the new item App->>GSI: Query order_id=9001 too soon GSI-->>App: empty result
The mistake everyone makes once: treating a GSI read as authoritative immediately after a write. If your flow is “create order, then redirect to a page that looks it up by order_id via the GSI,” you will intermittently get a 404 on a record you just created. Either read the base table for read-after-write, or design the flow to tolerate the gap.
Capacity units: the unit of everything
Capacity is the currency, and the conversion rates are worth memorizing because they govern both cost and throttling:
1 WCU= one write up to1 KB. A3 KBitem costs3 WCUper write.1 RCU= one strongly consistent read up to4 KB, or two eventually consistent reads of4 KB. So an eventually consistent4 KBread is0.5 RCU.- A transactional read or write costs 2x the normal units.
These rates mean item size is a throughput multiplier, not just a storage line. A table full of 8 KB items needs 8 WCU per write — eight times the capacity of a lean 1 KB design for the identical request rate. Trimming items is a throughput optimization disguised as a storage one.
On-demand vs provisioned
Capacity mode is the lever that controls cost and throttling behavior.
- Provisioned: you set
RCU/WCUand pay for them whether used or not. Cheapest if traffic is predictable. Pair it with auto scaling, which chases a target utilization (default70%) — but auto scaling reacts over minutes, so it does nothing for a sudden spike. - On-demand: pay per request (order of magnitude
~$1.25per million writes,~$0.25per million eventually-consistent reads), no capacity planning, instant scaling up to roughly double your previous peak. Costs roughly5-7xmore per request than well-utilized provisioned capacity.
The practical rule: start on-demand to learn the traffic shape without getting paged for throttles, then move steady high-volume tables to provisioned plus auto scaling once the pattern is known and the bill justifies the attention.
The tradeoffs that bite
| Decision | Looks free | Actually costs |
|---|---|---|
| Low-cardinality partition key | Simple, readable | Hot partition, throttling at scale |
| A GSI per query | Flexible | 2x write cost per GSI, eventual consistency |
| On-demand “set and forget” | No capacity math | 5-7x per-request price vs provisioned |
| Fat items with arrays | One round trip | 400 KB cap; every KB is more WCU/RCU |
| Strongly consistent reads | Always fresh | 2x RCU; impossible on a GSI |
| Single-table design | One query for parent+children | Unreadable without the access-pattern doc |
Two rows deserve emphasis. Every GSI you add doubles the write cost for items it projects, because each base write also writes the index — a table with three GSIs costs roughly 4x WCU per logical write. And strongly consistent reads cost 2x and cannot be served from a GSI at all, so “I’ll just read the index strongly” is not an option you have.
Item size is the quiet killer. An item is capped at 400 KB, and because capacity is metered per KB, fat items with embedded arrays silently inflate both your write and read bills. The fix is the same one S3 was built for: push large blobs to object storage and keep only a pointer in DynamoDB.
Read and write performance
What’s fast in DynamoDB is the single-key access path, and it stays fast at any scale because the engine adds partitions rather than letting any one of them get slower. A GetItem by primary key is single-digit milliseconds at ten items or ten billion. That flatness is the entire value proposition — latency that does not degrade as you grow.
What’s slow, or rather expensive, is anything that fights the model:
Scanreads the whole table and bills the whole table. It is fine for a one-off migration, a crime in a hot path. Use parallel segments only when you genuinely must sweep everything, and expect to consume serious capacity doing it.Querywith a big result set is paginated at1 MBper page; you pay an RCU cost per page and must loop onLastEvaluatedKey. A query that returns ten thousand items is many round trips, not one.- Filter expressions are a trap: they run after the read, so
FilterExpressiondoes not reduce consumed capacity. Filtering out 99% of items still bills you for reading 100%. Move the selectivity into the sort key, not the filter.
The levers that actually move read performance, in order of impact:
- Eventually consistent reads where you can tolerate millisecond staleness — instant
2xread-capacity savings. - DAX (DynamoDB Accelerator), an in-front write-through cache that turns hot reads into microsecond responses and shields the table from read storms. It is a managed cache with its own failure modes; see Caching Strategies for the patterns and pitfalls before you bolt it on.
- Projection expressions to fetch only the attributes you need, shrinking the bytes (though RCU is billed on the item size read, not the projection, for base tables — projections mainly help on GSIs where you choose what to project).
- Right-sized items, because every KB read is metered.
On the write side, the only real levers are key distribution and item size. There is no “slow write” to optimize — a write is fast or it is throttled. The whole performance discipline collapses into: spread the keys, shrink the items, and provision the GSIs as carefully as the base table.
Failure modes
The signature DynamoDB failure is hot-partition throttling, and it is exactly the opening story. The table’s total capacity is fine, the dashboard is green, but traffic concentrated on one partition key hits the per-partition 1000 WCU / 3000 RCU ceiling and clients get ProvisionedThroughputExceededException, surfaced as 400 throttles.
Classic triggers: a partition key like status (a handful of values), a date key that funnels all of today’s writes into one partition, a celebrity row everyone reads, or the city_id from the story. Adaptive capacity mitigates steady imbalance by quietly shifting capacity toward busy partitions and can isolate a single hot key onto its own partition — but it adapts over minutes and does nothing for a sharp burst.
If a single item or key is hot, no amount of total table capacity saves you. The ceiling is per-partition, not per-table. Design the key to spread load, or shard it with a write-sharding suffix (
USER#42#<0-9>) and scatter-gather on read. Total provisioned capacity is the number on the invoice; the number that throttles you is the one per partition that no dashboard shows by default.
The second failure mode is GSI backpressure. If a GSI is under-provisioned, writes to the base table get throttled, because DynamoDB will not let the index fall arbitrarily behind. An under-capacity index silently becomes the bottleneck for your primary write path, and the throttle shows up on the base table, where nobody thinks to look at index capacity. Provision GSIs as carefully as the base table.
The third is silent cost blowup: an unindexed access pattern someone solved with a Scan inside a Lambda that now runs every minute, or a retry storm that multiplies consumed capacity. There is no slow-query log nagging you the way PostgreSQL’s EXPLAIN would. The only signals are the bill and ConsumedReadCapacityUnits, so you alarm on consumed capacity, not just on throttles.
The fourth is the retry storm itself. Throttles are transient, so SDKs retry with backoff — but a misconfigured client without jitter, or application-level retries stacked on top of SDK retries, turns a brief throttle into a sustained self-inflicted attack on the hot key. Confirm exactly one layer of retry-with-jitter owns the backoff. This is the same dynamic as a rate limiting feedback loop, just pointed at your own database.
The general key-distribution theory under all of this is in Sharding & Partitioning, and Cassandra shows how a peer system handles the identical partition-key problem with a different, tunable consistency model.
Scaling it
DynamoDB scales storage and the happy path for you — AWS splits partitions behind the scenes as data and throughput grow, so “the table is too big” is essentially never the wall. The work at 10x and 100x is keeping load even and keeping read-heavy rollups off the base table’s hot keys.
- At 10x, your day-one partition key reveals its true cardinality and skew. Audit
ThrottledRequestsper table and per GSI. Move time-series keys off a raw timestamp to a composite that distributes, e.g.bucket#<hour>plus a suffix, so a single hour of writes spreads across many partitions instead of stacking on one. - At 100x, introduce write sharding for unavoidably hot keys: append
#0..#Nto the partition key on write, then fan reads across theNshards and merge. You are trading read fan-out for write distribution — the same bargain the ride-hailing team should have made forcity_idon day one.
flowchart LR W[Write hot key] --> S[Append suffix\n#0 to #9] S --> P0[(Partition\nkey#0)] S --> P1[(Partition\nkey#1)] S --> P2[(Partition\nkey#9)] P0 --> G[Read: query\nall N shards] P1 --> G P2 --> G G --> M[Merge results\nin app]
- DynamoDB Streams become the scaling backbone. A stream is an ordered,
24h-retained change log of the table. Wire it to Lambda or Kinesis to maintain aggregations, project into Elasticsearch for search, or drive event-driven workflows — so you stop hammering the base table with read-heavy rollups. This is also the clean integration point with Kafka or other message queues when DynamoDB is the system of record but other systems need the change feed. - Global tables add multi-region active-active replication on top of Streams. They resolve conflicts last-writer-wins by wall-clock timestamp; do not use them where two regions can legitimately update the same item and both updates must survive. That is a consistency and consensus problem global tables do not solve for you.
The wall you hit is never “too big.” It is always “one key is too hot” or “one GSI is starved.”
When to reach for it (and when not to)
Reach for DynamoDB when your access patterns are known and key-based, you need single-digit-millisecond reads at any scale, you want zero database operations, and your traffic is spiky enough that on-demand’s elasticity earns its premium. Session stores, shopping carts, IoT and event ingestion, user profiles, feature-flag state, and idempotency-key tables are textbook fits. It pairs naturally behind an API gateway for serverless backends where there is no server to run a database on anyway.
Don’t reach for it when you need ad-hoc queries, analytics, or joins — that is a relational database (PostgreSQL) or a warehouse. Don’t use it when access patterns are still in flux and changing weekly; the key schema is genuinely hard to change after data lands. And don’t pick it for low, steady traffic where a small managed Postgres does the job for $15/month — DynamoDB’s value is scale and operational offload, not being cheap at the bottom. Picking it for a CRUD app with a thousand rows because it is “web scale” is how you end up with a bespoke single-table encoding nobody can query and no scale problem to justify it.
When to consider alternatives
- Ad-hoc queries, joins, transactions across many rows → PostgreSQL or another relational system of record.
- High write throughput with tunable, region-aware consistency → Cassandra, which gives you read/write quorum knobs DynamoDB hides.
- Full-text search and relevance ranking → Elasticsearch, fed from DynamoDB Streams, never as the source of truth.
- Sub-millisecond ephemeral counters, locks, leaderboards → Redis.
- A durable event log / streaming backbone → Kafka or dedicated message queues.
- Large binary blobs → object storage with a pointer in the item.
Operational checklist
- Choose a high-cardinality partition key; reject any key with fewer than a few thousand well-distributed values for a high-write table. Audit
city_id-shaped keys before launch, not after. - Alarm on
ThrottledRequests > 0per table and per GSI — base-table writes can stall on a starved index. - Alarm on
ConsumedReadCapacityUnits/ConsumedWriteCapacityUnitsto catch runawayScanjobs and retry storms before the bill does. - Set provisioned auto-scaling target to
70%utilization, but keep headroom — it reacts in minutes, not seconds; stay on-demand for genuinely spiky tables. - Confirm exactly one layer of retry-with-jitter owns backoff; stacked retries turn a brief throttle into an outage.
- Enable point-in-time recovery (PITR) on any table you can’t afford to lose; it is off by default.
- Keep items well under
400 KB; push large blobs to object storage and store a key reference. Remember every KB is metered capacity. - Use
TransactWriteItemsfor cross-item atomicity, but budget its2xcapacity cost and the100-item /4 MBtransaction limit. - For global tables, confirm last-writer-wins is acceptable for every item type before enabling.
Summary
DynamoDB is a partitioned key-value store that delivers flat, single-digit-millisecond latency at any scale — in exchange for making you do the query planner’s job at design time. The partition key is the load-bearing decision: it determines placement, and placement determines whether you scale smoothly or throttle on a hot key while the dashboard shows green. Enumerate every access pattern before you create the table, choose a high-cardinality key, watch consumed capacity and per-GSI throttles rather than just table-level numbers, push big blobs out to S3, and provision your indexes as carefully as your base table. Do that and DynamoDB will be the boring, elastic backbone it promises to be. Get the key wrong and no amount of total capacity will save you — because the ceiling that throttles you is the one per partition that nobody is looking at.
Appendix: key-value vs relational, in one page
If you are coming from SQL and the model still feels foreign, the core differences:
- Schema time. Relational databases let you defer query design — declare tables, add indexes when a slow query appears,
JOINwhatever you forgot. DynamoDB front-loads all of that: the access patterns are the schema, decided before the first write. - Where the planner lives. In Postgres, a cost-based planner turns your declarative SQL into an execution plan. In DynamoDB, you are the planner — every read must name a key or an index, and there is no optimizer to rescue a bad query.
- How it scales. Relational scaling means vertical first, then read replicas, then painful sharding. DynamoDB shards from item one, transparently, which is why it scales horizontally without ceremony — and why the partition key matters so much that it is the only thing that can really hurt you.
- Consistency default. Postgres reads are strongly consistent by default. DynamoDB reads are eventually consistent by default (and half the price); you opt into strong consistency per request, and never get it on a GSI.
The trade is clean once you see it: relational systems optimize for flexibility of querying, DynamoDB optimizes for predictability of scale. Pick the one whose hard part you can afford to live with.
Further reading
- Related: Sharding & Partitioning — the key-distribution theory under hot partitions.
- Related: Cassandra — the same partition-key problem with tunable consistency · PostgreSQL — the relational counterpoint.
- DynamoDB Developer Guide — best practices for partition key design
- Dynamo: Amazon’s Highly Available Key-value Store (2007 paper)
Incidents & deep-dives
Where this system breaks in production — and how it comes back.
Documenting next
- 🔒 Hot Partition Throttling Under Burstroadmap →