beprodready
System Design Fundamentals

Sharding: Splitting Data Without Splitting Yourself

lesson 3 of 4 · 9 min

When one machine can't hold the data or take the load, you split — by key. Sharding is the standard answer to scale, and the standard source of 2 a.m. incidents, because the split is only as good as the key.

The mechanics

Pick a shard key (user ID, post ID), apply a function, get a shard:

  • Hash shardingshard = hash(key) % N. Spreads keys uniformly; kills range queries ("all posts from June" now touches every shard).
  • Range sharding — keys A–F here, G–M there. Preserves range scans; concentrates load wherever the workload is concentrated — which for time-ordered keys means all new writes hit the last shard.
  • Directory sharding — a lookup service maps keys to shards. Maximum flexibility, plus a new component that can fail.

Uniform load is the promise. The interactive sim below shows the promise being kept — set skew to 0% and watch shards share perfectly as you add them.

The hot key problem

Now drag the skew slider. A celebrity profile, a viral post, a flash-sale SKU: real traffic concentrates on a few keys, and a key lives on exactly one shard. The brutal consequence, which you can verify in the playground: adding shards does nothing for a hot key. Sixteen shards, and shard 1 still melts while fifteen idle — you're paying 16x for 1x of relief.

Real fixes attack the key, not the count:

  1. Cache the hot object. The cheapest and usually right answer — a hot key is by definition wildly cacheable (same key, repeated reads). This is why the caching lesson precedes this one.
  2. Salt the key. Split celebrity_42 into celebrity_42#1..8 spread across shards; readers fan out and merge. Costs read complexity, buys write/read spread.
  3. Special-case hot entities. Detect them (your metrics should make hot keys visible) and route them to dedicated capacity.

A design that answers "hot partition?" with "add more shards" fails the availability axis of any serious review — now you can see why.

Resharding: the bill arrives later

Shard count looks like a free parameter on day one. It isn't — changing it later means moving live data under traffic:

  • hash(key) % N reassigns nearly every key when N changes. Consistent hashing fixes this: only ~1/N of keys move.
  • Plan shard count with growth headroom (e.g., 4x current need) and use logical shards (many small virtual shards mapped onto few physical nodes) so "resharding" becomes "remapping," not "rewriting."

See the difference, key by key — every red square below is live data that has to migrate when you add one shard:

interactive

Adding one shard moves 85/100 keys — hash % N reassigns almost everything. Every red square is live data migrating under traffic.

This is the whole argument for consistent hashing in one picture: with hash % N, adding capacity is a near-total data migration; with a ring, it's a small, bounded handover to the new shard.

What to say in a design review

The sharding paragraph of a strong justification names four things: the shard key and why it distributes well for this workload, the lookup pattern it preserves (point reads? ranges?), the hot-key story, and the resharding plan. Miss the hot-key story and the next viral moment writes it for you.

try it — full simulation

dropped 6.7%

hot shard p99 ∞ (saturated)

sharding guidestep 1/4

Set skew to 0% and push traffic above 40k rps.

waiting for you to try it…

dashed line = shard capacity (5,000 rps)

Check yourself

Q1 · Traffic to one celebrity's profile is melting shard 3. You double the shard count. What happens?

Q2 · Why is range-based sharding (A-F, G-M, ...) risky for time-ordered IDs?

Q3 · What's the hardest operational moment in a sharded system's life?