beprodready
System Design Fundamentals

Caching: Hit Ratios Are Claims, Not Facts

lesson 2 of 4 · 10 min

A cache is the highest-leverage component in system design: one box can absorb 90% of your traffic for a tenth of the cost of scaling what's behind it. It's also where designs lie to themselves, because the load-bearing number — the hit ratio — is a claim about your workload, not a property of the cache.

What a hit ratio really is

When you write "Redis, 90% hit ratio" on a design, you are claiming: nine out of ten reads ask for a key that was already asked for recently enough to still be resident. Whether that's true depends entirely on the workload:

  • Popularity skew. Real-world access follows zipf-like distributions — a small set of hot keys gets most of the traffic. The more skewed, the higher the achievable hit ratio with a small cache.
  • Mutability. Immutable data (a shortened URL, a published photo) can be cached forever; every repeat is a hit. Frequently-updated data forces short TTLs, which slash hits.
  • Personalization. A feed that's unique per user per refresh has almost no repeated reads. Caching fragments (the post objects, the friend lists) works; caching the assembled feed mostly doesn't.

In beprodready, the hit ratio on cache components is a slider you set — and a claim the PRR grades. Setting 99% on a personalized-feed challenge without an argument is a fast way to lose the justification axis.

Misses are the design

The cache's job is to make the common case cheap. The system's job is to survive the uncommon case:

  • Steady misses. At 90% hits and 10,000 rps, your origin sees 1,000 rps — size the origin for the miss traffic, not zero.
  • The cold cache. Deploys, restarts, evictions, a new region: hit ratio is 0% until the cache warms. Your database briefly faces all the traffic. If it can't take even a few seconds of that, you need request coalescing (collapse identical concurrent misses into one origin fetch) or a warmup step — and your PRR retro should say which.
  • The hot key. One viral object can exceed a single cache node's capacity all by itself. Skewed enough traffic turns "the cache layer" into "that one cache shard" — the sharding lesson picks this up.

Staleness is a budget, not a bug

Every caching strategy is an answer to one question: how stale is acceptable?

Staleness budgetMechanism
Forever (immutable)Cache on write, never invalidate
MinutesTTL — simplest, self-healing, slightly stale
SecondsShort TTL + event-driven invalidation
ZeroWrite-through, or don't cache it

The classic failure is choosing the mechanism before the budget — a TTL on data with a zero-staleness requirement is a correctness bug wearing a performance costume.

Where the cache sits

Layered caches multiply: a CDN at 70% in front of an application cache at 85% means the database sees (1−0.70) × (1−0.85) ≈ 4.5% of read traffic. That's how a $1,000/month stack serves traffic that would otherwise need $10,000 of database. Stack the layers yourself and watch the multiplication work:

interactive

hits the CDN20,000 rps
misses to app cache6,000 rps
misses to the database900 rps

The database sees 4.5% of read traffic — layered misses multiply: (1−0.70) × (1−0.85) = 0.045.

Notice what happens when either layer's ratio slips: the database's traffic share multiplies back up. A CDN config change that drops its hit ratio from 70% to 40% doubles your database load — that's a production incident that begins life as a one-line CDN settings PR.

The full simulation below adds the dimension the multiplier hides — time. Drag the hit ratio down and watch the origin light up — then flush the cache and meet the herd.

try it — full simulation

cache warmth 100%

origin sees 2,000 rps

dropped 0 rps

caching guidestep 1/3

Set traffic to 20k rps, then find the lowest hit ratio that keeps the origin healthy (its bar out of the red).

waiting for you to try it…

cache (100,000 rps cap)20%

20,000 rps in · serves 90%

origin database (5,000 rps cap)40%

2,000 rps of misses

Check yourself

Q1 · Your cache hit ratio is 90%. The cache cluster restarts. What does your database see?

Q2 · Which workload supports a defensible 95%+ hit ratio?

Q3 · What's the honest first question about any cache invalidation strategy?