Caching: Hit Ratios Are Claims, Not Facts

A cache is the highest-leverage component in system design: one box can absorb 90% of your traffic for a tenth of the cost of scaling what's behind it. It's also where designs lie to themselves, because the load-bearing number — the hit ratio — is a claim about your workload, not a property of the cache.

What a hit ratio really is

When you write "Redis, 90% hit ratio" on a design, you are claiming: nine out of ten reads ask for a key that was already asked for recently enough to still be resident. Whether that's true depends entirely on the workload:

Popularity skew. Real-world access follows zipf-like distributions — a small set of hot keys gets most of the traffic. The more skewed, the higher the achievable hit ratio with a small cache.
Mutability. Immutable data (a shortened URL, a published photo) can be cached forever; every repeat is a hit. Frequently-updated data forces short TTLs, which slash hits.
Personalization. A feed that's unique per user per refresh has almost no repeated reads. Caching fragments (the post objects, the friend lists) works; caching the assembled feed mostly doesn't.

In beprodready, the hit ratio on cache components is a slider you set — and a claim the PRR grades. Setting 99% on a personalized-feed challenge without an argument is a fast way to lose the justification axis.

Misses are the design

The cache's job is to make the common case cheap. The system's job is to survive the uncommon case:

Steady misses. At 90% hits and 10,000 rps, your origin sees 1,000 rps — size the origin for the miss traffic, not zero.
The cold cache. Deploys, restarts, evictions, a new region: hit ratio is 0% until the cache warms. Your database briefly faces all the traffic. If it can't take even a few seconds of that, you need request coalescing (collapse identical concurrent misses into one origin fetch) or a warmup step — and your PRR retro should say which.
The hot key. One viral object can exceed a single cache node's capacity all by itself. Skewed enough traffic turns "the cache layer" into "that one cache shard" — the sharding lesson picks this up.

Staleness is a budget, not a bug

Every caching strategy is an answer to one question: how stale is acceptable?

Staleness budget	Mechanism
Forever (immutable)	Cache on write, never invalidate
Minutes	TTL — simplest, self-healing, slightly stale
Seconds	Short TTL + event-driven invalidation
Zero	Write-through, or don't cache it

The classic failure is choosing the mechanism before the budget — a TTL on data with a zero-staleness requirement is a correctness bug wearing a performance costume.

Where the cache sits

Layered caches multiply: a CDN at 70% in front of an application cache at 85% means the database sees (1−0.70) × (1−0.85) ≈ 4.5% of read traffic. That's how a $1,000/month stack serves traffic that would otherwise need $10,000 of database. Stack the layers yourself and watch the multiplication work:

interactive

CDN hit ratio70%App cache hit ratio85%Read traffic20k rps

hits the CDN20,000 rps

misses to app cache6,000 rps

misses to the database900 rps

The database sees 4.5% of read traffic — layered misses multiply: (1−0.70) × (1−0.85) = 0.045.

Notice what happens when either layer's ratio slips: the database's traffic share multiplies back up. A CDN config change that drops its hit ratio from 70% to 40% doubles your database load — that's a production incident that begins life as a one-line CDN settings PR.

The full simulation below adds the dimension the multiplier hides — time. Drag the hit ratio down and watch the origin light up — then flush the cache and meet the herd.

What a hit ratio really is

Popularity skew. Real-world access follows zipf-like distributions — a small set of hot keys gets most of the traffic. The more skewed, the higher the achievable hit ratio with a small cache.
Mutability. Immutable data (a shortened URL, a published photo) can be cached forever; every repeat is a hit. Frequently-updated data forces short TTLs, which slash hits.
Personalization. A feed that's unique per user per refresh has almost no repeated reads. Caching fragments (the post objects, the friend lists) works; caching the assembled feed mostly doesn't.

Misses are the design

The cache's job is to make the common case cheap. The system's job is to survive the uncommon case:

Steady misses. At 90% hits and 10,000 rps, your origin sees 1,000 rps — size the origin for the miss traffic, not zero.
The cold cache. Deploys, restarts, evictions, a new region: hit ratio is 0% until the cache warms. Your database briefly faces all the traffic. If it can't take even a few seconds of that, you need request coalescing (collapse identical concurrent misses into one origin fetch) or a warmup step — and your PRR retro should say which.
The hot key. One viral object can exceed a single cache node's capacity all by itself. Skewed enough traffic turns "the cache layer" into "that one cache shard" — the sharding lesson picks this up.

Staleness is a budget, not a bug

Every caching strategy is an answer to one question: how stale is acceptable?

Staleness budget	Mechanism
Forever (immutable)	Cache on write, never invalidate
Minutes	TTL — simplest, self-healing, slightly stale
Seconds	Short TTL + event-driven invalidation
Zero	Write-through, or don't cache it

The classic failure is choosing the mechanism before the budget — a TTL on data with a zero-staleness requirement is a correctness bug wearing a performance costume.

Where the cache sits

interactive

CDN hit ratio70%App cache hit ratio85%Read traffic20k rps

hits the CDN20,000 rps

misses to app cache6,000 rps

misses to the database900 rps

The database sees 4.5% of read traffic — layered misses multiply: (1−0.70) × (1−0.85) = 0.045.

The full simulation below adds the dimension the multiplier hides — time. Drag the hit ratio down and watch the origin light up — then flush the cache and meet the herd.

Caching: Hit Ratios Are Claims, Not Facts

What a hit ratio really is

Misses are the design

Staleness is a budget, not a bug

Where the cache sits

Check yourself

Caching: Hit Ratios Are Claims, Not Facts

What a hit ratio really is

Misses are the design

Staleness is a budget, not a bug

Where the cache sits

Check yourself