beprodready
System Design Fundamentals

Back-of-Envelope Capacity Math

lesson 1 of 4 · 8 min

Every system design conversation that matters eventually comes down to four numbers: how many requests, how big, how fast, and how much. Engineers who can produce those numbers in 60 seconds drive the room. This lesson is the 60 seconds.

From users to requests per second

Start with the only number product ever gives you — users — and convert:

  • 100M requests/day ÷ 86,400 seconds ≈ 1,150 rps average. Memorize 86,400, or just use "~100k seconds per day" for mental math: 100M/100k = 1,000 rps. Close enough.
  • Traffic is never flat. Daily peaks run 2–3x the average for consumer products (lunch, evenings), much spikier for events. Plan capacity against the peak, judge cost against the average.
  • Read/write ratio shapes everything. A 100:1 read-heavy system (feeds, catalogs, redirects) is a caching problem. A 2:1 system (chat, checkout, telemetry) is a database problem. Get this ratio before drawing a single box.

Run the conversion yourself — drag the numbers and watch what "plan for peak" actually means:

interactive

average

1,157 rps

peak (plan for this)

2,894 rps

peak reads

2,604 rps

peak writes

289 rps

Mental shortcut: a day is ~100k seconds, so 100M/day ≈ 1,000 rps average before the peak factor.

The queueing cliff

The single most important graph in systems engineering is latency versus utilization, and it is not a line — it's a hockey stick. Waiting time grows roughly like:

latency ≈ base_latency / (1 - utilization)

At 50% utilization you pay 2x base latency. At 80%, 5x. At 95%, 20x. This is why a component "only at 90%" is already an incident in progress: your p99 has left the building long before the load chart hits the ceiling. Trace the curve yourself — find the spot where a 20ms service quietly becomes a 400ms service:

interactive

0%98%latency multiplier →

waiting multiplier

3.3× base latency

a 20ms service now takes

~67ms

In beprodready simulations you'll watch components glow amber at 80% — that's the cliff edge, not a comfort zone.

Headroom is not waste

Size every tier so that it survives two things:

  1. The peak, not the average — see above.
  2. The loss of one instance. With 2 instances at 90% each, one failure sends the survivor to 180% — instant collapse. With 3 instances at 60%, one failure means 90% — ugly p99, but alive.

Don't take that on faith. Set up a tier and kill an instance:

interactive

60%

60%

60%

Kill an instance to see whether the survivors can carry its share.

The standard answer is ~2x headroom over peak at the tier level. More than that and you're gold-plating — which costs real money every month and is exactly what a Production Readiness Review will dock you for.

Don't forget storage

Requests get the attention; bytes send the invoices. The estimate is one multiplication — records per day × bytes per record × retention — but the order of magnitude of the answer decides your architecture. A few GB fits anywhere; tens of TB means partitioning and lifecycle policies; a PB means object storage and a dedicated team. Run your own workload:

interactive

per day

10.0 GB

per year

3.6 TB

total at 3yr

10.9 TB

This fits a single well-provisioned database — don't shard what a disk can hold.

Two classic mistakes this catches: sharding a database that holds 40 GB (a disk yawns at that), and promising "we keep everything forever" on a workload that compounds into petabytes by year three.

The cost line

Every box you draw has a monthly bill, and capacity scales it: twice the instances or twice the instance size is twice the cost. Senior engineers carry a rough price list in their heads (a small relational DB ~$600/mo, an app instance ~$120/mo, a cache node ~$200/mo — the orders of magnitude matter more than the digits). When someone proposes 20 app servers "to be safe," the question isn't whether it works — it's whether $2,400/month of safety is buying anything the traffic math justifies.

Putting it together

A worked example, the kind you should be able to do on a whiteboard:

"We expect 50M feed loads/day, 90% reads."

50M/100k ≈ 500 rps average → ~1,500 rps peak. Reads ≈ 1,350 rps — cacheable, so origin sees maybe 10–30% of that depending on hit ratio. Writes ≈ 150 rps — that's the durable-store load. An app tier of 2 × 2,000-rps instances covers peak with headroom; one cache node yawns at this volume; a single primary DB handles 150 writes/sec easily — the replica is for failure, not load.

Four numbers, one minute, and you already know which axis the design will be graded on.

Try it live: open any challenge, place an app server, set traffic in your head against its 2,000 rps capacity, and run the simulation. Watch where the math meets the queueing cliff.

Check yourself

Q1 · A service gets 100M requests/day. What's a sensible steady-state RPS to plan around?

Q2 · A component runs at 95% utilization in steady state. What happens to its latency?

Q3 · Why do architects size for ~2x headroom instead of exact capacity?