beprodready
Build Your Own Rate Limiter

Stage 1: Token Bucket

stage 1 of 3 · ~20 min · runs in your browser

Objective:Implement the token bucket algorithm and understand why it allows controlled bursting — the property that makes it the default choice for API rate limiting.

token bucket algorithmburst allowance vs steady-state ratetime-based refillrate limiting vs throttlingleaky bucket comparison

A token bucket has two properties:

  • capacity — the maximum number of tokens (= max burst size)
  • rate — tokens added per second (= sustained throughput limit)

Tokens are added continuously at rate per second, up to capacity. The bucket starts full. Each request consumes one token. If there are no tokens, the request is rejected.

Why "burst allowance" matters: If your service is quiet for 10 seconds with a capacity=100 bucket, you accumulate 100 tokens. A client can then send 100 requests immediately. This is intentional — token bucket distinguishes between a sudden legitimate spike (pent-up demand) and sustained overload.

Contrast with leaky bucket: Leaky bucket enforces a strictly uniform output rate — no burst at all. Token bucket allows bursting up to capacity. API rate limiters almost always use token bucket for this reason.


Implement create_token_bucket(rate, capacity) returning a dict with:

  • consume(n=1) — attempt to consume n tokens; return True if successful, False if insufficient tokens
  • tokens — current token count (float, for testing)

Important constraints:

  • The bucket starts at full capacity
  • Tokens accumulate based on real elapsed time between calls
  • Tokens cap at capacity (bucket can't overfill)
  • consume(n) is atomic — either all n tokens are consumed or none
Why this matters in production

AWS API Gateway, Stripe, Twilio, and most REST APIs use token bucket rate limiting under the hood. It's the algorithm behind the X-RateLimit-Remaining header you see in every API response. The "burst allowance" is what lets your app handle a burst of legitimate requests right after a quiet period without getting throttled.

tests (7)

starts at full capacity

consumes tokens on allowed requests

rejects when bucket is empty

rejects and does not subtract tokens on failure

refills over time

caps at capacity even after long idle

consume(n) bulk deduction

create_token_bucket.py· Pyodide
First run loads Pyodide (~10s)