classic system design
Design a distributed rate limiter
Enforce tenant and user quotas across API gateways without turning Redis into the only thing that matters.
quota modelingRedis countershot-key mitigationidempotency
Prompt
Design a rate limiter for a public API used by thousands of tenants. The limiter must support per-user, per-tenant, and global quotas with predictable behavior during traffic spikes.
Clarifying questions
- Are limits hard rejects or soft throttles with a retry-after hint?
- Do tenants need custom quota windows or a small fixed menu?
- Should internal service traffic share the same global budget?
Functional requirements
- Evaluate allow or deny decisions on every API request.
- Support tenant, user, endpoint, and global policy scopes.
- Return remaining quota and retry-after metadata to callers.
Nonfunctional requirements
- Keep limiter decision latency under 10 ms at the gateway.
- Fail closed for abuse-heavy tenants and fail open for trusted internal health checks.
- Sustain short bursts without losing the long-term quota guarantee.
Scale assumptions
- 50,000 peak requests per second across all tenants.
- Top 1 percent of tenants produce half of traffic.
- Most policies use one-minute and one-hour windows.
API sketch
- POST /v1/limit/check { tenantId, userId, route, requestId } -> { allowed, retryAfterMs, remaining }
- PUT /v1/limit/policies/{tenantId} for policy updates by administrators.
Data model
- Policy table keyed by tenantId and route pattern.
- Counter keys shaped as limit:{scope}:{id}:{windowStart}.
- Decision log keyed by requestId for idempotent gateway retries.
Architecture components
- API gateway calls a local limiter library before forwarding traffic.
- The library uses Redis scripts for atomic increment and expiry.
- Policy changes flow through a config service and local gateway cache.
Bottlenecks
- Hot tenants can concentrate all writes on a small set of counter keys.
- Policy cache misses can put the control-plane database on the request path.
Failure modes
- Redis timeout: apply tenant-specific fallback policy and emit a degraded-mode metric.
- Duplicate gateway retry: reuse the requestId decision record.
- Policy propagation delay: include policy version in decision logs.
Observability
- Limiter latency p50/p95/p99 by gateway and tenant tier.
- Allowed, denied, fallback, and Redis-timeout decision counts.
- Top hot keys and policy cache hit rate.
Security / privacy
- Keep user identifiers hashed in logs.
- Restrict policy mutation to tenant admins and platform operators.
Cost considerations
- Redis memory scales with active scope-window pairs, not with all registered tenants.
- High-cardinality metrics need tenant-tier labels rather than raw tenant ids.
Tradeoffs
- Token bucket handles bursts better than fixed windows but is harder to explain to tenants.
- Approximate local prechecks reduce Redis load but can produce short overages.
Rubric
| Criterion | Weight | Evidence |
|---|---|---|
Separates product behavior from infrastructure assumptions before drawing boxes. clarification | 10 | The answer names users, write paths, read paths, retention, and what is explicitly out of scope. |
Turns traffic and data assumptions into concrete sizing constraints. scale | 15 | Uses RPS, storage growth, hot-key risk, fanout, latency budget, or memory budget where relevant. |
Draws clear service, cache, queue, and storage boundaries with reasons for each split. architecture | 20 | The component diagram has one owner per responsibility and names the synchronous path. |
Defines durable state, indexes, keys, and idempotency records. data | 15 | Tables or collections include primary keys, lookup paths, TTLs, and consistency expectations. |
Names failure modes and the recovery behavior users see. failure | 15 | Covers partial outages, retries, duplicate work, stale reads, overload, and backfill. |
Defines the small set of metrics and traces needed to debug the design. observability | 10 | Includes SLIs, saturation metrics, queue lag, error classes, and an alert tied to user harm. |
Explains what is being sacrificed and why that sacrifice fits the prompt. tradeoffs | 15 | Compares at least two viable designs and names the losing design's advantage. |