classic system design

Design a distributed rate limiter

Enforce tenant and user quotas across API gateways without turning Redis into the only thing that matters.

quota modelingRedis countershot-key mitigationidempotency

Prompt

Design a rate limiter for a public API used by thousands of tenants. The limiter must support per-user, per-tenant, and global quotas with predictable behavior during traffic spikes.

Clarifying questions

Are limits hard rejects or soft throttles with a retry-after hint?
Do tenants need custom quota windows or a small fixed menu?
Should internal service traffic share the same global budget?

Functional requirements

Evaluate allow or deny decisions on every API request.
Support tenant, user, endpoint, and global policy scopes.
Return remaining quota and retry-after metadata to callers.

Nonfunctional requirements

Keep limiter decision latency under 10 ms at the gateway.
Fail closed for abuse-heavy tenants and fail open for trusted internal health checks.
Sustain short bursts without losing the long-term quota guarantee.

Scale assumptions

50,000 peak requests per second across all tenants.
Top 1 percent of tenants produce half of traffic.
Most policies use one-minute and one-hour windows.

API sketch

POST /v1/limit/check { tenantId, userId, route, requestId } -> { allowed, retryAfterMs, remaining }
PUT /v1/limit/policies/{tenantId} for policy updates by administrators.

Data model

Policy table keyed by tenantId and route pattern.
Counter keys shaped as limit:{scope}:{id}:{windowStart}.
Decision log keyed by requestId for idempotent gateway retries.

Architecture components

API gateway calls a local limiter library before forwarding traffic.
The library uses Redis scripts for atomic increment and expiry.
Policy changes flow through a config service and local gateway cache.

Bottlenecks

Hot tenants can concentrate all writes on a small set of counter keys.
Policy cache misses can put the control-plane database on the request path.

Failure modes

Redis timeout: apply tenant-specific fallback policy and emit a degraded-mode metric.
Duplicate gateway retry: reuse the requestId decision record.
Policy propagation delay: include policy version in decision logs.

Observability

Limiter latency p50/p95/p99 by gateway and tenant tier.
Allowed, denied, fallback, and Redis-timeout decision counts.
Top hot keys and policy cache hit rate.

Security / privacy

Keep user identifiers hashed in logs.
Restrict policy mutation to tenant admins and platform operators.

Cost considerations

Redis memory scales with active scope-window pairs, not with all registered tenants.
High-cardinality metrics need tenant-tier labels rather than raw tenant ids.

Tradeoffs

Token bucket handles bursts better than fixed windows but is harder to explain to tenants.
Approximate local prechecks reduce Redis load but can produce short overages.

Rubric

Criterion	Weight	Evidence
Separates product behavior from infrastructure assumptions before drawing boxes. clarification	10	The answer names users, write paths, read paths, retention, and what is explicitly out of scope.
Turns traffic and data assumptions into concrete sizing constraints. scale	15	Uses RPS, storage growth, hot-key risk, fanout, latency budget, or memory budget where relevant.
Draws clear service, cache, queue, and storage boundaries with reasons for each split. architecture	20	The component diagram has one owner per responsibility and names the synchronous path.
Defines durable state, indexes, keys, and idempotency records. data	15	Tables or collections include primary keys, lookup paths, TTLs, and consistency expectations.
Names failure modes and the recovery behavior users see. failure	15	Covers partial outages, retries, duplicate work, stale reads, overload, and backfill.
Defines the small set of metrics and traces needed to debug the design. observability	10	Includes SLIs, saturation metrics, queue lag, error classes, and an alert tied to user harm.
Explains what is being sacrificed and why that sacrifice fits the prompt. tradeoffs	15	Compares at least two viable designs and names the losing design's advantage.