classic system design

Design a distributed rate limiter

Enforce tenant and user quotas across API gateways without turning Redis into the only thing that matters.

quota modelingRedis countershot-key mitigationidempotency

Prompt

Design a rate limiter for a public API used by thousands of tenants. The limiter must support per-user, per-tenant, and global quotas with predictable behavior during traffic spikes.

Clarifying questions

  • Are limits hard rejects or soft throttles with a retry-after hint?
  • Do tenants need custom quota windows or a small fixed menu?
  • Should internal service traffic share the same global budget?

Functional requirements

  • Evaluate allow or deny decisions on every API request.
  • Support tenant, user, endpoint, and global policy scopes.
  • Return remaining quota and retry-after metadata to callers.

Nonfunctional requirements

  • Keep limiter decision latency under 10 ms at the gateway.
  • Fail closed for abuse-heavy tenants and fail open for trusted internal health checks.
  • Sustain short bursts without losing the long-term quota guarantee.

Scale assumptions

  • 50,000 peak requests per second across all tenants.
  • Top 1 percent of tenants produce half of traffic.
  • Most policies use one-minute and one-hour windows.

API sketch

  • POST /v1/limit/check { tenantId, userId, route, requestId } -> { allowed, retryAfterMs, remaining }
  • PUT /v1/limit/policies/{tenantId} for policy updates by administrators.

Data model

  • Policy table keyed by tenantId and route pattern.
  • Counter keys shaped as limit:{scope}:{id}:{windowStart}.
  • Decision log keyed by requestId for idempotent gateway retries.

Architecture components

  • API gateway calls a local limiter library before forwarding traffic.
  • The library uses Redis scripts for atomic increment and expiry.
  • Policy changes flow through a config service and local gateway cache.

Bottlenecks

  • Hot tenants can concentrate all writes on a small set of counter keys.
  • Policy cache misses can put the control-plane database on the request path.

Failure modes

  • Redis timeout: apply tenant-specific fallback policy and emit a degraded-mode metric.
  • Duplicate gateway retry: reuse the requestId decision record.
  • Policy propagation delay: include policy version in decision logs.

Observability

  • Limiter latency p50/p95/p99 by gateway and tenant tier.
  • Allowed, denied, fallback, and Redis-timeout decision counts.
  • Top hot keys and policy cache hit rate.

Security / privacy

  • Keep user identifiers hashed in logs.
  • Restrict policy mutation to tenant admins and platform operators.

Cost considerations

  • Redis memory scales with active scope-window pairs, not with all registered tenants.
  • High-cardinality metrics need tenant-tier labels rather than raw tenant ids.

Tradeoffs

  • Token bucket handles bursts better than fixed windows but is harder to explain to tenants.
  • Approximate local prechecks reduce Redis load but can produce short overages.

Rubric

CriterionWeightEvidence
Separates product behavior from infrastructure assumptions before drawing boxes.
clarification
10The answer names users, write paths, read paths, retention, and what is explicitly out of scope.
Turns traffic and data assumptions into concrete sizing constraints.
scale
15Uses RPS, storage growth, hot-key risk, fanout, latency budget, or memory budget where relevant.
Draws clear service, cache, queue, and storage boundaries with reasons for each split.
architecture
20The component diagram has one owner per responsibility and names the synchronous path.
Defines durable state, indexes, keys, and idempotency records.
data
15Tables or collections include primary keys, lookup paths, TTLs, and consistency expectations.
Names failure modes and the recovery behavior users see.
failure
15Covers partial outages, retries, duplicate work, stale reads, overload, and backfill.
Defines the small set of metrics and traces needed to debug the design.
observability
10Includes SLIs, saturation metrics, queue lag, error classes, and an alert tied to user harm.
Explains what is being sacrificed and why that sacrifice fits the prompt.
tradeoffs
15Compares at least two viable designs and names the losing design's advantage.