classic system design

Design a search indexing system

Keep a searchable index fresh while writes, deletes, and ranking signals arrive from different services.

inverted indexeschange data capturebackfillsfreshness SLIs

Prompt

Design the indexing pipeline for a marketplace search product. Listings change frequently, and search results need both freshness and stable relevance.

Clarifying questions

  • What freshness delay is acceptable for new and deleted listings?
  • Which fields affect ranking versus filtering?
  • Can the search service serve stale results during reindexing?

Functional requirements

  • Ingest listing creates, updates, deletes, and ranking signals.
  • Build and serve indexes for keyword search and filters.
  • Run full reindexes without taking search offline.

Nonfunctional requirements

  • Make deletes visible quickly enough to avoid user harm.
  • Keep query latency stable during index refreshes.
  • Support rollback after a bad ranking or schema change.

Scale assumptions

  • 100 million active listings.
  • Two million listing updates per hour during peak import windows.
  • Search QPS is highly diurnal.

API sketch

  • POST /internal/index-events { listingId, eventType, version }
  • GET /v1/search?q=&filters=&cursor= -> ranked listing ids.

Data model

  • listing_source(listing_id, version, fields, deleted_at).
  • index_documents(listing_id, version, tokens, filters, ranking_features).
  • index_builds(build_id, schema_version, status, started_at, promoted_at).

Architecture components

  • Change data capture writes versioned events to an indexing stream.
  • Index workers transform source records into search documents.
  • Serving layer uses blue-green index builds for schema changes.

Bottlenecks

  • Full reindex jobs compete with incremental updates for worker capacity.
  • Large filter dimensions can bloat postings lists.

Failure modes

  • Bad transform deploy: stop promotion and replay from last good event offset.
  • Event gap: compare source database versions against indexed versions.
  • Partial index build: keep previous index active until validation passes.

Observability

  • Index freshness p95, delete freshness, transform error rate, query p99.
  • Coverage checks comparing source counts to indexed counts.

Security / privacy

  • Remove private fields before indexing.
  • Respect deletion and takedown events ahead of ordinary ranking freshness.

Cost considerations

  • Index replicas trade query latency against storage and memory cost.
  • Backfills need budget caps so they do not starve incremental indexing.

Tradeoffs

  • Near-real-time indexing improves freshness but makes ranking experiments harder to reproduce.
  • Batch builds are reproducible but can leave users waiting for updates.

Rubric

CriterionWeightEvidence
Separates product behavior from infrastructure assumptions before drawing boxes.
clarification
10The answer names users, write paths, read paths, retention, and what is explicitly out of scope.
Turns traffic and data assumptions into concrete sizing constraints.
scale
15Uses RPS, storage growth, hot-key risk, fanout, latency budget, or memory budget where relevant.
Draws clear service, cache, queue, and storage boundaries with reasons for each split.
architecture
20The component diagram has one owner per responsibility and names the synchronous path.
Defines durable state, indexes, keys, and idempotency records.
data
15Tables or collections include primary keys, lookup paths, TTLs, and consistency expectations.
Names failure modes and the recovery behavior users see.
failure
15Covers partial outages, retries, duplicate work, stale reads, overload, and backfill.
Defines the small set of metrics and traces needed to debug the design.
observability
10Includes SLIs, saturation metrics, queue lag, error classes, and an alert tied to user harm.
Explains what is being sacrificed and why that sacrifice fits the prompt.
tradeoffs
15Compares at least two viable designs and names the losing design's advantage.