classic system design
Design a search indexing system
Keep a searchable index fresh while writes, deletes, and ranking signals arrive from different services.
inverted indexeschange data capturebackfillsfreshness SLIs
Prompt
Design the indexing pipeline for a marketplace search product. Listings change frequently, and search results need both freshness and stable relevance.
Clarifying questions
- What freshness delay is acceptable for new and deleted listings?
- Which fields affect ranking versus filtering?
- Can the search service serve stale results during reindexing?
Functional requirements
- Ingest listing creates, updates, deletes, and ranking signals.
- Build and serve indexes for keyword search and filters.
- Run full reindexes without taking search offline.
Nonfunctional requirements
- Make deletes visible quickly enough to avoid user harm.
- Keep query latency stable during index refreshes.
- Support rollback after a bad ranking or schema change.
Scale assumptions
- 100 million active listings.
- Two million listing updates per hour during peak import windows.
- Search QPS is highly diurnal.
API sketch
- POST /internal/index-events { listingId, eventType, version }
- GET /v1/search?q=&filters=&cursor= -> ranked listing ids.
Data model
- listing_source(listing_id, version, fields, deleted_at).
- index_documents(listing_id, version, tokens, filters, ranking_features).
- index_builds(build_id, schema_version, status, started_at, promoted_at).
Architecture components
- Change data capture writes versioned events to an indexing stream.
- Index workers transform source records into search documents.
- Serving layer uses blue-green index builds for schema changes.
Bottlenecks
- Full reindex jobs compete with incremental updates for worker capacity.
- Large filter dimensions can bloat postings lists.
Failure modes
- Bad transform deploy: stop promotion and replay from last good event offset.
- Event gap: compare source database versions against indexed versions.
- Partial index build: keep previous index active until validation passes.
Observability
- Index freshness p95, delete freshness, transform error rate, query p99.
- Coverage checks comparing source counts to indexed counts.
Security / privacy
- Remove private fields before indexing.
- Respect deletion and takedown events ahead of ordinary ranking freshness.
Cost considerations
- Index replicas trade query latency against storage and memory cost.
- Backfills need budget caps so they do not starve incremental indexing.
Tradeoffs
- Near-real-time indexing improves freshness but makes ranking experiments harder to reproduce.
- Batch builds are reproducible but can leave users waiting for updates.
Rubric
| Criterion | Weight | Evidence |
|---|---|---|
Separates product behavior from infrastructure assumptions before drawing boxes. clarification | 10 | The answer names users, write paths, read paths, retention, and what is explicitly out of scope. |
Turns traffic and data assumptions into concrete sizing constraints. scale | 15 | Uses RPS, storage growth, hot-key risk, fanout, latency budget, or memory budget where relevant. |
Draws clear service, cache, queue, and storage boundaries with reasons for each split. architecture | 20 | The component diagram has one owner per responsibility and names the synchronous path. |
Defines durable state, indexes, keys, and idempotency records. data | 15 | Tables or collections include primary keys, lookup paths, TTLs, and consistency expectations. |
Names failure modes and the recovery behavior users see. failure | 15 | Covers partial outages, retries, duplicate work, stale reads, overload, and backfill. |
Defines the small set of metrics and traces needed to debug the design. observability | 10 | Includes SLIs, saturation metrics, queue lag, error classes, and an alert tied to user harm. |
Explains what is being sacrificed and why that sacrifice fits the prompt. tradeoffs | 15 | Compares at least two viable designs and names the losing design's advantage. |