ML system design

Design a vector search service

Serve nearest-neighbor retrieval with filtering, index refresh, recall measurement, and cost controls.

nearest neighborsmetadata filteringindex refreshrecall evaluation

Prompt

Design a vector search service used by search, recommendations, and RAG products. It must support metadata filters and frequent index updates.

Write path stores vector records and schedules index updates.
Serving fleet keeps active index shards in memory with metadata filter side indexes.
Evaluation jobs compare approximate results to exact-search samples.

Bad index build: keep previous index version active.
Shard unavailable: route to replica and mark recall-risk if fallback is partial.
Embedding version mismatch: reject mixed-version queries unless explicitly configured.

Pre-filtering protects permissions but can hurt recall for sparse filters.
Post-filtering is fast for broad filters but can leak candidates if implemented carelessly.

Embedding version lineage must match query vectors and indexed vectors.
Recall evaluation needs held-out queries and exact-search baselines.
Retrieval drift can occur after corpus changes even when the model is unchanged.

Criterion	Weight	Evidence
Separates product behavior from infrastructure assumptions before drawing boxes. clarification	10	The answer names users, write paths, read paths, retention, and what is explicitly out of scope.
Turns traffic and data assumptions into concrete sizing constraints. scale	15	Uses RPS, storage growth, hot-key risk, fanout, latency budget, or memory budget where relevant.
Draws clear service, cache, queue, and storage boundaries with reasons for each split. architecture	20	The component diagram has one owner per responsibility and names the synchronous path.
Defines durable state, indexes, keys, and idempotency records. data	15	Tables or collections include primary keys, lookup paths, TTLs, and consistency expectations.
Names failure modes and the recovery behavior users see. failure	15	Covers partial outages, retries, duplicate work, stale reads, overload, and backfill.
Defines the small set of metrics and traces needed to debug the design. observability	10	Includes SLIs, saturation metrics, queue lag, error classes, and an alert tied to user harm.
Explains what is being sacrificed and why that sacrifice fits the prompt. tradeoffs	15	Compares at least two viable designs and names the losing design's advantage.
Covers the model, data, evaluation, deployment, and monitoring loop as one system. ml-specific	20	The answer includes lineage, offline eval, online eval, rollback, freshness, and drift handling.