ML platform prompts where the answer has to cover data, serving, evaluation, lineage, monitoring, rollback, and cost. A diagram alone is not enough.
Build retrieval, generation, citation, and evaluation loops that do not collapse into a demo prompt.
Serve models with versioning, autoscaling, canaries, and rollback paths that operators can trust.
Keep online and offline features aligned enough that model scores mean what training said they meant.
Run offline, online, and regression evaluations with lineage strong enough to stop bad releases.
Serve nearest-neighbor retrieval with filtering, index refresh, recall measurement, and cost controls.
Coordinate data, checkpoints, accelerators, and recovery for large training jobs.