16 The Serving Problem
Prefill versus decode, latency versus throughput, goodput, and the KV cache as the central resource.
NoteStatus
Outline. Source: new. See INTEGRATION.md.
16.1 Problem
16.2 Design
16.3 Evolution
16.4 Trade-offs
16.5 Implementation
TipConstraint arrow
The KV cache sized in the architecture chapter is the resource this layer schedules.
16.6 Further reading
NoteTODO
Establish the seminal, frontier, and primary-source anchors for this chapter.