16  The Serving Problem

Prefill versus decode, latency versus throughput, goodput, and the KV cache as the central resource.

NoteStatus

Outline. Source: new. See INTEGRATION.md.

16.1 Problem

16.2 Design

16.3 Evolution

16.4 Trade-offs

16.5 Implementation

TipConstraint arrow

The KV cache sized in the architecture chapter is the resource this layer schedules.

16.6 Further reading

NoteTODO

Establish the seminal, frontier, and primary-source anchors for this chapter.