17  Memory and Scheduling

Continuous batching, paged attention, prefix caching, and prefill / decode disaggregation.

NoteStatus

Outline. Source: new. See INTEGRATION.md.

17.1 Problem

17.2 Design

17.3 Evolution

17.4 Trade-offs

17.5 Implementation

17.6 Further reading

NoteTODO

Establish the seminal, frontier, and primary-source anchors for this chapter.