Summary
We followed one capability up the stack, from the compute that trains it to the governance that contains it. At each layer we asked the same three questions: can it do the task, at what cost, and can you trust it. And at each layer we tried to leave the reader able to say why the piece has its shape, not only how to operate it.
The layers depend on each other in both directions, and the constraint arrows are where that shows. The serving cost of a token decides how far to over-train a model. The size of the key-value cache decides an attention variant. The harness around a model moves its evaluation score as much as its weights do. Read the stack in order and these dependencies stop being surprises and become the design.
Where to go next
- Take one system you operate and write its history: the problem it solves, the design it settled on, what it superseded, and the trade-off it encodes.
- Follow one capability from Part I to Part V, from a forward pass to an agent, and watch each layer’s foundations carry up.
- For any chapter, read the “what’s contested” box first, then the primary sources behind it, and decide where you stand.