AI as an Infrastructure
From Systems to Agents: History, Design Decisions, and Foundations
Preface
This book treats artificial intelligence as an infrastructure: a layer that other software depends on, that must stay up, stay correct, stay affordable, and stay observable. It follows one continuous arc, the lifecycle of a capability, from raw compute to a deployed and governed behavior, and at every step it asks not only how the piece works but why it has the shape it does, what it gave up to get there, and what foundation it rests on.
A reader should finish a chapter able to answer “why is it built this way?”, not only “how do I run it?”. We reach that question the same way through every topic: state the problem and its constraints, give the design that answers it, trace how that design evolved and what it superseded, name the trade-offs, and only then go to implementation. The authoring conventions are in CONVENTIONS.md.
How this book is organized
The spine is the lifecycle of a capability, read as a stack from compute to deployed behavior.
- Part 0, Orientation. The whole stack in one pass, and how to read the book.
- Part I, Foundations and Pretraining. Scaling, data, tokenization, architecture, and training at scale.
- Part II, Adaptation and Alignment. Fine-tuning, RLHF, preference optimization, and self-improvement.
- Part III, Reasoning and Test-Time Compute. Eliciting and training reasoning, and inference-time scaling.
- Part IV, Inference and Serving. The serving problem, scheduling, faster decoding, quantization, and long context.
- Part V, Orchestration. Agents, memory, the harness, multi-agent systems, retrieval, and context.
- Part VI, Evaluation. Benchmarks, judging, and evaluating agents.
- Part VII, Infrastructure and Systems. Accelerators, networking, and the orchestration around a run.
- Part VIII, Safety, Interpretability, and Governance. Interpretability, oversight, and agent security.
- Part IX, Ecosystem and Economics. The model landscape, tooling, and the cost structure that feeds back into every choice above.
Two motifs run through the book. The three loops, training, inference, and agentic, are a recurring way to see the same control structure at different layers. The capability, efficiency, trust lens closes each chapter: can it do the task, at what cost, and can you trust it. Watch for the constraint arrows, the places where a lower layer dictates an upper layer’s choice, because they are the payoff of reading the stack in order.
Who this is for
Engineers who build and operate AI systems, and researchers who want to see how their theory lands in production. We assume comfort with one programming language, basic probability and linear algebra, and the ordinary tools of running software in production.
This book is a work in progress. Chapters carry outlines, source pointers, and open questions marked as TODO while drafting proceeds. Each chapter closes the live debates in a “what’s contested” box rather than papering over them.