All writing
·4 min read

Steady State

Six months, a model release most weeks, and the ground never stopped moving. The through-line: production AI stopped being a modeling problem and became a control-systems problem.

LLM systems control systems reliability manifesto

Take stock of the last six months. A serious model shipped most weeks. ChatGPT’s share of its own category slipped under 50% for the first time. Open weights crossed a majority of routed tokens on neutral routers. Benchmarks were shown to be gameable to near-perfect scores while deployed agents held near a coin flip in production. A whole new attack class, agentjacking, went from disclosure to 2,388 organizations. Two of the biggest labs turned toward profit and public markets. The frontier unbundled itself into a price menu.

If you tried to keep up by tracking which model is best, you spent six months being wrong on a rolling basis. If you tracked something else, the picture was oddly calm.

The thing that didn’t change

Underneath every one of those stories is the same shape, and it is the shape I have been circling in everything I wrote this half-year. The hard, durable, interesting engineering in AI is no longer inside the model. It is in the loop that keeps a system correct while the model, the data, the prices, and the threats all move underneath it.

“Steady state” is a term from control theory: the condition a well-designed system converges to and holds under continuous disturbance. Not a system that never gets pushed. A system that gets pushed constantly and stays within its bounds anyway. That is the honest description of what a production AI system has to be in 2026, because the disturbances are not going to stop. New models are a disturbance. Drifting knowledge is a disturbance. Adversarial inputs are a disturbance. Shifting prices are a disturbance. The question was never how to avoid them. It is what loop holds the system steady while they arrive.

Four stages, one loop

My research turned into an argument that the loop has four stages, and reading back through the last six months, every essay was really about one of them.

The production-AI loop: ingest fresh knowledge, adapt without forgetting, retrieve under a budget, correct from failures, and feed corrections back into the next cycle. This is the control loop the whole half-year keeps pointing at.

Ingest, under a freshness SLO. The pilots that die in month three die here first, answering from a world five weeks stale. Freshness is not a batch job you run quarterly; it is a service objective with a target and a clock.

Adapt, without forgetting. New knowledge that erases old competence is not learning, it is trading one failure for another. Doing both at once, on a budget, was the point of STAR+FAR. The whole model-is-not-your-moat argument is this stage at the system level: absorb the new without losing what worked.

Retrieve, under a budget. Not “stuff the window,” not “prompt and pray,” but allocate evidence per request against a latency and cost SLO. That is SAGE, and it is why long context didn’t kill retrieval: a bigger window widens the allocation problem, it does not delete it. The menu of model tiers is the same allocation one level up.

Correct, structurally. When failures cluster, fix the pipeline, not the sentence. CAFO was my evidence that structural correction beats re-rolling the answer, and it is the discipline the gamed benchmarks and the 88% pilot graveyard are both missing. The agentjacking defense lives here too: correction includes the boundary that keeps a fooled agent from doing damage.

And then the loop closes: corrections feed the next ingestion, better knowledge eases adaptation, sharper adaptation lightens retrieval, cheaper retrieval frees budget for correction. The stages are not a pipeline you run once. They are a cycle you hold in steady state.

Why this is the good news

It would be easy to read six months of relentless releases as exhausting, and the timelines certainly framed it that way. I read it as clarifying. If the model were the product, none of us could keep up; the half-life of a lead is now measured in weeks and no team can out-run twelve labs. But the model is not the product. The loop is the product, and the loop is exactly the thing that does not churn every Tuesday. Evals, routing, freshness, correction, boundaries: these compound. You can actually build a career, and a company, on top of them, precisely because they sit still while the models blur past.

That is also why I named this site after the idea, and why its one animated flourish is a signal trace holding under its ceiling while it gets pushed. It is not decoration. It is the entire thesis in one line: the interesting thing is not how high the signal goes. It is that the system holds it steady while the world does everything it can to knock it loose.

The models will keep coming, most weeks, better each time. Let them. Build the loop that stays steady while they arrive, and you have built the one thing in this field that a new release cannot obsolete: a system that meets its objectives, on purpose, under load, indefinitely. Steady state. That is the whole job now.