· 5 min read

Steady State

Six months, a model release most weeks, and the ground never stopped moving. The through-line: production AI stopped being a modeling problem and became a control-systems problem.

LLM systems control systems reliability
· 5 min read

The Frontier Is Now a Menu

GPT-5.6 shipped as three tiers. Claude comes in Fable and Sonnet. The labs unbundled 'the best model' into a price-quality menu, and your new job is per-request capital allocation.

LLM systems routing economics
· 5 min read

Agentjacking Was Inevitable

Fake Sentry errors hijacked coding agents at 2,388 orgs with an 85% success rate. No malware, no phishing, every step authorized. This is what happens when data and instructions share a channel.

security agents prompt injection
· 4 min read

Long Context Didn't Kill Retrieval

Million-token windows killed lazy retrieval, not retrieval. Context is a budget you allocate under a latency SLO, and 'stuff everything in' is the least defensible allocation there is.

RAG long context latency
· 4 min read

A Hundred Agents Is Not a Plan

Swarm architectures multiply an unreliable unit and call it scale. The math of chained success rates says the opposite: fewer agents, tighter loops, structural correction.

agents multi-agent reliability
· 4 min read

The Subsidy Era Is Ending

Anthropic is projecting its first operating profit and OpenAI is reportedly prepping an S-1. When your suppliers start caring about margins, your architecture inherits the problem.

economics industry strategy
· 4 min read

Assume the Benchmark Is Gamed

Berkeley researchers showed every major agent benchmark can be exploited to near-perfect scores. Production telemetry says deployed agents succeed 56.6% of the time. Measure like an SRE instead.

evals agents benchmarks
· 5 min read

Pilots Don't Die in Demos. They Die in Month Three.

88% of enterprise agent pilots never reach production. The autopsy almost never says 'the model was too dumb.' It says nobody built the loop that keeps a working system working.

agents production enterprise
· 4 min read

Token Prices Are Collapsing. Your AI Bill Isn't.

Prices per token fall up to 900x a year while agentic tasks burn 5-30x more tokens. The metric that decides your unit economics is cost per successful task, and its biggest lever is reliability.

economics agents reliability
· 5 min read

MCP Won. Now Comes the Hard Part.

97 million monthly downloads, 9,400 servers, 30+ CVEs in eight weeks. The protocol standardized the easy 20%. Production is the other 80%, and I've shipped it.

MCP agents security
· 4 min read

The Model Is Not Your Moat

A dozen frontier releases in 28 days means a lead now has a half-life of weeks. The durable asset is everything around the model: evals, routing, data, rollback.

LLM systems strategy evals
· 4 min read

Designing SLO-Aware RAG

Why production retrieval should treat latency and cost as constraints to control, not numbers to hope for, and how difficulty-adaptive routing gets there.

RAG LLM systems latency
· 5 min read

A Year After R1

DeepSeek-R1 was a pricing event disguised as a research event. One year later, the weights are the weapon and the price floor is the wound.

open weights economics LLM systems