The Frontier Is Now a Menu
GPT-5.6 shipped as three tiers. Claude comes in Fable and Sonnet. The labs unbundled 'the best model' into a price-quality menu, and your new job is per-request capital allocation.
Look at what shipped in the last three weeks. OpenAI previewed GPT-5.6 not as a model but as three: Sol for hard problems, Terra for the everyday middle, Luna for high-volume cheap work. Anthropic put out Claude Fable 5, a frontier-tier model at roughly twice the price of its Opus line, and then Sonnet 5 a few weeks later at the workhorse tier. GLM-5.2 sits just under the frontier on coding benchmarks with the weights downloadable. The pattern is not subtle once you see it: nobody sells “the best model” anymore. They sell a menu.
This is a bigger shift than any single release, and most teams are still ordering like the menu has one item.
Unbundling was the only move
Why did every major lab converge on tiers in the same quarter? Because they are caught in the vise I keep coming back to: an open-weight price floor below them and, for the ones eyeing public markets, investors who want margin above them. A single flagship price cannot satisfy both a Fortune 500 buyer who needs the hardest reasoning under an SLA and a startup metering a million cheap classifications. So the flagship split. Sol and Fable capture the buyers who can justify frontier prices; Luna and the Flash and Lite tiers defend the floor against open weights. One lineup, every price point, margin preserved by segmentation instead of by a number that only goes down.
For them it is pricing strategy. For us it is a new job description.
Your job is capital allocation now
When there was one best model, model choice was a purchasing decision you made once a quarter. With a menu spanning maybe 30x in price between tiers, model choice becomes a per-request decision, and making it well is an optimization problem you run continuously. Which is to say: the interesting work moved from “which model is best” to “which model for this request,” and that is capital allocation, not procurement.
The failure mode is defaulting to the top-right corner: route everything to the flagship because it is the safe choice. That is the same mistake as retrieving twenty passages for every query or sending every agent step to the largest model. It works, and it quietly overpays on the large majority of requests that a middle or low tier would have cleared. In a menu world, “always order the most expensive thing” is not caution. It is an unmanaged budget.
Routing is the new fine-tuning
There is a satisfying symmetry here. A year ago, the way you specialized a system to your workload was fine-tuning: take one model, bend it toward your task. In a menu world, the way you specialize is routing: take your task, decompose it by difficulty, and send each piece to the tier that fits. Fine-tuning optimized a model to a workload. Routing optimizes a workload across models. The second is cheaper, more flexible, and survives the next release, because when the menu changes you re-point the router instead of re-training anything.
I have built this router twice now, once for retrieval depth in SAGE and once, in spirit, for model selection, and the recipe is the same both times. Estimate difficulty from cheap signals before committing the expensive resource. Set a quality bar per request. Choose the cheapest option that clears the bar. Keep a fallback warm for when it does not. The unit of the decision changed from “passages” to “model tier,” but the control loop is identical, and so is the payoff: most requests are easy, and paying frontier prices for easy requests is the most common expensive mistake in production AI.
What to build this quarter
Concretely, three things, none of which require picking a winner among the labs.
A capability router. An interface in front of the menu that picks a tier per request from difficulty signals, not a hardcoded default. This is the single highest- leverage component you can add to an LLM product right now.
Difficulty estimation on your traffic. You cannot route without it, and it is cheaper to build than you expect: shallow signals plus your own outcome logs get you most of the way. Your traffic’s difficulty distribution is also the business case for the router, sitting in your logs already.
Tier-boundary evals. For each pair of adjacent tiers, know the class of requests where the cheaper one is good enough. That boundary is your routing policy, and it is the artifact that keeps paying off every time a new tier appears on the menu.
The labs did the hard, expensive thing: they built a genuine spectrum of capability and priced it. They handed the allocation problem to us. Treating that as a burden is a mistake; it is the most tractable optimization in the whole stack, and unlike model quality, it is entirely in your control. The frontier is a menu now. Learn to order.