The Model Is Not Your Moat

Count February’s releases with me: Gemini 3.1 Pro. Claude Opus 4.6 and Sonnet 4.6. GPT-5.3 Codex. Grok 4.20. GLM-5. MiniMax M2.5. Mercury 2, a diffusion language model, because apparently even the architecture is negotiable now. ByteDance shipped two Seed 2.0 variants. That is roughly a dozen serious models in 28 days, and it is not an unusual month anymore. It is the new cadence.

Here is the question that matters if you build on these things: what, exactly, did you own on January 31st that you still own today?

Leads now have a half-life

For most of 2023 and 2024, “which model is best” was a stable fact you could build around. You picked the leader, tuned your prompts to its quirks, and the decision held for a couple of quarters. That world is gone. The gap between labs at the top is now weeks wide, and it changes hands so often that any system whose value depends on which model is on top is depreciating at the same rate the leaderboard churns.

I want to be precise about what depreciates and what does not, because the popular version of this take (“models are commodities”) is wrong. Models are not commodities; the frontier is genuinely better than the floor, and picking well still matters. What depreciates is the coupling: every prompt hack tuned to one model’s quirks, every workflow that assumes one vendor’s tool-calling format, every quality bar that exists only as a vibe in the head of whoever demoed it last.

The leaderboard changes hands in weeks; the harness compounds. Schematic, not measured data.

What actually compounds

Four things get more valuable every time the leaderboard flips, precisely because the leaderboard flips.

Evals you trust. When a new model drops, the teams that win are the ones who can answer “is it better for us” by Friday. That answer comes from a regression suite built on your real traffic and your real failure modes, not from a benchmark screenshot. Your eval set is the contract between you and every future model. It is also, conveniently, the one thing no lab can ship for you.

A routing layer. If “swap the model” is a config change, release weeks are arbitrage opportunities. If it is a migration project, they are threats. The engineering is not exotic: an interface boundary, capability flags, and the discipline to never let business logic leak into prompt strings.

Your data flywheel. Interaction logs, failure taxonomies, difficulty labels, domain corpora. Every model consumes these; none of them replaces them.

Rollback discipline. The willingness to say “the new hotness regressed our p95 and we are going back” within an hour, because you kept the old path warm.

I learned this the way you learn most systems lessons: by being saved by it. The automation system I shipped at Zscaler last summer went through model swaps during development, and the reason they were boring is that the system’s logic never lived in the model. It lived in the orchestration, the state, and the checks around the model. Same story in my research: SAGE routes retrieval using signals that never touch the generator, which is exactly why one trained policy survived four different model families with zero retraining. Control logic that lives outside the model outlives the model.

The uncomfortable inversion

Here is the part I find genuinely interesting, not just tactically useful. We spent two years asking “what is your moat against the model getting better?” That question is backwards now. The model getting better is the tide, and it lifts everyone identically, which means it advantages no one. The moat question in 2026 is: what do you own that gets more valuable when a better model appears?

Run your own stack through that filter. Prompts: less valuable (the new model wants different ones). Fine-tunes: often stranded (they are married to a base). Vector indexes: neutral. Evals, routing, telemetry, domain data, correction loops: more valuable, every single time.

That filter is also, quietly, a hiring filter. The scarce skill this year is not prompting and it is not training. It is the ability to build the harness: the measurement, the budgets, the fallbacks, the loop that catches a regression before your users do. Model capability is rented. The harness is owned. Build like the lease expires monthly, because as of this month, it does.