Token Prices Are Collapsing. Your AI Bill Isn't.

Epoch’s price data says LLM inference costs are falling faster than nearly any computing commodity in history: depending on the capability level you track, between 9x and 900x per year. Meanwhile Gartner’s analysis this month says agentic workloads consume 5 to 30 times more tokens per task than a chatbot exchange, and inference now eats about 85% of enterprise AI budgets.

Both of those are the same story told from opposite ends, and if you only hear the first half you will budget wrong for the rest of the decade. Cheaper tokens do not mean cheaper systems. Cheaper tokens mean we immediately spend the savings on more ambitious loops. Jevons noticed this about coal in 1865; agents are the most Jevons technology ever shipped.

The unit that actually matters

Nobody buys tokens because they want tokens. You buy completed tasks. So the number that governs your unit economics is not price per million tokens, it is:

\text{cost per successful task} \;=\; \frac{p \cdot t}{s}

where $p$ is price per token, $t$ is tokens consumed per attempt, and $s$ is the rate at which attempts actually succeed. The industry celebrates $p$ falling. It politely ignores that $t$ is exploding by design (deeper reasoning, more tool calls, longer contexts, multi-step plans) and that $s$ , for autonomous work, is embarrassing.

That denominator is savage. At $2 per task-attempt and a 95% success rate, a completed task costs you $2.11. At the same price and a 50% success rate, it costs $4 if you give up after one try, and much more in practice, because failed agent runs are rarely detected instantly. They fail after the loop, after the retries, after burning the most tokens a run can burn. Failure is not just lost output. Failure is your most expensive token profile.

2</text> <text x="72" y="188" text-anchor="end">

4$695%85%75%65%55%task success rate →give up after 1 trywith one retry

Cost per successful task as reliability degrades, for a task whose single attempt costs $2, with one automatic retry on failure. Computed from the formula above; the curve is the point.

Reliability is a pricing strategy

This is the reframe I keep pushing on teams: stop treating reliability as a quality attribute you buy after the economics work. Reliability is how you make the economics work. Every point of success rate you gain does double duty: it removes wasted attempts from the numerator’s future and shrinks the denominator’s damage. Nothing on the price sheet compounds like that.

The same logic runs all the way down the stack, and it is where my own research lives. In SAGE we cut retrieval cost 51% not by finding cheaper tokens but by refusing to spend maximum effort on queries that did not need it, while holding the latency SLO that keeps answers usable. Effort allocated by difficulty, under a budget. The task level works the same way: spend on the attempts that can succeed, kill the runs that are already doomed, and measure everything in completed work.

Three habits fall out of taking the formula seriously:

Meter tasks, not tokens. Your dashboard’s top-line number should be cost per completed unit of work, with tokens as a diagnostic underneath. If you cannot define “completed,” that is not a metering problem, that is a product problem wearing a metering costume.

Make failure cheap. Doomed runs should die early. Progress checks, step budgets, and a controller willing to abort are worth more than a discount. An agent that detects its own dead end at step 3 instead of step 30 just gave you a 10x price cut on failure.

Route by difficulty. The 5-30x token multiplier is an average hiding a distribution. Most steps in most tasks are easy. Sending everything to the largest model at maximum reasoning depth is the agent-era version of retrieving twenty passages for “what year did X happen.”

The bill that never falls

Here is my prediction, and unlike most AI predictions it is cheerfully falsifiable: enterprise inference spend goes up every year for the rest of the decade, right through every price cut, and the teams that look brilliant will not be the ones with the cheapest tokens. They will be the ones whose cost per successful task falls while everyone else’s holds flat, because they treated $s$ as an engineering target instead of a fact of nature.

Token prices are a tide. Reliability is a boat. Budget accordingly.