Faizan · Writing

Faizan · WritingNotes on real-time LLM systems, retrieval, and agent safety.https://faizanraza.dev/en-usSteady Statehttps://faizanraza.dev/writing/steady-state/https://faizanraza.dev/writing/steady-state/Six months, a model release most weeks, and the ground never stopped moving. The through-line: production AI stopped being a modeling problem and became a control-systems problem.Sat, 04 Jul 2026 00:00:00 GMTLLM systemscontrol systemsreliabilitymanifestoThe Frontier Is Now a Menuhttps://faizanraza.dev/writing/the-frontier-is-a-menu/https://faizanraza.dev/writing/the-frontier-is-a-menu/GPT-5.6 shipped as three tiers. Claude comes in Fable and Sonnet. The labs unbundled 'the best model' into a price-quality menu, and your new job is per-request capital allocation.Wed, 01 Jul 2026 00:00:00 GMTLLM systemsroutingeconomicsstrategyAgentjacking Was Inevitablehttps://faizanraza.dev/writing/agentjacking-was-inevitable/https://faizanraza.dev/writing/agentjacking-was-inevitable/Fake Sentry errors hijacked coding agents at 2,388 orgs with an 85% success rate. No malware, no phishing, every step authorized. This is what happens when data and instructions share a channel.Thu, 18 Jun 2026 00:00:00 GMTsecurityagentsprompt injectionproductionLong Context Didn't Kill Retrievalhttps://faizanraza.dev/writing/long-context-didnt-kill-retrieval/https://faizanraza.dev/writing/long-context-didnt-kill-retrieval/Million-token windows killed lazy retrieval, not retrieval. Context is a budget you allocate under a latency SLO, and 'stuff everything in' is the least defensible allocation there is.Thu, 04 Jun 2026 00:00:00 GMTRAGlong contextlatencyLLM systemsA Hundred Agents Is Not a Planhttps://faizanraza.dev/writing/a-hundred-agents-is-not-a-plan/https://faizanraza.dev/writing/a-hundred-agents-is-not-a-plan/Swarm architectures multiply an unreliable unit and call it scale. The math of chained success rates says the opposite: fewer agents, tighter loops, structural correction.Tue, 26 May 2026 00:00:00 GMTagentsmulti-agentreliabilityarchitectureThe Subsidy Era Is Endinghttps://faizanraza.dev/writing/the-subsidy-era-is-ending/https://faizanraza.dev/writing/the-subsidy-era-is-ending/Anthropic is projecting its first operating profit and OpenAI is reportedly prepping an S-1. When your suppliers start caring about margins, your architecture inherits the problem.Tue, 12 May 2026 00:00:00 GMTeconomicsindustrystrategyLLM systemsAssume the Benchmark Is Gamedhttps://faizanraza.dev/writing/the-benchmark-is-gamed/https://faizanraza.dev/writing/the-benchmark-is-gamed/Berkeley researchers showed every major agent benchmark can be exploited to near-perfect scores. Production telemetry says deployed agents succeed 56.6% of the time. Measure like an SRE instead.Thu, 30 Apr 2026 00:00:00 GMTevalsagentsbenchmarksreliabilityPilots Don't Die in Demos. They Die in Month Three.https://faizanraza.dev/writing/pilots-die-in-month-three/https://faizanraza.dev/writing/pilots-die-in-month-three/88% of enterprise agent pilots never reach production. The autopsy almost never says 'the model was too dumb.' It says nobody built the loop that keeps a working system working.Thu, 16 Apr 2026 00:00:00 GMTagentsproductionenterprisereliabilityToken Prices Are Collapsing. Your AI Bill Isn't.https://faizanraza.dev/writing/cost-per-successful-task/https://faizanraza.dev/writing/cost-per-successful-task/Prices per token fall up to 900x a year while agentic tasks burn 5-30x more tokens. The metric that decides your unit economics is cost per successful task, and its biggest lever is reliability.Tue, 24 Mar 2026 00:00:00 GMTeconomicsagentsreliabilityLLM systemsMCP Won. Now Comes the Hard Part.https://faizanraza.dev/writing/mcp-won-now-the-hard-part/https://faizanraza.dev/writing/mcp-won-now-the-hard-part/97 million monthly downloads, 9,400 servers, 30+ CVEs in eight weeks. The protocol standardized the easy 20%. Production is the other 80%, and I've shipped it.Tue, 10 Mar 2026 00:00:00 GMTMCPagentssecurityproductionLLM systemsThe Model Is Not Your Moathttps://faizanraza.dev/writing/the-model-is-not-your-moat/https://faizanraza.dev/writing/the-model-is-not-your-moat/A dozen frontier releases in 28 days means a lead now has a half-life of weeks. The durable asset is everything around the model: evals, routing, data, rollback.Thu, 26 Feb 2026 00:00:00 GMTLLM systemsstrategyevalsindustryDesigning SLO-Aware RAGhttps://faizanraza.dev/writing/designing-slo-aware-rag/https://faizanraza.dev/writing/designing-slo-aware-rag/Why production retrieval should treat latency and cost as constraints to control, not numbers to hope for, and how difficulty-adaptive routing gets there.Tue, 10 Feb 2026 00:00:00 GMTRAGLLM systemslatencySLOsA Year After R1https://faizanraza.dev/writing/a-year-after-r1/https://faizanraza.dev/writing/a-year-after-r1/DeepSeek-R1 was a pricing event disguised as a research event. One year later, the weights are the weapon and the price floor is the wound.Tue, 27 Jan 2026 00:00:00 GMTopen weightseconomicsLLM systemsindustry