Foundation Models Plateau, Applications Take Off

Posted on 2026-04-07 In NLP 评论: Views:

Cursor's parent company Anysphere has about 150 employees. In November 2025, its ARR crossed $1 billion. OpenAI, as of early 2026, has 4,500 employees. Its 2025 revenue was $13.1 billion, but according to Fortune, it lost roughly $9 billion and doesn't expect to turn profitable until 2028.

An application company that trains zero models is outproducing, per capita, the company that trains them. This is the most telling signal in AI for 2025.

Models Are Getting Harder to Improve

The GPT-5 story begins with the failure that preceded it. According to The Information (Inside OpenAI's Rocky Path to GPT-5), OpenAI spent the second half of 2024 on a project called Orion, originally intended to be GPT-5. Orion failed to produce a meaningfully better model and was downgraded to GPT-4.5, released in February 2025.

The actual GPT-5 didn't ship until August 2025. MIT Technology Review's same-day assessment was blunt: "GPT-5 is, above all else, a refined product...it falls far short of the transformative AI future that Altman has spent much of the past year hyping." User feedback was even more awkward — many on Reddit said GPT-4o actually felt better. The auto-switcher broke on launch day, and Altman had to come out the next day to explain that "GPT-5 will seem smarter starting today."

Remember the leap from GPT-3.5 to GPT-4 — that moment of "wait, this thing actually works now." Then look at GPT-4 to GPT-5. Nearly every reviewer reached the same conclusion: better, but not by nearly as much. The Atlantic's take was the most honest: "At this stage of the AI boom, when every major chatbot is legitimately helpful in numerous ways, benchmarks, science, and rigor feel almost insignificant. What matters is how the chatbot feels."

Benchmarks keep going up, but the felt improvement keeps shrinking. The playbook of throwing more data, more compute, and more parameters at model quality hasn't stopped working — it's just yielding visibly diminishing returns.

Model Companies Are Voting with Their Feet

If outside reviews aren't convincing enough, look at what the model companies themselves are doing.

OpenAI's product line expanded rapidly over the past year: Codex (a coding agent), ChatGPT Agent (a general-purpose task agent), Atlas (an AI browser), Health (health consultation), and Shopping (product recommendations). More telling was the org chart change: they hired former Instacart CEO Fidji Simo as CEO of Applications. A company founded on model research created a separate CEO role for its application division.

Anthropic made equally clear moves. Claude Code went GA in May 2025 with VS Code and JetBrains integrations. In December they acquired Bun, a JavaScript runtime, to boost Claude Code's performance. In January 2026 they launched a Labs division led by Instagram co-founder Mike Krieger. The product lineup grew to include Cowork (a collaboration tool). They even bought a Super Bowl ad mocking OpenAI for putting ads in ChatGPT.

Put these moves together and this isn't "building apps on the side." It's a strategic shift. When one model company gives its application division its own CEO, and another runs a consumer brand ad during the Super Bowl, they're telling the market the same thing: selling models alone isn't enough.

The Application Layer's Efficiency Edge

Back to Cursor. Anysphere was founded in 2022 and raised an $8 million seed round led by OpenAI in 2023. What followed was a near-vertical growth curve: ARR hit $100 million in January 2025, $500 million in June, and $1 billion in November. A 10x increase in 10 months. Valuation went from $2.5 billion in November 2024 to $29.3 billion by November 2025. It trains no models — it calls APIs from Anthropic, OpenAI, and others.

Perplexity is another case. According to Sacra's February 2026 analysis, its ARR is around $200 million with a valuation of roughly $20 billion. Its search engine Sonar runs on Meta's open-source Llama model. In August 2025 it even bid $34.5 billion to acquire Chrome. A company built on open-source models, bidding for the world's largest browser.

Compare this with the model companies' situation. OpenAI brought in $13.1 billion but lost $9 billion. Anthropic is valued at $380 billion with 2,500 employees, but profitability remains undisclosed. Building models is fundamentally a capital-intensive business: training a frontier model costs billions in compute, then you slowly earn it back through API calls. The application layer is different. Cursor hit $1 billion in annual revenue with 150 people, because someone else trained the models — they just need to use them well.

DeepSeek Broke the Model Pricing Power

If GPT-5 is the story of models getting harder to improve at the top, DeepSeek is the story of model costs collapsing at the bottom.

DeepSeek has 160 employees. Training its V3 model cost about $6 million. For comparison, GPT-4's training cost was reportedly around $100 million. DeepSeek-R1 matched GPT-4 and o1 on multiple benchmarks, using weaker GPUs constrained by export controls.

The day the news broke, Nvidia's market cap dropped $600 billion — the largest single-day loss in U.S. stock market history. The reaction itself is revealing: the market's confidence in model pricing power was shaken in a single day.

DeepSeek's significance goes beyond "China's AI rises." It proved something more fundamental: frontier model capabilities are becoming a commodity. When a 160-person team can approach state-of-the-art performance at one-tenth the cost and one-tenth the compute, "we have the best model" stops being a moat. And DeepSeek-R1 is fully open-source — anyone can use it.

Models Are Becoming Infrastructure

These stories all point to the same conclusion: models are becoming infrastructure.

That doesn't mean models don't matter, just as no one says electricity doesn't matter. But the margins and valuation multiples of power companies are in a different league from the consumer brands that use their electricity. When model quality is good enough, costs are low enough, and there are enough suppliers, the center of value creation naturally moves up the stack.

Cursor doesn't care whether the underlying model is Claude or GPT — it cares about making code completion so good that developers can't live without it. Perplexity doesn't care whether it's running Llama or a proprietary model — it cares about whether its search experience can beat Google. That's the logic of infrastructure: downstream loyalty to upstream providers is low. Whoever is cheaper and better wins.

Three Predictions to Check Later

I'm writing this in April 2026. Here are three predictions across different time horizons — something to revisit in a year or two.

Talent shifts, verifiable within one year. The most sought-after hires in AI are shifting from researchers to product managers and application engineers. Fidji Simo runs OpenAI's application division. Mike Krieger moved from Anthropic's Chief Product Officer to lead the newly formed Labs division. By 2027, product backgrounds will outnumber research backgrounds in the C-suites of top AI companies. Model training becomes a procurement item; product sense becomes the core competency.

Model companies will be forced into applications, verifiable in two to three years. The pure model business doesn't hold up. OpenAI is already on this path, expanding from chatbot to agent, search, Health, and Shopping. Anthropic too. The end state looks more like Apple (chips, OS, and apps all in-house) than Intel (selling chips to everyone else). Pure API-selling model companies will struggle as DeepSeek-like competitors keep pushing prices down and open-source models keep closing the gap.

Agent OS disrupts application entry points, a three-to-five-year projection. Today's standalone AI applications are themselves transitional — not because existing products will absorb them, but because an Agent OS will upend them. OpenClaw hitting 350,000 GitHub stars in four months reveals the trend: users don't want to open a dozen AI apps; they want a unified AI layer that takes over all interaction entry points. Cursor's ultimate threat isn't another AI editor — it's an agent that writes your code directly, no editor needed. Perplexity's ultimate threat isn't Google getting smarter — it's the act of searching itself disappearing, with agents preparing information before you even ask. The endgame is apps dissolving as entry points, with AI agents becoming the new OS layer.

Three Years to Lay the Fiber

From models to applications to Agent OS — it's the same story in three chapters: technology moving from "who can build it" to "who can use it well" to "users don't even need to use it themselves." Each power shift comes with the incumbents' reluctance and the newcomers' startling efficiency.

This time the pace is entirely different. Past infrastructure eras took a decade to mature, then another decade for applications to bloom. Now everyone is using AI to build AI, using agents to build agents, using Claude Code to develop plugins for the next generation of Claude Code. The entire industry has entered a bootstrapping loop where each layer's output directly accelerates the next layer's development. It took three years from laying fiber to the application explosion. From applications to Agent OS might take just one. The scary thing about an exponential curve isn't how fast it is — it's that when you think "surely it can't get faster," it's still accelerating.