Baseten is set to raise $1.5B at $13B as inference demand soars

AI inference is becoming its own infrastructure category, and the money is following. Baseten is close to finalizing a $1.5 billion funding round at a $13 billion valuation, per WSJ reporting surfaced by TechCrunch on June 18, just five months after the company closed a $300 million Series E at a $5 billion valuation. The deal is reportedly split-priced and co-led by Spark Capital, Sands Capital, Altimeter Capital, and Wellington Management.

For an enterprise AI buyer, the headline number matters less than what Baseten actually sells. The company runs a multi-model inference platform that sits between the application and the foundation model. When a customer sends a prompt, Baseten decides which model should handle it, routes the request to the right GPU cluster, and returns the response as fast as possible. The pitch to enterprise customers is straightforward: cheaper inference than calling OpenAI or Anthropic directly, faster response times at scale, and the freedom to swap between closed and open source models without rewriting the application layer. For the investor side of the table, the raise is a bet that the inference layer will look a lot like the cloud compute layer did in the 2010s, with a handful of well-capitalized vendors capturing most of the spend.

The split-priced raise and the inference thesis

The fact that the round is split-priced tells you something about how the market is moving. A split-priced round lets a startup close at a headline valuation that flatters the company's story while letting late-stage investors mark their existing position at a more conservative price. WSJ reported that new investors are coming in at $13 billion, while existing investors are coming in at $11 billion. The gap is small in absolute terms but large enough to acknowledge that the 160% valuation jump in five months is not what the existing cap table would have supported on its own. The technique has become common in late-stage AI rounds over the past year as the gap between private valuations and public market comps has widened.

The bigger question is whether the inference thesis holds. The framing in VC circles right now is that every dollar spent on GPUs needs a dollar of inference software on top, because raw GPU access without a routing layer is too expensive to use well. Enterprises have started to see this in their own bills. Recent telemetry from more than 23,000 Kubernetes clusters put average enterprise GPU utilization at just 5%, and most companies are paying for capacity they never touch. The pitch from inference providers like Baseten, Together AI, and Fireworks is that they can take a workload, run it against a smaller open source model most of the time, and only escalate to a flagship model when the prompt actually needs it. The customer pays a fraction of the cost and the provider makes money on the spread.

If the model works, the addressable market is enormous. Every enterprise application that calls a foundation model goes through some kind of inference layer, and most of them are calling OpenAI or Anthropic directly today. The bet is that, three to five years from now, a meaningful share of those calls will be routed through a third-party inference platform that owns the model selection, the GPU scheduling, and the cost optimization. Baseten's $1.5B raise is a vote of confidence that the inference layer will be a real category, not a feature that gets absorbed into the model labs.

A second supporting trend is the rise of open-weight models. When a model is open and the weights are downloadable, a third party can host it, optimize it, and price it however the market will bear. That is what made Together AI and Fireworks work in the first place, and it is what gives Baseten the optionality to keep scaling even if a closed model lab decides to push the market toward direct API usage. The deepest open models still trail the closed flagships on most benchmarks, but the gap is closing fast on coding, math, and structured reasoning. For production inference at scale, the gap is already narrow enough that routing decisions can be made on price rather than raw capability.

Where Baseten sits next to the rest of the field

The competitive picture is more crowded than the round's headline might suggest. Together AI raised $305M at a $3.1B valuation in early 2025, and Fireworks AI has raised at a similar level. Modular, Anyscale, and Replicate are all chasing the same buyer with similar pitches. The differentiators are real-time latency, model coverage, the depth of the routing logic, and the cost per million tokens. Baseten has been public about its focus on production workloads, with the company's existing customers including the kinds of teams that need to serve millions of inference calls per day at predictable cost. The new capital gives the company room to extend the platform, add proprietary model serving capacity, and lock in multi-year GPU contracts before the next inference crunch.

The other thing to watch is how the major model labs respond. OpenAI and Anthropic both sell inference directly, and both have begun to offer batch discounts and committed-use pricing to defend the relationship. If the model labs start to undercut third-party inference on price, the economics of the category get harder. The most likely outcome is that the model labs keep the highest-margin workloads and let the inference platforms fight over the long tail of traffic that does not need a flagship model. Baseten's bet is that there is enough long-tail traffic to build a $13B company on, and that the model labs are content to let third parties handle the rest of the market.

A final signal to watch is whether the new round is the last private round before an IPO. Baseten reportedly hired bankers earlier this year, and the company is generating real revenue from inference contracts with enterprise customers. A 2027 listing would put the inference category in front of public market investors for the first time, and the comp would be set by the eventual trajectory of Together AI, Fireworks, and the hyperscaler inference services that Google, Microsoft, and AWS are now offering. If the category holds together, the public market will end up valuing the inference layer as a separate software line, not as a feature of the GPU vendors or the model labs.

For more on the cost side of inference economics, see our inference infrastructure cost and latency explainer and our look at the GPU utilization gap in enterprise AI.

Baseten is set to raise $1.5B at $13B as inference demand soars

The split-priced raise and the inference thesis

Where Baseten sits next to the rest of the field

Get a weekly summary of our most popular articles

Comments

Related articles

KKR launches Helix with Nvidia and Vistra to build $10B in AI data centers

OpenAI ships spend controls and a cost API for ChatGPT Enterprise

Cloudflare ships temporary accounts that let agents deploy in seconds