Sail Research: $80M to build inference for long-horizon agents

Sail Research raised $80M across Seed and Series A at a $450M valuation, with Kleiner Perkins leading the Series A and Sequoia leading the Seed. The company is building inference infrastructure for long-horizon AI agents, the kind that run for hours or days rather than for prompt-and-response. The pitch: current inference was designed for short, latency-bounded calls, and the next layer of agent work is going to break it. (Pulse 2.0)

The company said current platforms were not designed for the long-running workloads that agents produce. Agents are constrained by compute, context, rate limits, and cost ceilings across providers, and most of those constraints become binding only when an agent runs for thousands of steps across hours. Sail's platform is rebuilt around that constraint, the company said, and it claims an inference efficiency that puts it ahead of the field on the workloads that matter most for agentic systems.

The funding round included Redpoint Ventures, Theory Ventures, Vine Ventures, CRV, A*, and Abstract Ventures, with angel checks from John Hennessy, chairman of Alphabet; Lip-Bu Tan, CEO of Intel; and Tri Dao, chief scientist at Together AI. The angel list is a clean signal of where the company thinks the relevant expertise lives: chip strategy at Intel, system architecture at Alphabet, and inference research at Together.

The two pieces of Sail's stack

Sail ships two core components. The first is an inference stack rebuilt around throughput and efficiency for agents that may spend billions of tokens on a single task. The company said its efficiency advantage comes from proprietary infrastructure optimizations, deep customization of open-source inference engines, intelligent workload distribution across providers, and the use of underutilized compute. The second is Sailboxes, a sandbox environment designed to run for hours or days while charging only for time agents are actively doing work. The two pieces are complementary: the inference stack is the substrate, and the sandbox is the runtime that lets an agent live in the substrate without being killed by cost ceilings or timeout policies.

The economics matter. Most production agents run in a sandbox that bills by wall-clock time, which forces the operator to choose between letting an agent think for a long time and keeping the bill survivable. Sail's per-active-time pricing is the bet that customers will pay for the distinction. If the agent is idle waiting for a tool call, the customer does not pay. If the agent is actually doing work, the customer does. That pricing model is a strong fit for browser agents, research agents, and any long-running process that has bursts of activity between idle waits.

In a recent benchmark, Sail said its inference topped BrowseComp-Plus, achieving 90.72% accuracy at up to 10 times lower cost than leading alternatives. BrowseComp-Plus is the harder version of OpenAI's BrowseComp, designed to test how well a model can find and reason over a long sequence of web pages. Topping it at a tenth of the cost is a strong claim, and the agents-vertical press will be the first to test whether the number holds on independent workloads.

Who is shipping on Sail today

The company's API is already supporting AI-driven workflows at Parallel Web Systems, Jack and Jill, and Detail.dev. Parallel is the most strategic name on the list, since it is a web research and context platform that ships its own agent APIs, and Detail.dev runs long-running agent workflows in the devtools space. The OpenAI API compatibility is the other quiet differentiator: existing customers can swap the base URL and keep their existing code, and Sail supports leading open-source models including DeepSeek, Gemma, GLM, Kimi, and Nemotron. The open-model support is a deliberate choice, since the customers building long-horizon agents tend to be the ones most willing to mix providers to keep cost under control.

Sail Research was co-founded by CEO Neil Movva and CTO Samir Menon. Movva previously worked at NVIDIA, Apple, and Together AI, and Menon previously built large-scale systems at Apple. The combination is the usual one for this layer of the stack: a CEO with a model and inference background, and a CTO with infrastructure chops. Aditya Naganath, partner at Kleiner Perkins, framed the round in the way most agentic-infrastructure rounds are framed right now: "The infrastructure layer for the agent era is one of the most important bets in AI right now."

The investor quote that cuts the deepest comes from Travers Nisbet, co-founder of Parallel, who framed Sail as the inference half of a stack that Parallel is building the context half of. "We and Sail share a belief that background agents are about to do far more useful work," Nisbet said. "Getting there takes efficient, scalable inference paired with the highest-quality context, including from the web. Sail is building the inference side of that, and we're glad to be aligned on where this is going." That framing is the load-bearing one for the rest of 2026. The agent era is not a single product, it is a stack, and Sail is now the most clearly funded piece of the inference side of that stack.

The wider inference-platform race

The $80M raise puts Sail into direct comparison with the recently funded inference layer: Baseten is set to raise $1.5B at $13B as inference demand soars, Baseten's $1.5B round at a $13B valuation, Fireworks, Together, Anyscale, Replicate, and Groq. The capital intensity of the inference layer is climbing, and the reason is that the customer profile is shifting from "developer making a single API call" to "company running an agent for hours at a time." For a deeper look at the per-token and per-hour economics underneath these providers, the AI inference cost and latency resource page walks through what actually drives the bill. The first customer profile tolerates per-token pricing that looks reasonable, and the second one does not. Sail is the bet that the second profile becomes the dominant one inside eighteen months.

The other side of the bet is that the per-active-time sandbox model becomes a real category. There are a handful of players in that space, mostly focused on browser agents, and the pricing model is not yet a standard. If Sail's per-active-time model becomes the default for long-horizon agent sandboxes, the company will own a layer that sits between the inference provider and the customer. That is the position Replicate carved out for short-running inference in 2023, and the long-horizon equivalent would be considerably larger.

The cost math is the third leg of the bet. If Sail can sustain 10x cost advantage on the workloads that matter, the company's gross margins have room to expand as agent workloads scale. If the 10x figure does not hold on independent benchmarks, the long-horizon pitch collapses to a pricing experiment, and the company will have to compete on the same per-token axis as the rest of the inference layer. The benchmarks over the next two quarters will settle that question.

For now, the round is the cleanest signal of where the inference layer is heading. A purpose-built inference platform for long-horizon agents, with $80M and a $450M valuation, shipping to real customers, and benchmarked on the hardest agent task in the field, is the most direct read of the market's current bet. The next twelve months will tell whether the bet pays off.

Sail Research: $80M to build inference for long-horizon agents

The two pieces of Sail's stack

Who is shipping on Sail today

The wider inference-platform race

Get a weekly summary of our most popular articles

Comments

Related articles

Claude Code can be tricked into a reverse shell by a clean GitHub repo

Omada: C-suite and practitioners disagree on agent identity controls

HCLTech ships Gemini Enterprise agents on ServiceNow for field service, factory, IT