Editorial illustration of an abstract AI voice agent running through a glowing simulation grid, audio waveforms and test signals around a single evaluation node

Coval raises $28M to test AI voice agents before they reach customers

AIntelligenceHub
··5 min read

Coval raised $28M led by Norwest with Base10, Twilio Ventures, and Y Combinator, bringing total funding to $31M. The platform simulates and monitors AI voice agents so enterprises can ship them without silent failures.

Coval, the evaluation platform for AI voice and chat agents, has raised a $28 million Series A led by Norwest with Base10 Partners, Twilio Ventures, and Y Combinator, bringing total funding to $31 million. The pitch is that the next hard problem in voice AI is not the model, it is the testing layer that decides whether an autonomous agent is safe to put in front of a customer.

The bet reflects a shift inside enterprise voice AI. Two years ago, the question was whether large language models could carry a phone call. Today, models from OpenAI, Anthropic, Google, and a long tail of specialists can handle routine support calls well enough that enterprises are now pushing them into production at scale. The failure mode has changed with it. The remaining errors are quieter: a wrong hold-music transfer, a confused identity check, a billing flow that loops. Those failures are not captured by a unit test on the prompt, which is the gap Coval is trying to close.

The simulation-first approach behind Coval

Coval sits between the voice agent and the production phone system. The platform runs millions of simulated conversations against a customer's agent, scores the transcripts against enterprise-defined criteria, monitors live calls after deployment, and feeds the labeled data back into the prompt and the model. The core idea is borrowed from the self-driving industry: you do not ship a perception system without a simulation harness, and you should not ship a voice agent without one either.

Founder Brooke Hopkins spent years leading evaluation infrastructure at Waymo before starting Coval. That background shows up in the product. Customers define a set of evaluation criteria, then Coval spins up parallel simulations against the agent, scores each interaction across voice, reasoning, and audio quality, and surfaces the transcripts that failed. The same harness runs in production for live monitoring. When a deployed agent drifts, the system flags the call before a human notices.

In the round announcement, Norwest partner Scott Beechuk framed the investment around the same analogy. "Voice is going to be the number one interface for how humans interact with AI, and that shift creates an entirely new infrastructure layer for enterprises," Beechuk said. "With her deep experience building evaluation systems for autonomous technologies at Waymo, Brooke is uniquely positioned to lead Coval in defining how companies deploy and scale voice agents reliably."

The numbers from the announcement and the early customers suggest traction. Coval says enterprises using the platform report manual QA reductions of up to 30x and voice agent deployment time improvements of up to 10x. More than 60 organizations run on the platform today, including Zoom and Deepgram. The customers tend to be teams that have already committed to a specific voice stack and need a way to keep that stack honest as they scale from a pilot to thousands of live calls per day.

The product is not a competitor to the model providers. Coval sits on top of OpenAI Realtime, Anthropic's voice mode, Google's Gemini voice stack, Deepgram's transcription, and Twilio's telephony. The platform's value is the abstraction layer that lets an enterprise standardize on evaluation criteria across whichever models and vendors it picks today, and swap them out later without losing the test corpus. For a deeper look at how the broader agent stack is shaping up around this kind of infrastructure, the Enterprise AI in 2026 reference page tracks the categories that are now being treated as core enterprise plumbing rather than experimental pilots.

Voice AI testing is now an enterprise line item

The voice AI market has spent the last 18 months graduating from demo to deployment. Twilio's voice division, a Coval investor through Twilio Ventures, is one of the channels that exposed the gap. Twilio Field CTO Andy O'Dower framed the Coval investment in the announcement as a bet on the testing layer becoming a permanent piece of enterprise voice infrastructure. "Trust is critical to scaling these experiences, and our investment in Coval reflects our conviction that full evaluation and testing tools, combined with a strong observability and reliability layer, are foundational to maintaining momentum in today's voice AI renaissance," O'Dower said.

Deepgram, a Coval customer and a transcription provider to most of the voice agent stack, made the same point from the buyer side. "Brooke has built Coval into a core part of the modern enterprise's evaluation stack by improving reliability before scaled deployment," Deepgram COO Anoop Dawar said. "For any serious enterprise deployment, this is no longer a nice-to-have. At Deepgram, we power the voice AI infrastructure teams build on, but thanks to our partnership with Coval, enterprises can rest assured it's working properly."

That is a useful way to read the round. The voice model providers are racing on latency, accent coverage, and price per minute. The telephony providers are racing on carrier coverage and SIP reliability. The evaluation layer is where enterprise procurement teams need help, and where the least mature tooling has been. Coval's bet is that evaluation becomes a sticky, line-item layer the same way application performance monitoring became a line item in the 2010s.

There is a risk to that bet. The model providers are also building their own evaluation tooling, and at least one has shipped a testing harness that overlaps with parts of what Coval does. Coval's response is the same as Datadog's response to cloud provider native observability: be the vendor-agnostic layer that holds the system of record across the whole stack. The Series A gives the company runway to do that while the category is still being defined.

The signals to track over the next 12 months

The most concrete signal of traction in the next 12 months will be the depth of integration with the largest voice agent platforms. Coval is already live with Twilio, Deepgram, and a handful of LLM providers. If the company lands deep partnerships with OpenAI's Realtime API team, Anthropic's voice mode team, and Google's Gemini enterprise channel, the platform becomes hard to displace. If those integrations stay shallow, model providers will eventually own the testing layer for their own models and Coval will have to fight for the cross-vendor slice.

The other signal is pricing. Coval has not published list pricing, and enterprise procurement teams in the agent space are still learning how to budget for an evaluation line item. The category will harden in 2026 as more enterprises move from pilot to multi-thousand-call-per-day production. Coval's existing $31 million in funding gives it about 18 to 24 months of runway at current burn, which is enough to ride the first wave of procurement cycles.

The competitive picture is thin but not empty. Smaller startups are also chasing the voice agent evaluation category, and a handful of enterprise contact center vendors are building adjacent capabilities. Coval's advantage is the Waymo simulation pedigree, the early enterprise customers, and the fact that the company shipped a working product before most of the model providers had voice agents worth testing. That lead is real, and the Series A is the round that decides whether Coval can hold it. The full announcement is in Coval's press release on PRNewswire.

Weekly newsletter

Get a weekly summary of our most popular articles

Every week we send one email with a summary of the most popular articles on AIntelligenceHub so you can stay up-to-date on the latest AI trends and topics.

One weekly email. No sponsored sends. Unsubscribe when you want.

Comments

Every comment is reviewed before it appears on the site.

Comments stay pending until review. Posts with more than two links are held back.

Related articles