Abstract illustration of a financial trading floor with glowing AI agent panels reviewing contract documents, deep navy and teal palette, clean tech illustration style

Kalshi built an AI agent called Harrison to review prediction market contracts

AIntelligenceHub
··5 min read

Kalshi's internal AI agent 'Harrison,' built on Anthropic's Claude, reviews and stress-tests the wording of event contracts before they go live. The tool has vetted over 500 market templates.

Prediction market operator Kalshi has built an internal AI agent called Harrison that reviews and stress-tests the wording of its event contracts before they go live to traders. Built on Anthropic's Claude model, the agent has already been used to vet more than 500 market templates. It is the most concrete example yet of a regulated US exchange turning an LLM into a frontline compliance reviewer.

Kalshi is a CFTC-regulated exchange where traders buy and sell contracts on the outcomes of real-world events, including elections, sports, Supreme Court rulings, and economic releases. The bets resolve between 1 cent and 99 cents, and the wording of each contract determines the conditions under which it pays out. A small ambiguity in a single phrase can mean the difference between a market settling as expected and a costly dispute, which is why Kalshi has historically required two people to fill in and review the rules for every newly listed contract, with a one to two hour window for problem identification. Harrison is being added as a third pass, one that runs before human reviewers touch the file.

The system reviews the wording and evidence sources of contracts, flags potentially controversial vulnerabilities, and compares its own analysis against human judgments at settlement, with extra scrutiny on complex events such as Supreme Court rulings. Beyond contract review, Harrison also aggregates news, analyzes competitor contracts on rival platforms like Polymarket, and suggests new markets and liquidity incentives. The product framing inside Kalshi is less about replacing the human reviewers and more about giving them a faster, more uniform first read on the most ambiguous questions the company is asked to list.

Prediction market contracts are an unusual fit for AI review

Prediction market contracts are a hard case for language models for a reason that does not show up in standard chatbot benchmarks. Each contract is a single short paragraph that has to resolve unambiguously to a binary outcome when a real-world event happens, and the wording has to anticipate the exact edge cases the news cycle will throw at it months later. A Supreme Court ruling that splits 6-3 with a partial concurrence is the kind of event that punishes loose phrasing. Kalshi learned this the hard way with past contracts that produced disputes over resolution criteria, and the company's approach with Harrison is to make the first pass through every template a structured, model-driven review.

The most useful framing is to compare Harrison to legal contract review software. Tools like Kira and Luminance have been doing this for law firms and enterprise legal teams for years, using a combination of classical NLP and machine learning to flag risky clauses. Kalshi is essentially running the same play in a different vertical, with two important twists. First, the corpus is small enough (a few hundred active templates) that Kalshi can stress-test the model against historical disputes. Second, the financial stakes are high enough that the company can put a human in the loop at settlement and measure how often the model and the human agree. The Bloomberg report notes that Kalshi compares Harrison's analysis against human judgments at settlement, with extra attention to complex cases such as Supreme Court rulings. That comparison is exactly the kind of feedback loop that turns a generic LLM call into a calibrated risk tool over time.

There is also a strategic angle. Prediction market competitors like Polymarket, Kalshi's largest crypto-based rival, have been moving fast on the same problem with their own AI tooling, and exchanges that move faster on contract vetting can list more markets, which in turn drives trader engagement and revenue. Kalshi is signaling to the market that it intends to be the platform with the most thorough pre-launch review, and the press positioning is meant to make the case that the more rigorous product wins. That is a different kind of moat than Polymarket's crypto-native user base, and it is the kind of moat that compound over the years as the dispute ledger grows.

What the broader agent rollout looks like

Kalshi's launch fits a pattern the agentic AI industry has been building toward for the last year. After Salesforce's Agentforce rollout and Yardi's multifamily housing agent fleet, the headline case for enterprise agents is no longer copilots that summarize documents. It is agents that take a high-stakes decision, run it through a structured review, and hand the messy cases to a human. Kalshi's Harrison sits firmly in that second camp, and the company's willingness to publish the 500-template figure is a signal that the system is being used at scale, not as a one-off pilot.

The choice of Claude as the base model is also worth noting. Anthropic has been the partner of choice for financial and legal language tasks throughout 2026, partly because Claude's instruction-following tends to be conservative on edge cases, and partly because the company's safety posture fits the regulatory expectations of a CFTC-regulated entity. Kalshi did not have to use Claude specifically, and the same architecture could be ported to GPT, Gemini, or an open-weights model. The fact that the company went with Anthropic says something about the trust calculation for high-stakes, regulated deployments. For a broader look at how vertical software vendors are picking model providers for compliance-heavy work, our Enterprise AI Use Cases for Finance and Operations page walks through the patterns that are working today.

There are open questions. Kalshi has not disclosed the false positive rate for Harrison's contract flags, the average time saved per listing, or how the system handles markets that turn out to be politically sensitive in ways the model did not anticipate. The 500-template figure is suggestive but not a productivity number, and the press coverage so far is light on the operational metrics that enterprise AI buyers usually want. The clearest sign that Harrison is working will be a measurable drop in resolution disputes and faster time-to-list on the kinds of markets that historically took Kalshi the longest to vet.

Kalshi is the largest US-regulated prediction market by volume, and the company's product roadmap is closely watched by both the CFTC and by traditional finance groups that see event contracts as a future asset class. A successful agent rollout here would put pressure on Polymarket, on the smaller regulated exchanges, and on the sportsbooks that have been partnering with Kalshi to embed prediction market products inside their apps. The full report on Harrison is in Crypto Briefing's coverage of Kalshi's stress-test agent. For teams building their own agent deployments in regulated industries, the Kalshi example is the clearest signal yet that the agent-first enterprise playbook is moving from financial services pilots into the regulated consumer markets, and that the winners will be the companies that publish the operational numbers, not just the demos.

Weekly newsletter

Get a weekly summary of our most popular articles

Every week we send one email with a summary of the most popular articles on AIntelligenceHub so you can stay up-to-date on the latest AI trends and topics.

One weekly email. No sponsored sends. Unsubscribe when you want.

Comments

Every comment is reviewed before it appears on the site.

Comments stay pending until review. Posts with more than two links are held back.

Related articles