A New Benchmark Says Top AI Agents Still Lose Money Over a Full Season
General Reasoning’s new KellyBench puts AI agents through a long football betting season instead of a short task list. Every frontier model tested lost money, which makes it a sharp reality check for long-running agents.