Abstract AI video generation visualization with glowing video frames and neural network connections in dark blue space

Google's New AI Video Model Leaked Ahead of Google I/O 2026

AIntelligenceHub
··6 min read

Real clips from Google's unreleased Gemini Omni video model are circulating publicly, showing in-chat video editing and strong prompt adherence just days before Google I/O 2026 opens on May 19.

Six days before Google I/O 2026, Google's new AI video model is already running in the hands of real users, and the clips circulating publicly are more revealing than a polished press release would be.

On May 11, select Gemini Pro users opened their Gemini interface and found something unexpected: a pop-up reading "Create with Gemini Omni." Real generated clips followed. By May 12, screenshots and sample videos had spread across Reddit and AI communities. One of Google's most anticipated I/O announcements had partially escaped before the keynote.

What the leak reveals isn't just a new video model. It's a different philosophy about what AI video should do, and that difference is worth understanding before the official announcement on May 19.

The Leaked Clips and How Gemini Omni Works

Two generated clips have circulated publicly, and they tell a story about where Gemini Omni sits competitively.

The first was a dinner scene: two men at an upscale seaside restaurant, spaghetti on the table. The environmental rendering impressed early viewers, with some calling it "incredibly realistic" by current AI video standards. The close-up animation problems that have plagued AI video for years were still there: pasta appearing from nowhere, chewing animations that didn't quite sync with the characters' movements. The setting looked good. The physics didn't.

The second clip was more technically ambitious. A professor works through trigonometric identities on a chalkboard while explaining the material. Text rendering on a chalkboard and synchronized physical motion are genuinely hard problems for video AI systems. Gemini Omni handled the logical flow of the proof coherently, with equations appearing in the right sequence. The tells were still visible, chalk vanishing between frames, hand movements that didn't always correspond to what appeared on the board. But the underlying problem the model was solving wasn't trivial.

One detail stands out: generating those two short clips consumed 86% of a user's daily AI Pro plan allocation. Whatever is running underneath Gemini Omni, it's computationally expensive. The leak was first picked up by Android Authority after a Gemini Pro user received an early-access pop-up. TestingCatalog independently confirmed the interface strings and capability descriptions that followed.

The leaked UI text is precise: "Create with Gemini Omni: meet our new video model, remix your videos, edit directly in chat, try templates, and more." That phrase "edit directly in chat" is the significant one. Every major AI video model currently in production, including Google's own Veo 3.1, is a generation-only system. You write a prompt, the model produces a clip, and if you want to change anything you either regenerate from scratch or move the clip to a separate video editing application. Generation and editing live in completely different tools.

Gemini Omni collapses that separation. In-chat editing means you describe a change and apply it to existing footage. Video remixing suggests working with source material you bring to the conversation, not just generating from a blank text prompt. Watermark removal has also appeared in early settings documentation. These features don't just expand AI video generation. They change what the product is actually for.

Metadata in the leaked interface indicates Omni is an evolution of the Veo foundation rather than a clean-break architecture. That's consistent with how Google builds out Gemini systems: iteration on existing infrastructure rather than starting fresh. The tiered release structure is already apparent from leaked documentation. References to Flash and Pro variants appear in early settings. The model is also expected to function as an agent through AI Studio and to be available via API, which matters considerably for developers building video capabilities into their own products.

How Gemini Omni Compares to the Competition

Raw frame-by-frame photorealism isn't where Gemini Omni leads right now. ByteDance's Seedance 2 is currently the strongest model on pure cinematic fidelity. Reviewers who've compared early Omni clips against Seedance 2 outputs consistently note that Google's model trails on visual polish. Character consistency across long clips, complex motion physics, per-frame detail: those are Seedance 2's current advantages.

OpenAI's Sora is the other natural comparison. Sora launched with significant momentum in early 2024 but has received comparatively little investment since. OpenAI's resources are concentrated on reasoning model work, multi-agent systems, and the GPT-5 series. The video generation track hasn't kept pace with those areas, and that gap gives Google an opening.

Google's apparent bet is that editing workflow beats raw generation quality at the margin. If a model can take existing footage and let users modify it through natural chat instructions, the output doesn't need to be perfect on the first generation. That's a fundamentally different value proposition than competing head-to-head on photorealism, and it fits much better with how professional video workflows actually operate. Source material typically already exists. The work is refinement, not creation from nothing.

The compute cost is the critical friction point. Two test clips at 86% of a daily Pro plan limit means Omni isn't a casual-use tool in its current form. A Flash tier with lower resource requirements would change that calculation significantly. Without one, Omni stays in the high-end creative workflow category regardless of how capable the editing features are. Early-access versions of Gemini features have historically carried higher costs that get revised down at general availability, so the leaked figures may not represent final pricing.

Google I/O 2026, Developer Access, and What Comes Next

Google I/O 2026 opens on May 19 with the main developer keynote at 10 a.m. Pacific. The pre-show already ran on May 13 with the Android Show I/O Edition, which covered Android 17's AI overhaul, the new Googlebook laptop line with integrated Gemini AI, and expanded on-device capabilities rolling out through Play Services rather than core OS updates.

This follows a broader pattern. Google has been pushing Gemini aggressively across its product lineup, and I/O typically serves as the moment where those threads get pulled together into a unified platform story. Based on confirmed agenda items and the pre-show momentum, Gemini Omni will be among the headlining announcements on May 19. Gemini 4 is also widely expected, and if Omni is the video surface of a broader Gemini 4 architecture, the implications will be considerably more significant than what the leaked clips alone suggest.

For developers, Gemini Omni's API access is the most important detail from this leak. Current video generation APIs are expensive, output-only, and require significant post-processing. An API that exposes editing and remixing through the same interface as generation changes what production pipelines can actually do, particularly for teams that already have libraries of source footage. The AI Studio integration is also practical for iteration: if Omni is available there as an agent, experimentation cycles get considerably shorter than the current Vertex AI workflow allows.

The chalkboard clip showed Gemini Omni handling simultaneous text rendering and synchronized physical motion without the results becoming incoherent. That's a persistent hard problem that competitors have struggled with. For anyone building educational content tools, presentation automation, or explainer video pipelines, that specific capability matters more than raw cinematic quality. The open question is still compute economics. If the Flash tier launches with meaningfully better resource costs, Omni becomes viable for moderate video volume. If Flash inherits the same resource profile as the leaked Pro-tier clips, it remains high-end creative tool territory.

The pace of pre-I/O leakage has been unusual this year. Confirmed Gemini Omni output circulated less than two weeks before the official announcement, and the Android Show pre-show covered enough ground that the main keynote will need substantial new material to drive developer attention. That could be deliberate. Real user demos generate more authentic technical discussion than controlled stage presentations. Or it's genuinely accidental, a staging environment reaching a broader group than intended.

Either way, the model is real, it's close to launch, and it's different enough from existing video tools to matter. Whether the compute costs come down before general availability is the question that determines how widely it gets adopted. The official announcement comes May 19.

For broader context on how Gemini models compare to Claude, GPT, and other leading AI systems, the LLM Comparison resource covers the current model landscape in detail.

Weekly newsletter

Get a weekly summary of our most popular articles

Every week we send one email with a summary of the most popular articles on AIntelligenceHub so you can stay up-to-date on the latest AI trends and topics.

One weekly email. No sponsored sends. Unsubscribe when you want.

Comments

Every comment is reviewed before it appears on the site.

Comments stay pending until review. Posts with more than two links are held back.

Related articles