Google on May 19 introduced Gemini Omni, a new family of multimodal models for video generation and editing, at its I/O 2026 developer conference. The first model, Gemini Omni Flash, accepts text, image, video and limited audio inputs and allows conversational, multi-turn editing that builds on prior instructions. The launch places video creation into the Gemini chatbot, Google Flow and YouTube Shorts, signaling a bet that the technology can become a mainstream consumer tool.
The announcement shifts video generation away from specialist applications and into everyday conversational interfaces, a product-ecosystem play that leverages Google’s distribution reach. Yet the company did not claim a technical victory over rivals; independent performance benchmarks are absent, and the model’s abilities are defined largely by Google’s own demos and statements.
Gemini Omni Flash produces videos up to 10 seconds with native audio. In a briefing, Google said the 10-second cap is a deployment choice, not a permanent limit, and that longer durations are planned. Audio input at launch is limited to voice references; broader audio editing remains under review for a responsible release. The model card published alongside the launch notes that complete consistency across edits, complex motion and accurate text rendering remain challenges.
Gemini Omni Flash began reaching Google AI Plus, Pro and Ultra subscribers this week via the Gemini app and Google Flow. It is also available at no cost in YouTube Shorts Remix and YouTube Create. The global rollout varies by geography and plan, and users must be 18 or older.
Google did not disclose per-generation quotas by subscription tier. Developer and enterprise API access is planned for the coming weeks; the company said performance evaluations will be published when the API becomes available.
In YouTube, the new Shorts Remix feature lets users apply Omni-powered editing to existing Shorts, adding digital watermarks, metadata and links back to the original video.
All content created or edited with Omni in Gemini, Flow or YouTube carries SynthID digital watermarks and C2PA Content Credentials. In a parallel transparency blog post, Google said it is expanding AI-generated media identification tools across Search, Chrome, Pixel and Cloud.
Gemini Omni replaces Veo in the Gemini app experience, but Veo remains a separate video model family under Google DeepMind.
Google claims Omni Flash offers improved understanding of physics—including gravity, kinetic energy and fluid dynamics—and better consistency across edits. These assertions have not been verified by third parties; the model card lacks comparative benchmark results against Veo 3.1, OpenAI’s Sora or other tools. Google staff briefed reporters that editing prompts must be highly specific, or the model can over-edit or alter unintended elements.
Even with those caveats, the launch moves Google ahead of big-tech peers in folding conversational video editing into a widely used AI assistant and a short-video platform. Competitors such as OpenAI’s Sora, Runway and ByteDance’s Seedance have focused on stand-alone creative tools, leaving Google to test whether casual creation workflows inside Gemini and YouTube can attract mass adoption.
The company said the Omni family will later support image and audio generation, but the initial release is video-only. Developer API access and performance benchmarks are expected in the coming weeks.
For strategists and investors, the move signals a bet that video generation can become a mass-consumer habit woven into Google’s core ecosystem. The verdict on whether Gemini Omni can deliver on that ambition will depend on independent tests that are yet to come.