Google launched a new multimodal AI model, called Google Gemini Omni, at its annual I/O developer conference. The model will be capable of generating videos from text, audio, and video inputs. The company said Omni is a big leap in developing AI systems that can ‘create anything from anything.’
The first version of the model is Gemini Omni Flash, which supports the generation of up to 10-second AI videos, and longer video generation capabilities are planned in future updates.
Unlike the classic text-to-video model, Gemini Omni can handle multiple types of media at once. Users can add images, sound clips, videos, and text prompts all at once, and the model is able to reason over all of them and produce a single output.
Google DeepMind CTO Koray Kavukcuoglu says Omni can take in images, audio, video, and text to create high-quality video ‘based on real-world knowledge from Gemini.’
Google presented some demos at the event. One example that Kavukcuoglu gave was: When Omni was given a simple prompt like ‘a claymation explainer of protein folding,’ it quickly rendered a video of a stop-motion explainer with a voice-over that said, ‘Proteins start as chains of amino acids. They fold into patterns like the alpha helix and flat sections called beta sheets, forming a perfect three-dimensional shape.’
Google is positioning Omni Flash as an easy-to-use consumer product focused on simple conversational editing instead of professional editing software. The model is much like Google's previous Nano Banana image editing model, where users can edit content with natural language commands.
According to DeepMind director Nicole Brichtova, “We definitely did focus on making this easy to use for consumers.”
The system also allows users to create digital avatars of people. Google claims identity verification, with recorded prompts, in the onboarding process, to alleviate deepfake misuse.
The videos produced by Gemini Omni will come equipped with Google's SynthID watermarking system, which will be used to label AI-generated content in an imperceptible way to verify its authenticity.
Also Read: Google and Blackstone Launch $5B AI Cloud Venture to Expand TPU Access
Google is going to launch Omni APIs in the coming weeks, which may be used for various applications, including advertising, filmmaking, and social media and content creation.
The launch comes amid increasing rivalry among Google, OpenAI, Runway, and AI startups developing tools that can create sophisticated media from a short prompt in multimodal AI.
But misinformation, synthetic media, and deepfakes are at the heart of the debate. Google said that the model has been tested and reviewed internally and by external agencies before releasing it to the market.