Google is developing generative AI soundtracks and dialogue for videos

Google is developing generative AI soundtracks and dialogue for videos



Sound has always played a crucial role in movies and videos, even before the talkies era with silent films accompanied by music. Google has been developing “video-to-audio” (V2A) technology through its artificial intelligence lab, DeepMind. This technology aims to generate audio that synchronizes perfectly with AI-generated videos, including soundtracks and dialogue.

Google is in competition with other companies like OpenAI, which has its own AI video generator called sora, and GPT-4o for AI voice responses. While some companies have explored AI-generated audio and music, the combination of audio with video is relatively new. Google DeepMind’s V2A technology does not require text prompts like other tools, as it can understand raw pixels to generate audio that matches the tone and context of videos.

V2A can be used with AI video tools like Google Veo or with existing stock footage and silent films to create soundtracks, sound effects, and dialogue. It uses a diffusion model trained with visual inputs, natural language cues, and video annotations to refine random noise in audio. The model can also be instructed to generate audio with a positive or negative tone. DeepMind has released demo videos showcasing the capabilities of V2A, including horror music for a dark hallway, a harmonica melody for a cowboy at dusk, and dialogue for an animated figure talking about dinner. The audio generated with V2A will include Google’s SynthID watermark for protection against misuse, and the feature is currently being tested before its public release.

Article Source
https://mashable.com/article/google-deepmind-working-on-generative-ai-soundtracks-dialogue-videos