Google has updated its DeepMindAI to generate music that accompanies videos, creating complete soundtracks. This process combines video pixels with natural language text prompts to generate a soundscape that matches the video. The technology can create dramatic scores, sound effects, and dialogue that align with the characters and tone of a video.
The V2A technology combines auto-regressive and diffusion approaches to create a scalable AI architecture that synchronizes video and audio information effectively. The AI model encodes the video input into a compressed representation and refines the audio iteratively using a diffusion model guided by visual information and natural language prompts.
This new process allows audio engineers to have greater creative control over the generated soundtracks, enabling them to use positive and negative cues to influence the music’s feel. Positive cues guide the model towards desired outcomes, while negative cues steer it away from undesirable sounds. The technology can generate unlimited soundtracks from any video input, including traditional footage like stock footage and silent films.
Training the model on video, audio, and additional annotations allows it to associate specific audio events with visual scenes and respond to information in the annotations or transcripts. The model can create lifelike audio that closely aligns with the video’s content and instructions, adding detailed sound descriptions and transcripts of spoken dialogue to improve audio quality.
However, the model heavily relies on high-quality video streams to create high-quality audio, as artifacts or distortions in the video can impact the audio quality. Google is also working on lip-sync technology for videos with characters, but mismatches can occur, resulting in strange lip-syncing where characters speak without their lips moving.
Overall, Google’s DeepMindAI update enhances the video-to-audio generation process, providing audio engineers with more creative control over the soundtracks. The technology’s ability to synchronize video and audio information effectively and generate lifelike audio opens up new possibilities for creating compelling sound experiences for videos.
Article Source
https://www.digitalmusicnews.com/2024/06/18/google-deepmind-ai-music-for-video/