DeepMind’s ‘V2A’ AI technology by Google generates video soundtracks using pixels and text prompts

Spread the love



Google’s DeepMind team has developed a video-to-audio (V2A) technology that can create soundtracks for videos using both text messages and video pixels. This innovation allows for the generation of music, sound effects, and speech from various types of visual content.

One interesting feature of this technology is the ability to input text prompts that guide the audio generation process. Users can enter ‘positive prompts’ to steer the audio in a desired direction, as well as ‘negative prompts’ to avoid certain elements. This flexibility enables the creation of an endless variety of soundtracks for any given video.

The V2A system can work with both newly created videos and existing footage, including archive and silent films. By utilizing video pixels alone, users have the option to generate audio without the need for text messages. While the technology has some current limitations, such as audio quality depending on video quality and imperfect lip-syncing in speech generation, Google DeepMind is actively researching solutions to improve these areas.

The development of V2A could potentially disrupt the traditional process of soundtrack composition, as it offers a fast and automated way to create audio for videos. Soundtrack composers may need to adapt to this new technology, which has the capability to revolutionize the audio production industry.

To learn more about Google DeepMind’s V2A technology, visit their website for additional information and examples of its capabilities. Stay up to date with the latest music and gear news, reviews, deals, and features by signing up for updates delivered directly to your inbox.

Article Source
https://www.musicradar.com/news/google-deepmind-video-to-audio