Google DeepMind is developing artificial intelligence-based technology to create video soundtracks.
Google's AI research unit and other organizations have already created models for videos, but they are not able to generate sound effects for them. For these purposes, DeepMind uses V2A (video-to-audio) technology.
"Video generation models are developing at an incredible pace, but many modern systems do not create an audio track. One of the next important steps towards movie generation is the appearance of soundtracks for these silent videos," DeepMind said in a statement.DeepMind's V2A technology uses prompta in combination with video to create music, sound effects and dialogues. For example: "Jellyfish pulsating underwater, marine life, ocean." The underlying V2A diffusion AI model is trained on the basis of sounds, transcripts of dialogues and video clips.
https://www.youtube.com/watch?v=b6Elcke3JMc&t=9sThe following tips were used to create the sound for the video: cinematography, thriller, horror film, music, tension, atmosphere, steps on concrete.
DeepMind notes that the technology is not perfect yet, and the sound cannot be called high-quality and convincing. Improvements and testing are required before the full launch of V2A.
Recall that in February, OpenAI introduced a new generative AI model Sora, which allows you to convert text into video.
In June, scientists from Harvard and DeepMind created a virtual rat with artificial intelligence as a brain.
Previously, a Google subsidiary introduced the Genie generative AI model for creating games.