The AI’s New Wonder Creates Music and Natural-Sounding Speech

The AI’s New Wonder Creates Music and Natural-Sounding Speech

Published on

Artificial intelligence has now come up with a new AudioLM

A group of Google Researchers developed a new AI system that can create natural-sounding speech and music after being prompted with a few seconds of audio. The framework for the high-quality audio generation with long-term consistency, called AudioLM, generates true-to-life sounds without the need for any human intervention.

What makes AudioLM so remarkable is that it generates very realistic audio that fits the style of the relatively short audio prompt, including complex sounds like piano music, or a person speaking. And what is more important is that the AI does it in such a way that is almost indistinguishable from the original recording. The particular technique seems promising to expedite the tedious process of training AI to generate audio.

AI-generated audio is, however, nothing new and is widely used in home assistants like Alexa where the voices use natural language processing. Similarly, AI music systems like OpenAI's Jukebox, using a neural net, have generated impressive results, including rudimentary singing, as raw audio in a variety of genres and artistic styles. But most existing techniques need people to prepare transcriptions and label text-based training data, which takes considerable time and human llabor Jukebox, for example, uses text-based data to generate song lyrics.

AudioLM is very different and does not require transcription or llabeling In the case of AudioLM, sound databases are fed into the program, and machine learning is used to compress the audio files into sound snippets, called semantic and acoustic "tokens," without losing too much information. This tokenised training data is then fed into a machine-learning model that maps the input audio to a sequence of discrete tokens and uses natural language processing to learn sound patterns.

To generate reliable audio, only a few seconds of sound need to be fed into AudioLM, which then predicts what comes next. This process is very similar to the way autoregressive language models that use deep learning to produce human-like text like Generative Pre-trained Transformer 3 (GPT-3) predict what sentences and words typically follow one another.

The result is that audio produced by AudioLM sounds very natural. What is particularly remarkable is that the piano music generated by AudioLM sounds much more realistic and fluid than the music usually generated through the use of AI techniques that often sound chaotic. There is no doubt that AudioLM already has a much better sound quality than previous music generation programs. In particular, AudioLM is surprisingly good at recreating some of the repeating patterns inherent in human-made music. It generated convincing continuations that are coherent with the short prompt in terms of melody, harmony, tone, and rhythm.

AudioLM can learn the inherent structure at multiple levels and can create realistic piano music by capturing the subtle vibrations contained in each note when the piano keys are played, as well as the rhythms and harmonies. AudioLM was able to generate coherent piano music continuations, despite being trained without any symbolic representation of music.

Join our WhatsApp Channel to get the latest news, exclusives and videos on WhatsApp

                                                                                                       _____________                                             

Disclaimer: Analytics Insight does not provide financial advice or guidance. Also note that the cryptocurrencies mentioned/listed on the website could potentially be scams, i.e. designed to induce you to invest financial resources that may be lost forever and not be recoverable once investments are made. You are responsible for conducting your own research (DYOR) before making any investments. Read more here.

logo
Analytics Insight
www.analyticsinsight.net