- By Vikas Yadav
- Sat, 17 Jun 2023 10:36 PM (IST)
- Source:JND
JE Technology Desk: Meta has freshly announced Voicebox, an AI model for speech-generation tasks, including "editing, sampling and stylizing" audio clips. It can produce high-grade audio files and rework pre-recorded tapes to remove unwanted elements while maintaining the overall style and content of the clip. The AI tool is multi-lingual and can generate speech in six languages.
AI tools like Voicebox could allow visually impaired individuals to listen to written messages from persons through AI in their respective voice in the future, Meta said. It may also lend realistic voices to virtual assistants and help creators to edit and produce audio tracks for online content. Here are a few more tasks that the new AI tool can accomplish:
Text-to-speech generation in the same style:
The latest model can analyse a short two-second audio clip to churn out text-to-speech output in a similar style.
Eraser for audio editing:
It can filter out noise or replace misspelt words and recreate that section of the audio note, eliminating the need to re-record the speech. "For example, you can identify a segment of a speech that's interrupted by a dog barking, crop it, and instruct Voicebox to re-generate that segment – like an eraser for audio editing," Meta notes in the Newsroom post.
Also Read: WhatsApp, Instagram, Facebook Down: Meta Confirms Massive Outage; Services Now Restored
Cross-lingual speech generation:
Voicebox can prepare "a reading of the text" (from the audio input) in the same style after analysing a user's speech in English, French, German, Spanish, Polish or Portuguese. To simplify, it can allow people to generate audio output in a foreign language in their voice.
Diverse speech sampling:
As the tool is shaped on massive amounts of data, Voicebox generates speech that is "more representative" of the way people talk in real life in the above-stated six languages.
It is worth noting that the AI in Voicebox learns from the raw audio it receives as input alongside the transcription. Because of the associated risks and misuse, Voicebox is not live for general use. And the code has not been made public as of now.