Meta Unveils Voicebox: Cutting-Edge AI Model Revolutionizes Speech Generation and Audio Editing

Technology
Jun 17 2023 03:41 PM
ANIKET DIXIT

USA: Meta has recently unveiled Voicebox, an advanced AI model designed for speech generation tasks such as editing, sampling, and stylizing. This cutting-edge tool has the ability to generate high-quality sound clips and manipulate pre-recorded audio, enabling tasks like removing unwanted noises while preserving the original audio style. Voicebox is a multilingual AI model capable of producing speech in six different languages.

Similar to generative systems for images and text, Voicebox generates outputs in a wide range of styles. However, instead of creating pictures or written passages, it focuses on producing exceptional audio clips. This AI tool can either generate outputs from scratch or modify existing samples provided to it.

Voicebox can be immensely helpful for various speech-related tasks, including speech synthesis, audio editing, noise removal, diverse sample generation, and style conversion. What sets Voicebox apart is its unique approach to learning, which solely relies on raw audio and transcription data. It utilizes a novel technique called Flow Matching, which has demonstrated superior performance compared to diffusion models.

Also Read: Feelpixel: leading Customer-Centric Obsession in UX Designs, Elevating Businesses with Delight and Success!

In terms of performance, Voicebox surpasses other models such as VALL-E and YourTTS. In zero-shot text-to-speech scenarios, Voicebox outperforms the current state-of-the-art English model VALL-E in terms of intelligibility (5.9% vs. 1.9% word error rates) and audio similarity (0.580 vs. 0.681) while being significantly faster, up to 20 times.

Additionally, Voicebox surpasses YourTTS for cross-lingual style transfer, reducing the average word error rate from 10.9% to 5.2% and improving audio similarity from 0.335 to 0.481.

Voicebox is capable of synthesizing speech in six languages. Meta trained the model using over 50,000 hours of pre-recorded speech and transcripts from public-domain audiobooks in English, French, Spanish, German, Polish, and Portuguese. It can predict speech segments given the surrounding speech and the corresponding transcript.

One notable feature of Voicebox is its ability to infill speech from context, enabling it to generate segments within an audio recording without recreating the entire input. It can also replicate the style of a given audio sample for text-to-speech generation.

Also Read: How Amazon.com Started and Became the Biggest E-commerce Platform

The applications of Voicebox are numerous and promising. This multipurpose generative AI model could provide natural-sounding voices for future virtual assistants or non-player characters in the Metaverse. It has the potential to simplify audio track editing for content creators, allow individuals to speak foreign languages using their own voice, and enable visually impaired people to have written messages read aloud in the voices of their friends through AI technology.

Also Read: Amazon Prime Lite: Unlocking the Best of Prime, Light on Your Wallet

Despite its exciting possibilities, Voicebox is currently not accessible to the general public. Meta has only shared audio samples and a research paper outlining the methodology and results achieved with this state-of-the-art AI model. This cautious approach is in place due to the potential risks of misuse associated with releasing the model or its code to the public

Meta Unveils Voicebox: Cutting-Edge AI Model Revolutionizes Speech Generation and Audio Editing

Tauseef Ahmed's Vision: Protecting Babies' Skin and the Planet with Happy Panda’s Eco-Conscious Care

Europe’s Proba-3 Satellites to Launch on India’s PSLV Rocket in December

Critical Warning for Apple Users in India: Urgent Action Needed to Protect Devices from Cyber

Israeli Airstrikes Target Hezbollah Strongholds in Beirut, Escalating Violence on Lebanon-Israel Border

Most Popular

Meta Unveils Voicebox: Cutting-Edge AI Model Revolutionizes Speech Generation and Audio Editing

Related News

ISRO Leverages Space Research to Tackle India’s Healthcare Challenges

Critical Warning for Apple Users in India: Urgent Action Needed to Protect Devices from Cyber

Social media will be closed for children under the age of 16

BSNL has benefited the most this year, know what's special

OpenAi is now preparing for a new place after Google

WhatsApp introduces Transcripts for Voice Messages: Read Instead of Listen

Tauseef Ahmed's Vision: Protecting Babies' Skin and the Planet with Happy Panda’s Eco-Conscious Care

Europe’s Proba-3 Satellites to Launch on India’s PSLV Rocket in December

Critical Warning for Apple Users in India: Urgent Action Needed to Protect Devices from Cyber

Israeli Airstrikes Target Hezbollah Strongholds in Beirut, Escalating Violence on Lebanon-Israel Border

Most Popular

Reliance-Supported Addverb Ventures into Humanoid Robotics, Set to Launch First Robot by 2025

Google Announces the Best Apps and Games of 2024 on Play Store

SpaceX Successfully Launches ISRO’s GSAT-20 Satellite: A Milestone Collaboration

Meta to Impose Rs 213 Crore Fine Over WhatsApp Privacy Policy

How to Free Up Space on Your iPhone by Transferring Photos to Mac or PC

ISRO's Historic GSAT-N2 Launch on SpaceX Falcon 9- All You Need To Know