Researchers have finally cracked the code on voice cloning in real-time. It has arrived. Deepfake conversations will now become a reality.
ByteDance has created powerful new software that will allow its users to instantly change their voice into another person’s using generative AI technology. ByteDance is the parent company of TikTok.
The tool called StreamVoice is not yet available to the public, but it is already sparking fraud concerns among experts.
Clone A Voice With A Single Utterance Of Voice
The instant voice cloning software was created by researchers at Northwestern Polytechnical University in China, along with some researchers at ByteDance. This is the same University that China accused the US government of launching cyberattacks against it. The school is known for doing military research.
It’s a powerful piece of AI that could change everything we know about Voice Cloning which has been limited to offline up until now.
Researchers say that the StreamVoice software can perform real-time conversion of anyone’s voice into the cloned voice using a “single utterance of speech from the voice they are imitating.
And the solution is fast and nearly undetectable. The voice conversion occurs at the speed of live streaming, according to the experts, with only 124 milliseconds (about 1/10th of a second) delay.
Zero-Shot Voice Conversion
The AI model boasts zero-shot voice conversion, meaning that it can successfully clone any word in the target’s voice even if it has not been trained on that particular word.
Researchers trained the tool on voices speaking in Mandarin and one multilingual set that included English, Finnish, and German.
They used a voice dataset called Aishell3, which contains roughly 85 hours of emotion-neutral recordings spoken by 218 native Chinese Mandarin speakers and a total of 88,035 utterances. And they used Meta’s Llama AI platform to build and test the AI voice cloning model.
The researchers express some concerns about how the technology could be used negatively.
In the paper, the researchers caution readers, “Since StreamVoice can convert source speech to desired speakers, it may carry potential risks of misuse for various purposes, such as spreading fake information or phone fraud.”
What Will TikTok Do With This AI Cloning Capability?
Will ByteDance unleash this real-time voice cloning capability into the wild? This certainly would be a hit on TikTok. Imagine doing a video in the voice of a celebrity or politician.
This would certainly be a significant enhancement over the current voice cloning capability, which, until this point, has only been available offline and not in real-time.