Artificial intelligence is having a moment right now, thanks to the likes of Dall-E and ChatGPT but AI tools that mimic human voices could have gone a step too far.
Earlier this month, UK-based ElevenLabs announced its Prime Voice AI which lets users generate the ‘most realistic’ spoken audio in any voice and style.
However, in just a few weeks, troublemakers have taken to using the tool to upload deepfaked voices of celebrities like Joe Rogan to Robin Williams, as reported by Motherboard.
The company’s ‘Voice Lab’ feature let users clone voices from small audio samples.
A number of users on 4Chan, the anonymous imageboard, reportedly posted a clip of Emma Watson reading a passage from Mein Kampf.
Other 4Chan users had posted clips of the AI spewing intense misogyny or transphobia using voices of characters or narrators from various anime or video games.
On Monday, ElevenLabs took to Twitter to acknowledge that they had seen an ‘increasing number of voice cloning misuse cases’.
The company said that while they could trace any generated audio back to the user, they were choosing to address the problem by implementing additional safeguards.
Additional account verifications to enable voice cloning ‘such as payment info or even full ID verification’ and verifying copyright to the voice by submitting a sample with prompted text were presented as potential safeguards.
Microsoft have announced their AI “VALL-E”
Using a 3-second sample of human speech, it can generate super-high-quality text-to-text speech from the same voice. Even emotional range and acoustic environment of the
sample data can be reproduced. Here are some examples. pic.twitter.com/ExoS2VWO6d
The company even said that it would consider dropping Voice Lab altogether and manually verifying each cloning request.
The company had touted the ‘uncanny quality’ of its AI-generated voices and received $US2 ($3) million in pre-seed funding last week.
The tool was supposed to automate audio for news articles, create audio for video games or even narrate audiobooks but it looks like the company will have to fix the safety implications first.
In January, Microsoft announced its artificial intelligence VALL-E, which could mimic a human voice perfectly after just 3 seconds.
What are Deepfakes?
Deepfakes are videos and images that use deep learning AI to forge something not actually there. They are most known for being used in porn videos, fake news, and hoaxes.
The disinformation can be used to make events that never happened appear real, place people in certain situations they were never in or be used to depict people saying things they never said.
However, the tech giant had also mentioned in an ethics statement that since VALL-E could synthesize speech that maintains speaker identity, it may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker.
‘If the model is generalized to unseen speakers in the real world, it should include a protocol to ensure that the speaker approves the use of their voice and a synthesized speech detection model,’ said the statement.
Source: Read Full Article