How AI could take the words right out of voice artists’ mouths

These professionals’ voices are their livelihoods – but advances in generative artificial intelligence threaten to take that away as another industry comes under pressure from the technology. Pranshu Verma reports

Sunday 30 April 2023 10:30 BST
Comments
Bev Standing at her home studio in Frankford, Ontario
Bev Standing at her home studio in Frankford, Ontario (Jennifer Roberts/Washington Post)

Companies clamour to use Remie Michelle Clarke’s voice. An award-winning vocal artist, her smooth Irish accent backs ads for Mazda and Mastercard and is the sound of Microsoft’s search engine, Bing, in Ireland.

But in January, her sound engineer told Michelle Clarke he’d found a voice that sounded uncannily like hers someplace unexpected: on revoicer.com, credited to a woman named “Olivia”. For a modest monthly fee, Revoicer customers can access hundreds of different voices and, through an artificial intelligence-backed tool, morph them to say anything – to voice commercials, recite corporate trainings or narrate books.

Revoicer advertised Olivia with a photo of a grey-haired woman, who appeared to be of Asian descent, and a blurb: “A deep, calm and kind voice. Excelent [sic] for audio books.”

A 38-year-old brunette, Michelle Clarke looked nothing like Olivia. But when she hit play, she was greeted with the jarring sound of what could only be her own voice: “Hello my dear ones, my name is Olivia,” it said. “I have a soft and caring voice.”

“It’s completely bizarre,” Michelle Clarke says. “When you see your voice has been shifted and tampered with ... there’s something so invasive about it.”

But Michelle Clarke isn’t the only one who has found her voice seized from her control. Advances in generative artificial intelligence – technology that forms texts, images or sounds based on data it is fed – has allowed software to recreate people’s voices with eerie precision. Such software can quickly spot patterns, comparing a small sample to a database of millions of voices, allowing users to brandish simple text-to-speech tools to modify a voice to say whatever they type.

The technology burst into the public eye this month, when a music producer claimed to use AI versions of Drake and the Weeknd’s voices to build a new track, “Heart on My Sleeve”, which spread rapidly on TikTok. A number of celebrities have experienced these verbal deepfakes, including Emma Watson, whose cloned voice recited passages of Adolf Hitler’s “Mein Kampf”, and President Biden, who was artificially made to say he preferred low-quality marijuana.

But the technology puts voice actors, the often-nameless professionals who narrate audiobooks, video games and commercials, in a particularly precarious position. While their voices are often known, they rarely command the star power necessary to wield control of their voice. The law offers little refuge, since copyright provisions haven’t grappled with artificial intelligence’s ability to recreate humanlike speech, text and photos. And experts say contracts more frequently contain fine-print provisions allowing a company to use an actor’s voice in endless permutations, even selling it to other parties.

Neal Throdes, a developer at revoicer.com, said the company used the voice through a licensing agreement with Microsoft, which allows them unrestricted access to Michelle Clarke’s sample. Hours after The Post contacted revoicer.com, the company pledged to remove the voice from their site. “We have taken responsibility,” Throdes said in an email, adding: “Revoicer.com is not responsible for the situation [Michelle Clarke] is in.”

Several voice actors have said may abandon their careers, seeing a cataclysmic future where people can obtain a voice without hiring an individual. Michelle Clarke wonders why a company would pay the $2,000 she can command for a 30-second recording when they can instead pay $27 a month for a realistic clone.

“How many other companies ... are using my voice and my work and my livelihood without ever factoring me in?” Michelle Clarke asks.

You don’t need an hour or 20 hours anymore. You need a few minutes, a few seconds to basically get something that sounds 90 per cent

Zohaib Ahmed

Voice-generating software is benefiting from a boom in generative AI, which backs chatbots like ChatGPT and text-to-image makers like DALL-E and has rapidly increased in sophistication in the last year.

While AI has long helped companies successfully mimic speech, it churned out robotic, unrealistic voices, says Zohaib Ahmed, chief executive of Resemble AI, a company that uses artificial intelligence to generate voices.

But improvements in the underlying architecture and computing power of this software upgraded its abilities. Now it can analyse millions of voices quickly to spot patterns between the elemental units of speech, called phonemes. This software compares an original voice sample to troves of similar ones in its library, finding unique characteristics to produce a realistic-sounding clone.

Before this advanced pattern recognition was possible, voice-generating software needed thousands of sentences to duplicate a voice, Ahmed says. Now, these tools work with just a few minutes of recorded speech.

“You don’t need an hour or 20 hours anymore,” Ahmed says. “You just need like a few minutes, a few seconds to basically get something that sounds 90 per cent [accurate].”

This advancement has been a boon to some: people with degenerative illnesses, like ALS, can bank their voices using artificial intelligence. Voice cloning software allowed Val Kilmer, who lost his voice after surgery for throat cancer, to speak for his role in Top Gun: Maverick.

But it’s also given rise to predatory industries. People have reported the voice of their loved ones being recreated to perpetuate scams. Start-ups have emerged that scrape the internet for high-quality speech samples and bundle hundreds of voices into libraries, and sell them to companies for their commercials, in-house trainings, videogame demos and audiobooks, charging less than $150 per month.

Tim Friedlander, the president of the National Association of Voice Actors, an advocacy organisation, says these “middlemen” start-ups provide companies a lucrative proposition: lifelike voices that can say what’s needed without having to deal with the higher costs associated with human professionals.

Friedlander adds that generative AI’s impact on his industry has only just started, and it’s likely to disrupt it greatly. “It’s scary,” he says. “Voice actors, unknowingly, have been training their replacements.”

For about three days, it was fun. But as soon as my business brain kicked in, it wasn’t

Bev Standing

Bev Standing was at home one afternoon when her children sent a flurry of texts asking the same thing: Mom, are you the voice of TikTok?

Standing was confused. The Canadian voice actor had done work for many clients, but TikTok hadn’t hired her to narrate anything and she certainly wasn’t getting paid by its parent company, ByteDance.

But on the app she found herself everywhere – as the voice behind TikTok’s text-to-speech feature she was narrating cat videos, critiquing shoddy boyfriends, touting McDonald’s hamburgers and pitching investment tools she’d never heard of.

She wasn’t immediately angry. “For about three days, it was fun,” Standing says. “But as soon as my business brain kicked in, it wasn’t.”

Standing took a job in 2018 for a client on behalf of the Chinese Institute of Acoustics and recorded her voice for a translation app. She read in the monotone style emblematic of TikTok’s narration feature, but she said there weren’t any provisions in the contract allowing them to sell her voice to other companies.

She sued ByteDance in 2021 and settled out of court for an undisclosed sum. Shortly after, TikTok removed her voice from the app. Kat Callaghan, a Canadian disc jockey, is now the voice.

While the software that cloned Standing’s voice is probably less sophisticated than current technology, Standing says she does not appreciate having her voice copied without her permission.

“That’s my voice,” she says. “You can’t just take it without paying me.”

Despite revoicer.com pledging to take down Olivia’s voice, Michelle Clarke says her livelihood is still at risk. Other third-party sites could be reselling her voice. Her friends have passed along Instagram ads that she appears to be narrating, even if she hasn’t heard of the company. “The problem is not solved for me,” she says.

But as a mother of a one-year-old boy, she thinks she may quit doing voiceover work. “There’s no right time to feel like your future is at stake,” she says. “But it’s absolutely the worst time for me now.”

‘It’s nothing like as good as me’: Mike Cooper in his voice studio in Asheville
‘It’s nothing like as good as me’: Mike Cooper in his voice studio in Asheville (Jacob Biba/Washington Post)

Little recourse is available to voice actors. Until recently, artificial intelligence didn’t pose much of a threat to their professions, and many say they didn’t parse through contracts in detail, searching for provisions allowing a company to use their audio beyond an individual job.

Copyright law has also not matured to decide what happens when a person’s voice is mimicked for profit, leading to patchwork enforcement where celebrities can access to more protections than lesser-known professionals. (For example, Drake’s AI-generated song was quickly taken off YouTube and Spotify last week after Universal Music Group raised concerns.)

Daniel J Gervais, an intellectual property expert and professor at Vanderbilt University Law School, says US law doesn’t offer much refuge for people who’ve had their voices taken.

Federal copyright law does not protect a person’s voice, and local laws vary by state, he says. Even in California, which because of its prominence in the entertainment industry has some of the stronger voice protections, it’s difficult to assert who’s covered. The state’s law says a voice must be considered distinct – meaning identifiable – and from a well-known person, making it hard for the average voice actor to be protected, Gervais says.

Friedlander says his colleagues must be vigilant in how their voices are being used on the internet and pay close attention to the details of their contracts.

Many voice actors are not unionised, and Friedlander’s advocacy organisation is urging actors to scan for provisions that ask for the rights to their voice in perpetuity. The organisation has crafted template contracts for actors that give them control over how their voice is used.

In Europe, it’s easier to get a sound recording copyrighted, and commercial scraping of such content requires permission from the recording’s owner, Gervais says. The EU has also charted a stronger stance against artificial intelligence by proposing laws that would classify the risks of an AI system.

“There’s a huge fork in the road between Europe and the United States,” he says. “It is much more aggressive.”

It was a very surreal moment when I clicked ‘play’ on that, and heard my own voice coming back to me

Mike Cooper

In late January, Mike Cooper received an email from a company advertising a library of voice-overs for sale. He was intrigued and scrolled onto the page and quickly found his voice in the library as a sample.

“It was a very surreal moment when I clicked ‘play’ on that, and heard my own voice coming back to me,” he says.

Cooper, who lives in Asheville, North Carolina, says he was angry at first. But then he remembered why this happened. The company now selling his voice had probably gotten it after acquiring a firm Cooper did a few minutes of voiceover work for in 2016.

Cooper remembers a provision in his contract saying his voice could be used elsewhere. But he recalls thinking it was harmless. He was only giving the company a few minutes of his voice, he says.

“I viewed the risk as extremely small,” he says. “I was absolutely wrong.”

But Cooper says synthetically generated voices made without his input can’t offer what he can – a deep understanding of what a project needs, and a performance with emotion and intention: “It’s nothing like as good as me.”

© The Washington Post

Join our commenting forum

Join thought-provoking conversations, follow other Independent readers and see their replies

Comments

Thank you for registering

Please refresh the page or navigate to another page on the site to be automatically logged inPlease refresh your browser to be logged in