AI text generator too dangerous to release, say creators

Developers cite concerns over fake news proliferation and risk of online impersonation

Comments

New AI software can write endless reams of invented, coherent prose (Getty)

Your support helps us to tell the story

From reproductive rights to climate change to Big Tech, The Independent is on the ground when the story is developing. Whether it's investigating the financials of Elon Musk's pro-Trump PAC or producing our latest documentary, 'The A Word', which shines a light on the American women fighting for reproductive rights, we know how important it is to parse out the facts from the messaging.

At such a critical moment in US history, we need reporters on the ground. Your donation allows us to keep sending journalists to speak to both sides of the story.

The Independent is trusted by Americans across the entire political spectrum. And unlike many other quality news outlets, we choose not to lock Americans out of our reporting and analysis with paywalls. We believe quality journalism should be available to everyone, paid for by those who can afford it.

Your support makes all the difference.

Groundbreaking new artificial intelligence text generation software built by a company backed by Elon Musk is too dangerous to make public, its creators say.

OpenAI, a nonprofit artificial intelligence research group, said their GPT-2 software is so good they are worried it could be misused.

The software generates coherent text, and can be prompted to write on certain subjects or in a certain style by feeding it paragraphs of source material.

The algorithm was trained on eight million web pages and the results are far better than any previous attempt at computer text-generation, where odd syntax changes and rambling nonsense have been difficult to iron out.

The success of the software has seen it dubbed “deepfakes for text”, and among the core concerns are that it could be used to generate unstoppable quantities of fabricated news or impersonate people online.

In a blog on the results, OpenAI provided examples of the prose the software generated.

Here is the human-written source text prompt they fed it: “In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.”

The software then carried on writing the piece, including its own invented quotes.

It wrote: “The scientist named the population, after their distinctive horn, Ovid’s Unicorn. These four-horned, silver-white unicorns were previously unknown to science.

"Now, after almost two centuries, the mystery of what sparked this odd phenomenon is finally solved.

“Dr. Jorge Pérez, an evolutionary biologist from the University of La Paz, and several companions, were exploring the Andes Mountains when they found a small valley, with no other animals or humans. Pérez noticed that the valley had what appeared to be a natural fountain, surrounded by two peaks of rock and silver snow.”

The software reportedly took 10 attempts to produce this coherent example.

OpenAI said: “Overall, we find that it takes a few tries to get a good sample, with the number of tries depending on how familiar the model is with the context. When prompted with topics that are highly represented in the data (Brexit, Miley Cyrus, Lord of the Rings, and so on), it seems to be capable of generating reasonable samples about 50 per cent of the time.”

Worries over how the product could be used mean at this stage the company has only released a smaller version of the software.

"Due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale, we are only releasing a much smaller version,” the company said. “We are not releasing the dataset, training code, or GPT-2 model weights.”

OpenAI also suggested government policy could be required to address some of the issues, and thereby allow further progression in the field.

“Governments should consider expanding or commencing initiatives to more systematically monitor the societal impact and diffusion of AI technologies, and to measure the progression in the capabilities of such systems,” they said.

Join our commenting forum

Join thought-provoking conversations, follow other Independent readers and see their replies

Comments

Stay up to date with notifications from The Independent

Thank you for registering

AI text generator too dangerous to release, say creators

Developers cite concerns over fake news proliferation and risk of online impersonation

Bookmark popover

Join our commenting forum

Thank you for registering