Google hopes to prevent robot uprising with new AI training technique
Designed to discourage machines from cheating
Your support helps us to tell the story
From reproductive rights to climate change to Big Tech, The Independent is on the ground when the story is developing. Whether it's investigating the financials of Elon Musk's pro-Trump PAC or producing our latest documentary, 'The A Word', which shines a light on the American women fighting for reproductive rights, we know how important it is to parse out the facts from the messaging.
At such a critical moment in US history, we need reporters on the ground. Your donation allows us to keep sending journalists to speak to both sides of the story.
The Independent is trusted by Americans across the entire political spectrum. And unlike many other quality news outlets, we choose not to lock Americans out of our reporting and analysis with paywalls. We believe quality journalism should be available to everyone, paid for by those who can afford it.
Your support makes all the difference.Google is developing a new system designed to prevent artificial intelligence from going rogue and clashing with humans.
It’s an idea that has been explored by a multitude of sci-fi films, and has grown into a genuine fear for a number of people.
Google is now hoping to tackle the issue by encouraging machines to work in a certain way.
The company’s DeepMind division, which was behind the AI that recently defeated Ke Jie, the world’s number one Go player, has teamed up with Open AI, a research group that’s part-funded by Elon Musk.
They’ve released a paper explaining how human feedback can be used to ensure machine-learning systems work things out the way in which their trainers want them to.
A technique called reinforcement learning, which is popular in AI research, challenges software to complete tasks, and rewards it for doing so.
However, the software has been known to cheat, by figuring out shortcuts or uncovering loopholes that maximise the size of the reward it receives.
In one instance it drove a boat around in circles in racing game CoastRunners, instead of actually completing the course because it knew it would still win a reward, reports Wired.
DeepMind and Open AI are trying to solve the problem by using human input to recognise when artificial intelligence complete tasks in the “correct” way, and then reward them for doing so.
“In the long run it would be desirable to make learning a task from human preferences no more difficult than learning it from a programmatic reward signal, ensuring that powerful RL systems can be applied in the service of complex human values rather than low-complexity goals,” reads the report.
Unfortunately, the improved reinforcement learning system is too time-consuming to be practical right now, but it gives us an idea of how the development of increasingly advanced machines and robots could be controlled in the future.
Join our commenting forum
Join thought-provoking conversations, follow other Independent readers and see their replies
Comments