Facial recognition systems trained on millions of photos of people without their consent

'This is the dirty little secret of AI training sets,' a legal expert warns

Anthony Cuthbertson
Wednesday 13 March 2019 14:36 GMT
Comments
Artificial intelligence algorithms have typically struggled to identify women and people with darker skin through facial recognition
Artificial intelligence algorithms have typically struggled to identify women and people with darker skin through facial recognition (Getty/iStock)

Your support helps us to tell the story

From reproductive rights to climate change to Big Tech, The Independent is on the ground when the story is developing. Whether it's investigating the financials of Elon Musk's pro-Trump PAC or producing our latest documentary, 'The A Word', which shines a light on the American women fighting for reproductive rights, we know how important it is to parse out the facts from the messaging.

At such a critical moment in US history, we need reporters on the ground. Your donation allows us to keep sending journalists to speak to both sides of the story.

The Independent is trusted by Americans across the entire political spectrum. And unlike many other quality news outlets, we choose not to lock Americans out of our reporting and analysis with paywalls. We believe quality journalism should be available to everyone, paid for by those who can afford it.

Your support makes all the difference.

Facial recognition algorithms are being trained using photos of people who have not given their consent, legal experts have warned.

Companies like IBM are scraping millions of publicly available images from Flickr and other sites in order to improve the technology, though the people in the photos have no idea this is happening.

Civil rights activists warn that this technology could one day be used to track and spy on the same people whose faces have been used to train it.

"This is the dirty little secret of AI training sets. Researchers often just grab whatever images are available in the wild," NYU School of Law professor Jason Schultz told NBC, who first reported on the issue.

Around 100 million Creative Commons-licensed images are available for artificial intelligence researchers to draw upon to train facial recognition systems through Yahoo's YFCC-100M dataset.

IBM used around one million images from the dataset in its 'Diversity in Faces' research that aimed to improve AI's historical issue with identifying women and people with darker skin.

"We are harnessing the power of science to create AI systems that are more fair and accurate," IBM researcher John Smith wrote in a blog that detailed the research.

"The AI systems learn what they're taught, and if they are not taught with robust and diverse datasets, accuracy and fairness could be at risk. For that reason, IBM, along with AI developers and the research community, need to be thoughtful about what data we use for training."

The researcher claims the publicly available images are the best way of ensuring training data is large enough and diverse enough to reflect the distribution of face types around the world.

People who have since discovered their pictures are in the dataset used by IBM took to Twitter to question the ethics of using such images.

"IBM is using 14 of my photos," said Flickr co-founder Caterina Fake. "IBM says people can opt out, but is making it impossible to do so."

The Independent reached out to IBM for a comment.

Join our commenting forum

Join thought-provoking conversations, follow other Independent readers and see their replies

Comments

Thank you for registering

Please refresh the page or navigate to another page on the site to be automatically logged inPlease refresh your browser to be logged in