Data gathered from Facebook users likely spread to other databases and dark web, say experts

'With a relatively small amount of data points, you can infer an incredible amount of very personal information about people'

Drew Harwell,Elizabeth Dwoskin
Friday 23 March 2018 17:25 GMT
Facebook data row: What is Cambridge Analytica?

The data on millions of Facebook users that a firm wrongfully swiped from the social network has likely spread to other groups, databases and the dark web, experts said, making company's pledge to safeguard its users’ privacy hard to enforce.

Chief executive Mark Zuckerberg said earlier this week that it would notify users whose data might have been taken by Cambridge Analytica, a political marketing firm that worked for the Trump campaign. Cambridge Analytica obtained the data of an estimated 50 million users in 2014 and 2015 under false pretences, breaking Facebook’s rules. Mr Zuckerberg said that the world’s biggest social network has taken steps to ensure data on millions of its users does not get into the wrong hands.

But Paul-Olivier Dehaye, a privacy expert and co-founder of PersonalData.IO, said he suspects the data has already proliferated far beyond Cambridge Analytica’s reach. “It is the whole nature of this ecosystem,” Mr Dehaye said. “This data travels. And once it has spread there is no way to get it back.”

Mr Zuckerberg added that Facebook will investigate and audit thousands of third-party developers. Third-party apps could access data on Facebook users and their friends until 2015, when Facebook changed its rules. Experts question whether the network’s push to investigate and audit thousands of third-party developers will merit any true results. Mr Dehaye questioned how Facebook would define which apps merit investigation and what would constitute “suspicious activity”.

Facebook said that it conducts manual and automated checks to verify developers comply with its policies. It also plans to expand its bug bounty programme to report misuse of data.

Mr Zuckerberg said in interviews that the company is investigating reports that independent researchers and dark web data brokers are trading user data grabbed by the firm Cambridge Analytica.

Frank Pasquale, a University of Maryland professor who specialises in algorithms and tech ethics, called this “the runaway data problem” and said there is no way to return the genie to the bottle when it comes to securing data that’s already been released. Location and demographic information, like the data taken from Facebook, can often be used to tie someone to other data points where the identity was previously unclear.

“The larger data sets you get about individuals, the easier it is to use those to reidentify them in data sets where they think they’re anonymous,” professor Pasquale said. “With a relatively small amount of data points, you can infer an incredible amount of very personal information about people.”

Facebook does not know whether other companies have shared or mishandled user data and a forensic audit is ongoing, Mr Zuckerberg told Wired magazine. Asked by Wired how confident he was that Facebook data had not gotten into the hands of Russian operatives or other groups, Mr Zuckerberg said, “I can’t really say that. I hope that we will know that more certainly after we do an audit.”

For many of Facebook’s prime growth years, the company gave outside developers access to virtually everything that a user who authorised an app or her friends, had posted on the social network: her home town, current city, events and location check-ins; her interests, groups and all the pages she’d liked; her relationship statuses with romantic partners, friends and family; her birthday, activities, work history and political and religious affiliations and her photos, notes and videos.

Facebook changed its rules in 2015 amid concerns over how the data was being used. But for years, other developers had the power to construct the same kinds of massive micro-targeted databases that had helped make Facebook so prominent. It’s unclear how many other services used that power or what they have done with the data pulled.

Mr Zuckerberg said the company will “investigate all apps that had access to large amounts of information” before the rule change, a number he said would likely be in the thousands. The company, he added, “will conduct a full audit of any app with suspicious activity” and said the company would likely need to hire more workers to complete the audits. “We want to make sure that there aren’t other Cambridge Analyticas out there,” he told Wired.

The data shared with Cambridge Analytica was taken via a personality quiz, called ThisIsYourDigitalLife, that was initially approved by Facebook for research purposes.

It’s unclear how Facebook would know how to find or recover users’ data. The data taken by the researcher Dr Aleksandr Kogan, who provided it to Cambridge Analytica, “wasn’t watermarked in anyway,” Mr Zuckerberg told Wired. “And if he passed along data to Cambridge Analytica that was some kind of derivative data based on personality scores or something, we wouldn’t have known that or ever seen that data.”

In the same year that Facebook severed ties with him, Dr Kogan also started his own San Francisco-based survey data firm, Philometrics, raising questions about whether he took the Facebook data with him and used for commercial purposes. (Dr Kogan did not reply to repeated requests for comment).

Apps and start-ups that grabbed user data over a number of years, Mr Dehaye said, often hand over their data if they’re acquired by another company or sell their data sets if they close or liquidate.

Facebook opened the door to developers in 2007 in hopes of expanding Facebook’s reach across the web by making it easier for other sites to connect with the sprawling connective maps Facebook uses to link people by relationships and tastes, known as its “social graph” and “interest graph”.

Marketing firms have spent tens of millions of dollars to learn similar information – including compiling consumer surveys and purchasing massive consumer files from data brokers such as Experian and Acxiom – all of which came from different sources and had varying ages, precision and usefulness. Facebook’s wealth of data, on the other hand, was packed with detailed information volunteered by users themselves and offered completely free until the rule change took effect in 2015.

Facebook, Mr Zuckerberg said, will now restrict the data that third-party developers can access to names, profile photos and email addresses and will require developers to sign a contract before being allowed to ask users for rights to their posts.

Facebook said it will ban developers who misuse its data.

The sheer size of the data pulled from Facebook, experts say, is powerful on its own – and could prove valuable for marketers, political campaigns or other groups seeking to target users en masse.

“Getting good data on 50 million people from a relatively neutral, nonpartisan source that is diversely spread and not just clustered in one tiny segment of the social graph – that’s a big deal,” said Matthew Hindman, a George Washington University associate professor who researches online campaigning and internet politics. “If you can see that many people’s activity on Facebook, you can guess pretty accurately what their partisanship might be, no matter how good your model is.”

The Washington Post

Thank you for registering

Please refresh the page or navigate to another page on the site to be automatically logged inPlease refresh your browser to be logged in