Stay up to date with notifications from The Independent

Notifications can be managed in browser preferences.

Yell cries foul over the cyberspace copycats

Clayton Hirst on the growing trend for taking details off commercial websites for anything from marketing to crime

Sunday 06 March 2005 01:00 GMT
Comments

Your support helps us to tell the story

From reproductive rights to climate change to Big Tech, The Independent is on the ground when the story is developing. Whether it's investigating the financials of Elon Musk's pro-Trump PAC or producing our latest documentary, 'The A Word', which shines a light on the American women fighting for reproductive rights, we know how important it is to parse out the facts from the messaging.

At such a critical moment in US history, we need reporters on the ground. Your donation allows us to keep sending journalists to speak to both sides of the story.

The Independent is trusted by Americans across the entire political spectrum. And unlike many other quality news outlets, we choose not to lock Americans out of our reporting and analysis with paywalls. We believe quality journalism should be available to everyone, paid for by those who can afford it.

Your support makes all the difference.

In just three weeks some 7.5 million pages of information were downloaded from the Yell.com web- site to a computer system near Birmingham. At the peak of this extraordinary operation, which took place early last year and was masterminded by only one person, around 33,000 pages of the online directory were copied each hour, through the day and night.

In just three weeks some 7.5 million pages of information were downloaded from the Yell.com web- site to a computer system near Birmingham. At the peak of this extraordinary operation, which took place early last year and was masterminded by only one person, around 33,000 pages of the online directory were copied each hour, through the day and night.

This was not an isolated incident. Over the past year Yell has been targeted in a plethora of attempts to copy all the names and telephone numbers on its database, part of a growing phenomenon known as data scraping.

Using specialist software, unscrupulous individuals are copying vast amounts of data from commercial websites to be used for anything from market research to organised crime.

No one knows exactly how widespread data scraping is, but companies such as Yell have noticed a sharp rise. Over the past 12 months the FTSE 100 company has served at least three High Court injunctions on individuals involved in the practice, and it has recently stepped up monitoring on its database for illegal activity.

"We are very protective of our intellectual property and won't hesitate to take strong action to safeguard it," says a Yell spokesman. "[Data scraping] has the potential to undermine our business and the good relationships with our advertisers."

The reason companies like Yell are so worried is the potential damage that data scraping can do to their brands if the information falls into the wrong hands. "The problem is wide- spread," says Simon Moores, managing director of Zentelligence, a technology research organisation.

"In the extreme, data scraping could be used by a crime gang to steal information from a site. They could then use the data to spoof the internet domain and then sell advertising."

You don't have to be a tech expert to data scrape. It can be done from a home computer and the necessary software is available on the internet for immediate download for as little as £50.

For more sophisticated data-scraping exercises, there are dozens of companies in the US and the UK that will tailor software for a fee.

Scrape Goat is one of the better-known US firms specialising in this area. It claims it "reserves the right" to refuse work from anyone wanting to use its software in an illegal way. But the company says on its website that "it is up to you to determine the legality of the way you plan to use our scrapers".

The practice isn't outside the law per se. John Lever runs Systems Services, a Hertfordshire company that specialises in data scraping. He says: "I'll use whatever means are required to obtain the data, so long as it is legal." Mr Lever won't, say, gather mass data from sites where collection is specifically prohibited (such as on Yell.com), or use computer programs to bypass password protection. "I wouldn't be able to sleep at night if I was doing this sort of thing."

Mr Lever says he has turned down work that he suspected might have been on the edges of the law. "I once had a request from someone in America who wanted a list of the names of all the prisoners in US penitentiaries. I couldn't do that one."

While data scraping is only now beginning to be recognised as a phenomenon, its roots are in the late 1990s, and the growth of the internet.

During the hi-tech boom years, most dot-com business plans were based on advertisers spending vast amounts of money on websites that offered free content.

This idea has since been discredited. Most internet companies now realise that data is actually their biggest asset and that they must charge for its use.

The problem is, the attitude of many internet users hasn't changed from the boom years.

One data scraper - an IT expert who last year was served with an injunction for copying data from a commercial website - says: "These companies are using bully-boy tactics. You can't fight them unless you have half a million pounds."

The man adds: "What is the point in having all this information available on the web if you can't use it?"

Given trenchant views like this, and the relative ease with which people can download vast amounts of internet data, data scraping appears to be here to stay.

Join our commenting forum

Join thought-provoking conversations, follow other Independent readers and see their replies

Comments

Thank you for registering

Please refresh the page or navigate to another page on the site to be automatically logged inPlease refresh your browser to be logged in