THE INTERNET IS WATCHING YOU
Test Center | 29 November 2010
THE INTERNET IS WATCHING YOU
Everything you do online is being used to track you and guess what you’ll want to do next. Should you be scared yet?
By DOMINIK HOFERER and ANDREAS HENTSCHEL
The owner of the corner shop has known you for a long time. He knows what you eat, that you like to drink Italian wine, and that you usually watch action movies on Sundays. That’s how he can offer you things that you need, like a new crime thriller, the perfect bottle for your next party, and reserved bags of your favorite snacks when you forget to order them.
What sounds like a pleasant community store in the past occurs every day on the Internet. Our “corner shop owner” is not behind the counter, but instead runs a successful online business that offers exactly what his customers need. He might have had to know you personally 30 years ago, but today the business’s computers simply have to analyze your online visiting habits.
Now imagine that man in the store is following you around, reminding you to buy a gift for the party you said you’d be attending on Facebook last week. Or imagine that he’s seen your status updates about starting a diet, and starts telling you about the store’s low-fat foods section. This is pretty much what’s happening online these days. CHIP shows how online shops today use advanced Deep Packet Inspection to screen customers such that they can offer exactly what the customer wants. We also give you the lowdown on how behavior-based advertisements work with behavioral targeting.
Online shops collect data en masse
Online shops such as Amazon swear by one rule: get to know everything about our customers. The more information it has, the more specific its user profiles will be, and the more effective its advertisements. Thus, products that one has viewed on Amazon influence the display of others. For instance, if someone buys a Wii game console, he will be offered accessories for it in the future.
Many find this invasive and Amazon has had to face criticism from individuals, activists, and even the media. German TV host Günther Jauch famously called out the store after he once received a package with something he called “erotic”, which had not been meant for him. Since then he has constantly received pornographic recommendations. Though Jauch’s surprise has given rise to plenty of jokes about his supposed gifting ideas, it also exposes the weaknesses of this system. Amazon does not know that the erotic product was not supposed to match with Jauch’s profile.
Amazon also often displays products that are not of interest to the customer—a waste of advertising space. In one such example, Amazon displayed two different types of refill packs for a coffee machine it was selling, in the advertising module titled “Customers who bought this product also bought…..” The packs did not work with this machine at all! This is annoying for those who see an opportunity and quickly buy what looks like a good product, assuming it matches.
Analyzing surfing habits
Behavioral Targeting techniques are an evolution of this idea, which many marketing professionals consider a wonder weapon. Behavior-based advertisement displays take into account where the user comes from, which websites he has visited previously, and what he has clicked on.
For a long time, Google’s AdWords service has been displaying advertisements after detecting keywords on a web page. However since March 2009, the search giant has also been offering behavioral targeting and can display specific advertisements to groups of people. For instance, if a user has been browsing through a sportswear website for a football shirt in August, he might be shown ads for another website with Christmas offers on similar products in December. Google itself describes its technique as using cookies which save tracking information on users' computers.
The possibilities available to a shop through behavior-based display are as endless as the creativity of search and marketing providers. If a customer only clicks on special offers, the online shop can even discourage him by directing him to a slow server in the future and spoiling the fun of bargain shopping. In addition, the dealer puts the customer at a disadvantage by not displaying advertisements related to special offers. These will be shown only to customers they want to reward!
Online shops also apply marketing tips from the real world. For instance, if a retailer wants to attract only well-to-do customers, leaflets with attractive offers are only put in mailboxes in upmarket areas with well-situated residents. Similarly, one can use geolocation information to analyze the place of origin of a surfer and recommend specific offers to him or her. The coordinates obtained through IP address identification on the Internet are very fine-grained, but modern cellphones and certain desktop browsers now supply precise GPS locations, which can even be used to guess the financial behavioral pattern of any surfer.
In-depth analysis divulges too much information
Deep Packet Inspection, or DPI in short, is a technological continuation of this personalized advertisement strategy. While theoretically a surfer can avoid behavioral targeting by not allowing any cookies, DPI traces a user’s activities on the Internet as if he or she is under surveillance. In theory, every website that is called up can be recorded; every mail can be scanned in real time—and with the help of keywords found in these, an individual profile can be created through which advertisers can send users specific offers. For instance, if an advertiser detects a number of messages to a car dealer from a customer inquiring about certain accessories, advertisements for those very products can be inserted in advertisement spaces as he or she browses the Web. However, online shops cannot use DPI by themselves; they need Internet service providers to offer it, but they seem to be cautious of violating user privacy agreements. Governments will soon be forced to formulate policies to regulate this practice.
The technology is not new. It is already being used for things like filtering viruses and spam. ISPs normally look at the IP headers of data packets in transit (in which the sender and the recipient IP addresses are mentioned), which means they can easily use the same techniques to search through an entire packet. This way the provider gets an insight into the actual data that is being sent and received.
DPI can be misused, but there are no cases that could be cause for any alarm at present. It possible for providers to analyze data traffic, and manipulate it as well—just like cybercriminals do when they attempt to send malicious code to a victim.
Advertisers do not always need ingenious techniques to get information about surfers from the Internet. Users voluntarily give away plenty of information too. For instance, Amazon users can create wishlists in which they save products they desire but do not own yet. Friends can have a look, to order the products and send them to the creator of the list as gifts. What many do not know is that if one isn’t careful with his or her Amazon settings, the wishlists become public and the whole world can access them through search engines.
Web 2.0 follows specific identities
Social networks are also ideal data sources for marketing professionals. Data collectors have been known to make the most of Facebook with its open API. One can program applications that convince users to grant them access to personal information, including details about their other friends. Other less ethical means include persuading people to add a fake profile as a “friend”, thereby granting it access to more of your user profile, which most people leave totally visible to their friends. Through the Facebook API, programmers can access information about members, including details such as their employers, religious affiliations, and sexual orientation. According to the Facebook developer Wiki, applications can access over 50 sets of user information—which is interesting for marketers and hackers alike.
American students of MIT at successful in programming a “radar” system for Facebook, which can analyze the information stored in a user’s friends’ profiles to draw conclusions about that person, even if his own settings made all information private. This should be a warning sign for users of a community not to publish too much of their real lives online. Most importantly, people need to be cautious about the kinds of applications, games and quizzes they click on, since doing so grants all of them access to one’s personal information.
While Facebook is a superb example, all of this also applies to other services that identify individuals, such as OpenID and Google Accounts. These let users log into dozens of websites with a single username and password. For example, with a valid Facebook account, members can use the Facebook Connect system to log in to the video sharing portal Vimeo which also lets you publish your “likes” on your wall. This is easy for users and opens up new ways for companies to court customers if they are ethical. Online shops are experimenting with ways to display products that friends have bought or looked at often (although this famously spoiled many people’s Christmas shopping surprises when Facebook demonstrated the capability with its highly criticized and short-lived Beacon advertising program in late 2007).
Another example is a promotional online trailer for the videogame Prototype, which came out in mid 2009. Those who used Facebook Connect suddenly found themselves becoming part of the trailer! It accessed users’ names, photos and professional backgrounds through their Facebook profiles and integrated this information into scenes in the trailer.
Users become advertising figures
People are more receptive to recommendations from friends than from strangers, so companies try reaching customers personally by creating so-called fansites. Any user can, for instance, become fans of products, people, companies, and even designs. With Facebook’s Open Graph tool, companies even have the opportunity to put advertisements on external websites to receive testimonials from members of the fansite, and gain advertising exposure through the profile picture.
People who recommend products of their own accord are particularly of interest to online shops. Economists have conducted research on filtering out these opinion makers in the populations of online networks through community analysis. There are plenty of scenarios for such identification services to thrive in, when advertisers start linking information from the digital world with the real.
Data and the Google juggernaut
Of course no discussion of privacy online is complete without analyzing Google’s data-mining habits. The search giant is in a position to use its multiple online properties to gather amazing amounts of information, and possibly even link these profiles to individuals in the real world. The company’s motto has long been “Don’t be evil”, but it’s difficult to ascertain what exactly the company considers to be within this limit and what is too much. Incidents of anti-Google dissent are growing more common, from strangers being able to follow you on Google Wave, to protests in the publishing industry against the mass digitization of books, to rumblings of antitrust cases because of the company’s dominance in online advertising. Jeff Jarvis, blogger and author of the book “What would Google do?” sharply criticizes the company for a product called Sidewiki which collects user comments about websites and saves them on Google servers. The site operators themselves, and the furious Jeff Jarvis, have no control over it. Google copies entire libraries, and has detailed photographs of the entire planet, covering all countries and cities, many streets and houses, the oceans, the Moon and Mars. Google offers an operating system for mobile phones, and soon there will also be one for netbooks. Google says "It is our mission to organize the information of the world and to make it accessible and usable worldwide”.
The information of the world also includes health data. Google has for example invested in the start-up 23andMe, run by Sergey Brin's wife Anne Wojcicki, which offers genetic analysis for anyone. Will we one day be able to run a search to find out which illnesses we are predisposed to?
One can also see it as part of a strategy to be omnipresent on the Web. In order to submit and read comments in Sidewiki, the Google toolbar must be installed. This piece of software doesn't have a very good reputation, and continuously provides Google with user information, linking information on sites you surf to Google services, such as addresses in Google Maps. On the sidelines of the 2009 Frankfurt Book Fair, Google announced its entry into the digital book business. Google soon intends to digitize every book in the world!
Even the Chrome browser doesn’t have a clean record when it comes to privacy—it identifies each user with a unique ID. As of version 4.1, the ID is purged when a user first downloads an update, but it should not be there at all.
Brilliant ideas underlie most Google services. They are easy to use, technically solid, and best of all, they’re nearly all free. Google does a lot of good as a company by investing in alternative energy production and giving employees an allowance if they buy a hybrid car. But Google is also greedy for data. It commands the largest Web index available, and has insight into every website, photo and video.
“We are building a mirror world,” said Marissa Mayer, head of Google Search, a few years ago at the Digital Life Design Conference in Munich. The company is setting up a digital copy of our world. It records down to the last detail, how we move in it.
Google tracks 80 percent of all websites
One can hardly elude Google today. It does not help if you stop using Google Search, YouTube, Picasa or even the services requiring registration like GMail, Docs and Calendar. With its astoundingly wide network, Google is present on 80 percent of all websites—for lay persons often invisibly. After its acquisition of advertising network DoubleClick, around half of all ad banners on the Web originate from Google servers. The more inconspicuous, but still more widely spread text ads come from Google AdWords as well. The Google Analytics service works completely secretly, allowing website operators to analyze the click-paths of their visitors. Whenever a surfer lands on a site that uses this service, Google sets a cookie with a unique ID and records his or her IP address. Thanks to its super dense network, Google can then see exactly who moves how on the Internet. Every click or search query generates a log entry with an IP address and unique cookie ID as well as a time stamp. The log file of YouTube until mid 2008 alone was over 12 Terabytes in size.
For database security as well as privacy concerns, the different databases for each Google service are not necessarily tied to each other, but it is technically possible and Google certainly has to have the know-how. Even when there are no actual names, the records have enough parts to piece together a picture of the person who is sitting at a PC, where he lives, what interests he has, and how much money he spends. Google justifies its passion for collection by being able to improve its services with the data. Only in this way can it know how to show personalized relevant search results, or ads that users are more likely to click on.
How much is too much?
One can pick up interesting tidbits from the official company blogs, such as the fact that some employees are excited about the idea of building a 3D model of every building ever built on the planet. Google Building Maker already makes the required tools available. On an academic level, most of those working at high levels in the company are IT pros, mathematicians and statisticians—most of them toppers from prestigious universities. For them the masses of data collected are like toys with which they can run riot. They work on them as if possessed, to write algorithms which recognize patterns and structures in what seems like random chaos. There are no limits. Suggestions for projects which might seem outlandish are particularly welcome at Google. Lars Reppesgaard quotes a Google software engineer in his book The Google Empire: “One day, someone suggests some wild endeavor for which he needs a few thousand computers, and you say ‘OK, you’ve got it’.” Usually it takes new employees a couple of months to get so far, but the moment can come anytime.
Deep Packet Inspection
China uses it for Internet monitoring, the same way as Tunisia and Iran. Deep Packet Inspection (DPI) has become an explosive topic since it first started being used not only used for Internet security, but also for on-the-fly-manipulation of websites, be it to silence political dissidents or display personalized advertisements.
Personal privacy becomes a concern when companies start matching individuals to the profiles they generate online. Serving advertisements by harnessing this knowledge is a questionable practice in terms of data protection regulation, and most countries’ legal systems see this as a gray area. However, certain cases have come up in courts of law. The EU commission has initiated proceedings against Great Britain since it failed to prevent British Telecom from violating its user privacy guidelines. The fact that it managed to display advertisements through Phorm without users’ consent implies that the UK’s own laws do not have this kind of protection mechanism in place. Most countries’ laws have only limited control over what people do with the data floating out there, since they can hardly keep pace with the development of new technology.
However at least some countries, for example Germany, are becoming aware of the problem and have begun to enact laws precluding general-purpose monitoring of citizens through DPI.
What it means for users
Privacy sometimes takes a backseat when it could slow down innovative thinking. In the midst of protests about Google parsing its users’ email to show related ads, founders Larry Page and Sergey Brin answered: “That is automated. No one watches, so we don’t believe that personal privacy is affected”
Data that doesn’t include specific private information can still be enough to personally identify you. One does not need to read crude conspiracy theories to imagine how interesting such data could be for the world’s governments, which are already overzealous about protecting their national security. Some agencies already monitor the eating preferences of airline passengers to filter cultural influences. What if new laws compel Amazon and Google to disclose their log files to prosecutors and intelligence agencies? Each innocent-looking mouse click would gain even greater importance; way beyond individual privacy concerns.