Interesting People mailing list archives

Data is a fingerprint': why you aren't as anonymous as you think online


From: "Dave Farber" <farber () gmail com>
Date: Sun, 15 Jul 2018 06:44:21 +0900




Begin forwarded message:

From: Dewayne Hendricks <dewayne () warpspeed com>
Date: July 15, 2018 at 6:27:54 AM GMT+9
To: Multiple recipients of Dewayne-Net <dewayne-net () warpspeed com>
Subject: [Dewayne-Net] 'Data is a fingerprint': why you aren't as anonymous as you think online
Reply-To: dewayne-net () warpspeed com

'Data is a fingerprint': why you aren't as anonymous as you think online
So-called ‘anonymous’ data can be easily used to identify everything from our medical records to purchase histories
By Olivia Solon
Jul 13 2018
<https://www.theguardian.com/world/2018/jul/13/anonymous-browsing-data-medical-records-identity-privacy>

In August 2016, the Australian government released an “anonymised” data set comprising the medical billing records, 
including every prescription and surgery, of 2.9 million people.

Names and other identifying features were removed from the records in an effort to protect individuals’ privacy, but 
a research team from the University of Melbourne soon discovered that it was simple to re-identify people, and learn 
about their entire medical history without their consent, by comparing the dataset to other publicly available 
information, such as reports of celebrities having babies or athletes having surgeries.

The government pulled the data from its website, but not before it had been downloaded 1,500 times.

This privacy nightmare is one of many examples of seemingly innocuous, “de-identified” pieces of information being 
reverse-engineered to expose people’s identities. And it’s only getting worse as people spend more of their lives 
online, sprinkling digital breadcrumbs that can be traced back to them to violate their privacy in ways they never 
expected.

Nameless New York taxi logs were compared with paparazzi shots at locations around the city to reveal that Bradley 
Cooper and Jessica Alba were bad tippers. In 2017 German researchers were able to identify people based on their 
“anonymous” web browsing patterns. This week University College London researchers showed how they could identify an 
individual Twitter user based on the metadata associated with their tweets, while the fitness tracking app Polar 
revealed the homes and in some cases names of soldiers and spies.

“It’s convenient to pretend it’s hard to re-identify people, but it’s easy. The kinds of things we did are the kinds 
of things that any first-year data science student could do,” said Vanessa Teague, one of the University of Melbourne 
researchers to reveal the flaws in the open health data.

One of the earliest examples of this type of privacy violation occurred in 1996 when the Massachusetts Group 
Insurance Commission released “anonymised” data showing the hospital visits of state employees. As with the 
Australian data, the state removed obvious identifiers like name, address and social security number. Then the 
governor, William Weld, assured the public that patients’ privacy was protected.

Latanya Sweeney, a computer science grad who later became the chief technology officer at the Federal Trade 
Commission, showed how wrong Weld was by finding his medical records in the data set. Sweeney used Weld’s zip code 
and birth date, taken from voter rolls, and the knowledge that he had visited the hospital on a particular day after 
collapsing during a public ceremony, to track him down. She sent his medical records to his office.

In later work, Sweeney showed that 87% of the population of the United States could be uniquely identified by their 
date of birth, gender and five-digit zip codes.

“The point is that data that may look anonymous is not necessarily anonymous,” she said in testimony to a Department 
of Homeland Security privacy committee.

More recently, Yves-Alexandre de Montjoye, a computational privacy researcher, showed how the vast majority of the 
population can be identified from the behavioural patterns revealed by location data from mobile phones. By analysing 
a mobile phone database of the approximate locations (based on the nearest cell tower) of 1.5 million people over 15 
months (with no other identifying information) it was possible to uniquely identify 95% of the people with just four 
data points of places and times. About 50% could be identified from just two points.

The four points could come from information that is publicly available, including a person’s home address, work 
address and geo-tagged Twitter posts.

[snip]

Dewayne-Net RSS Feed: http://dewaynenet.wordpress.com/feed/
Twitter: https://twitter.com/wa8dzp





-------------------------------------------
Archives: https://www.listbox.com/member/archive/247/=now
Modify Your Subscription: https://www.listbox.com/member/?member_id=18849915
Unsubscribe Now: 
https://www.listbox.com/unsubscribe/?member_id=18849915&id_secret=18849915-a538de84&post_id=20180714174431:1529A88E-87AF-11E8-9CAE-83253733CDEB
Powered by Listbox: https://www.listbox.com

Current thread: