Results 1 -
3 of
3
De-anonymizing social networks
, 2009
"... Operators of online social networks are increasingly sharing potentially sensitive information about users and their relationships with advertisers, application developers, and data-mining researchers. Privacy is typically protected by anonymization, i.e., removing names, addresses, etc. We present ..."
Abstract
-
Cited by 57 (2 self)
- Add to MetaCart
Operators of online social networks are increasingly sharing potentially sensitive information about users and their relationships with advertisers, application developers, and data-mining researchers. Privacy is typically protected by anonymization, i.e., removing names, addresses, etc. We present a framework for analyzing privacy and anonymity in social networks and develop a new re-identification algorithm targeting anonymized socialnetwork graphs. To demonstrate its effectiveness on realworld networks, we show that a third of the users who can be verified to have accounts on both Twitter, a popular microblogging service, and Flickr, an online photo-sharing site, can be re-identified in the anonymous Twitter graph with only a 12 % error rate. Our de-anonymization algorithm is based purely on the network topology, does not require creation of a large number of dummy “sybil ” nodes, is robust to noise and all existing defenses, and works even when the overlap between the target network and the adversary’s auxiliary information is small. 1.
Shmatikov How To Break Anonymity of the Netflix Prize Dataset. arxiv cs/0610105
, 2006
"... largest online movie rental service—publicly released a dataset containing movie ratings of 500,000 Netflix subscribers. The dataset is intended to be anonymous, and all personally identifying information has been removed. We demonstrate that an attacker who knows only a little bit about an individu ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
largest online movie rental service—publicly released a dataset containing movie ratings of 500,000 Netflix subscribers. The dataset is intended to be anonymous, and all personally identifying information has been removed. We demonstrate that an attacker who knows only a little bit about an individual subscriber can easily identify this subscriber’s record if it is present in the dataset, or, at the very least, identify a small set of records which include the subscriber’s record. This knowledge need not be precise, e.g., the dates may only be known to the attacker with a 14-day error, the ratings may be known only approximately, and some of the ratings may even be completely wrong. Using the Internet Movie Database (IMDb) as our source of auxiliary information, we successfully identified Netflix records of non-anonymous IMDb users, uncovering information—such as their apparent political preferences—that could not be determined from their public IMDb ratings. We also discuss the implications that a successful deanonymization of the Netflix dataset may have for the Netflix Prize competition. 1
Robust De-anonymization of Large Datasets (How to Break Anonymity of the Netflix Prize Dataset)
, 2008
"... We present a new class of statistical de-anonymization attacks against high-dimensional micro-data, such as individual preferences, recommendations, transaction records and so on. Our techniques are robust to perturbation in the data and tolerate some mistakes in the adversary’s background knowledge ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We present a new class of statistical de-anonymization attacks against high-dimensional micro-data, such as individual preferences, recommendations, transaction records and so on. Our techniques are robust to perturbation in the data and tolerate some mistakes in the adversary’s background knowledge. We apply our de-anonymization methodology to the Netflix Prize dataset, which contains anonymous movie ratings of 500,000 subscribers of Netflix, the world’s largest online movie rental service. We demonstrate that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber’s record in the dataset. Using the Internet Movie Database as the source of background knowledge, we successfully identified the Netflix records of known users, uncovering their apparent political preferences and other potentially sensitive information. 1

