Results 11 - 20
of
83
A Brief Survey on Anonymization Techniques for Privacy Preserving Publishing of Social Network Data
- In SIGKDD Explorations
"... Nowadays, partly driven by many Web 2.0 applications, more and more social network data has been made publicly available and analyzed in one way or another. Privacy preserving publishing of social network data becomes a more and more important concern. In this paper, we present a brief yet systemati ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
Nowadays, partly driven by many Web 2.0 applications, more and more social network data has been made publicly available and analyzed in one way or another. Privacy preserving publishing of social network data becomes a more and more important concern. In this paper, we present a brief yet systematic review of the existing anonymization techniques for privacy preserving publishing of social network data. We identify the new challenges in privacy preserving publishing of social network data comparing to the extensively studied relational case, and examine the possible problem formulation in three important dimensions: privacy, background knowledge, and data utility. We survey the anonymization methods for privacy preservation in two categories: clustering-based approaches and graph modification approaches. 1
“I Know What You Did Last Summer ” — Query Logs and User Privacy ABSTRACT
"... We investigate the subtle cues to user identity that may be exploited in attacks on the privacy of users in web search query logs. We study the application of simple classifiers to map a sequence of queries into the gender, age, and location of the user issuing the queries. We then show how these cl ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
We investigate the subtle cues to user identity that may be exploited in attacks on the privacy of users in web search query logs. We study the application of simple classifiers to map a sequence of queries into the gender, age, and location of the user issuing the queries. We then show how these classifiers may be carefully combined at multiple granularities to map a sequence of queries into a set of candidate users that is 300-600 times smaller than random chance would allow. We show that this approach remains surprisingly accurate even after removing personally identifiable information such as names/numbers or limiting the size of the query log. We also present a new attack in which a real-world acquaintance of a user attempts to identify that user in a large query log, using personal information. We show that combinations of small pieces of information about terms a user would probably search for can be highly effective in identifying the sessions of that user. We conclude that known schemes to release even heavily scrubbed query logs that contain session information have significant privacy risks.
Inferring private information using social network data
, 2008
"... Online social networks, such as Facebook, are increasingly utilized by many users. These networks allow people to publish details about themselves and connect to their friends. Some of the information revealed inside these networks is private and it is possible that corporations could use learning a ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Online social networks, such as Facebook, are increasingly utilized by many users. These networks allow people to publish details about themselves and connect to their friends. Some of the information revealed inside these networks is private and it is possible that corporations could use learning algorithms on the released data to predict undisclosed private information. In this paper, we propose an effective, scalable inference attack for released social networking data to infer undisclosed private information about individuals. We then explore the effectiveness of possible sanitization techniques that can be used to combat such an inference attack. 1
Measurement-calibrated Graph Models for Social Network Experiments
"... Access to realistic, complex graph datasets is critical to research on social networking systems and applications. Simulations on graph data provide critical evaluation of new systems and applications ranging from community detection to spam filtering and social web search. Due to the high time and ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Access to realistic, complex graph datasets is critical to research on social networking systems and applications. Simulations on graph data provide critical evaluation of new systems and applications ranging from community detection to spam filtering and social web search. Due to the high time and resource costs of gathering real graph datasets through direct measurements, researchers are anonymizing and sharing a small number of valuable datasets with the community. However, performing experiments using shared real datasets faces three key disadvantages: concerns that graphs can be de-anonymized to reveal private information, increasing costs of distributing large datasets, and that a small number of available social graphs limits the statistical confidence in the results. The use of measurement-calibrated graph models is an attractive alternative to sharing datasets. Researchers can “fit ” a graph model
privacy in social networks
- ICDE 2008 Poster
"... We consider a privacy threat to a social network in which the goal of an attacker is to obtain knowledge of a significant fraction of the links in the network. We formalize the typical social network interface and the information about links that it provides to its users in terms of lookahead. We co ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
We consider a privacy threat to a social network in which the goal of an attacker is to obtain knowledge of a significant fraction of the links in the network. We formalize the typical social network interface and the information about links that it provides to its users in terms of lookahead. We consider a particular threat where an attacker subverts user accounts to get information about local neighborhoods in the network and pieces them together in order to get a global picture. We analyze, both experimentally and theoretically, the number of user accounts an attacker would need to subvert for a successful attack, as a function of his strategy for choosing users whose accounts to subvert and a function of lookahead provided by the network. We conclude that such an attack is feasible in practice, and thus any social network that wishes to protect the link privacy of its users should take great care in choosing the lookahead of its interface, limiting it to 1 or 2, whenever possible.
G.: Prying Data out of a Social Network
- In: First International Conference on Advances in Social Networks Analysis and Mining. (2009
"... Abstract—Preventing adversaries from compiling significant amounts of user data is a major challenge for social network operators. We examine the difficulty of collecting profile and graph information from the popular social networking website Facebook and report two major findings. First, we descri ..."
Abstract
-
Cited by 12 (7 self)
- Add to MetaCart
Abstract—Preventing adversaries from compiling significant amounts of user data is a major challenge for social network operators. We examine the difficulty of collecting profile and graph information from the popular social networking website Facebook and report two major findings. First, we describe several novel ways in which data can be extracted by third parties. Second, we demonstrate the efficiency of these methods on crawled data. Our findings highlight how the current protection of personal data is inconsistent with users ’ expectations of privacy. Keywords-social networks; web crawling; privacy; I.
K-Automorphism: A general framework for privacy preserving network publication
- In VLDB
, 2009
"... The growing popularity of social networks has generated interesting data management and data mining problems. An important concern in the release of these data for study is their privacy, since social networks usually contain personal information. Simply removing all identifiable personal informatio ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
The growing popularity of social networks has generated interesting data management and data mining problems. An important concern in the release of these data for study is their privacy, since social networks usually contain personal information. Simply removing all identifiable personal information (such as names and social security number) before releasing the data is insufficient. It is easy for an attacker to identify the target by performing different structural queries. In this paper we propose k-automorphism to protect against multiple structural attacks and develop an algorithm (called KM) that ensures k-automorphism. We also discuss an extension of KM to handle “dynamic ” releases of the data. Extensive experiments show that the algorithm performs well in terms of protection it provides. 1.
xBook: Redesigning Privacy Control in Social Networking Platforms
"... Social networking websites have recently evolved from being service providers to platforms for running third party applications. Users have typically trusted the social networking sites with personal data, and assume that their privacy preferences are correctly enforced. However, they are now being ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Social networking websites have recently evolved from being service providers to platforms for running third party applications. Users have typically trusted the social networking sites with personal data, and assume that their privacy preferences are correctly enforced. However, they are now being asked to trust each third-party application they use in a similar manner. This has left the users’ private information vulnerable to accidental or malicious leaks by these applications. In this work, we present a novel framework for building privacy-preserving social networking applications that retains the functionality offered by the current social networks. We use information flow models to control what untrusted applications can do with the information they receive. We show the viability of our design by means of a platform prototype. The usability of the platform is further evaluated by developing sample applications using the platform APIs. We also discuss both security and nonsecurity challenges in designing and implementing such a framework. 1
Managing uncertainty in social networks
- IEEE DATA ENGINEERING BULLETIN
, 2007
"... Social network analysis (SNA) has become a mature scientific field over the last 50 years and is now an area with massive commercial appeal and renewed research interest. In this paper, we argue that new methods for collecting social nework strucuture, and the shift in scale of these networks, intro ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Social network analysis (SNA) has become a mature scientific field over the last 50 years and is now an area with massive commercial appeal and renewed research interest. In this paper, we argue that new methods for collecting social nework strucuture, and the shift in scale of these networks, introduces a greater degree of imprecision that requires rethinking on how SNA techniques can be applied. We discuss a new area in data management, probabilistic databases, whose main research goal is to provide tools to manage and manipulate imprecise or uncertain data. We outline the application building blocks necessary to build a large scale social networking application and the extent to which current research in probabilisitc databases addresses these challenges.
An ad omnia approach to defining and achieving private data analysis
- In Proceedings of the 1st ACM SIGKDD international conference on Privacy, security, and trust in KDD
, 2007
"... Abstract. We briefly survey several privacy compromises in published datasets, some historical and some on paper. An inspection of these suggests that the problem lies with the nature of the privacy-motivated promises in question. These are typically syntactic, rather than semantic. They are also ad ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Abstract. We briefly survey several privacy compromises in published datasets, some historical and some on paper. An inspection of these suggests that the problem lies with the nature of the privacy-motivated promises in question. These are typically syntactic, rather than semantic. They are also ad hoc, with insufficient argument that fulfilling these syntactic and ad hoc conditions yields anything like what most people would regard as privacy. We examine two comprehensive, or ad omnia, guarantees for privacy in statistical databases discussed in the literature, note that one is unachievable, and describe implementations of the other. In this note we survey a body of work, developed over the past five years, addressing the problem known variously as statistical disclosure control, inference control, privacy-preserving datamining, and private data analysis. Our principal motivating scenario is a statistical database. A statistic is a quantity computed from a sample. Suppose a trusted and trustworthy curator gathers sensitive information from a large number of respondents (the sample), with the goal of learning (and releasing to the public) statistical facts about the underlying population. The problem is to release statistical information without compromising the privacy of the individual respondents. There are two settings: in the noninteractive setting the curator computes and publishes some statistics, and the data are not used further. Privacy concerns may affect the precise answers released by the curator, or even the set of statistics released. Note that since the data will never be used again the curator can destroy the data (and himself) once the statistics have been published. In the interactive setting the curator sits between the users and the database. Queries posed by the users, and/or the responses to these queries, may be modified by the curator in order to protect the privacy of the respondents. The data cannot be destroyed, and the curator must remain present throughout the lifetime of the database. There is a rich literature on this problem, principally from the satistics community

