Results 1 - 10
of
37
Anonymizing Social Networks
- VLDB 2008
, 2008
"... Advances in technology have made it possible to collect data about individuals and the connections between them, such as email correspondence and friendships. Agencies and researchers who have collected such social network data often have a compelling interest in allowing others to analyze the data. ..."
Abstract
-
Cited by 31 (3 self)
- Add to MetaCart
Advances in technology have made it possible to collect data about individuals and the connections between them, such as email correspondence and friendships. Agencies and researchers who have collected such social network data often have a compelling interest in allowing others to analyze the data. However, in many cases the data describes relationships that are private (e.g., email correspondence) and sharing the data in full can result in unacceptable disclosures. In this paper, we present a framework for assessing the privacy risk of sharing anonymized network data. This includes a model of adversary knowledge, for which we consider several variants and make connections to known graph theoretical results. On several real-world social networks, we show that simple anonymization techniques are inadequate, resulting in substantial breaches of privacy for even modestly informed adversaries. We propose a novel anonymization technique based on perturbing the network and demonstrate empirically that it leads to substantial reduction of the privacy threat. We also analyze the effect that anonymizing the network has on the utility of the data for social network analysis.
Relationship privacy: Output perturbation for queries with joins
- In ACM Symposium on Principles of Database Systems, 2009. [13] Yossi
"... We study privacy-preserving query answering over data containing relationships. A social network is a prime example of such data, where the nodes represent individuals and edges represent relationships. Nearly all interesting queries over social networks involve joins, and for such queries, existing ..."
Abstract
-
Cited by 25 (7 self)
- Add to MetaCart
We study privacy-preserving query answering over data containing relationships. A social network is a prime example of such data, where the nodes represent individuals and edges represent relationships. Nearly all interesting queries over social networks involve joins, and for such queries, existing output perturbation algorithms severely distort query answers. We propose an algorithm that significantly improves utility over competing techniques, typically reducing the error bound from polynomial in the number of nodes to polylogarithmic. The algorithm is, to the best of our knowledge, the first to answer such queries with acceptable accuracy, even for worst-case inputs. The improved utility is achieved by relaxing the privacy condition. Instead of ensuring strict differential privacy, we guarantee a weaker (but still quite practical) condition based on adversarial privacy. To explain precisely the nature of our relaxation in privacy, we provide a new result that characterizes the relationship between ǫ-indistinguishability (a variant of the differential privacy definition) and adversarial privacy, which is of independent interest: an algorithm is ǫ-indistinguishable iff it is private for a particular class of adversaries (defined precisely herein). Our perturbation algorithm guarantees privacy against adversaries in this class whose prior distribution is numerically bounded.
Attacks on privacy and de finetti’s theorem
- In SIGMOD
, 2009
"... In this paper we present a method for reasoning about privacy using the concepts of exchangeability and deFinetti’s theorem. We illustrate the usefulness of this technique by using it to attack a popular data sanitization scheme known as Anatomy. We stress that Anatomy is not the only sanitization s ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
In this paper we present a method for reasoning about privacy using the concepts of exchangeability and deFinetti’s theorem. We illustrate the usefulness of this technique by using it to attack a popular data sanitization scheme known as Anatomy. We stress that Anatomy is not the only sanitization scheme that is vulnerable to this attack. In fact, any scheme that uses the random worlds model, i.i.d. model, or tuple-independent model needs to be re-evaluated. The difference between the attack presented here and others that have been proposed in the past is that we do not need extensive background knowledge. An attacker only needs to know the nonsensitive attributes of one individual in the data, and can carry out this attack just by building a machine learning model over the sanitized data. The reason this attack is successful is that it exploits a subtle flaw in the way prior work computed the probability of disclosure of a sensitive attribute. We demonstrate this theoretically, empirically, and with intuitive examples. We also discuss how this generalizes to many other privacy schemes.
Boosting the accuracy of differentially-private histograms through consistency
- In Proceedings of the VLDB
, 2010
"... Recent differentially private query mechanisms offer strong privacy guarantees by adding noise to the query answer. For a single counting query, the technique is simple, accurate, and provides optimal utility. However, analysts typically wish to ask multiple queries. In this case, the optimal strate ..."
Abstract
-
Cited by 19 (10 self)
- Add to MetaCart
Recent differentially private query mechanisms offer strong privacy guarantees by adding noise to the query answer. For a single counting query, the technique is simple, accurate, and provides optimal utility. However, analysts typically wish to ask multiple queries. In this case, the optimal strategy is not apparent, and alternative query strategies can involve difficult trade-offs in accuracy, and may produce inconsistent answers. In this work we show that it is possible to significantly improve accuracy for a general class of histogram queries. Our approach carefully chooses a set of queries to evaluate, and then exploits consistency constraints that should hold over the noisy output. In a post-processing phase, we compute the consistent input most likely to have produced the noisy output. The final output is both private and consistent, but in addition, it is often much more accurate. We apply our techniques to real datasets and show they can be used for estimating the degree sequence of a graph with extreme precision, and for computing a histogram that can support arbitrary range queries accurately. 1.
Privacy wizards for social networking sites
- in WWW ’10: Proceedings of the 19th International World Wide Web Conference
, 2010
"... Privacy is an enormous problem in online social networking sites. While sites such as Facebook allow users fine-grained control over who can see their profiles, it is difficult for average users to specify this kind of detailed policy. In this paper, we propose a template for the design of a social ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
Privacy is an enormous problem in online social networking sites. While sites such as Facebook allow users fine-grained control over who can see their profiles, it is difficult for average users to specify this kind of detailed policy. In this paper, we propose a template for the design of a social networking privacy wizard. The intuition for the design comes from the observation that real users conceive their privacy preferences (which friends should be able to see which information) based on an implicit set of rules. Thus, with a limited amount of user input, it is usually possible to build a machine learning model that concisely describes a particular user’s preferences, and then use this model to configure the user’s privacy settings automatically. As an instance of this general framework, we have built a wizard based on an active learning paradigm called uncertainty sampling. The wizard iteratively asks the user to assign privacy “labels ” to selected (“informative”) friends, and it uses this input to construct a classifier, which can in turn be used to automatically assign privileges to the rest of the user’s (unlabeled) friends. To evaluate our approach, we collected detailed privacy preference data from 45 real Facebook users. Our study revealed two important things. First, real users tend to conceive their privacy preferences in terms of communities, which can easily be extracted from a social network graph using existing techniques. Second, our active learning wizard, using communities as features, is able to recommend high-accuracy privacy settings using less user input than existing policy-specification tools.
Accurate Estimation of the Degree Distribution of Private Networks
"... Abstract—We describe an efficient algorithm for releasing a provably private estimate of the degree distribution of a network. The algorithm satisfies a rigorous property of differential privacy, and is also extremely efficient, running on networks of 100 million nodes in a few seconds. Theoretical ..."
Abstract
-
Cited by 14 (6 self)
- Add to MetaCart
Abstract—We describe an efficient algorithm for releasing a provably private estimate of the degree distribution of a network. The algorithm satisfies a rigorous property of differential privacy, and is also extremely efficient, running on networks of 100 million nodes in a few seconds. Theoretical analysis shows that the error scales linearly with the number of unique degrees, whereas the error of conventional techniques scales linearly with the number of nodes. We complement the theoretical analysis with a thorough empirical analysis on real and synthetic graphs, showing that the algorithm’s variance and bias is low, that the error diminishes as the size of the input graph increases, and that common analyses like fitting a power-law can be carried out very accurately. Keywords-privacy; social networks; privacy-preserving data mining; differential privacy. I.
K-Automorphism: A general framework for privacy preserving network publication
- In VLDB
, 2009
"... The growing popularity of social networks has generated interesting data management and data mining problems. An important concern in the release of these data for study is their privacy, since social networks usually contain personal information. Simply removing all identifiable personal informatio ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
The growing popularity of social networks has generated interesting data management and data mining problems. An important concern in the release of these data for study is their privacy, since social networks usually contain personal information. Simply removing all identifiable personal information (such as names and social security number) before releasing the data is insufficient. It is easy for an attacker to identify the target by performing different structural queries. In this paper we propose k-automorphism to protect against multiple structural attacks and develop an algorithm (called KM) that ensures k-automorphism. We also discuss an extension of KM to handle “dynamic ” releases of the data. Extensive experiments show that the algorithm performs well in terms of protection it provides. 1.
A Framework for Computing the Privacy Scores of Users in Online Social Networks
"... Abstract—A large body of work has been devoted to address corporate-scale privacy concerns related to social networks. The main focus was on how to share social networks owned by organizations without revealing the identities or sensitive relationships of the users involved. Not much attention has b ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Abstract—A large body of work has been devoted to address corporate-scale privacy concerns related to social networks. The main focus was on how to share social networks owned by organizations without revealing the identities or sensitive relationships of the users involved. Not much attention has been given to the privacy risk of users posed by their informationsharing activities. In this paper, we approach the privacy concerns arising in online social networks from the individual users ’ viewpoint: we propose a framework to compute a privacy score of a user, which indicates the potential privacy risk caused by his participation in the network. Our definition of privacy score satisfies the following intuitive properties: the more sensitive the information revealed by a user, the higher his privacy risk. Also, the more visible the disclosed information becomes in the network, the higher the privacy risk. We develop mathematical models to estimate both sensitivity and visibility of the information. We apply our methods to synthetic and real-world data and demonstrate their efficacy and practical utility. Keywords-privacy score; social network; item response theory I.
Graph generation with prescribed feature constraints
- In Proc. of the 9th SIAM Conference on Data Mining
, 2009
"... In this paper, we study the problem of how to generate synthetic graphs matching various properties of a real social network with two applications, privacy preserving social network publishing and significance testing of network analysis results. We present a simple switching based graph generation ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
In this paper, we study the problem of how to generate synthetic graphs matching various properties of a real social network with two applications, privacy preserving social network publishing and significance testing of network analysis results. We present a simple switching based graph generation approach to generate graphs preserving features of a real graph. We then investigate potential disclosures of sensitive links due to the preserved features. Our algorithms on graph generation with feature range and feature distribution constraints are based on the Metropolis-Hastings sampling. This is of importance for significance testing of network analysis results. 1
On link privacy in randomizing social networks
- In PAKDD
, 2009
"... Abstract. Many applications of social networks require relationship anonymity due to the sensitive, stigmatizing, or confidential nature of relationship. Recent work showed that the simple technique of anonymizing graphs by replacing the identifying information of the nodes with random ids does not ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Abstract. Many applications of social networks require relationship anonymity due to the sensitive, stigmatizing, or confidential nature of relationship. Recent work showed that the simple technique of anonymizing graphs by replacing the identifying information of the nodes with random ids does not guarantee privacy since the identification of the nodes can be seriously jeopardized by applying subgraph queries. In this paper, we investigate how well an edge based graph randomization approach can protect sensitive links. We show via theoretical studies and empirical evaluations that various similarity measures can be exploited by attackers to significantly improve their confidence and accuracy of predicted sensitive links between nodes with high similarity values. 1

