Results 1 - 10
of
171
Resisting Structural Re-identification in Anonymized Social Networks
, 2008
"... We identify privacy risks associated with releasing network data sets and provide an algorithm that mitigates those risks. A network consists of entities connected by links representing relations such as friendship, communication, or shared activity. Maintaining privacy when publishing networked dat ..."
Abstract
-
Cited by 38 (7 self)
- Add to MetaCart
We identify privacy risks associated with releasing network data sets and provide an algorithm that mitigates those risks. A network consists of entities connected by links representing relations such as friendship, communication, or shared activity. Maintaining privacy when publishing networked data is uniquely challenging because an individual’s network context can be used to identify them even if other identifying information is removed. In this paper, we quantify the privacy risks associated with three classes of attacks on the privacy of individuals in networks, based on the knowledge used by the adversary. We show that the risks of these attacks vary greatly based on network structure and size. We propose a novel approach to anonymizing network data that models aggregate network structure and then allows samples to be drawn from that model. The approach guarantees anonymity for network entities while preserving the ability to estimate a wide variety of network measures with relatively little bias.
Exponentiated gradient algorithms for conditional random fields and maxmargin Markov networks
, 2008
"... Log-linear and maximum-margin models are two commonly-used methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large dat ..."
Abstract
-
Cited by 35 (1 self)
- Add to MetaCart
Log-linear and maximum-margin models are two commonly-used methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large data sets. This paper describes exponentiated gradient (EG) algorithms for training such models, where EG updates are applied to the convex dual of either the log-linear or maxmargin objective function; the dual in both the log-linear and max-margin cases corresponds to minimizing a convex function with simplex constraints. We study both batch and online variants of the algorithm, and provide rates of convergence for both cases. In the max-margin case, O ( 1 ε) EG updates are required to reach a given accuracy ε in the dual; in contrast, for log-linear models only O(log (1/ε)) updates are required. For both the max-margin and log-linear cases, our bounds suggest that the online EG algorithm requires a factor of n less computation to reach a desired accuracy than the batch EG algorithm, where n is the number of training examples. Our experiments confirm that the online algorithms are much faster than the batch algorithms in practice. We describe how the EG updates factor in a convenient way for structured prediction problems, allowing the algorithms to be
Using Rank Propagation and Probabilistic Counting for Link-Based Spam Detection
- In Proceedings of the Workshop on Web Mining and Web Usage Analysis (WebKDD
, 2006
"... This paper describes a technique for automating the detection of Web link spam, that is, groups of pages that are linked together with the sole purpose of obtaining an undeservedly high score in search engines. The problem of Web spam is widespread and di#cult to solve, mostly due to the large size ..."
Abstract
-
Cited by 26 (12 self)
- Add to MetaCart
This paper describes a technique for automating the detection of Web link spam, that is, groups of pages that are linked together with the sole purpose of obtaining an undeservedly high score in search engines. The problem of Web spam is widespread and di#cult to solve, mostly due to the large size of web collections that makes many algorithms unfeasible in practice.
Beyond Bloom Filters: From Approximate Membership Checks to Approximate State Machines
- SIGCOMM '06
, 2006
"... Many networking applications require fast state lookups in a concurrent state machine, which tracks the state of a large number of flows simultaneously. We consider the question of how to compactly represent such concurrent state machines. To achieve compactness, we consider data structures for Appr ..."
Abstract
-
Cited by 24 (4 self)
- Add to MetaCart
Many networking applications require fast state lookups in a concurrent state machine, which tracks the state of a large number of flows simultaneously. We consider the question of how to compactly represent such concurrent state machines. To achieve compactness, we consider data structures for Approximate Concurrent State Machines (ACSMs) that can return false positives, false negatives, or a “don’t know” response. We describe three techniques based on Bloom filters and hashing, and evaluate them using both theoretical analysis and simulation. Our analysis leads us to an extremely efficient hashing-based scheme with several parameters that can be chosen to trade off space, computation, and the impact of errors. Our hashing approach also yields a simple alternative structure with the same functionality as a counting Bloom filter that uses much less space. We show how ACSMs can be used for video congestion control. Using an ACSM, a router can implement sophisticated Active Queue Management (AQM) techniques for video traffic (without the need for standards changes to mark packets or change video formats), with a factor of four reduction in memory compared to full-state schemes and with very little error. We also show that ACSMs show promise for real-time detection of P2P traffic.
Less hashing, same performance: Building a better bloom filter
- In Proc. the 14th Annual European Symposium on Algorithms (ESA 2006
, 2006
"... ABSTRACT: A standard technique from the hashing literature is to use two hash functions h1(x) and h2(x) to simulate additional hash functions of the form gi(x) = h1(x) + ih2(x). We demonstrate that this technique can be usefully applied to Bloom filters and related data structures. Specifically, on ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
ABSTRACT: A standard technique from the hashing literature is to use two hash functions h1(x) and h2(x) to simulate additional hash functions of the form gi(x) = h1(x) + ih2(x). We demonstrate that this technique can be usefully applied to Bloom filters and related data structures. Specifically, only two hash functions are necessary to effectively implement a Bloom filter without any loss in the asymptotic false positive probability. This leads to less computation and potentially less need for
Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes
"... It may not be feasible for sensor networks monitoring nature and inaccessible geographical regions to include powered sinks with Internet connections. We consider the scenario where sinks are not present in large-scale sensor networks, and unreliable sensors have to collectively resort to storing s ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
It may not be feasible for sensor networks monitoring nature and inaccessible geographical regions to include powered sinks with Internet connections. We consider the scenario where sinks are not present in large-scale sensor networks, and unreliable sensors have to collectively resort to storing sensed data over time on themselves. At a time of convenience, such cached data from a small subset of live sensors may be collected by a centralized (possibly mobile) collector. In this paper, we propose a decentralized algorithm using fountain codes to guarantee the persistence and reliability of cached data on unreliable sensors. With fountain codes, the collector is able to recover all data as long as a sufficient number of sensors are alive. We use random walks to disseminate data from a sensor to a random subset of sensors in the network. Our algorithms take advantage of the low decoding complexity of fountain codes, as well as the scalability of the dissemination process via random walks. We have proposed two algorithms based on random walks. Our theoretical analysis and simulation-based studies have shown that, the first algorithm maintains the same level of fault tolerance as the original centralized fountain code, while introducing lower overhead than naive random-walk based implementation in the dissemination process. Our second algorithm has lower level of fault tolerance than the original centralized fountain code, but consumes much lower dissemination cost.
An Improved Construction for Counting Bloom Filters
- 14th Annual European Symposium on Algorithms, LNCS 4168
, 2006
"... Abstract. A counting Bloom filter (CBF) generalizes a Bloom filter data structure so as to allow membership queries on a set that can be changing dynamically via insertions and deletions. As with a Bloom filter, a CBF obtains space savings by allowing false positives. We provide a simple hashing-bas ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
Abstract. A counting Bloom filter (CBF) generalizes a Bloom filter data structure so as to allow membership queries on a set that can be changing dynamically via insertions and deletions. As with a Bloom filter, a CBF obtains space savings by allowing false positives. We provide a simple hashing-based alternative based on d-left hashing called a d-left CBF (dlCBF). The dlCBF offers the same functionality as a CBF, but uses less space, generally saving a factor of two or more. We describe the construction of dlCBFs, provide an analysis, and demonstrate their effectiveness experimentally. 1
Submodular Function Minimization under Covering Constraints
, 2009
"... This paper addresses the problems of minimizing nonnegative submodular functions under covering constraints, which generalize the vertex cover, edge cover, and set cover problems. We give approximation algorithms for these problems exploiting the discrete convexity of submodular functions. We first ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
This paper addresses the problems of minimizing nonnegative submodular functions under covering constraints, which generalize the vertex cover, edge cover, and set cover problems. We give approximation algorithms for these problems exploiting the discrete convexity of submodular functions. We first present a rounding 2-approximation algorithm for the submodular vertex cover problem based on the half-integrality of the continuous relaxation problem, and show that the rounding algorithm can be performed by one application of submodular function minimization on a ring family. We also show that a rounding algorithm and a primal-dual algorithm for the submodular cost set cover problem are both constant factor approximation algorithms if the maximum frequency is fixed. In addition, we give an essentially tight lower bound on the approximability of the submodular edge cover problem.
Efficient Broadcasting using Network Coding
, 2008
"... We consider the problem of broadcasting in an adhoc wireless network, where all nodes of the network are sources that want to transmit information to all other nodes. Our figure of merit is energy efficiency, a critical design parameter for wireless networks since it directly affects battery life an ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
We consider the problem of broadcasting in an adhoc wireless network, where all nodes of the network are sources that want to transmit information to all other nodes. Our figure of merit is energy efficiency, a critical design parameter for wireless networks since it directly affects battery life and thus network lifetime. We prove that applying ideas from network coding allows to realize significant benefits in terms of energy efficiency for the problem of broadcasting, and propose very simple algorithms that allow to realize these benefits in practice. In particular, our theoretical analysis shows that network coding improves performance by a constant factor in fixed networks. We calculate this factor exactly for some canonical configurations. We then show that in networks where the topology dynamically changes, for example due to mobility, and where operations are restricted to simple distributed algorithms, network coding can offer improvements of a factor of log n, where n is the number of nodes in the network. We use the insights gained from the theoretical analysis to propose low-complexity distributed algorithms for realistic wireless ad-hoc scenarios, discuss a number of practical considerations, and evaluate our algorithms through packet level simulation.

