Results 1 - 10
of
30
Robust De-anonymization of Large Sparse Datasets
, 2008
"... We present a new class of statistical deanonymization attacks against high-dimensional micro-data, such as individual preferences, recommendations, transaction records and so on. Our techniques are robust to perturbation in the data and tolerate some mistakes in the adversary’s background knowledge. ..."
Abstract
-
Cited by 81 (5 self)
- Add to MetaCart
We present a new class of statistical deanonymization attacks against high-dimensional micro-data, such as individual preferences, recommendations, transaction records and so on. Our techniques are robust to perturbation in the data and tolerate some mistakes in the adversary’s background knowledge. We apply our de-anonymization methodology to the Netflix Prize dataset, which contains anonymous movie ratings of 500,000 subscribers of Netflix, the world’s largest online movie rental service. We demonstrate that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber’s record in the dataset. Using the Internet Movie Database as the source of background knowledge, we successfully identified the Netflix records of known users, uncovering their apparent political preferences and other potentially sensitive information.
Modeling Inverse Covariance Matrices by Basis Expansion
, 2003
"... This paper proposes a new covariance modeling technique for Gaussian Mixture Models. Specifically the inverse covariance (precision) matrix of each Gaussian is expanded in a rank-1 basis i.e., j = P j = k , 2 R; a k 2 R . A generalized EM algorithm is proposed to obtain maximum likelihood paramete ..."
Abstract
-
Cited by 31 (9 self)
- Add to MetaCart
This paper proposes a new covariance modeling technique for Gaussian Mixture Models. Specifically the inverse covariance (precision) matrix of each Gaussian is expanded in a rank-1 basis i.e., j = P j = k , 2 R; a k 2 R . A generalized EM algorithm is proposed to obtain maximum likelihood parameter estimates for the basis set fa k a k=1 and the expansion coefficients f g. This model, called the Extended Maximum Likelihood Linear Transform (EMLLT) model, is extremely flexible: by varying the number of basis elements from D = d to D = d(d + 1)=2 one gradually moves from a Maximum Likelihood Linear Transform (MLLT) model to a full-covariance model. Experimental results on two speech recognition tasks show that the EMLLT model can give relative gains of up to 35% in the word error rate over a standard diagonal covariance model, 30% over a standard MLLT model.
Discriminative, Generative and Imitative Learning
, 2002
"... I propose a common framework that combines three different paradigms in machine learning: generative, discriminative and imitative learning. A generative probabilistic distribution is a principled way to model many machine learning and machine perception problems. Therein, one provides domain specif ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
I propose a common framework that combines three different paradigms in machine learning: generative, discriminative and imitative learning. A generative probabilistic distribution is a principled way to model many machine learning and machine perception problems. Therein, one provides domain specific knowledge in terms of structure and parameter priors over the joint space of variables. Bayesian networks and Bayesian statistics provide a rich and flexible language for specifying this knowledge and subsequently refining it with data and observations. The final result is a distribution that is a good generator of novel exemplars.
Energy Efficient Wireless Packet Scheduling and Fair Queuing
- ACM TRANS. EMBEDDED COMPUTING SYSTEMS
, 2004
"... this paper, we present techniques for energy efficient packet scheduling and fair queuing in wireless communication systems. Our techniques are based on an extensive slack management approach that dynamically adapts the output rate of the system in accordance with the input packet arrival rate. We u ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
this paper, we present techniques for energy efficient packet scheduling and fair queuing in wireless communication systems. Our techniques are based on an extensive slack management approach that dynamically adapts the output rate of the system in accordance with the input packet arrival rate. We use a recently proposed radio power management technique, dynamic modulation scaling (DMS), as a control knob to enable energy-latency trade-offs during wireless packet transmission. We first analyze a single input stream scenario, and describe a rate adaptation technique that results in significantly lower energy consumption (reductions of up to 10), while still bounding the resulting packet delays. By appropriately setting the various parameters of our algorithm, the system can be made to traverse the energy-latencyfidelity trade-off space. We extend our techniques to a multiple input stream scenario, and present WFQ, an energy efficient version of the weighted fair queuing (WFQ) algorithm for fair packet scheduling. Simulation results show that large energy savings can be obtained through the use of E WFQ, with only a small, bounded increase in worst case packet latency. Further, our results demonstrate that E WFQ does not adversely affect the throughput allocation (and hence, fairness) of WFQ
Micro power management of active 802.11 interfaces
- in Proc. ACM/USENIX Int. Conf. Mobile Systems, Applications and Services (MobiSys
"... Wireless interfaces are major power consumers on mobile systems. Considerable research has improved the energy efficiency of elongated idle periods or created more elongated idle periods in wireless interfaces, often requiring cooperation from applications or the network infrastructure. With increas ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Wireless interfaces are major power consumers on mobile systems. Considerable research has improved the energy efficiency of elongated idle periods or created more elongated idle periods in wireless interfaces, often requiring cooperation from applications or the network infrastructure. With increasing wireless mobile data, it has become critical to improve the energy efficiency of active wireless interfaces. In this work, we present micro power management (µPM), a solution inspired by the mismatch between the high performance of state-of-the-art 802.11 interfaces and the modest data rate requirements by many popular network applications. µPM enables an 802.11 interface to enter unreachable power-saving modes even between MAC frames, without noticeable impact on the traffic flow. To control data loss, µPM leverages the retransmission mechanism in 802.11 and controls frame delay to adapt to demanded network throughput with minimal cooperation from the access point. Based on a theoretical framework, we employ simulation to systematically investigate an effective and efficient implementation of µPM. We have built a prototype µPM on an openaccess wireless hardware platform. Measurements show that more than 30 % power reduction for the wireless transceiver can be achieved with µPM for various applications without perceptible quality degradation.
Toward a usable theory of Chernoff Bounds for heterogeneous and partially dependent random variables
, 1992
"... Let X be a sum of real valued random variables and have a bounded mean E[X]. The generic Chernoff-Hoeffding estimate for large deviations of X is: P rfX \GammaE[X ] ag min 0 e \Gamma(a+E[X]) E[e X ], which applies with a 0 to random variables with very small tails. At issue is how to use this ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Let X be a sum of real valued random variables and have a bounded mean E[X]. The generic Chernoff-Hoeffding estimate for large deviations of X is: P rfX \GammaE[X ] ag min 0 e \Gamma(a+E[X]) E[e X ], which applies with a 0 to random variables with very small tails. At issue is how to use this method to attain sharp and useful estimates. We present a number of Chernoff-Hoeffding bounds for sums of random variables that may have a variety of dependent relationships and that may be heterogeneously distributed. AMS classifications 60F10, Large deviations, 68Q25 Analysis of algorithms, 62E17, Approximations to distributions (nonasymptotic), 60E15, Inequalities. Key words: Hoeffding bounds, Chernoff bounds, dependent random variables, Bernoulli trials. This research was supported, in part, by grants NSF-CCR-8902221, NSF-CCR-8906949, and NSF-CCR-9204202. 1 Summary In the analysis of probabilistic algorithms, some of the following problems may arise, possibly in complex combinations....
Nonextensive Information Theoretic Kernels on Measures
, 2009
"... Positive definite kernels on probability measures have been recently applied to classification problems involving text, images, and other types of structured data. Some of these kernels are related to classic information theoretic quantities, such as (Shannon’s) mutual information and the Jensen-Sha ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Positive definite kernels on probability measures have been recently applied to classification problems involving text, images, and other types of structured data. Some of these kernels are related to classic information theoretic quantities, such as (Shannon’s) mutual information and the Jensen-Shannon (JS) divergence. Meanwhile, there have been recent advances in nonextensive generalizations of Shannon’s information theory. This paper bridges these two trends by introducing nonextensive information theoretic kernels on probability measures, based on new JS-type divergences. These new divergences result from extending the the two building blocks of the classical JS divergence: convexity and Shannon’s entropy. The notion of convexity is extended to the wider concept of q-convexity, for which we prove a Jensen q-inequality. Based on this inequality, we introduce Jensen-Tsallis (JT) q-differences, a nonextensive generalization of the JS divergence, and define a k-th order JT q-difference between stochastic processes. We then define a new family of nonextensive mutual information kernels, which allow weights to be assigned to their arguments, and which includes the Boolean, JS, and linear kernels as particular cases. Nonextensive string kernels are also defined that generalize the p-spectrum kernel. We illustrate the performance of
Worst Case Reliability Prediction Based on a Prior Estimate of Residual Defects
- Thirteenth International Symposium on Software Reliability Engineering (ISSRE '02
, 2002
"... In this paper we extend an earlier worst case bound reliability theory to derive a worst case reliability function R(t), which gives the worst case probability of surviving a further time t given an estimate of residual defects in the software N and a prior test time T. ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
In this paper we extend an earlier worst case bound reliability theory to derive a worst case reliability function R(t), which gives the worst case probability of surviving a further time t given an estimate of residual defects in the software N and a prior test time T.
Refinement Criteria Based on f-Divergences
- EUROGRAPHICS SYMPOSIUM ON RENDERING 2003 PER CHRISTENSEN AND DANIEL COHEN-OR (EDITORS)
, 2003
"... In several domains a refinement criterion is often needed to decide whether to go on or to stop sampling a signal. When the sampled values are homogeneous enough, we assume that they represent the signal fairly well and we do not need further refinement, otherwise more samples are required, possibly ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In several domains a refinement criterion is often needed to decide whether to go on or to stop sampling a signal. When the sampled values are homogeneous enough, we assume that they represent the signal fairly well and we do not need further refinement, otherwise more samples are required, possibly with adaptive subdivision of the domain. For this purpose, a criterion which is very sensitive to variability is necessary. In this paper we present a family of discrimination measures, the f-divergences, meeting this requirement. These functions have been well studied and successfully applied to image processing and several areas of engineering. Two applications to global illumination are shown: oracles for hierarchical radiosity and criteria for adaptive refinement in ray-tracing. We obtain significantly better results than with classic criteria, showing that f-divergences are worth further investigation in computer graphics.

