Results 1 - 10
of
17
On exchangeable random variables and the statistics of large graphs and hypergraphs
, 2008
"... ..."
Bayesian Hierarchical Modeling
, 2000
"... Introduction This tutorial provides a very brief introduction to the formulation, tting, and checking of hierarchical or multilevel models from the Bayesian point of view. Hierarchical models (HMs) arise frequently in ve main kinds of applications: 1 HMs are common in elds such as health and educa ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
Introduction This tutorial provides a very brief introduction to the formulation, tting, and checking of hierarchical or multilevel models from the Bayesian point of view. Hierarchical models (HMs) arise frequently in ve main kinds of applications: 1 HMs are common in elds such as health and education, in which data|both outcomes and predictors|are often gathered in a nested or hierarchical fashion, e.g., patients within hospitals, or students within classrooms within schools. HMs are thus also ideally suited to the wide range of applications in government and business in which single- or multi-stage cluster samples are routinely drawn, and oer a unied approach to the analysis of random-eects (variance-components) and mixed models. 2 Introduction (continued) 2 A dierent kind
The Mondrian Process
"... We describe a novel class of distributions, called Mondrian processes, which can be interpreted as probability distributions over kd-tree data structures. Mondrian processes are multidimensional generalizations of Poisson processes and this connection allows us to construct multidimensional generali ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
We describe a novel class of distributions, called Mondrian processes, which can be interpreted as probability distributions over kd-tree data structures. Mondrian processes are multidimensional generalizations of Poisson processes and this connection allows us to construct multidimensional generalizations of the stickbreaking process described by Sethuraman (1994), recovering the Dirichlet process in one dimension. After introducing the Aldous-Hoover representation for jointly and separately exchangeable arrays, we show how the process can be used as a nonparametric prior distribution in Bayesian models of relational data. 1
Data Publishing against Realistic Adversaries
"... Privacy in data publishing has received much attention recently. The key to defining privacy is to model knowledge of the attacker – if the attacker is assumed to know too little, the published data can be easily attacked, if the attacker is assumed to know too much, the published data has little ut ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Privacy in data publishing has received much attention recently. The key to defining privacy is to model knowledge of the attacker – if the attacker is assumed to know too little, the published data can be easily attacked, if the attacker is assumed to know too much, the published data has little utility. Previous work considered either quite ignorant adversaries or nearly omniscient adversaries. In this paper, we introduce a new class of adversaries that we call realistic adversaries who live in the unexplored space in between. Realistic adversaries have knowledge from external sources with an associated stubbornness indicating the strength of their knowledge. We then introduce a novel privacy framework called epsilon-privacy that allows us to guard against realistic adversaries. We also show that prior privacy definitions are instantiations of our framework. In a thorough experimental study with real census data we show that e-privacy allows us to publish data with high utility while defending against strong adversaries. 1.
Hierarchical Probabilistic Models for Group Anomaly Detection
"... Statistical anomaly detection typically focuses on finding individual point anomalies. Often the most interesting or unusual things in a data set are not odd individual points, but rather larger scale phenomena that only become apparent when groups of points are considered. In this paper, we propose ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Statistical anomaly detection typically focuses on finding individual point anomalies. Often the most interesting or unusual things in a data set are not odd individual points, but rather larger scale phenomena that only become apparent when groups of points are considered. In this paper, we propose generative models for detecting such group anomalies. We evaluate our methods on synthetic data as well as astronomical data from the Sloan Digital Sky Survey. The empirical results show that the proposed models are effective in detecting group anomalies. 1
2006] ”The origin of infinitely divisible distributions: from de Finettis problem to Lévy– Khintchine formula
- Mathematical Methods in Economics and Finance
"... Lévy-Khintchine formula ..."
Prediction Intervals for Class Probabilities
, 2007
"... Prediction intervals for class probabilities are of interest in machine learning because they can quantify the uncertainty about the class probability estimate for a test instance. The idea is that all likely class probability values of the test instance are included, with a pre-specified confidence ..."
Abstract
- Add to MetaCart
Prediction intervals for class probabilities are of interest in machine learning because they can quantify the uncertainty about the class probability estimate for a test instance. The idea is that all likely class probability values of the test instance are included, with a pre-specified confidence level, in the calculated prediction interval. This thesis proposes a probabilistic model for calculating such prediction intervals. Given the unobservability of class probabilities, a Bayesian approach is employed to derive a complete distribution of the class probability of a test instance based on a set of class observations of training instances in the neighbourhood of the test instance. A random decision tree ensemble learning algorithm is also proposed, whose prediction output constitutes the neighbourhood that is used by the Bayesian model to produce a PI for the test instance. The Bayesian model, which is used in conjunction with the ensemble learning algorithm and the standard nearest-neighbour classifier, is evaluated on artificial datasets and modified real datasets. i Acknowledgments

