Results 1  10
of
939
Bayesian Data Analysis
, 1995
"... I actually own a copy of Harold Jeffreys’s Theory of Probability but have only read small bits of it, most recently over a decade ago to confirm that, indeed, Jeffreys was not too proud to use a classical chisquared pvalue when he wanted to check the misfit of a model to data (Gelman, Meng and Ste ..."
Abstract

Cited by 1230 (47 self)
 Add to MetaCart
I actually own a copy of Harold Jeffreys’s Theory of Probability but have only read small bits of it, most recently over a decade ago to confirm that, indeed, Jeffreys was not too proud to use a classical chisquared pvalue when he wanted to check the misfit of a model to data (Gelman, Meng and Stern, 2006). I do, however, feel that it is important to understand where our probability models come from, and I welcome the opportunity to use the present article by Robert, Chopin and Rousseau as a platform for further discussion of foundational issues. 2 In this brief discussion I will argue the following: (1) in thinking about prior distributions, we should go beyond Jeffreys’s principles and move toward weakly informative priors; (2) it is natural for those of us who work in social and computational sciences to favor complex models, contra Jeffreys’s preference for simplicity; and (3) a key generalization of Jeffreys’s ideas is to explicitly include model checking in the process of data analysis.
Powerlaw distributions in empirical data
 ISSN 00361445. doi: 10.1137/ 070710111. URL http://dx.doi.org/10.1137/070710111
, 2009
"... Powerlaw distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and manmade phenomena. Unfortunately, the empirical detection and characterization of power laws is made difficult by the large fluctuations that occur in the t ..."
Abstract

Cited by 199 (3 self)
 Add to MetaCart
Powerlaw distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and manmade phenomena. Unfortunately, the empirical detection and characterization of power laws is made difficult by the large fluctuations that occur in the tail of the distribution. In particular, standard methods such as leastsquares fitting are known to produce systematically biased estimates of parameters for powerlaw distributions and should not be used in most circumstances. Here we describe statistical techniques for making accurate parameter estimates for powerlaw data, based on maximum likelihood methods and the KolmogorovSmirnov statistic. We also show how to tell whether the data follow a powerlaw distribution at all, defining quantitative measures that indicate when the power law is a reasonable fit to the data and when it is not. We demonstrate these methods by applying them to twentyfour realworld data sets from a range of different disciplines. Each of the data sets has been conjectured previously to follow a powerlaw distribution. In some cases we find these conjectures to be consistent with the data while in others the power law is ruled out.
Nearoptimal sensor placements in gaussian processes
 In ICML
, 2005
"... When monitoring spatial phenomena, which can often be modeled as Gaussian processes (GPs), choosing sensor locations is a fundamental task. There are several common strategies to address this task, for example, geometry or disk models, placing sensors at the points of highest entropy (variance) in t ..."
Abstract

Cited by 174 (27 self)
 Add to MetaCart
When monitoring spatial phenomena, which can often be modeled as Gaussian processes (GPs), choosing sensor locations is a fundamental task. There are several common strategies to address this task, for example, geometry or disk models, placing sensors at the points of highest entropy (variance) in the GP model, and A, D, or Eoptimal design. In this paper, we tackle the combinatorial optimization problem of maximizing the mutual information between the chosen locations and the locations which are not selected. We prove that the problem of finding the configuration that maximizes mutual information is NPcomplete. To address this issue, we describe a polynomialtime approximation that is within (1 − 1/e) of the optimum by exploiting the submodularity of mutual information. We also show how submodularity can be used to obtain online bounds, and design branch and bound search procedures. We then extend our algorithm to exploit lazy evaluations and local structure in the GP, yielding significant speedups. We also extend our approach to find placements which are robust against node failures and uncertainties in the model. These extensions are again associated with rigorous theoretical approximation guarantees, exploiting the submodularity of the objective function. We demonstrate the advantages of our approach towards optimizing mutual information in a very extensive empirical study on two realworld data sets.
Practical privacy: the sulq framework
 In PODS ’05: Proceedings of the twentyfourth ACM SIGMODSIGACTSIGART symposium on Principles of database systems
, 2005
"... We consider a statistical database in which a trusted administrator introduces noise to the query responses with the goal of maintaining privacy of individual database entries. In such a database, a query consists of a pair (S, f) where S is a set of rows in the database and f is a function mapping ..."
Abstract

Cited by 158 (34 self)
 Add to MetaCart
We consider a statistical database in which a trusted administrator introduces noise to the query responses with the goal of maintaining privacy of individual database entries. In such a database, a query consists of a pair (S, f) where S is a set of rows in the database and f is a function mapping database rows to {0, 1}. The true answer is P i∈S f(di), and a noisy version is released as the response to the query. Results of Dinur, Dwork, and Nissim show that a strong form of privacy can be maintained using a surprisingly small amount of noise – much less than the sampling error – provided the total number of queries is sublinear in the number of database rows. We call this query and (slightly) noisy reply the SuLQ (SubLinear Queries) primitive. The assumption of sublinearity becomes reasonable as databases grow increasingly large. We extend this work in two ways. First, we modify the privacy analysis to realvalued functions f and arbitrary row types, as a consequence greatly improving the bounds on noise required for privacy. Second, we examine the computational power of the SuLQ primitive. We show that it is very powerful indeed, in that slightly noisy versions of the following computations can be carried out with very few invocations of the primitive: principal component analysis, k means clustering, the Perceptron Algorithm, the ID3 algorithm, and (apparently!) all algorithms that operate in the in the statistical query learning model [11].
Towards highly reliable enterprise network services via inference of multilevel dependencies
 IN SIGCOMM
, 2007
"... Localizing the sources of performance problems in large enterprise networks is extremely challenging. Dependencies are numerous, complex and inherently multilevel, spanning hardware and software components across the network and the computing infrastructure. To exploit these dependencies for fast, ..."
Abstract

Cited by 112 (10 self)
 Add to MetaCart
Localizing the sources of performance problems in large enterprise networks is extremely challenging. Dependencies are numerous, complex and inherently multilevel, spanning hardware and software components across the network and the computing infrastructure. To exploit these dependencies for fast, accurate problem localization, we introduce an Inference Graph model, which is welladapted to userperceptible problems rooted in conditions giving rise to both partial service degradation and hard faults. Further, we introduce the Sherlock system to discover Inference Graphs in the operational enterprise, infer critical attributes, and then leverage the result to automatically detect and localize problems. To illuminate strengths and limitations of the approach, we provide results from a prototype deployment in a large enterprise network, as well as from testbed emulations and simulations. In particular, we find that taking into account multilevel structure leads to a 30 % improvement in fault localization, as compared to twolevel approaches.
Gaussian process dynamical models for human motion
 IEEE Trans. Pattern Anal. Machine Intell
, 2007
"... Abstract—We introduce Gaussian process dynamical models (GPDMs) for nonlinear time series analysis, with applications to learning models of human pose and motion from highdimensional motion capture data. A GPDM is a latent variable model. It comprises a lowdimensional latent space with associated d ..."
Abstract

Cited by 86 (4 self)
 Add to MetaCart
Abstract—We introduce Gaussian process dynamical models (GPDMs) for nonlinear time series analysis, with applications to learning models of human pose and motion from highdimensional motion capture data. A GPDM is a latent variable model. It comprises a lowdimensional latent space with associated dynamics, as well as a map from the latent space to an observation space. We marginalize out the model parameters in closed form by using Gaussian process priors for both the dynamical and the observation mappings. This results in a nonparametric model for dynamical systems that accounts for uncertainty in the model. We demonstrate the approach and compare four learning algorithms on human motion capture data, in which each pose is 50dimensional. Despite the use of small data sets, the GPDM learns an effective representation of the nonlinear dynamics in these spaces. Index Terms—Machine learning, motion, tracking, animation, stochastic processes, time series analysis. 1
Gaussian process dynamical models
 In NIPS
, 2006
"... This paper introduces Gaussian Process Dynamical Models (GPDM) for nonlinear time series analysis. A GPDM comprises a lowdimensional latent space with associated dynamics, and a map from the latent space to an observation space. We marginalize out the model parameters in closedform, using Gaussian ..."
Abstract

Cited by 84 (7 self)
 Add to MetaCart
This paper introduces Gaussian Process Dynamical Models (GPDM) for nonlinear time series analysis. A GPDM comprises a lowdimensional latent space with associated dynamics, and a map from the latent space to an observation space. We marginalize out the model parameters in closedform, using Gaussian Process (GP) priors for both the dynamics and the observation mappings. This results in a nonparametric model for dynamical systems that accounts for uncertainty in the model. We demonstrate the approach on human motion capture data in which each pose is 62dimensional. Despite the use of small data sets, the GPDM learns an effective representation of the nonlinear dynamics in these spaces. Webpage:
Comparing Dynamic Causal Models
 NEUROIMAGE
, 2004
"... This article describes the use of Bayes factors for comparing Dynamic Causal Models (DCMs). DCMs are used to make inferences about effective connectivity from functional Magnetic Resonance Imaging (fMRI) data. These inferences, however, are contingent upon assumptions about model structure, that is, ..."
Abstract

Cited by 81 (34 self)
 Add to MetaCart
This article describes the use of Bayes factors for comparing Dynamic Causal Models (DCMs). DCMs are used to make inferences about effective connectivity from functional Magnetic Resonance Imaging (fMRI) data. These inferences, however, are contingent upon assumptions about model structure, that is, the connectivity pattern between the regions included in the model. Given the current lack of detailed knowledge on anatomical connectivity in the human brain, there are often considerable degrees of freedom when defining the connectional structure of DCMs. In addition, many plausible scientific hypotheses may exist about which connections are changed by experimental manipulation, and a formal procedure for directly comparing these competing hypotheses is highly desirable. In this article, we show how Bayes factors can be used to guide choices about model structure, both with regard to the intrinsic connectivity pattern and the contextual modulation of individual connections. The combined use of Bayes factors and DCM thus allows one to evaluate competing scientific theories about the architecture of largescale neural networks and the neuronal interactions that mediate perception and cognition.
Banburismus and the brain: Decoding the relationship between sensory stimuli, decisions, and reward,” Neuron vol 36
, 2002
"... named after the town of Banbury where they printed special sheets of paper that allowed them to carry out the computations that were central to this process. In the second section, we apply this framework to simple, twoalternative decisions on responsetime perceptual tasks. We focus on how the cri ..."
Abstract

Cited by 79 (1 self)
 Add to MetaCart
named after the town of Banbury where they printed special sheets of paper that allowed them to carry out the computations that were central to this process. In the second section, we apply this framework to simple, twoalternative decisions on responsetime perceptual tasks. We focus on how the critical components of the theory might be implemented in the brain. Central to this neural implementation is the relationship between the rate of reward and the decision rule that determines how and when to read out the sensory evidence. In the third section, we summarize recent experimental findings that support this model. This article relates a theoretical framework developed by British codebreakers in World War II to the neural Background: Banburismus at Bletchley Park computations thought to be responsible for forming To illustrate the framework developed by Turing, concategorical
A maximum entropy model of phonotactics and phonotactic learning
, 2006
"... The study of phonotactics (e.g., the ability of English speakers to distinguish possible words like blick from impossible words like *bnick) is a central topic in phonology. We propose a theory of phonotactic grammars and a learning algorithm that constructs such grammars from positive evidence. Our ..."
Abstract

Cited by 76 (13 self)
 Add to MetaCart
The study of phonotactics (e.g., the ability of English speakers to distinguish possible words like blick from impossible words like *bnick) is a central topic in phonology. We propose a theory of phonotactic grammars and a learning algorithm that constructs such grammars from positive evidence. Our grammars consist of constraints that are assigned numerical weights according to the principle of maximum entropy. Possible words are assessed by these grammars based on the weighted sum of their constraint violations. The learning algorithm yields grammars that can capture both categorical and gradient phonotactic patterns. The algorithm is not provided with any constraints in advance, but uses its own resources to form constraints and weight them. A baseline model, in which Universal Grammar is reduced to a feature set and an SPEstyle constraint format, suffices to learn many phonotactic phenomena. In order to learn nonlocal phenomena such as stress and vowel harmony, it is necessary to augment the model with autosegmental tiers and metrical grids. Our results thus offer novel, learningtheoretic support for such representations. We apply the model to English syllable onsets, Shona vowel harmony, quantityinsensitive stress typology, and the full phonotactics of Wargamay, showing that the learned grammars capture the distributional generalizations of these languages and accurately predict the findings of a phonotactic experiment.