Results 11  20
of
136
Tweedie’s Formula and Selection Bias
"... We suppose that the statistician observes some large number of estimates zi, each with its own unobserved expectation parameter µi. The largest few of the zi’s are likely to substantially overestimate their corresponding µi’s, this being an example of selection bias, or regression to the mean. Tweed ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
We suppose that the statistician observes some large number of estimates zi, each with its own unobserved expectation parameter µi. The largest few of the zi’s are likely to substantially overestimate their corresponding µi’s, this being an example of selection bias, or regression to the mean. Tweedie’s formula, first reported by Robbins in 1956, offers a simple empirical Bayes approach for correcting selection bias. This paper investigates its merits and limitations. In addition to the methodology, Tweedie’s formula raises more general questions concerning empirical Bayes theory, discussed here as “relevance ” and “empirical Bayes information. ” There is a close connection between applications of the formula and James–Stein estimation. Keywords: Bayesian relevance, empirical Bayes information, James–Stein, false discovery rates, regret, winner’s curse
Receptive field inference with localized priors
 PLoS Comput Biol
"... The linear receptive field describes a mapping from sensory stimuli to a onedimensional variable governing a neuron’s spike response. However, traditional receptive field estimators such as the spiketriggered average converge slowly and often require large amounts of data. Bayesian methods seek to ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
The linear receptive field describes a mapping from sensory stimuli to a onedimensional variable governing a neuron’s spike response. However, traditional receptive field estimators such as the spiketriggered average converge slowly and often require large amounts of data. Bayesian methods seek to overcome this problem by biasing estimates towards solutions that are more likely a priori, typically those with small, smooth, or sparse coefficients. Here we introduce a novel Bayesian receptive field estimator designed to incorporate locality, a powerful form of prior information about receptive field structure. The key to our approach is a hierarchical receptive field model that flexibly adapts to localized structure in both spacetime and spatiotemporal frequency, using an inference method known as empirical Bayes. We refer to our method as automatic locality determination (ALD), and show that it can accurately recover various types of smooth, sparse, and localized receptive fields. We apply ALD to neural data from retinal ganglion cells and V1 simple cells, and find it achieves error rates several times lower than standard estimators. Thus, estimates of comparable accuracy can be achieved with substantially less data. Finally, we introduce a computationally efficient Markov Chain Monte Carlo (MCMC) algorithm
Bayesian multitask inverse reinforcement learning
"... Abstract. We generalise the problem of inverse reinforcement learning to multiple tasks, from a set of demonstrations. Each demonstration may represent one expert trying to solve a different task. Alternatively, one may see each demonstration as given by a different expert trying to solve the same t ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
(Show Context)
Abstract. We generalise the problem of inverse reinforcement learning to multiple tasks, from a set of demonstrations. Each demonstration may represent one expert trying to solve a different task. Alternatively, one may see each demonstration as given by a different expert trying to solve the same task. Our main technical contribution is to solve the problem by formalising it as statistical preference elicitation, via a number of structured priors, whose form captures our biases about the relatedness of different tasks or expert policies. We show that our methodology allows us not only to learn to efficiently from multiple experts but to also effectively differentiate between the goals of each. Possible applications include analysing the intrinsic motivations of subjects in behavioural experiments and imitation learning from multiple teachers.
Contextspecific independence mixture modeling for positional weight matrices
 BIOINFORMATICS, VOL. 22 NO. 14 2006, PAGES E166–E173
, 2006
"... ..."
(Show Context)
Learning LargeScale Graphical Gaussian Models from Genomic Data
 In Science of Complex Networks: From Biology to the Internet and WWW
, 2005
"... The inference and modeling of networklike structures in genomic data is of prime importance in systems biology. Complex stochastic associations and interdependencies can very generally be described as a graphical model. However, the paucity of available samples in current highthroughput experiments ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
The inference and modeling of networklike structures in genomic data is of prime importance in systems biology. Complex stochastic associations and interdependencies can very generally be described as a graphical model. However, the paucity of available samples in current highthroughput experiments renders learning graphical models from genome data, such as microarray expression profiles, a challenging and very hard problem. Here we review several recently developed approaches to smallsample inference of graphical Gaussian modeling and discuss strategies to cope with the high dimensionality of functional genomics data. Particular emphasis is put on regularization methods and an empirical Bayes network inference procedure.
Bayesian adaptive inference and adaptive training
 IEEE Transactions Speech and Audio Processing
, 2007
"... Abstract—Largevocabulary speech recognition systems are often built using found data, such as broadcast news. In contrast to carefully collected data, found data normally contains multiple acoustic conditions, such as speaker or environmental noise. Adaptive training is a powerful approach to build ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
Abstract—Largevocabulary speech recognition systems are often built using found data, such as broadcast news. In contrast to carefully collected data, found data normally contains multiple acoustic conditions, such as speaker or environmental noise. Adaptive training is a powerful approach to build systems on such data. Here, transforms are used to represent the different acoustic conditions, and then a canonical model is trained given this set of transforms. This paper describes a Bayesian framework for adaptive training and inference. This framework addresses some limitations of standard maximumlikelihood approaches. In contrast to the standard approach, the adaptively trained system can be directly used in unsupervised inference, rather than having to rely on initial hypotheses being present. In addition, for limited adaptation data, robust recognition performance can be obtained. The limited data problem often occurs in testing as there is no control over the amount of the adaptation data available. In contrast, for adaptive training, it is possible to control the system complexity to reflect the available data. Thus, the standard point estimates may be used. As the integral associated with Bayesian adaptive inference is intractable, various marginalization approximations are described, including a variational Bayes approximation. Both batch and incremental modes of adaptive inference are discussed. These approaches are applied to adaptive training of maximumlikelihood linear regression and evaluated on a largevocabulary speech recognition task. Bayesian adaptive inference is shown to significantly outperform standard approaches. Index Terms—Adaptive training, Bayesian adaptation, Bayesian inference, incremental, variational Bayes.
Skellam shrinkage: Waveletbased intensity estimation for inhomogeneous Poisson data
"... The ubiquity of integrating detectors in imaging and other applications implies that a variety of realworld data are well modeled as Poisson random variables whose means are in turn proportional to an underlying vectorvalued signal of interest. In this article, we first show how the socalled Skel ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
(Show Context)
The ubiquity of integrating detectors in imaging and other applications implies that a variety of realworld data are well modeled as Poisson random variables whose means are in turn proportional to an underlying vectorvalued signal of interest. In this article, we first show how the socalled Skellam distribution arises from the fact that Haar wavelet and filterbank transform coefficients corresponding to measurements of this type are distributed as sums and differences of Poisson counts. We then provide two main theorems on Skellam shrinkage, one showing the nearoptimality of shrinkage in the Bayesian setting and the other providing for unbiased risk estimation in a frequentist context. These results serve to yield new estimators in the Haar transform domain, including an unbiased risk estimate for shrinkage of HaarFisz variancestabilized data, along with accompanying lowcomplexity algorithms for inference. We conclude with a simulation study demonstrating the efficacy of our Skellam
Modeling for Optimal Probability Prediction
 In Proceedings of the Nineteenth International Conference on Machine Learning
, 2002
"... We present a general modeling method for optimal probability prediction over future observations, in which model dimensionality is determined as a natural byproduct. This new method yields several estimators, and we establish theoretically that they are optimal (either overall or under stated ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
We present a general modeling method for optimal probability prediction over future observations, in which model dimensionality is determined as a natural byproduct. This new method yields several estimators, and we establish theoretically that they are optimal (either overall or under stated restrictions) when the number of free parameters is infinite.
Adaptive Training for Large Vocabulary Continuous Speech Recognition
, 2006
"... Summary In recent years, there has been a trend towards training large vocabulary continuous speech recognition (LVCSR) systems on a large amount of found data. Found data is recorded from spontaneous speech without careful control of the recording acoustic conditions, for example, conversational te ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
Summary In recent years, there has been a trend towards training large vocabulary continuous speech recognition (LVCSR) systems on a large amount of found data. Found data is recorded from spontaneous speech without careful control of the recording acoustic conditions, for example, conversational telephone speech. Hence, it typically has greater variability in terms of speaker and acoustic conditions than specially collected data. Thus, in addition to the desired speech variability required to discriminate between words, it also includes various nonspeech variabilities, for example, the change of speakers or acoustic environments. The standard approach to handle this type of data is to train hidden Markov models (HMMs) on the whole data set as if all data comes from a single acoustic condition. This is referred to as multistyle training, for example speakerindependent training. Effectively, the nonspeech variabilities are ignored. Though good performance has been obtained with multistyle systems, these systems account for all variabilities. Improvement may be obtained if the two types of variabilities in the found data are modelled separately. Adaptive training has been proposed for this purpose. In contrast to multistyle training, a set of transforms is used to represent the nonspeech variabilities. A canonical