Results 1  10
of
93
Estimation of probabilities from sparse data for the language model component of a speech recognizer
 IEEE Transactions on Acoustics, Speech and Signal Processing
, 1987
"... AbstractThe description of a novel type of rngram language model is given. The model offers, via a nonlinear recursive procedure, a computation and space efficient solution to the problem of estimating probabilities from sparse data. This solution compares favorably to other proposed methods. Wh ..."
Abstract

Cited by 701 (2 self)
 Add to MetaCart
(Show Context)
AbstractThe description of a novel type of rngram language model is given. The model offers, via a nonlinear recursive procedure, a computation and space efficient solution to the problem of estimating probabilities from sparse data. This solution compares favorably to other proposed methods. While the method has been developed for and successfully implemented in the IBM Real Time Speech Recognizers, its generality makes it applicable in other areas where the problem of estimating probabilities from sparse data arises. Sparseness of data is an inherent property of any real text, and it is a problem that one always encounters while collecting frequency statistics on words and word sequences (mgrams) from a text of finite size. This means that even for a very large data collection, the maximum likelihood estimation method does not allow Turing’s estimate PT for a probability of a word (mgram) which occurred in the sample r times is r* PT = where r We call a procedure of replacing a count r with a modified count r ’ “discounting ” and a ratio rt/r a discount coefficient dr. When r ’ = r*, we have Turing’s discounting. Let us denote the mgram wl, *.., w, as wy and the number of times it occurred in the sample text as c(wT). Then the maximum likelihood estimate is
Stock Return Predictability and Model Uncertainty
, 2002
"... We use Bayesian model averaging to analyze the sample evidence on return predictability in the presence of model uncertainty. The analysis reveals insample and outofsample predictability, and shows that the outofsample performance of the Bayesian approach is superior to that of model selecti ..."
Abstract

Cited by 122 (4 self)
 Add to MetaCart
We use Bayesian model averaging to analyze the sample evidence on return predictability in the presence of model uncertainty. The analysis reveals insample and outofsample predictability, and shows that the outofsample performance of the Bayesian approach is superior to that of model selection criteria. We find that term and market premia are robust predictors. Moreover, smallcap value stocks appear more predictable than largecap growth stocks. We also investigate the implications of model uncertainty from investment management perspectives. We show that model uncertainty is more important than estimation risk, and investors who discard model uncertainty face large utility losses.
The interplay of bayesian and frequentist analysis
 Statist. Sci
, 2004
"... Statistics has struggled for nearly a century over the issue of whether the Bayesian or frequentist paradigm is superior. This debate is far from over and, indeed, should continue, since there are fundamental philosophical and pedagogical issues at stake. At the methodological level, however, the fi ..."
Abstract

Cited by 30 (0 self)
 Add to MetaCart
Statistics has struggled for nearly a century over the issue of whether the Bayesian or frequentist paradigm is superior. This debate is far from over and, indeed, should continue, since there are fundamental philosophical and pedagogical issues at stake. At the methodological level, however, the fight has become considerably muted, with the recognition that each approach has a great deal to contribute to statistical practice and each is actually essential for full development of the other approach. In this article, we embark upon a rather idiosyncratic walk through some of these issues. Key words and phrases: Admissibility; Bayesian model checking; conditional frequentist; confidence intervals; consistency; coverage; design; hierarchical models; nonparametric
Bayesian adaptive learning of the parameters of hidden Markov model for speech recognition
 Transactions On Speech and Audio Processing
, 1995
"... ..."
Learning to be Bayesian without supervision
 in Adv. Neural Information Processing Systems (NIPS*06
, 2007
"... Bayesian estimators are defined in terms of the posterior distribution. Typically, this is written as the product of the likelihood function and a prior probability density, both of which are assumed to be known. But in many situations, the prior density is not known, and is difficult to learn from ..."
Abstract

Cited by 19 (8 self)
 Add to MetaCart
(Show Context)
Bayesian estimators are defined in terms of the posterior distribution. Typically, this is written as the product of the likelihood function and a prior probability density, both of which are assumed to be known. But in many situations, the prior density is not known, and is difficult to learn from data since one does not have access to uncorrupted samples of the variable being estimated. We show that for a wide variety of observation models, the Bayes least squares (BLS) estimator may be formulated without explicit reference to the prior. Specifically, we derive a direct expression for the estimator, and a related expression for the mean squared estimation error, both in terms of the density of the observed measurements. Each of these priorfree formulations allows us to approximate the estimator given a sufficient amount of observed data. We use the first form to develop practical nonparametric approximations of BLS estimators for several different observation processes, and the second form to develop a parametric family of estimators for use in the additive Gaussian noise case. We examine the empirical performance of these estimators as a function of the amount of observed data. 1
General empirical Bayes wavelet methods and exactly adaptive minimax estimation

, 2005
"... In many statistical problems, stochastic signals can be represented as a sequence of noisy wavelet coefficients. In this paper, we develop general empirical Bayes methods for the estimation of true signal. Our estimators approximate certain oracle separable rules and achieve adaptation to ideal risk ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
In many statistical problems, stochastic signals can be represented as a sequence of noisy wavelet coefficients. In this paper, we develop general empirical Bayes methods for the estimation of true signal. Our estimators approximate certain oracle separable rules and achieve adaptation to ideal risks and exact minimax risks in broad collections of classes of signals. In particular, our estimators are uniformly adaptive to the minimum risk of separable estimators and the exact minimax risks simultaneously in Besov balls of all smoothness and shape indices, and they are uniformly superefficient in convergence rates in all compact sets in Besov spaces with a finite secondary shape parameter. Furthermore, in classes nested between Besov balls of the same smoothness index, our estimators dominate threshold and James–Stein estimators within an infinitesimal fraction of the minimax risks. More general block empirical Bayes estimators are developed. Both white noise with drift and nonparametric regression are considered.
An empirical bayes approach to contextual region classification
 In CVPR
, 2009
"... This paper presents a nonparametric approach to labeling of local image regions that is inspired by recent developments in informationtheoretic denoising. The chief novelty of this approach rests in its ability to derive an unsupervised contextual prior over image classes from unlabeled test data. ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
(Show Context)
This paper presents a nonparametric approach to labeling of local image regions that is inspired by recent developments in informationtheoretic denoising. The chief novelty of this approach rests in its ability to derive an unsupervised contextual prior over image classes from unlabeled test data. Labeled training data is needed only to learn a local appearance model for image patches (although additional supervisory information can optionally be incorporated when it is available). Instead of assuming a parametric prior such as a Markov random field for the class labels, the proposed approach uses the empirical Bayes technique of statistical inversion to recover a contextual model directly from the test data, either as a spatially varying or as a globally constant prior distribution over the classes in the image. Results on two challenging datasets convincingly demonstrate that useful contextual information can indeed be learned from unlabeled data. 1.
When did Bayesian inference become “Bayesian"?
 BAYESIAN ANALYSIS
, 2006
"... While Bayes’ theorem has a 250year history, and the method of inverse probability that flowed from it dominated statistical thinking into the twentieth century, the adjective “Bayesian” was not part of the statistical lexicon until relatively recently. This paper provides an overview of key Bayesi ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
While Bayes’ theorem has a 250year history, and the method of inverse probability that flowed from it dominated statistical thinking into the twentieth century, the adjective “Bayesian” was not part of the statistical lexicon until relatively recently. This paper provides an overview of key Bayesian developments, beginning with Bayes’ posthumously published 1763 paper and continuing up through approximately 1970, including the period of time when “Bayesian” emerged as the label of choice for those who advocated Bayesian methods.
Bayesian adaptive inference and adaptive training
 IEEE Transactions Speech and Audio Processing
, 2007
"... Abstract—Largevocabulary speech recognition systems are often built using found data, such as broadcast news. In contrast to carefully collected data, found data normally contains multiple acoustic conditions, such as speaker or environmental noise. Adaptive training is a powerful approach to build ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
Abstract—Largevocabulary speech recognition systems are often built using found data, such as broadcast news. In contrast to carefully collected data, found data normally contains multiple acoustic conditions, such as speaker or environmental noise. Adaptive training is a powerful approach to build systems on such data. Here, transforms are used to represent the different acoustic conditions, and then a canonical model is trained given this set of transforms. This paper describes a Bayesian framework for adaptive training and inference. This framework addresses some limitations of standard maximumlikelihood approaches. In contrast to the standard approach, the adaptively trained system can be directly used in unsupervised inference, rather than having to rely on initial hypotheses being present. In addition, for limited adaptation data, robust recognition performance can be obtained. The limited data problem often occurs in testing as there is no control over the amount of the adaptation data available. In contrast, for adaptive training, it is possible to control the system complexity to reflect the available data. Thus, the standard point estimates may be used. As the integral associated with Bayesian adaptive inference is intractable, various marginalization approximations are described, including a variational Bayes approximation. Both batch and incremental modes of adaptive inference are discussed. These approaches are applied to adaptive training of maximumlikelihood linear regression and evaluated on a largevocabulary speech recognition task. Bayesian adaptive inference is shown to significantly outperform standard approaches. Index Terms—Adaptive training, Bayesian adaptation, Bayesian inference, incremental, variational Bayes.