Results 1  10
of
136
Estimation of probabilities from sparse data for the language model component of a speech recognizer
 IEEE Transactions on Acoustics, Speech and Signal Processing
, 1987
"... AbstractThe description of a novel type of rngram language model is given. The model offers, via a nonlinear recursive procedure, a computation and space efficient solution to the problem of estimating probabilities from sparse data. This solution compares favorably to other proposed methods. Wh ..."
Abstract

Cited by 790 (2 self)
 Add to MetaCart
(Show Context)
AbstractThe description of a novel type of rngram language model is given. The model offers, via a nonlinear recursive procedure, a computation and space efficient solution to the problem of estimating probabilities from sparse data. This solution compares favorably to other proposed methods. While the method has been developed for and successfully implemented in the IBM Real Time Speech Recognizers, its generality makes it applicable in other areas where the problem of estimating probabilities from sparse data arises. Sparseness of data is an inherent property of any real text, and it is a problem that one always encounters while collecting frequency statistics on words and word sequences (mgrams) from a text of finite size. This means that even for a very large data collection, the maximum likelihood estimation method does not allow Turing’s estimate PT for a probability of a word (mgram) which occurred in the sample r times is r* PT = where r We call a procedure of replacing a count r with a modified count r ’ “discounting ” and a ratio rt/r a discount coefficient dr. When r ’ = r*, we have Turing’s discounting. Let us denote the mgram wl, *.., w, as wy and the number of times it occurred in the sample text as c(wT). Then the maximum likelihood estimate is
Stock Return Predictability and Model Uncertainty
, 2002
"... We use Bayesian model averaging to analyze the sample evidence on return predictability in the presence of model uncertainty. The analysis reveals insample and outofsample predictability, and shows that the outofsample performance of the Bayesian approach is superior to that of model selecti ..."
Abstract

Cited by 152 (5 self)
 Add to MetaCart
We use Bayesian model averaging to analyze the sample evidence on return predictability in the presence of model uncertainty. The analysis reveals insample and outofsample predictability, and shows that the outofsample performance of the Bayesian approach is superior to that of model selection criteria. We find that term and market premia are robust predictors. Moreover, smallcap value stocks appear more predictable than largecap growth stocks. We also investigate the implications of model uncertainty from investment management perspectives. We show that model uncertainty is more important than estimation risk, and investors who discard model uncertainty face large utility losses.
The interplay of bayesian and frequentist analysis
 Statist. Sci
, 2004
"... Statistics has struggled for nearly a century over the issue of whether the Bayesian or frequentist paradigm is superior. This debate is far from over and, indeed, should continue, since there are fundamental philosophical and pedagogical issues at stake. At the methodological level, however, the fi ..."
Abstract

Cited by 48 (0 self)
 Add to MetaCart
Statistics has struggled for nearly a century over the issue of whether the Bayesian or frequentist paradigm is superior. This debate is far from over and, indeed, should continue, since there are fundamental philosophical and pedagogical issues at stake. At the methodological level, however, the fight has become considerably muted, with the recognition that each approach has a great deal to contribute to statistical practice and each is actually essential for full development of the other approach. In this article, we embark upon a rather idiosyncratic walk through some of these issues. Key words and phrases: Admissibility; Bayesian model checking; conditional frequentist; confidence intervals; consistency; coverage; design; hierarchical models; nonparametric
Bayesian adaptive learning of the parameters of hidden Markov model for speech recognition
 IEEE Trans. on SAP
"... AbstractIn this paper, a theoretical framework for Bayesian adaptive training of the parameters of discrete hidden Markov model (DHMM) and of semicontinuous HMM (SCHMM) with Gaussian mixture state observation densities is presented. In addition to formulating the forwardbackward MAP (maximum a po ..."
Abstract

Cited by 34 (8 self)
 Add to MetaCart
(Show Context)
AbstractIn this paper, a theoretical framework for Bayesian adaptive training of the parameters of discrete hidden Markov model (DHMM) and of semicontinuous HMM (SCHMM) with Gaussian mixture state observation densities is presented. In addition to formulating the forwardbackward MAP (maximum a posterion’) and the segmental MAP algorithms for estimating the above HMM parameters, a computationally efficient segmental quasiBayes algorithm for estimating the statespecific mixture coefficients in SCHMM is developed. For estimating the parameters of the prior densities, a new empirical Bayes method based on the moment estimates is also proposed. The MAP algorithms and the prior parameter specification are directly applicable to training speaker adaptive HMM’s. Practical issues related to the use of the proposed techniques for HMMbased speaker adaptation are studied. The proposed MAP algorithms are shown to be effective especially in the cases in which the training or adaptation data are limited. I.
GENERAL MAXIMUM LIKELIHOOD EMPIRICAL BAYES ESTIMATION OF NORMAL MEANS
, 908
"... We propose a general maximum likelihood empirical Bayes (GMLEB) method for the estimation of a mean vector based on observations with i.i.d. normal errors. We prove that under mild moment conditions on the unknown means, the average mean squared error (MSE) of the GMLEB is within an infinitesimal f ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
(Show Context)
We propose a general maximum likelihood empirical Bayes (GMLEB) method for the estimation of a mean vector based on observations with i.i.d. normal errors. We prove that under mild moment conditions on the unknown means, the average mean squared error (MSE) of the GMLEB is within an infinitesimal fraction of the minimum average MSE among all separable estimators which use a single deterministic estimating function on individual observations, provided that the risk is of greater order than (log n) 5 /n. We also prove that the GMLEB is uniformly approximately minimax in regular and weak ℓp balls when the order of the lengthnormalized norm of the unknown means is between (log n) κ1 /n
When did Bayesian inference become “Bayesian"?
 BAYESIAN ANALYSIS
, 2006
"... While Bayes’ theorem has a 250year history, and the method of inverse probability that flowed from it dominated statistical thinking into the twentieth century, the adjective “Bayesian” was not part of the statistical lexicon until relatively recently. This paper provides an overview of key Bayesi ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
While Bayes’ theorem has a 250year history, and the method of inverse probability that flowed from it dominated statistical thinking into the twentieth century, the adjective “Bayesian” was not part of the statistical lexicon until relatively recently. This paper provides an overview of key Bayesian developments, beginning with Bayes’ posthumously published 1763 paper and continuing up through approximately 1970, including the period of time when “Bayesian” emerged as the label of choice for those who advocated Bayesian methods.
General empirical Bayes wavelet methods and exactly adaptive minimax estimation

, 2005
"... In many statistical problems, stochastic signals can be represented as a sequence of noisy wavelet coefficients. In this paper, we develop general empirical Bayes methods for the estimation of true signal. Our estimators approximate certain oracle separable rules and achieve adaptation to ideal risk ..."
Abstract

Cited by 25 (3 self)
 Add to MetaCart
In many statistical problems, stochastic signals can be represented as a sequence of noisy wavelet coefficients. In this paper, we develop general empirical Bayes methods for the estimation of true signal. Our estimators approximate certain oracle separable rules and achieve adaptation to ideal risks and exact minimax risks in broad collections of classes of signals. In particular, our estimators are uniformly adaptive to the minimum risk of separable estimators and the exact minimax risks simultaneously in Besov balls of all smoothness and shape indices, and they are uniformly superefficient in convergence rates in all compact sets in Besov spaces with a finite secondary shape parameter. Furthermore, in classes nested between Besov balls of the same smoothness index, our estimators dominate threshold and James–Stein estimators within an infinitesimal fraction of the minimax risks. More general block empirical Bayes estimators are developed. Both white noise with drift and nonparametric regression are considered.
Learning to be Bayesian without supervision
 in Adv. Neural Information Processing Systems (NIPS*06
, 2007
"... Bayesian estimators are defined in terms of the posterior distribution. Typically, this is written as the product of the likelihood function and a prior probability density, both of which are assumed to be known. But in many situations, the prior density is not known, and is difficult to learn from ..."
Abstract

Cited by 25 (6 self)
 Add to MetaCart
(Show Context)
Bayesian estimators are defined in terms of the posterior distribution. Typically, this is written as the product of the likelihood function and a prior probability density, both of which are assumed to be known. But in many situations, the prior density is not known, and is difficult to learn from data since one does not have access to uncorrupted samples of the variable being estimated. We show that for a wide variety of observation models, the Bayes least squares (BLS) estimator may be formulated without explicit reference to the prior. Specifically, we derive a direct expression for the estimator, and a related expression for the mean squared estimation error, both in terms of the density of the observed measurements. Each of these priorfree formulations allows us to approximate the estimator given a sufficient amount of observed data. We use the first form to develop practical nonparametric approximations of BLS estimators for several different observation processes, and the second form to develop a parametric family of estimators for use in the additive Gaussian noise case. We examine the empirical performance of these estimators as a function of the amount of observed data. 1
An empirical bayes approach to contextual region classification
 In CVPR
, 2009
"... This paper presents a nonparametric approach to labeling of local image regions that is inspired by recent developments in informationtheoretic denoising. The chief novelty of this approach rests in its ability to derive an unsupervised contextual prior over image classes from unlabeled test data. ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
(Show Context)
This paper presents a nonparametric approach to labeling of local image regions that is inspired by recent developments in informationtheoretic denoising. The chief novelty of this approach rests in its ability to derive an unsupervised contextual prior over image classes from unlabeled test data. Labeled training data is needed only to learn a local appearance model for image patches (although additional supervisory information can optionally be incorporated when it is available). Instead of assuming a parametric prior such as a Markov random field for the class labels, the proposed approach uses the empirical Bayes technique of statistical inversion to recover a contextual model directly from the test data, either as a spatially varying or as a globally constant prior distribution over the classes in the image. Results on two challenging datasets convincingly demonstrate that useful contextual information can indeed be learned from unlabeled data. 1.
Prediction in multilevel generalized linear models
 Journal of the Royal Statistical Society. Series A (Statistics in Society
"... Summary. We discuss prediction of random effects and of expected responses in multilevel generalized linear models. Prediction of random effects is useful for instance in small area estimation and disease mapping, effectiveness studies and model diagnostics. Prediction of expected responses is usefu ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
Summary. We discuss prediction of random effects and of expected responses in multilevel generalized linear models. Prediction of random effects is useful for instance in small area estimation and disease mapping, effectiveness studies and model diagnostics. Prediction of expected responses is useful for planning, model interpretation and diagnostics. For prediction of random effects, we concentrate on empirical Bayes prediction and discuss three different kinds of standard errors; the posterior standard deviation and the marginal prediction error standard deviation (comparative standard errors) and the marginal sampling standard deviation (diagnostic standard error). Analytical expressions are available only for linear models and are provided in an appendix. For other multilevel generalized linear models we present approximations and suggest using parametric bootstrapping to obtain standard errors. We also discuss prediction of expectations of responses or probabilities for a new unit in a hypothetical cluster, or in a new (randomly sampled) cluster or in an existing cluster. The methods are implemented in gllamm and illustrated by applying them to survey data on reading proficiency of children nested in schools. Simulations are used to assess the performance of various predictions and associated standard errors for logistic randomintercept models under a range of conditions.