Results 1 - 10
of
41
Estimation of probabilities from sparse data for the language model component of a speech recognizer
- IEEE Transactions on Acoustics, Speech and Signal Processing
, 1987
"... Abstract-The description of a novel type of rn-gram language model is given. The model offers, via a nonlinear recursive procedure, a com-putation and space efficient solution to the problem of estimating prob-abilities from sparse data. This solution compares favorably to other proposed methods. Wh ..."
Abstract
-
Cited by 577 (1 self)
- Add to MetaCart
Abstract-The description of a novel type of rn-gram language model is given. The model offers, via a nonlinear recursive procedure, a com-putation and space efficient solution to the problem of estimating prob-abilities from sparse data. This solution compares favorably to other proposed methods. While the method has been developed for and suc-cessfully implemented in the IBM Real Time Speech Recognizers, its generality makes it applicable in other areas where the problem of es-timating probabilities from sparse data arises. Sparseness of data is an inherent property of any real text, and it is a problem that one always encounters while collecting fre-quency statistics on words and word sequences (m-grams) from a text of finite size. This means that even for a very large data col-lection, the maximum likelihood estimation method does not allow Turing’s estimate PT for a probability of a word (m-gram) which occurred in the sample r times is r* PT = where r We call a procedure of replacing a count r with a modified count r ’ “discounting ” and a ratio rt/r a discount coefficient dr. When r ’ = r*, we have Turing’s discounting. Let us denote the m-gram wl, *.., w, as wy and the number of times it occurred in the sample text as c(wT). Then the maxi-mum likelihood estimate is
Stock Return Predictability and Model Uncertainty
, 2002
"... We use Bayesian model averaging to analyze the sample evidence on return predictability in the presence of model uncertainty. The analysis reveals in-sample and out-of-sample predictability, and shows that the out-of-sample performance of the Bayesian approach is superior to that of model selecti ..."
Abstract
-
Cited by 53 (2 self)
- Add to MetaCart
We use Bayesian model averaging to analyze the sample evidence on return predictability in the presence of model uncertainty. The analysis reveals in-sample and out-of-sample predictability, and shows that the out-of-sample performance of the Bayesian approach is superior to that of model selection criteria. We find that term and market premia are robust predictors. Moreover, small-cap value stocks appear more predictable than large-cap growth stocks. We also investigate the implications of model uncertainty from investment management perspectives. We show that model uncertainty is more important than estimation risk, and investors who discard model uncertainty face large utility losses.
Bayesian Adaptive Learning of the Parameters of Hidden Markov Model for Speech Recognition
"... In this paper a theoretical framework for Bayesian adaptive learning of discrete HMM and semi-continuous one with Gaussian mixture state observation densities is presented. Corresponding to the well-known Baum-Welch and segmental k-means algorithms respectively for HMM training, formulations of MAP ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
In this paper a theoretical framework for Bayesian adaptive learning of discrete HMM and semi-continuous one with Gaussian mixture state observation densities is presented. Corresponding to the well-known Baum-Welch and segmental k-means algorithms respectively for HMM training, formulations of MAP (maximum aposteriori) and segmental MAP estimation of HMM parameters are developed. Furthermore, a computationally efficient method of the segmental quasi-Bayes estimation for semi-continuous HMM is also presented. The important issue of prior density estimation is discussed and a simplified method of moment estimate is given. The method proposed in this paper will be applicable to some problems in HMM training for speech recognition such as sequential or batch training, model adaptation, and parameter smoothing, etc.
Learning to be Bayesian without supervision
- in Adv. Neural Information Processing Systems (NIPS*06
, 2007
"... Bayesian estimators are defined in terms of the posterior distribution. Typically, this is written as the product of the likelihood function and a prior probability density, both of which are assumed to be known. But in many situations, the prior density is not known, and is difficult to learn from ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
Bayesian estimators are defined in terms of the posterior distribution. Typically, this is written as the product of the likelihood function and a prior probability density, both of which are assumed to be known. But in many situations, the prior density is not known, and is difficult to learn from data since one does not have access to uncorrupted samples of the variable being estimated. We show that for a wide variety of observation models, the Bayes least squares (BLS) estimator may be formulated without explicit reference to the prior. Specifically, we derive a direct expression for the estimator, and a related expression for the mean squared estimation error, both in terms of the density of the observed measurements. Each of these prior-free formulations allows us to approximate the estimator given a sufficient amount of observed data. We use the first form to develop practical nonparametric approximations of BLS estimators for several different observation processes, and the second form to develop a parametric family of estimators for use in the additive Gaussian noise case. We examine the empirical performance of these estimators as a function of the amount of observed data. 1
General empirical Bayes wavelet methods and exactly adaptive minimax estimation
-
, 2005
"... In many statistical problems, stochastic signals can be represented as a sequence of noisy wavelet coefficients. In this paper, we develop general empirical Bayes methods for the estimation of true signal. Our estimators approximate certain oracle separable rules and achieve adaptation to ideal risk ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
In many statistical problems, stochastic signals can be represented as a sequence of noisy wavelet coefficients. In this paper, we develop general empirical Bayes methods for the estimation of true signal. Our estimators approximate certain oracle separable rules and achieve adaptation to ideal risks and exact minimax risks in broad collections of classes of signals. In particular, our estimators are uniformly adaptive to the minimum risk of separable estimators and the exact minimax risks simultaneously in Besov balls of all smoothness and shape indices, and they are uniformly superefficient in convergence rates in all compact sets in Besov spaces with a finite secondary shape parameter. Furthermore, in classes nested between Besov balls of the same smoothness index, our estimators dominate threshold and James–Stein estimators within an infinitesimal fraction of the minimax risks. More general block empirical Bayes estimators are developed. Both white noise with drift and nonparametric regression are considered.
Bayesian adaptive inference and adaptive training
- IEEE Transactions Speech and Audio Processing
, 2007
"... Abstract—Large-vocabulary speech recognition systems are often built using found data, such as broadcast news. In contrast to carefully collected data, found data normally contains multiple acoustic conditions, such as speaker or environmental noise. Adaptive training is a powerful approach to build ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
Abstract—Large-vocabulary speech recognition systems are often built using found data, such as broadcast news. In contrast to carefully collected data, found data normally contains multiple acoustic conditions, such as speaker or environmental noise. Adaptive training is a powerful approach to build systems on such data. Here, transforms are used to represent the different acoustic conditions, and then a canonical model is trained given this set of transforms. This paper describes a Bayesian framework for adaptive training and inference. This framework addresses some limitations of standard maximum-likelihood approaches. In contrast to the standard approach, the adaptively trained system can be directly used in unsupervised inference, rather than having to rely on initial hypotheses being present. In addition, for limited adaptation data, robust recognition performance can be obtained. The limited data problem often occurs in testing as there is no control over the amount of the adaptation data available. In contrast, for adaptive training, it is possible to control the system complexity to reflect the available data. Thus, the standard point estimates may be used. As the integral associated with Bayesian adaptive inference is intractable, various marginalization approximations are described, including a variational Bayes approximation. Both batch and incremental modes of adaptive inference are discussed. These approaches are applied to adaptive training of maximum-likelihood linear regression and evaluated on a large-vocabulary speech recognition task. Bayesian adaptive inference is shown to significantly outperform standard approaches. Index Terms—Adaptive training, Bayesian adaptation, Bayesian inference, incremental, variational Bayes.
An empirical bayes approach to contextual region classification
- In CVPR
, 2009
"... This paper presents a nonparametric approach to labeling of local image regions that is inspired by recent developments in information-theoretic denoising. The chief novelty of this approach rests in its ability to derive an unsupervised contextual prior over image classes from unlabeled test data. ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This paper presents a nonparametric approach to labeling of local image regions that is inspired by recent developments in information-theoretic denoising. The chief novelty of this approach rests in its ability to derive an unsupervised contextual prior over image classes from unlabeled test data. Labeled training data is needed only to learn a local appearance model for image patches (although additional supervisory information can optionally be incorporated when it is available). Instead of assuming a parametric prior such as a Markov random field for the class labels, the proposed approach uses the empirical Bayes technique of statistical inversion to recover a contextual model directly from the test data, either as a spatially varying or as a globally constant prior distribution over the classes in the image. Results on two challenging datasets convincingly demonstrate that useful contextual information can indeed be learned from unlabeled data. 1.
Adaptive Training for Large Vocabulary Continuous Speech Recognition
, 2006
"... Summary In recent years, there has been a trend towards training large vocabulary continuous speech recognition (LVCSR) systems on a large amount of found data. Found data is recorded from spontaneous speech without careful control of the recording acoustic conditions, for example, conversational te ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Summary In recent years, there has been a trend towards training large vocabulary continuous speech recognition (LVCSR) systems on a large amount of found data. Found data is recorded from spontaneous speech without careful control of the recording acoustic conditions, for example, conversational telephone speech. Hence, it typically has greater variability in terms of speaker and acoustic conditions than specially collected data. Thus, in addition to the desired speech variability required to discriminate between words, it also includes various non-speech variabil-ities, for example, the change of speakers or acoustic environments. The standard approach to handle this type of data is to train hidden Markov models (HMMs) on the whole data set as if all data comes from a single acoustic condition. This is referred to as multi-style training, for exam-ple speaker-independent training. Effectively, the non-speech variabilities are ignored. Though good performance has been obtained with multi-style systems, these systems account for all variabilities. Improvement may be obtained if the two types of variabilities in the found data are modelled separately. Adaptive training has been proposed for this purpose. In contrast to multi-style training, a set of transforms is used to represent the non-speech variabilities. A canonical
Empirical Bayes and compound estimation of normal means
- Statistica Sinica
, 1997
"... Abstract: This article concerns the canonical empirical Bayes problem of estimating normal means under squared-error loss. General empirical estimators are derived which are asymptotically minimax and optimal. Uniform convergence and the speed of convergence are considered. The general empirical Bay ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Abstract: This article concerns the canonical empirical Bayes problem of estimating normal means under squared-error loss. General empirical estimators are derived which are asymptotically minimax and optimal. Uniform convergence and the speed of convergence are considered. The general empirical Bayes estimators are compared with the shrinkage estimators of Stein (1956) and James and Stein (1961). Estimation of the mixture density and its derivatives are also discussed.

