Results 1  10
of
27
An Inequality for Rational Functions with Applications to Some Statistical Estimation Problems
 IEEE Trans. on Information Theory
, 1991
"... AbstractThe wellknown BaumEagon inequality I31 provides an effective iterative scheme for finding a local maximum for homogeneous polynomials with positive coefticients over a domain of probability values. However, in many applications we are interested in maximizing a general rational function. ..."
Abstract

Cited by 114 (4 self)
 Add to MetaCart
(Show Context)
AbstractThe wellknown BaumEagon inequality I31 provides an effective iterative scheme for finding a local maximum for homogeneous polynomials with positive coefticients over a domain of probability values. However, in many applications we are interested in maximizing a general rational function. We extend the BaumEagon inequality to rational functions. We briefly describe some of the applications of this inequality to statistical estimation problems. Index TermsNonlinear optimization, statistical estimation, hidden Markov models, speech recognition. I.
Markovian Models for Sequential Data
, 1996
"... Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many machine learning applications, especially for speech recognition. Furthermore, in the last few years, many new and promising probabilistic models related to HMMs have been proposed. We firs ..."
Abstract

Cited by 110 (2 self)
 Add to MetaCart
(Show Context)
Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many machine learning applications, especially for speech recognition. Furthermore, in the last few years, many new and promising probabilistic models related to HMMs have been proposed. We first summarize the basics of HMMs, and then review several recent related learning algorithms and extensions of HMMs, including in particular hybrids of HMMs with artificial neural networks, InputOutput HMMs (which are conditional HMMs using neural networks to compute probabilities), weighted transducers, variablelength Markov models and Markov switching statespace models. Finally, we discuss some of the challenges of future research in this very active area. 1 Introduction Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many applications in artificial intelligence, pattern recognition, speech recognition, and modeling of biological ...
Utilizing Soft Information in Decoding of Variable Length Codes
, 1999
"... : We present a method for utilizing soft information in decoding of variable length codes (VLCs). When compared with traditional VLC decoding, which is performed using "hard" input bits and a state machine, the softinput VLC decoding offers improved performance in terms of packet and symb ..."
Abstract

Cited by 27 (3 self)
 Add to MetaCart
: We present a method for utilizing soft information in decoding of variable length codes (VLCs). When compared with traditional VLC decoding, which is performed using "hard" input bits and a state machine, the softinput VLC decoding offers improved performance in terms of packet and symbol error rates. Softinput VLC decoding is free from the risk, encountered in hard decision VLC decoders in noisy environments, of terminating the decoding in an unsynchronized state, and it offers the possibility to exploit a priori knowledge, if available, of the number of symbols contained in the packet. 1 Introduction In most applications of variable length codes (VLCs), decoding is performed bit by bit, with the input to the entropy decoder assumed to be a sequence of "hard" bits about which no soft information is available. However, in noisy environments, soft information can be associated with each information bit, either by direct use of channel observations in the case of uncoded transmission...
Discriminative Training of Hidden Markov Models
, 1998
"... vi Abbreviations vii Notation viii 1 Introduction 1 2 Hidden Markov Models 4 2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 HMM Modelling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 HMM Topology . . . . . . . . . ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
vi Abbreviations vii Notation viii 1 Introduction 1 2 Hidden Markov Models 4 2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 HMM Modelling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 HMM Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4 Finding the Best Transcription . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.5 Setting the Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3 Objective Functions 19 3.1 Properties of Maximum Likelihood Estimators . . . . . . . . . . . . . . . . . . . 19 3.2 Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3 Maximum Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.4 Frame Discrimination . . . . . . . . . . . . . . . . ....
Predicting Daily Probability Distributions of S&P500 Returns
, 1998
"... Most approaches in forecasting merely try to predict the next value of the time series. In contrast, this paper presents a framework to predict the full probability distribution. It is expressed as a mixture model: the dynamics of the individual states is modeled with socalled "experts" ( ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
Most approaches in forecasting merely try to predict the next value of the time series. In contrast, this paper presents a framework to predict the full probability distribution. It is expressed as a mixture model: the dynamics of the individual states is modeled with socalled "experts" (potentially nonlinear neural networks), and the dynamics between the states is modeled using a hidden Markov approach. The full density predictions are obtained by a weighted superposition of the individual densities of each expert. This model class is called "hidden Markov experts". Results are presented for daily S&P500 data. While the predictive accuracy of the mean does not improve over simpler models, evaluating the prediction of the full density shows a clear outofsample improvement both over a simple GARCH(1,1) model (which assumes Gaussian distributed returns) and over a "gated experts" model (which expresses the weighting for each state nonrecursively as a function of external inputs). Sev...
Maximum expected bleu training of phrase and lexicon translation models
 In ACL
"... This paper proposes a new discriminative training method in constructing phrase and lexicon translation models. In order to reliably learn a myriad of parameters in these models, we propose an expected BLEU scorebased utility function with KL regularization as the objective, and train the models on ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
(Show Context)
This paper proposes a new discriminative training method in constructing phrase and lexicon translation models. In order to reliably learn a myriad of parameters in these models, we propose an expected BLEU scorebased utility function with KL regularization as the objective, and train the models on a large parallel dataset. For training, we derive growth transformations for phrase and lexicon translation probabilities to iteratively improve the objective. The proposed method, evaluated on the Europarl GermantoEnglish dataset, leads to a 1.1 BLEU point improvement over a stateoftheart baseline translation system. In IWSLT 2011 Benchmark, our system using the proposed method achieves the best ChinesetoEnglish translation result on the task of translating TED talks. 1.
Discriminative Training For Continuous Speech Recognition
 Proc. 1995 Europ. Conf. on Speech Communication and Technology
, 1995
"... Discriminative training techniques for HiddenMarkov Models were recently proposed and successfully applied for automatic speech recognition. In this paper a discussion of the Minimum Classification Error and the Maximum Mutual Information objective is presented. An extended reestimation formula is ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
(Show Context)
Discriminative training techniques for HiddenMarkov Models were recently proposed and successfully applied for automatic speech recognition. In this paper a discussion of the Minimum Classification Error and the Maximum Mutual Information objective is presented. An extended reestimation formula is used for the HMM parameter update for both objective functions. The discriminative training methods were utilized in speaker independent phoneme recognition experiments and improved the phoneme recognition rates for both discriminative training techniques. 1. INTRODUCTION Recently discriminative training techniques for Hidden Markov Models (HMM) were used successfully for automatic speech recognition. They provide better performance compared to Maximum Likelihood Estimation (MLE), since the training is concentrated on the estimation of class boundaries and not on parameters of assumed model distributions [1,12]. Although MLE and discriminative training are theoretically equivalent (if su...
Language Modeling for Efficient BeamSearch
 Computer Speech and Language
, 1995
"... This paper considers the problems of estimating bigram language models and of efficiently representing them by a finite state network, which can be employed by an hidden Markov model based, beamsearch, continuous speech recognizer. ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
This paper considers the problems of estimating bigram language models and of efficiently representing them by a finite state network, which can be employed by an hidden Markov model based, beamsearch, continuous speech recognizer.
Techniques For Robust Recognition In Restricted Domains
 In Proceedings of the European Conference on Speech Communication and Technology
, 1993
"... This paper describes an Automatic Speech Understanding (ASU) system used in a humanrobot interface for the remote control of a mobile robot. The intended application is that of an operator issuing telecontrol commands to one or more robots from a remote workstation. ASU is supposed to be performed ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
This paper describes an Automatic Speech Understanding (ASU) system used in a humanrobot interface for the remote control of a mobile robot. The intended application is that of an operator issuing telecontrol commands to one or more robots from a remote workstation. ASU is supposed to be performed with spontaneous continuous speech and quasi real time conditions. Training and testing of the system was based on speech data collected by means of Wizard of Oz simulations. Two kinds of robustness factors are introduced: the first is a recognition errortolerant approach to semantic interpretation, the second is based on a technique for evaluating the reliability of the ASU system output with respect to the input utterance. Preliminary results are 90.9% of correct semantic interpretations, and 89.1% of correct detection of outofdomain sentences at the cost of rejecting 16.4% of correct indomain sentences. 1. INTRODUCTION This paper describes an Automatic Speech Understanding (ASU) sys...
Statistical Machine Translation and Automatic Speech Recognition under Uncertainty
 Johns Hopkins University
, 2007
"... Statistical modeling techniques have been applied successfully to natural language processing tasks such as automatic speech recognition (ASR) and statistical machine translation (SMT). Since most statistical approaches rely heavily on availability of data and the underlying model assumptions, reduc ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Statistical modeling techniques have been applied successfully to natural language processing tasks such as automatic speech recognition (ASR) and statistical machine translation (SMT). Since most statistical approaches rely heavily on availability of data and the underlying model assumptions, reduction in uncertainty is critical to their optimal performance. In speech translation, the uncertainty is due to the speech input to the SMT system whose elements are represented as distributions over sequences. A novel approach to statistical phrasebased speech translation is proposed. This approach is based on a generative, sourcechannel model of translation, similar in spirit to the modeling approaches that underly hidden Markov model(HMM)based ASR systems: in fact, our model of speechtotext translation contains the acoustic models of a large vocabulary ASR system as one of its components. This model of speechtotext translation is developed as a direct extension of the phrasebased models used in text translation systems. Speech is translated by mapping ASR word lattices to lattices of phrase sequences which are then translated using operations developed for