Results 1  10
of
103
ContextDependent Pretrained Deep Neural Networks for Large Vocabulary Speech Recognition
 IEEE Transactions on Audio, Speech, and Language Processing
, 2012
"... Abstract—We propose a novel contextdependent (CD) model for large vocabulary speech recognition (LVSR) that leverages recent advances in using deep belief networks for phone recognition. We describe a pretrained deep neural network hidden Markov model (DNNHMM) hybrid architecture that trains the ..."
Abstract

Cited by 76 (35 self)
 Add to MetaCart
Abstract—We propose a novel contextdependent (CD) model for large vocabulary speech recognition (LVSR) that leverages recent advances in using deep belief networks for phone recognition. We describe a pretrained deep neural network hidden Markov model (DNNHMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output. The deep belief network pretraining algorithm is a robust and often helpful way to initialize deep neural networks generatively that can aid in optimization and reduce generalization error. We illustrate the key components of our model, describe the procedure for applying CDDNNHMMs to LVSR, and analyze the effects of various modeling choices on performance. Experiments on a challenging business search dataset demonstrate that CDDNNHMMs can significantly outperform the conventional contextdependent Gaussian mixture model (GMM)HMMs, with an absolute sentence accuracy improvement of 5.8 % and 9.2 % (or relative error reduction of 16.0 % and 23.2%) over the CDGMMHMMs trained using the minimum phone error rate (MPE) and maximum likelihood (ML) criteria, respectively. Index Terms—Speech recognition, deep belief network, contextdependent phone, LVSR, DNNHMM, ANNHMM I.
Dynamic Bayesian Multinets
, 2000
"... In this work, dynamic Bayesian multinets are introduced where a Markov chain state at time t determines conditional independence patterns between random variables lying within a local time window surrounding t. It is shown how informationtheoretic criterion functions can be used to induce spa ..."
Abstract

Cited by 59 (18 self)
 Add to MetaCart
In this work, dynamic Bayesian multinets are introduced where a Markov chain state at time t determines conditional independence patterns between random variables lying within a local time window surrounding t. It is shown how informationtheoretic criterion functions can be used to induce sparse, discriminative, and classconditional network structures that yield an optimal approximation to the class posterior probability, and therefore are useful for the classification task. Using a new structure learning heuristic, the resulting models are tested on a mediumvocabulary isolatedword speech recognition task. It is demonstrated that these discriminatively structured dynamic Bayesian multinets, when trained in a maximum likelihood setting using EM, can outperform both HMMs and other dynamic Bayesian networks with a similar number of parameters. 1 Introduction While Markov chains are sometimes a useful model for sequences, such simple independence assumptions can lead...
discriminant model for information retrieval
 In the Proceedings of SIGIR’2005
, 2005
"... This paper presents a new discriminative model for information retrieval (IR), referred to as linear discriminant model (LDM), which provides a flexible framework to incorporate arbitrary features. LDM is different from most existing models in that it takes into account a variety of linguistic featu ..."
Abstract

Cited by 51 (16 self)
 Add to MetaCart
This paper presents a new discriminative model for information retrieval (IR), referred to as linear discriminant model (LDM), which provides a flexible framework to incorporate arbitrary features. LDM is different from most existing models in that it takes into account a variety of linguistic features that are derived from the component models of HMM that is widely used in language modeling approaches to IR. Therefore, LDM is a means of melding discriminative and generative models for IR. We present two algorithms of parameter learning for LDM. One is to optimize the average precision (AP) directly using an iterative procedure. The other is a perceptronbased algorithm that minimizes the number of discordant documentpairs in a rank list. The effectiveness of our approach has been evaluated on the task of ad hoc retrieval using six English and Chinese TREC test sets. Results show that (1) in most test sets, LDM significantly outperforms the stateoftheart language modeling approaches and the classical probabilistic retrieval model; (2) it is more appropriate to train LDM using a measure of AP rather than likelihood if the IR system is graded on AP; and (3) linguistic features (e.g. phrases and dependences) are effective for IR if they are incorporated properly.
A tutorial on energybased learning
 Predicting Structured Data
, 2006
"... EnergyBased Models (EBMs) capture dependencies between variables by associating a scalar energy to each configuration of the variables. Inference consists in clamping the value of observed variables and finding configurations of the remaining variables that minimize the energy. Learning consists in ..."
Abstract

Cited by 42 (6 self)
 Add to MetaCart
EnergyBased Models (EBMs) capture dependencies between variables by associating a scalar energy to each configuration of the variables. Inference consists in clamping the value of observed variables and finding configurations of the remaining variables that minimize the energy. Learning consists in finding an energy function in which observed configurations of the variables are given lower energies than unobserved ones. The EBM approach provides a common theoretical framework for many learning models, including traditional discriminative and generative approaches, as well as graphtransformer networks, conditional random fields, maximum margin Markov networks, and several manifold learning methods. Probabilistic models must be properly normalized, which sometimes requires evaluating intractable integrals over the space of all possible variable configurations. Since EBMs have no requirement for proper normalization, this problem is naturally circumvented. EBMs can be viewed as a form of nonprobabilistic factor graphs, and they provide considerably more flexibility in the design of architectures and training criteria than probabilistic approaches. 1
Maximum Likelihood and Minimum Classification Error Factor Analysis for Automatic Speech Recognition
 IEEE Transactions on Speech and Audio Processing
, 1997
"... Hidden Markov models (HMMs) for automatic speech recognition rely on high dimensional feature vectors to summarize the shorttime properties of speech. Correlations between features can arise when the speech signal is nonstationary or corrupted by noise. We investigate how to model these correlatio ..."
Abstract

Cited by 36 (3 self)
 Add to MetaCart
Hidden Markov models (HMMs) for automatic speech recognition rely on high dimensional feature vectors to summarize the shorttime properties of speech. Correlations between features can arise when the speech signal is nonstationary or corrupted by noise. We investigate how to model these correlations using factor analysis, a statistical method for dimensionality reduction. Factor analysis uses a small number of parameters to model the covariance structure of high dimensional data. These parameters can be chosen in two ways: (i) to maximize the likelihood of observed speech signals, or (ii) to minimize the number of classification errors. We derive an ExpectationMaximization (EM) algorithm for maximum likelihood estimation and a gradient descent algorithm for improved class discrimination. Speech recognizers are evaluated on two tasks, one smallsized vocabulary (connected alphadigits) and one mediumsized vocabulary (New Jersey town names). We find that modeling feature correlations...
Uncertainty decoding for noise robust speech recognition
 in Proc. Interspeech
, 2004
"... This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings ..."
Abstract

Cited by 36 (12 self)
 Add to MetaCart
This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings
What HMMs can do
, 2002
"... Since their inception over thirty years ago, hidden Markov models (HMMs) have have become the predominant methodology for automatic speech recognition (ASR) systems — today, most stateoftheart speech systems are HMMbased. There have been a number of ways to explain HMMs and to list their capabil ..."
Abstract

Cited by 30 (4 self)
 Add to MetaCart
Since their inception over thirty years ago, hidden Markov models (HMMs) have have become the predominant methodology for automatic speech recognition (ASR) systems — today, most stateoftheart speech systems are HMMbased. There have been a number of ways to explain HMMs and to list their capabilities, each of these ways having both advantages and disadvantages. In an effort to better understand what HMMs can do, this tutorial analyzes HMMs by exploring a novel way in which an HMM can be defined, namely in terms of random variables and conditional independence assumptions. We prefer this definition as it allows us to reason more throughly about the capabilities of HMMs. In particular, it is possible to deduce that there are, in theory at least, no theoretical limitations to the class of probability distributions representable by HMMs. This paper concludes that, in search of a model to supersede the HMM for ASR, we should rather than trying to correct for HMM limitations in the general case, new models should be found based on their potential for better parsimony, computational requirements, and noise insensitivity.
On adaptive decision rules and decision parameter adaptation for automatic speech recognition
 Proc. IEEE
, 2000
"... Recent advances in automatic speech recognition are accomplished by designing a plugin maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
Recent advances in automatic speech recognition are accomplished by designing a plugin maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and language training corpora. Maximumlikelihood point estimation is by far the most prevailing training method. However, due to the problems of unknown speech distributions, sparse training data, high spectral and temporal variabilities in speech, and possible mismatch between training and testing conditions, a dynamic training strategy is needed. To cope with the changing speakers and speaking conditions in real operational conditions for highperformance speech recognition, such paradigms incorporate a small amount of speaker and environment specific adaptation data into the training process. Bayesian adaptive learning is an optimal way to combine
Largemargin minimum classification error training for largescale speech recognition tasks
 in Proc. Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP
, 2007
"... Recently, we have developed a novel discriminative training method named largemargin minimum classification error (LMMCE) training that incorporates the idea of discriminative margin into the conventional minimum classification error (MCE) training method. In our previous work, this novel approach ..."
Abstract

Cited by 27 (10 self)
 Add to MetaCart
Recently, we have developed a novel discriminative training method named largemargin minimum classification error (LMMCE) training that incorporates the idea of discriminative margin into the conventional minimum classification error (MCE) training method. In our previous work, this novel approach was formulated specifically for the MCE training using the sigmoid loss function and its effectiveness was demonstrated on the TIDIGITS task alone. In this paper two additional contributions are made. First, we formulate LMMCE as a Bayes risk minimization problem whose loss function not only includes empirical error rates but also a marginbound risk. This new formulation allows us to extend the same technique to a wide variety of MCE based training. Second, we have successfully applied LMMCE training approach to the Microsoft internal large vocabulary telephony speech recognition task (with 2000 hours of training data and 120K of vocabulary) and achieved significant recognition accuracy improvement acrosstheboard. To our best knowledge, this is the first time that the largemargin approach is demonstrated to be successful in largescale speech recognition tasks. Index Terms—minimum classification error training, discriminative training, largemargin learning 1.
Integration of Continuous Speech Recognition and Information Retrieval for Mutually Optimal Performance
 COMPUTER SCIENCE DEPARTMENT, CARNEGIE MELLON UNIVERSITY. HTTP://WWW.CS.CMU.EDU/~MSIEGLER/PUBLISH/PHD/THESIS.PS.GZ SINGHAL
, 1999
"... Traditionally, indexing and searching of speech content in multimedia databases have been achieved through a combination of separately constructed speech recognition and information retrieval engines. Although each technology has a legacy of research, only recently have efforts been made to study th ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
Traditionally, indexing and searching of speech content in multimedia databases have been achieved through a combination of separately constructed speech recognition and information retrieval engines. Although each technology has a legacy of research, only recently have efforts been made to study the potential suboptimality of this strategy, and none of these efforts specifically addresses the presence of uncertainty in automatically generated transcriptions. This research develops a refinement of the most common information retrieval relevance formula, TFIDF, to incorporate uncertainty as a retrieval feature, along with a set of techniques to acquire this uncertainty from multiple hypotheses produced by existing speech recognition data structures. In the process a greater amount of evidence is extracted than is available in the most likely transcription hypothesis, and overall retrieval precision and recall are improved. The term weighting scheme known as the inverse document frequenc...