Results 1 - 10
of
12
A Tree-Trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognition
"... In this paper a new, tree-trellis based fast search for finding the N best sentence hypotheses in continuous speech recognition is proposed. The search consists of two parts: a forward, time-synchronous, trellis search and a backward, time asynchronous, tree search. In the first module the well know ..."
Abstract
-
Cited by 37 (2 self)
- Add to MetaCart
In this paper a new, tree-trellis based fast search for finding the N best sentence hypotheses in continuous speech recognition is proposed. The search consists of two parts: a forward, time-synchronous, trellis search and a backward, time asynchronous, tree search. In the first module the well known Viterbi algorithm is used for finding the best hypothesis and for preparing a map of all partial paths scores time synchronously. In the second module a tree search is used to grow partial paths backward and time asynchronously. Each partial path in the backward tree search is rank ordered in a stack by the corresponding full path score, which is computed by adding the partial path score with the best possible score of the remaining path obtained from the trellis path map. In each path growing cycle, the current best partial path, which is at the top of the stack, is extended by one arc (word). The new tree-trellis search is different from the traditional time synchronous Viterbi search in its ability for finding not just the best but the N-best paths of different word content. The new search is also different from the A * algorithm, or the stack algorithm, in its capability for providing an exact, full path score estimate of any given partial (i.e., incomplete) path before its completion. When compared with the best candidate Viterbi search, the search complexities for finding the N-best strings are rather low, i.e., only a fraction more computation is needed.
Using Self-Organizing Maps and Learning Vector Quantization for Mixture Density Hidden Markov Models
, 1997
"... This work presents experiments to recognize pattern sequences using hidden Markov models (HMMs). The pattern sequences in the experiments are computed from speech signals and the recognition task is to decode the corresponding phoneme sequences. The training of the HMMs of the phonemes using the col ..."
Abstract
-
Cited by 19 (8 self)
- Add to MetaCart
This work presents experiments to recognize pattern sequences using hidden Markov models (HMMs). The pattern sequences in the experiments are computed from speech signals and the recognition task is to decode the corresponding phoneme sequences. The training of the HMMs of the phonemes using the collected speech samples is a difficult task because of the natural variation in the speech. Two neural computing paradigms, the Self-Organizing Map (SOM) and the Learning Vector Quantization (LVQ) are used in the experiments to improve the recognition performance of the models. A HMM consists of sequential states which are trained to model the feature changes in the signal produced during the modeled process. The output densities applied in this work are mixtures of Gaussian density functions. SOMs are applied to initialize and train the mixtures to give a smooth and faithful presentation of the feature vector space defined by the corresponding training samples. The SOM maps similar feature vect...
Integration of Continuous Speech Recognition and Information Retrieval for Mutually Optimal Performance
- COMPUTER SCIENCE DEPARTMENT, CARNEGIE MELLON UNIVERSITY. HTTP://WWW.CS.CMU.EDU/~MSIEGLER/PUBLISH/PHD/THESIS.PS.GZ SINGHAL
, 1999
"... Traditionally, indexing and searching of speech content in multimedia databases have been achieved through a combination of separately constructed speech recognition and information retrieval engines. Although each technology has a legacy of research, only recently have efforts been made to study th ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Traditionally, indexing and searching of speech content in multimedia databases have been achieved through a combination of separately constructed speech recognition and information retrieval engines. Although each technology has a legacy of research, only recently have efforts been made to study the potential suboptimality of this strategy, and none of these efforts specifically addresses the presence of uncertainty in automatically generated transcriptions. This research develops a refinement of the most common information retrieval relevance formula, TFIDF, to incorporate uncertainty as a retrieval feature, along with a set of techniques to acquire this uncertainty from multiple hypotheses produced by existing speech recognition data structures. In the process a greater amount of evidence is extracted than is available in the most likely transcription hypothesis, and overall retrieval precision and recall are improved. The term weighting scheme known as the inverse document frequenc...
Improved Hidden Markov Modeling for Speaker-Independent Continuous Speech Recognition
- Proc. DARPA Speech and Natural language Workshop
, 1990
"... This paper reports recent efforts to further improve the perfor-mance of the Sphinx system for speaker-independent contin-uous speech recognition. The recognition error rate is signifi-cantly reduced with incorporation of additional dynamic fea-tures, semi-continuous hidden Markov models, and speake ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
This paper reports recent efforts to further improve the perfor-mance of the Sphinx system for speaker-independent contin-uous speech recognition. The recognition error rate is signifi-cantly reduced with incorporation of additional dynamic fea-tures, semi-continuous hidden Markov models, and speaker clustering. For the June 1990 (RM2) evaluation test set, the error rates of our current system are 4.3 % and 19.9 % for word-pair grammar and no grammar respectively.
Corrective Tuning By Applying Lvq For Continuous Density And Semi-Continuous Markov Models
- In Proceedings of International Symposium on Speech, Image Processing and Neural Networks
"... In this work the objective is to increase the accuracy of speaker dependent phonetic transcription of spoken utterances using continuous density and semi-continuous HMMs. Experiments with LVQ based corrective tuning indicate that the average recognition error rate can be made to decrease about 5% -- ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
In this work the objective is to increase the accuracy of speaker dependent phonetic transcription of spoken utterances using continuous density and semi-continuous HMMs. Experiments with LVQ based corrective tuning indicate that the average recognition error rate can be made to decrease about 5% -- 10%. Experiments are also made to increase the efficiency of the Viterbi decoding by a discriminative approximation of the output probabilities of the states in the Markov models. Using only a few nearest components of the mixture density functions instead of every component decreases both the recognition error rate (5% -- 10% for CDHMMs) and the execution time (about 50% for SCHMMs). The lowest average error rates achieved were about 5.6%. 1 INTRODUCTION Several suggestions have been recently published, describing training methods for HMMs using the minimization of the number of misclassifications directly as a training criterium. A formal way to realize this criterium is to define a cont...
Training Mixture Density HMMs with SOM and LVQ
, 1997
"... ¯ The objective of this paper is to present experiments and discussions of how some neural network algorithms can help the phoneme recognition with mixture density hidden Markov models (MDHMMs). In MDHMMs the modeling of the stochastic observation processes associated with the states is based on the ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
¯ The objective of this paper is to present experiments and discussions of how some neural network algorithms can help the phoneme recognition with mixture density hidden Markov models (MDHMMs). In MDHMMs the modeling of the stochastic observation processes associated with the states is based on the estimation of the probability density function of the short-time observations in each state as a mixture of Gaussian densities. The Learning Vector Quantization (LVQ) is used to increase the discrimination between dioeerent phoneme models both during the initialization of the Gaussian codebooks and during the actual MDHMM training. The Self-Organizing Map (SOM) is applied to provide a suitably smoothed mapping of the training vectors to accelerate the convergence of the actual training. The obtained codebook topology can also be exploited in the recognition phase to speed up the calculations to approximate the observation probabilities. The experiments with LVQ and SOMs show reductions both...
Boosting Word Error Rates
"... We apply boosting techniques to the problem of word error rate minimisation in speech recognition. This is achieved through a new definition of sample error for boosting and a training procedure for hidden Markov models. For this purpose we define a sample error for sentence examples related to the ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
We apply boosting techniques to the problem of word error rate minimisation in speech recognition. This is achieved through a new definition of sample error for boosting and a training procedure for hidden Markov models. For this purpose we define a sample error for sentence examples related to the word error rate. Furthermore, for each sentence example we define a probability distribution in time that represents our belief that an error has been made at that particular frame. This is used to weigh the frames of each sentence in the boosting framework. We present preliminary results on the well-known Numbers 95 database that indicate the importance of this temporal probability distribution.
Minimum-risk training of approximate CRF-based NLP systems
- In Proceedings of NAACL
, 2012
"... Conditional Random Fields (CRFs) are a popular formalism for structured prediction in NLP. It is well known how to train CRFs with certain topologies that admit exact inference, such as linear-chain CRFs. Some NLP phenomena, however, suggest CRFs with more complex topologies. Should such models be u ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Conditional Random Fields (CRFs) are a popular formalism for structured prediction in NLP. It is well known how to train CRFs with certain topologies that admit exact inference, such as linear-chain CRFs. Some NLP phenomena, however, suggest CRFs with more complex topologies. Should such models be used, considering that they make exact inference intractable? Stoyanov et al. (2011) recently argued for training parameters to minimize the task-specific loss of whatever approximate inference and decoding methods will be used at test time. We apply their method to three NLP problems, showing that (i) using more complex CRFs leads to improved performance, and that (ii) minimumrisk training learns more accurate models. 1
unknown title
"... We apply boosting techniques to the problem of word error rate minimisation in speech recognition. This is achieved through a new definition of sample error for boosting and a training procedure for hidden Markov models. For this purpose we define a sample error for sentence examples related to the ..."
Abstract
- Add to MetaCart
We apply boosting techniques to the problem of word error rate minimisation in speech recognition. This is achieved through a new definition of sample error for boosting and a training procedure for hidden Markov models. For this purpose we define a sample error for sentence examples related to the word error rate. Furthermore, for each sentence example we define a probability distribution in time that represents our belief that an error has been made at that particular frame. This is used to weigh the frames of each sentence in the boosting framework. We present preliminary results on the well-known Numbers 95 database that indicate the importance of this temporal probability distribution. 1.
ISADORA - a Speech Modelling Network Based on Hidden Markov Models
- on Hidden Markov Models. Computer Speech & Language
, 1993
"... In this paper we present the ISADORA system which provides highly flexible speech recognition based on HMM technology together with an hierarchical representation of speech units. Markov model topologies, subword unit inventories, regular grammars expressed in finite-state or phrase structure style, ..."
Abstract
- Add to MetaCart
In this paper we present the ISADORA system which provides highly flexible speech recognition based on HMM technology together with an hierarchical representation of speech units. Markov model topologies, subword unit inventories, regular grammars expressed in finite-state or phrase structure style, and even the analysis tasks themselves are explicitly represented by the nodes of a large speech unit network. Thus, nothing that can be "said in the language of Markov models" needs to be hard-wired in the program code. In contrast to traditional compiled network recognizers, units, grammars, and tasks may be created or modified at analysis time, and the outcome of the decoding process is a structured symbolic description of the sensory input. Our architecture has proven extremely useful in prototyping new kinds of subword units. Besides generalized triphones and context-freezing units, a new subword speech unit for automatic speech recognition has been implemented. The so-called polyphone...

