Results 1 -
9 of
9
From HMM's to Segment Models: A Unified View of Stochastic Modeling for Speech Recognition
, 1996
"... ..."
Graphical models and automatic speech recognition
- Mathematical Foundations of Speech and Language Processing
, 2003
"... Graphical models provide a promising paradigm to study both existing and novel techniques for automatic speech recognition. This paper first provides a brief overview of graphical models and their uses as statistical models. It is then shown that the statistical assumptions behind many pattern recog ..."
Abstract
-
Cited by 49 (10 self)
- Add to MetaCart
Graphical models provide a promising paradigm to study both existing and novel techniques for automatic speech recognition. This paper first provides a brief overview of graphical models and their uses as statistical models. It is then shown that the statistical assumptions behind many pattern recognition techniques commonly used as part of a speech recognition system can be described by a graph – this includes Gaussian distributions, mixture models, decision trees, factor analysis, principle component analysis, linear discriminant analysis, and hidden Markov models. Moreover, this paper shows that many advanced models for speech recognition and language processing can also be simply described by a graph, including many at the acoustic-, pronunciation-, and language-modeling levels. A number of speech recognition techniques born directly out of the graphical-models paradigm are also surveyed. Additionally, this paper includes a novel graphical analysis regarding why derivative (or delta) features improve hidden Markov model-based speech recognition by improving structural discriminability. It also includes an example where a graph can be used to represent language model smoothing constraints. As will be seen, the space of models describable by a graph is quite large. A thorough exploration of this space should yield techniques that ultimately will supersede the hidden Markov model.
Production Models As A Structural Basis For Automatic Speech Recognition
, 1996
"... We postulate in this paper that highly structured speech production models will have much to contribute to the ultimate success of speech recognition in view of the weaknesses of the theoretical foundation underpinning current technology. These weaknesses are analyzed in terms of phonological modeli ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
We postulate in this paper that highly structured speech production models will have much to contribute to the ultimate success of speech recognition in view of the weaknesses of the theoretical foundation underpinning current technology. These weaknesses are analyzed in terms of phonological modeling and of phonetic-interface modeling. We conclude by suggesting that many of the advantages to be gained from interaction between speech production and speech recognition communities will develop from integrating models from the production community with the probabilistic analysis-by-synthesis strategy currently used by the technology community. R ' ESUM ' EE Dans cet article, nous proposons que les mod`eles de production de la parole contribueront beaucoup `a la r'eussite eventuelle des mod`eles de reconnaissance automatique, limit'es en ce moment par les faiblesses de la base th'eorique de la technologie actuelle. Nous analysons ces faiblesses au niveau des mod`eles phonologiques et mod`...
What HMMs can do
, 2002
"... Since their inception over thirty years ago, hidden Markov models (HMMs) have have become the predominant methodology for automatic speech recognition (ASR) systems — today, most state-of-the-art speech systems are HMM-based. There have been a number of ways to explain HMMs and to list their capabil ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
Since their inception over thirty years ago, hidden Markov models (HMMs) have have become the predominant methodology for automatic speech recognition (ASR) systems — today, most state-of-the-art speech systems are HMM-based. There have been a number of ways to explain HMMs and to list their capabilities, each of these ways having both advantages and disadvantages. In an effort to better understand what HMMs can do, this tutorial analyzes HMMs by exploring a novel way in which an HMM can be defined, namely in terms of random variables and conditional independence assumptions. We prefer this definition as it allows us to reason more throughly about the capabilities of HMMs. In particular, it is possible to deduce that there are, in theory at least, no theoretical limitations to the class of probability distributions representable by HMMs. This paper concludes that, in search of a model to supersede the HMM for ASR, we should rather than trying to correct for HMM limitations in the general case, new models should be found based on their potential for better parsimony, computational requirements, and noise insensitivity.
Probabilistic-trajectory Segmental HMMs. Computer Speech and Language
, 1999
"... “Segmental hidden Markov models ” (SHMMs) are intended to overcome important speech-modelling limitations of the conventional-HMM approach by representing sequences (or segments) of features and incorporating the concept of trajectories to describe how features change over time. A novel feature of t ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
“Segmental hidden Markov models ” (SHMMs) are intended to overcome important speech-modelling limitations of the conventional-HMM approach by representing sequences (or segments) of features and incorporating the concept of trajectories to describe how features change over time. A novel feature of the approach presented in this paper is that extra-segmental variability between different examples of a sub-phonemic speech segment is modelled separately from intra-segmental variability within any one example. The extra-segmental component of the model is represented in terms of variability in the trajectory parameters, and these models are therefore referred to as “probabilistic-trajectory segmental HMMs ” (PTSHMMs). This paper presents the theory of PTSHMMs using a linear trajectory description characterized by slope and mid-point parameters, and presents theoretical and experimental comparisons between different types of PTSHMMs, simpler SHMMs and conventional HMMs. Experiments have demonstrated that, for any given feature set, a linear PTSHMM can substantially reduce the error rate in comparison with a conventional HMM, both for a connected-digit recognition task and for a phonetic classification task. Performance benefits have been demonstrated from incorporating a linear trajectory description and additionally from modelling variability in the mid-point parameter. c ○ 1999 British Crown Copyright/DERA 1.
Structured speech modeling
- IEEE Transactions on Audio, Speech and Language Processing (Special Issue on Rich Transcription
, 2006
"... Abstract—Modeling dynamic structure of speech is a novel paradigm in speech recognition research within the generative modeling framework, and it offers a potential to overcome limitations of the current hidden Markov modeling approach. Analogous to structured language models where syntactic structu ..."
Abstract
-
Cited by 19 (11 self)
- Add to MetaCart
Abstract—Modeling dynamic structure of speech is a novel paradigm in speech recognition research within the generative modeling framework, and it offers a potential to overcome limitations of the current hidden Markov modeling approach. Analogous to structured language models where syntactic structure is exploited to represent long-distance relationships among words [5], the structured speech model described in this paper makes use of the dynamic structure in the hidden vocal tract resonance space to characterize long-span contextual influence among phonetic units. A general overview is provided first on hierarchically classified types of dynamic speech models in the literature. A detailed account is then given for a specific model type called the hidden trajectory model, and we describe detailed steps of model construction and the parameter estimation algorithms. We show how the use of resonance target parameters and their temporal filtering enables joint modeling of long-span coarticulation and phonetic reduction effects. Experiments on phonetic recognition evaluation demonstrate superior recognizer performance over a modern hidden Markov model-based system. Error analysis shows that the greatest performance gain occurs within the sonorant speech class. Index Terms—Hidden dynamics, hidden trajectory, long span modeling, maximum-likelihood, nonlinear prediction, parameter learning, structured modeling, vocal tract resonance. I.
A Maximum-entropy Solution to the Frame-dependency Problem in Speech Recognition
, 2001
"... The HMM assumption of conditional independence of observations causes a variety of problems for speech-recognition applications. Previous attempts to construct acoustic models that remove this assumption have suffered from a significant increase in the number of parameters to train. Another weakness ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
The HMM assumption of conditional independence of observations causes a variety of problems for speech-recognition applications. Previous attempts to construct acoustic models that remove this assumption have suffered from a significant increase in the number of parameters to train. Another weakness of current acoustic models is that they do not account for the origin of derived features (estimated derivatives). We show how to both remove the independence assumption and properly account for derived features, with little or no increase in the number of parameters to train, by applying the principle of maximum entropy. We also show that ignoring the origins of derived features in training HMM acoustic models can lead to severe distortions of the effective language model. Evaluation of our maxent model on a simple problem cuts an already-low error rate in half compared to an equivalent HMM with the same number of parameters.
The Performance of Automated Speech Recognition Systems Under Adverse Conditions of Human Exertion
- International Journal of HCI
, 2003
"... Research was conducted to determine if a relation exists between human exertion and the ability of speech recognition software to correctly recognize human speech. Participants were asked to use voice recognition technology to input a short newspaper article in 3 portions. 1 portion of the selected ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Research was conducted to determine if a relation exists between human exertion and the ability of speech recognition software to correctly recognize human speech. Participants were asked to use voice recognition technology to input a short newspaper article in 3 portions. 1 portion of the selected article was read while the participants were rested, another portion while they were lightly exerted, and the final portion while they were experiencing hard exertion. Recognition percentages were computed and compared for rested, lightly exerted, and moderately hard exerted states. The results identified a negative linear relation between physical exertion and recognition accuracy; the higher the level of exertion, the lower the accuracy rate. 1.
Extending the standard Hidden Markov Models to include local time correlation in speech data - A summary of literature review
, 1999
"... In this report, various techniques employed in extending the standard HMMs to explicifiy model the local time correlation in speech data are reviewed. The aim of this review is to compare and contrast various techniques by relating each technique to the rest in a proper framework. Critical evaluatio ..."
Abstract
- Add to MetaCart
In this report, various techniques employed in extending the standard HMMs to explicifiy model the local time correlation in speech data are reviewed. The aim of this review is to compare and contrast various techniques by relating each technique to the rest in a proper framework. Critical evaluations which highlight the strengths and weaknesses of each technique are also reported. This study will give a clearer picture and better understanding of works done in this area and finally help to identify potential areas for future research.

