Results 1 - 10
of
12
From HMM's to Segment Models: A Unified View of Stochastic Modeling for Speech Recognition
, 1996
"... ..."
Speaker Adaptation Using Constrained Estimation of Gaussian Mixtures
- IEEE Transactions on Speech and Audio Processing
, 1995
"... A recent trend in automatic speech recognition systems is the use of continuous mixture-density hidden Markov models (HMMs). Despite the good recognition performance that these systems achieve on average in large vocabulary applications, there is a large variability in performance across speakers. P ..."
Abstract
-
Cited by 65 (2 self)
- Add to MetaCart
A recent trend in automatic speech recognition systems is the use of continuous mixture-density hidden Markov models (HMMs). Despite the good recognition performance that these systems achieve on average in large vocabulary applications, there is a large variability in performance across speakers. Performance degrades dramatically when the user is radically different from the training population. A popular technique that can improve the performance and robustness of a speech recognition system is adapting speech models to the speaker, and more generally to the channel and the task. In continuous mixture-density HMMs the number of component densities is typically very large, and it may not be feasible to acquire a sufficient amount of adaptation data for robust maximum-likelihood estimates. To solve this problem, we propose a constrained estimation technique for Gaussian mixture densities. The algorithm is evaluated on the large-vocabulary Wall Street Journal corpus for both ...
The Applications of Genetic Algorithms in Cryptanalysis
, 1996
"... This thesis describes a method of deciphering messages encrypted with rotor machines utilising a Genetic Algorithm to search the keyspace. A fitness measure based on the phi test for non randomness of text is described and the results show that an unknown three rotor machine can generally be cryptan ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This thesis describes a method of deciphering messages encrypted with rotor machines utilising a Genetic Algorithm to search the keyspace. A fitness measure based on the phi test for non randomness of text is described and the results show that an unknown three rotor machine can generally be cryptanalysed with about 4000 letters of ciphertext. The results are compared to those given using a previously published technique and found to be superior. Acknowledgements I would like to thank my supervisors, Vic Rayward-Smith and Geoff McKeown, for their help and encouragement. Contents 1 Introduction 8 2 Statistical Inference 10 2.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10 2.2 Uncertainty : : : : : : : : : : : : : : : : : : : : : : : : : : : : 11 2.2.1 Rules of Probability : : : : : : : : : : : : : : : : : : : 12 2.2.2 Frequency Probability : : : : : : : : : : : : : : : : : : 15 2.2.3 Subjective Probability : : : : : : : : : : : : : : : : : : 15 2.3 Modelling...
Acoustic Model Clustering Based on Syllable Structure
, 2002
"... Current speech recognition systems perform poorly on conversational speech as compared to read speech, arguably due to the large acoustic variability inherent in conversational speech. Our hypothesis is that there are systematic effects in local context, associated with syllabic structure, that are ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Current speech recognition systems perform poorly on conversational speech as compared to read speech, arguably due to the large acoustic variability inherent in conversational speech. Our hypothesis is that there are systematic effects in local context, associated with syllabic structure, that are not being captured in the current acoustic models. Such variation may be modeled using a broader definition of context than in traditional systems which restrict context to be the neighboring phonemes. In this paper, we study the use of word- and syllable-level context conditioning in recognizing conversational speech. We describe a method to extend standard tree-based clustering to incorporate a large number of features, and we report results on the Switchboard task which indicate that syllable structure outperforms pentaphones and incurs less computational cost. It has been hypothesized that previous work in using syllable models for recognition of English was limited because of ignoring the phenomenon of re-syllabification (change of syllable structure at word boundaries), but our analysis shows that accounting for re-syllabification does not impact recognition performance.
the State Based Mixture of Expert HMM with Applications to the Recognition of Spontaneous Speech
, 2001
"... Dissertation submitted to the University of Cambridge for the degree of Doctor of Philosophy Although the performance of speech recognition systems has increased substantially over the last decades, there still remain a number of tasks which pose considerable problems for current state-of-the-art te ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Dissertation submitted to the University of Cambridge for the degree of Doctor of Philosophy Although the performance of speech recognition systems has increased substantially over the last decades, there still remain a number of tasks which pose considerable problems for current state-of-the-art techniques. One of these tasks is the recognition of spontaneous speech which differs from read or planned speech in that its underlying dynamics change frequently over time. The negative effect of changes in acoustic background condition on recognition performance can also be observed in other situations as, for instance, in the case of speech that is corrupted by non-stationary noise. This thesis is concerned with the development of an acoustic model for speech recognition which automatically detects changes in the background condition of a signal and compensates for the model-data mismatch by combining the information of several expert models. These experts are specialised on the different acoustic conditions under consideration and their influ-ence on the recognition process is determined by how well their associated condition matches
PUBLISHED AS
, 2003
"... for his love and his continuous support in good and bad times throughout this thesis To Laura Lou for her smiles and the energy they gave me when I needed it most To my parents for their perspective about the relative importance of a thesis and other things in life ii State-of-the-art automatic spee ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
for his love and his continuous support in good and bad times throughout this thesis To Laura Lou for her smiles and the energy they gave me when I needed it most To my parents for their perspective about the relative importance of a thesis and other things in life ii State-of-the-art automatic speech recognition (ASR) techniques are typically based on hidden Markov models (HMMs) for the modeling of temporal sequences of feature vectors extracted from the speech
The Effect of Rare Events on the Evaluation and Decoding of Hidden non-Markovian Models
"... Abstract. Hidden non-Markovian Models (HnMM) are a modeling paradigm based on Hidden Markov Models. They extend the existing Hidden Markov Models by changing the hidden model from a memoryless discrete-time Markov chain to a more flexible discrete stochastic model involving time dependent transition ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. Hidden non-Markovian Models (HnMM) are a modeling paradigm based on Hidden Markov Models. They extend the existing Hidden Markov Models by changing the hidden model from a memoryless discrete-time Markov chain to a more flexible discrete stochastic model involving time dependent transition rates. HnMM enable the analysis of not completely observable real systems only based on their interaction with the environment. Possible application areas of HnMM are the analysis of machine behavior based on protocol data, or diagnosis support of a disease based on symptoms of a patient; both of which are hard or impossible to answer using existing modeling paradigms. Some of the questions that can be answered using HnMM touch failure or unusual system behavior, which might both involve rare events. This paper is aimed at investigating the effect of the existence of rare events on HnMM and on the performance of the analysis methods. By doing this, we want to test the applicability of HnMM to the analysis of problems involving rare events. 1
Clustering Wide-Contexts and HMM Topologies for Spontaneous Speech Recognition
, 2001
"... In most speech recognition systems today, all the acoustic variation associated with a phoneme is characterized in terms of the identity of its neighboring phonemes. The neighbors influence only the state observation density of a fixed Hidden Markov Model. Other sources of variation are captured imp ..."
Abstract
- Add to MetaCart
In most speech recognition systems today, all the acoustic variation associated with a phoneme is characterized in terms of the identity of its neighboring phonemes. The neighbors influence only the state observation density of a fixed Hidden Markov Model. Other sources of variation are captured implicitly by using Gaussian mixture models for the state observations. Consequently, these models can be very broad, particularly for casual spontaneous speech. In this thesis, we explore conditioning of phonemes on higher level linguistic structure, specifically syllable- and word-level structure to learn models for phonemes that are more specific to the context, reporting experimental results on a large vocabulary (35k words) conversational speech task (Switchboard). In particular, this thesis makes three main contributions related to wide context conditioning. First, we demonstrate that syllable- and word-level structure can be incorporated into current acoustic models to improve recognition accuracy over triphones. For a fixed number of parameters, these models are computationally more efficient than pentaphones, both in training and in testing. In addition, use of syllable and word features leads to a small but significant improvement in performance. The wide-contexts used in our acoustic model can implicitly capture re-syllabification effects to a certain extent. However, we find that explicitly modeling re-syllabification does not improve recognition further, because there are only a small number of phones that exhibit acoustic difference after re-syllabification. The second contribution addresses the difficulties that arise when a large number of additional conditioning features are used. As the number of conditioning features increases, the training cost can increase exponentially. Moreover, a large fraction of the training labels tends to have too few examples to have reliable statistics associated with them, and this could potentially cause decision trees to learn bad clusters. A new method has been developed for clustering with multiple stages, where each stage clusters a different subset of features, and also has a choice of using the partitions learned in the previous stages. Apart from reducing the risk of unreliable statistics, it is designed to ameliorate data fragmentation problem and is computationally less expensive. This method was successfully demonstrated with pentaphones, resulting in equivalent performance at a lower cost. Finally, a new algorithm is described to design context-specific HMMs. The idea is to model reduction of a phone for certain contexts, and to learn a more constrained topology. Using contextual information, the algorithm clusters HMM paths where each path has a different number of states. An HMM distance measure has been formulated to prune out the paths which are similar. During decoding, the paths are allocated dynamically for each sub-word unit according to their context. We investigated this algorithm to model phone topologies, finding improved characterization of speech given known word sequences but no significant improvement in word error rate.
A Complete Procedure For Estimating Hidden Markov Models with Application in Locating Structural Breaks
, 2008
"... Testing for structural breaks and identifying their location is essential for econometric modeling. In this paper, a Hidden Markov Model (HMM) approach is used in order to perform these tasks. Breaks are defined as the data points where the underlying Markov Chain switches from one state to another. ..."
Abstract
- Add to MetaCart
Testing for structural breaks and identifying their location is essential for econometric modeling. In this paper, a Hidden Markov Model (HMM) approach is used in order to perform these tasks. Breaks are defined as the data points where the underlying Markov Chain switches from one state to another. The estimation of the HMM is conducted using a variant of the Iterative Conditional Expectation-Generalized Mixture (ICE-GEMI) algorithm proposed by Delignon et al. (1997), that permits analysis of the conditional distributions of economic data and allows for different functional forms across regimes. The locations of the breaks are subsequently obtained by assigning states to data points according to the Maximum Posterior Mode (MPM) algorithm. The Integrated Classification Likelihood-Bayesian Information Criterion (ICL-BIC) allows for the determination of the number of regimes by taking into account the classification of the data points to their corresponding regimes. The performance of the overall procedure, denoted IMI by the initials of the component algorithms, is validated by two sets of simulations; one in which only the parameters are permitted to differ across regimes, and one that also permits differences in the functional forms. The IMI method performs well in both sets. Moreover, when it is compared to the Bai and Perron (1998) method, which is plausible for the first set of simulation, its performance is superior in the assessing the number of breaks and their respective locations.

