Results 1 - 10
of
10
From HMM's to Segment Models: A Unified View of Stochastic Modeling for Speech Recognition
, 1996
"... ..."
Connectionist Probability Estimation in HMM Speech Recognition
- IEEE Transactions on Speech and Audio Processing
, 1992
"... This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Herve Bourlard. We review the basis of HMM speech ..."
Abstract
-
Cited by 45 (9 self)
- Add to MetaCart
This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Herve Bourlard. We review the basis of HMM speech recognition, and point out the possible benefits of incorporating connectionist networks. We discuss some issues necessary to the construction of a connectionist HMM recognition system, and describe the performance of such a system, including evaluations on the DARPA database, in collaboration with Mike Cohen and Horacio Franco of SRI International. In conclusion, we show that a connectionist component improves a state of the art HMM system. ii Part I INTRODUCTION Over the past few years, connectionist models have been widely proposed as a potentially powerful approach to speech recognition (e.g. Makino et al. (1983), Huang et al. (1988) and Waibel et al. (1989)). However, whilst connec...
Speech Recognition using Neural Networks
, 1995
"... This thesis examines how artificial neural networks can benefit a large vocabulary, speaker independent, continuous speech recognition system. Currently, most speech recognition systems are based on hidden Markov models (HMMs), a statistical framework that supports both acoustic and temporal modelin ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
This thesis examines how artificial neural networks can benefit a large vocabulary, speaker independent, continuous speech recognition system. Currently, most speech recognition systems are based on hidden Markov models (HMMs), a statistical framework that supports both acoustic and temporal modeling. Despite their state-of-the-art performance, HMMs make a number of suboptimal modeling assumptions that limit their potential effectiveness. Neural networks avoid many of these assumptions, while they can also learn complex functions, generalize effectively, tolerate noise, and support parallelism. While neural networks can readily be applied to acoustic modeling, it is not yet clear how they can be used for temporal modeling. Therefore, we explore a class of systems called NN-HMM hybrids, in which neural networks perform acoustic modeling, and HMMs perform temporal modeling. We argue that a NN-HMM hybrid has several theoretical advantages over a pure HMM system, including better acoustic ...
Speech Processing with Linear and Neural Network Models
, 1996
"... ion, for imposing continuity between models of adjacent speech segments, and learning rate adaptation, for improving back-propagation training, are discussed. For synthesising real speech utterances, an audio tape demonstrates that ARX models produce the highest quality synthetic speech and that the ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
ion, for imposing continuity between models of adjacent speech segments, and learning rate adaptation, for improving back-propagation training, are discussed. For synthesising real speech utterances, an audio tape demonstrates that ARX models produce the highest quality synthetic speech and that the quality is maintained when pitch modifications are applied. The second part of the dissertation studies the operation of recurrent neural networks in classifying patterns of correlated feature vectors. Such patterns are typical of speech classification tasks. The operation of a hidden node with a recurrent connection is explained in terms of a decision boundary which changes position in feature space. The feedback is shown to delay switching from one class to another and to smooth output decisions for sequences of feature vectors from the same class. For networks trained with constant class targets, a sequence of feature vectors from the same class tends to drive the operation of hidden nod
AN SVM FRONT-END LANDMARK SPEECH RECOGNITION SYSTEM
, 2008
"... Support vector machines (SVMs) can be trained to detect manner transitions between phones and to identify the manner and place of articulation of any given phone. The SVMs can perform these tasks with high accuracy using a variety of acoustic representations. The SVMs generalize well to unseen test ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Support vector machines (SVMs) can be trained to detect manner transitions between phones and to identify the manner and place of articulation of any given phone. The SVMs can perform these tasks with high accuracy using a variety of acoustic representations. The SVMs generalize well to unseen test data if these data were created under identical conditions to the training corpus. Unseen acoustic data from different corpora present a problem for the SVM, even if these acoustic data were generated under similar conditions. The discriminant outputs of these SVMs are used to create both a hybrid SVM/HMM (hidden Markov model) phone recogni-tion system and a hybrid SVM/HMM word recognition system. There is a significant improvement in both phone and word recognition accuracy when these SVM discrim-inant features are used instead of mel frequency cepstral coefficients (MFCCs).
Multi-State Predictive Neural Networks For Textindependent Speaker Recognition
, 1995
"... Both Hidden Markov Models and Neural Networks have already been used as production systems for speaker identification or verification. Recently [9] has shown that ergodic multi-state hidden Markov Models do not outperform one-state "hidden" Markov Models, i.e. Gaussian Mixture Models, for speaker re ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Both Hidden Markov Models and Neural Networks have already been used as production systems for speaker identification or verification. Recently [9] has shown that ergodic multi-state hidden Markov Models do not outperform one-state "hidden" Markov Models, i.e. Gaussian Mixture Models, for speaker recognition. She put in evidence that the important characteristic of these models is the total number of mixtures and not the number of states. These HMMs are thus unable to make use of temporal information for performing speaker recognition. On the other hand, recent experiments have shown that, for neural predictive systems, modelization of non stationarity allowed to significantly improve the performances [6]. We are interested here in the development of such models which will be refereed to as multi-state predictive neural networks (MSPNNs). We study the ability of these systems for speaker identification and discuss the superiority of multi-state upon one-state models. We provide results...
Predictive Systems for Speaker Identification: Heuristics for Model Selection.
, 1995
"... Introduction Speaker recognition methods rely either on the direct classification of speech characteristics or on the modelization of speech utterances. Classification based techniques extract discriminant information from the signal, they may offer high performances even for large populations. How ..."
Abstract
- Add to MetaCart
Introduction Speaker recognition methods rely either on the direct classification of speech characteristics or on the modelization of speech utterances. Classification based techniques extract discriminant information from the signal, they may offer high performances even for large populations. However, since they do not allow to easily take into consideration new speakers, they are not adequate when the population may change frequently and is not limited to a fixed set. This is not a limitation for modelization approaches which attempt to identify the speech production system independently for each speaker. Several approaches from the second group have been proposed over the years: Vector Quantization has been the most popular technique, more recently Hidden Markov Models (HMM) or Predictive Neural Networks (PNN) systems have been developed. Although the may behave differently, HMMs and PNNs share many similarities. The main limitation of the modelization approach lies in the
Neural Networks In Speech Recognition
"... We review some of the Artificial Neural Network (ANN) approaches used in speech recognition. Some basic principles of neural networks are briefly described as well as their current applications and performances in speech recognition. Strenghtnesses and weaknesses of pure connectionnist networks in t ..."
Abstract
- Add to MetaCart
We review some of the Artificial Neural Network (ANN) approaches used in speech recognition. Some basic principles of neural networks are briefly described as well as their current applications and performances in speech recognition. Strenghtnesses and weaknesses of pure connectionnist networks in the particular context of the speech signal are then evoqued. The emphasis is put on the capabilities of connectionnist methods to improve the performances of the Hidden Markov Model approach (HMM). Some of the principles that govern the socalled hybrid HMM-ANN approach are then briefly explained. Some recent combinations of stochastic models and ANNs known as the Hidden Control Neural Networks are also presented. 1. Introduction It has been a long standing technical challenge to let machines acquire and expand certain human abilities. Maybe the most difficult tasks, apart from the higher level functions of the brain, are speech and image recognition. Modern computers outperform the human bra...
Recognition of Consonant-Vowel (CV) Utterances Using Modular Neural Network Models
, 2000
"... Development of suitable models for recognition of subword units is important for realization of vocabulary independent speech recognition systems. Syllable-like subword units, such as diphones and triphones, have been used for recognition of speech in English. In Indian languages, the Consonant-Vowe ..."
Abstract
- Add to MetaCart
Development of suitable models for recognition of subword units is important for realization of vocabulary independent speech recognition systems. Syllable-like subword units, such as diphones and triphones, have been used for recognition of speech in English. In Indian languages, the Consonant-Vowel (CV) type syllable units occur frequently. These are also the basic units of speech production. Therefore CVs are useful as subword units for Indian languages. Pronunciation variation can also be modeled for CV type syllable units. The focus of this thesis work is on development of models for recognition of CV units of speech in Indian languages. Recognition of ConsonantVowel (CV) utterances in Indian languages is a challenging task because of the large number of classes and the high confusability among several classes. Approaches for CV recognition should be based on the classification models capable of good discrimination among several classes. Multilayer perceptron models have been commonly used for complex pattern classification tasks. These models are suitable for classi cation of xed length patterns. The patterns are extracted from varying duration segments of CV utterances using Vowel Onset Points (VOPs) as anchor points. Performance of a neural network based approach for detection of VOPs is studied for different categories of consonants. Any deviation in the VOP from the actual VOP will result in extraction of a misaligned pattern and affect the CV classi cation performance. Effect of deviation in detection of VOPs on CV classification is also studied. A modular
Predictive DP Matching for On-Line Character Recognition
"... For on-line character recognition, predictive DP matching is proposed where two physically different features, coordinate features and directional features, are handled in a unified manner. For this unification, the distance of the directional features is converted into a distance of the coordinate ..."
Abstract
- Add to MetaCart
For on-line character recognition, predictive DP matching is proposed where two physically different features, coordinate features and directional features, are handled in a unified manner. For this unification, the distance of the directional features is converted into a distance of the coordinate features by a feature prediction technique. An experimental result showed that the predictive DP matching could attain a recognition rate comparable to the rate by the conventional DP matching which requires the costly optimization of the weight to balance the two features. 1.

