Results 1 -
8 of
8
Combining ANNs To Improve Phone Recognition
- ICASSP
, 1997
"... In applying neural networks to speech recognition, one often finds that slightly different training configurations lead to significantly different networks. Thus different training sessions using different setups will likely end up in "mixed" network configurations representing different solutions i ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In applying neural networks to speech recognition, one often finds that slightly different training configurations lead to significantly different networks. Thus different training sessions using different setups will likely end up in "mixed" network configurations representing different solutions in different regions of the data space. This sensitivity to the initial weights assigned, the training parameters and the training data can be used to enhance performance, using a committee of neural networks. In this paper, we study various ways to combine context-dependent (CD) and context -independent (CI) neural network phone estimators to improve phone recognition. As a result, we obtain 6.3% and 2.2% increase in accuracy in phone recognition using monophones and biphones respectively. 1. INTRODUCTION In the past decade, a number of connectionist approaches have enabled a new computing paradigm for speech recognition with some success [1, 6, 12, 15]. In these ANN-based speech recognizer...
Accent Clustering in Swedish Using the Bhattacharyya Distance
- Proc. International Congress of Phonetic Sciences (ICPhS
, 2003
"... In an attempt to improve automatic speech recognition (ASR) models for Swedish, accent variations were considered. These have proved to be important variables in the statistical distribution of the acoustic features usually employed in ASR. The analysis of feature variability have revealed phenomena ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In an attempt to improve automatic speech recognition (ASR) models for Swedish, accent variations were considered. These have proved to be important variables in the statistical distribution of the acoustic features usually employed in ASR. The analysis of feature variability have revealed phenomena that are consistent with what is known from phonetic investigations, suggesting that a consistent part of the information about accents could be derived form those features. A graphical interface has been developed to simplify the visualization of the geographical distributions of these phenomena. 1
Towards A Compact Speech Recognizer: Subspace Distribution Clustering Hidden Markov Model
, 1998
"... : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xiii 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 The Problem: Too Many Parameters : : : : : : : : : : : : : : : : : : : : : : 3 1.2 Proposed Solution: It Is Time to ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xiii 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 The Problem: Too Many Parameters : : : : : : : : : : : : : : : : : : : : : : 3 1.2 Proposed Solution: It Is Time to Share More! : : : : : : : : : : : : : : : : : 4 1.3 Thesis Summary and Outline : : : : : : : : : : : : : : : : : : : : : : : : : : 6 2 Review of Acoustic Modeling Using Hidden Markov Model : : : : : : : 9 2.1 Speech Characteristics : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 2.2 Selection of Input Speech Space and Speech Model : : : : : : : : : : : : : : 10 2.2.1 Cepstral Input : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10 2.2.2 Hidden Markov Model : : : : : : : : : : : : : : : : : : : : : : : : : : 11 2.2.3 Our Choice of HMM for Acoustic Modeling : : : : : : : : : : : : : : 14 2.3 Speech Unit to Model : : : : : : : : : : : : : : : : : : : : : : : : : : ...
Stream Derivation And Clustering Scheme For Subspace Distribution Clustering Hidden Markov Model
- in Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop
, 1997
"... In [1], our novel subspace distribution clustering hidden Markov model (SDCHMM) made its debut as an approximation to continuous density HMM(CDHMM). Deriving SDCHMMs from CDHMMs requires a definition of multiple streams and a Gaussian clustering scheme. Previously we have tried 4 and 13 streams, wh ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
In [1], our novel subspace distribution clustering hidden Markov model (SDCHMM) made its debut as an approximation to continuous density HMM(CDHMM). Deriving SDCHMMs from CDHMMs requires a definition of multiple streams and a Gaussian clustering scheme. Previously we have tried 4 and 13 streams, which are common but ad hoc choices. Here we present a simple and coherent definition for streams of any dimension: the streams comprise the most correlated features. The new definition is shown to give better performance in two recognition tasks. The clustering scheme in [1] is an O(n 2 ) algorithm which can be slow when the number of Gaussians in the original CDHMMs is large. Now we have devised a modified k-means clustering scheme using the Bhattacharyya distance as the distance measure between Gaussian clusters. Not only is the new clustering scheme faster, when combined with the new stream definitions, we now obtain SDCHMMs which perform at least as well as the original CDHMMs (with bet...
Training acoustic models with speech data from different languages
- ISCA Tutorial and Research Workshop (ITRW) on Multilingual Speech and Language Processing
, 2005
"... We present a technique to train acoustic models for a target language using speech data from distinct source languages. In this approach, no native training data from the target language is required. The acoustic model candidates for each targetlanguage phoneme are automatically selected from a grou ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We present a technique to train acoustic models for a target language using speech data from distinct source languages. In this approach, no native training data from the target language is required. The acoustic model candidates for each targetlanguage phoneme are automatically selected from a group of existing source languages by means of a combined phoneticphonological (CPP) metric, developed by incorporating statistically-derived phonetic and phonological distance information (Liu and Melnar, Interspeech 2005). The method assumes availability of sufficient native training data for the source languages and pronunciation lexica for both the target and source languages. Once the model candidates are determined for each target-language phoneme, the target HMMs are trained with the speech data from the source languages by means of a “silkie-hen-on-duck-eggs ” strategy – namely the target phoneme model training is embedded in the source phoneme model training. The recognition performance of the resultant models is comparable to that of our previously-reported CPP-derived models built through multimixture construction while the size of the current models is only a fraction of the previous models, depending on the number of HMM candidates used for each target phoneme. Utilizing the CPP metric, both versions of the models reach the performance of models generated by a data-driven acoustic-distance mapping approach, far above the general phoneme symbol-based cross-language transfer strategies. 1.
Using Accent Information in ASR Models for Swedish
- In Proceedings of European Conference on Speech Communication and Technology (Eurospeech
, 2003
"... In this study accent information is used in an attempt to improve acoustic models for automatic speech recognition (ASR). First, accent dependent Gaussian models were trained independently. The Bhattacharyya distance was then used in conjunction with agglomerative hierarchical clustering to define o ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In this study accent information is used in an attempt to improve acoustic models for automatic speech recognition (ASR). First, accent dependent Gaussian models were trained independently. The Bhattacharyya distance was then used in conjunction with agglomerative hierarchical clustering to define optimal strategies for merging those models. The resulting allophonic classes were analyzed and compared with the phonetic literature. Finally, accent "aware" models were built, in which the parametric complexity for each phoneme corresponds to the degree of variability across accent areas and to the amount of training data available for it. The models were compared to models with the same, but evenly spread, overall complexity showing in some cases a slight improvement in recognition accuracy.
Accent dependent modelling for Swedish speech recognition
, 2003
"... This master thesis work has contained studies in accent dependent training of phone models for Swedish, and also evaluations of applications of accent dependence when building monophonic HMM based speech recognizers. The accent dependent application has been carried out by altering the distribution ..."
Abstract
- Add to MetaCart
This master thesis work has contained studies in accent dependent training of phone models for Swedish, and also evaluations of applications of accent dependence when building monophonic HMM based speech recognizers. The accent dependent application has been carried out by altering the distribution of a fixed number of Gaussian components over the phonetic models, based on a clustering algorithm that uses distances evaluated between accent dependent phonetic models. The Gaussian components of the accent dependent phonetic models have then been used to build a new speech model.
A Combined Phonetic-Phonological Approach to Estimating Cross- Language Phoneme Similarity in an ASR Environment
"... This paper presents a fully automated linguistic approach to measuring distance between phonemes across languages. In this approach, a phoneme is represented by a feature matrix where feature categories are fixed, hierarchically related and binary-valued; feature categorization explicitly addresses ..."
Abstract
- Add to MetaCart
This paper presents a fully automated linguistic approach to measuring distance between phonemes across languages. In this approach, a phoneme is represented by a feature matrix where feature categories are fixed, hierarchically related and binary-valued; feature categorization explicitly addresses allophonic variation and feature values are weighted based on their relative prominence derived from lexical frequency measurements. The relative weight of feature values is factored into phonetic distance calculation. Two phonological distances are statistically derived from lexical frequency measurements. The phonetic distance is combined with the phonological distances to produce a single metric that quantifies cross-language phoneme distance. The performances of target-language phoneme HMMs constructed solely with source language HMMs, first selected by the combined phonetic and phonological metric and then by a data-driven, acoustics distance-based method, are compared in context-independent automatic speech recognition (ASR) experiments. Results show that this approach consistently performs equivalently to the acoustics-based approach, confirming its effectiveness in estimating cross-language similarity between phonemes in an ASR environment. 1

