• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Automatic generation of detailed pronunciation lexicons (1996)

by Riley Michael, Andrej Ljolje
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 26
Next 10 →

Learning String Edit Distance

by Eric Sven Ristad, Peter N. Yianilos , 1997
"... In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic mo ..."
Abstract - Cited by 158 (2 self) - Add to MetaCart
In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic model for string edit distance. Our stochastic model allows us to learn a string edit distance function from a corpus of examples. We illustrate the utility of our approach by applying it to the difficult problem of learning the pronunciation of words in conversational speech. In this application, we learn a string edit distance with nearly one fifth the error rate of the untrained Levenshtein distance. Our approach is applicable to any string classification problem that may be solved using a similarity function against a database of labeled prototypes.

Speaking In Shorthand -- A Syllable-Centric Perspective For Understanding Pronunciation Variation

by Steven Greenberg , 1998
"... Current-generation automatic speech recognition (ASR) systems model spoken discourse as a linear sequence of words and phones. Because it is unusual for every phone within a word to be pronounced in a standard ("canonical") way, ASR systems often depend on a multi-pronunciation lexicon to match an a ..."
Abstract - Cited by 93 (12 self) - Add to MetaCart
Current-generation automatic speech recognition (ASR) systems model spoken discourse as a linear sequence of words and phones. Because it is unusual for every phone within a word to be pronounced in a standard ("canonical") way, ASR systems often depend on a multi-pronunciation lexicon to match an acoustic sequence with a lexical unit. Since there are, in practice, many different ways for a word to be pronounced, this standard approach adds a layer of complexity and ambiguity to the decoding process which, if modified, could potentially improve recognition performance. Systematic analysis of pronunciation variation in a corpus of spontaneous English discourse (Switchboard) demonstrates that the variation observed is systematic at the level of the syllable. Syllabic onsets are realized in canonical form far more frequently than either coda or nuclear constituents. Prosodic stress also plays an important role in pronunciation. The governing mechanism is likely to involve the informationa...

Stochastic pronunciation modelling from hand-labelled phonetic corpora

by M. Riley, W. Byrne, M. Finke, S. Khudanpur, A. Ljolje, J. Mcdonough, H. Nock, M. Saraclar, C. Wooters, G. Zavaliagkos - Speech Communication , 1999
"... In the early ’90s, the availability of the TIMIT read-speech phonetically transcribed corpus led to work at AT&T on the automatic inference of pronunciation variation. This work, briefly summarized here, used stochastic decisions trees trained on phonetic and linguistic features, and was applied to ..."
Abstract - Cited by 45 (2 self) - Add to MetaCart
In the early ’90s, the availability of the TIMIT read-speech phonetically transcribed corpus led to work at AT&T on the automatic inference of pronunciation variation. This work, briefly summarized here, used stochastic decisions trees trained on phonetic and linguistic features, and was applied to the DARPA North American Business News read-speech ASR task. More recently, the ICSI spontaneous-speechphonetically transcribed corpus was collected at the behest of the 1996 and 1997 LVCSR Summer Workshops held at Johns Hopkins University. A 1997 workshop (WS97) group focused on pronunciation inference from this corpus for application to the DoD Switchboard spontaneous telephone speech ASR task. We describe several approaches taken there. Theseinclude ( ) one analogousto the AT&T approach, ( ) one, inspired by work at WS96 and CMU, that involved adding pronunciation variants of a sequence of one or more words (‘multiwords’) in the corpus (with corpus-derived probabilities) into the ASR lexicon, and ( ) a hybrid approachin which a decision-tree model was used to automatically phonetically transcribe a much larger speech corpus than ICSI and then the multiword approach was used to construct an ASR recognition pronunciation lexicon. 1.

Pronunciation Modeling By Sharing Gaussian Densities Across Phonetic Models

by Murat Saraclar, Harriet Nock, Sanjeev Khudanpur - Computer Speech and Language , 1999
"... Conversational speech exhibits considerable pronunciation variability, which has been shown to have a detrimental effect on the accuracy of automatic speech recognition. There have been many attempts to model pronunciation variation, including the use of decision-trees to generate alternate word pro ..."
Abstract - Cited by 42 (2 self) - Add to MetaCart
Conversational speech exhibits considerable pronunciation variability, which has been shown to have a detrimental effect on the accuracy of automatic speech recognition. There have been many attempts to model pronunciation variation, including the use of decision-trees to generate alternate word pronunciations from phonemic baseforms. Use of such pronunciation models during recognition is known to improve accuracy. This paper describes the use of such pronunciation models during acoustic model training. Subtle difficulties in the straightforward use of alternatives to canonical pronunciations are first illustrated: it is shown that simply improving the accuracy of the phonetic transcription used for acoustic model training is of little benefit. Analysis of this paradox leads to a new method of accommodating nonstandard pronunciations: rather than allowing a phoneme in the canonical pronunciation to be realized as one of a few distinct alternate phones predicted by the pronunciation model, the HMM states of the phoneme's model are instead allowed to share Gaussian mixture components with the HMM states of the model of the alternate realization. Qualitatively, this amounts to making a soft decision about which surface-form is realized. Quantitative experiments on the Switchboard corpus show that this method improves accuracy by 1.7% (absolute).

A conditional random field for discriminatively-trained finite-state string edit distance

by Andrew Mccallum, Kedar Bellare - In Conference on Uncertainty in AI (UAI , 2005
"... The need to measure sequence similarity arises in information extraction, object identity, data mining, biological sequence analysis, and other domains. This paper presents discriminative string-edit CRFs, a finitestate conditional random field model for edit sequences between strings. Conditional r ..."
Abstract - Cited by 33 (5 self) - Add to MetaCart
The need to measure sequence similarity arises in information extraction, object identity, data mining, biological sequence analysis, and other domains. This paper presents discriminative string-edit CRFs, a finitestate conditional random field model for edit sequences between strings. Conditional random fields have advantages over generative approaches to this problem, such as pair HMMs or the work of Ristad and Yianilos, because as conditionally-trained methods, they enable the use of complex, arbitrary actions and features of the input strings. As in generative models, the training data does not have to specify the edit sequences between the given string pairs. Unlike generative models, however, our model is trained on both positive and negative instances of string pairs. We present positive experimental results on several data sets. 1

Hidden feature models for speech recognition using dynamic bayesian networks

by Karen Livescu, James Glass, Jeff Bilmes - in Proc. Eurospeech , 2003
"... In this paper, we investigate the use of dynamic Bayesian networks (DBNs) to explicitly represent models of hidden features, such as articulatory or other phonological features, for automatic speech recognition. In previous work using the idea of hidden features, the representation has typically bee ..."
Abstract - Cited by 29 (4 self) - Add to MetaCart
In this paper, we investigate the use of dynamic Bayesian networks (DBNs) to explicitly represent models of hidden features, such as articulatory or other phonological features, for automatic speech recognition. In previous work using the idea of hidden features, the representation has typically been implicit, relying on a single hidden state to represent a combination of features. We present a class of DBN-based hidden feature models, and show that such a representation can be not only more expressive but also more parsimonious. We also describe a way of representing the acoustic observation model with fewer distributions using a product of models, each corresponding to a subset of the features. Finally, we describe our recent experiments using hidden feature models on the Aurora 2.0 corpus. 1.

Lexical Modeling Of Non-Native Speech For Automatic Speech Recognition

by Karen Livescu, James Glass , 2000
"... This paper examines the recognition of non-native speech in jupiter, a speaker-independent, spontaneous-speech conversational system. Because the non-native speech in this domain is limited and varied, speaker- and accent-specific methods are impractical. We therefore chose to model all of the non-n ..."
Abstract - Cited by 24 (1 self) - Add to MetaCart
This paper examines the recognition of non-native speech in jupiter, a speaker-independent, spontaneous-speech conversational system. Because the non-native speech in this domain is limited and varied, speaker- and accent-specific methods are impractical. We therefore chose to model all of the non-native data with a single model. In particular, this paper describes an attempt to better model non-native lexical patterns. These patterns are incorporated by applying context-independent phonetic confusion rules, whose probabilities are estimated from training data. Using this approach, the word error rate on a non-native test set is reduced from 20.9% to 18.8%. 1. INTRODUCTION Speech recognition accuracy has been observed to be drastically lower for non-native speakers of the target language than for native speakers [3, 13, 14]. Research on both nonnative accent modeling and dialect-specific modeling shows that large gains in performance can be achieved when the acoustics [1, 9, 14] and ...

Feature-based Pronunciation Modeling with Trainable Asynchrony Probabilities

by Karen Livescu, James Glass - in ICSLP , 2004
"... We report on ongoing work on a pronunciation model based on explicit representation of the evolution of multiple linguistic feature streams. In this type of model, most pronunciation variation is viewed as the result of asynchrony between features and changes in feature values. We have implemented s ..."
Abstract - Cited by 11 (6 self) - Add to MetaCart
We report on ongoing work on a pronunciation model based on explicit representation of the evolution of multiple linguistic feature streams. In this type of model, most pronunciation variation is viewed as the result of asynchrony between features and changes in feature values. We have implemented such a model using dynamic Bayesian networks. In this paper, we extend our previous work with a mechanism for learning feature asynchrony probabilities from data. We present experimental results on a word classification task using phonetic transcriptions of utterances from the Switchboard corpus.

Accent Modeling Based On Pronunciation Dictionary Adaptation For Large Vocabulary Mandarin Speech Recognition

by Chao Huang, Eric Chang, Jianlai Zhou, Kai-Fu Lee - In ICSLP 2000 , 2000
"... A method of accent modeling through Pronunciation Dictionary Adaptation (PDA) is presented. We derive the pronunciation variation between canonical speaker groups and accent groups and add an encoding of the differences to a canonical dictionary to create a new, adapted dictionary that reflects the ..."
Abstract - Cited by 9 (0 self) - Add to MetaCart
A method of accent modeling through Pronunciation Dictionary Adaptation (PDA) is presented. We derive the pronunciation variation between canonical speaker groups and accent groups and add an encoding of the differences to a canonical dictionary to create a new, adapted dictionary that reflects the accent characteristics. The pronunciation variation information is then integrated with acoustic and language models into a one-pass search framework. It is assumed that acoustic deviation and pronunciation variation are independent but complementary phenomena that cause poor performance among accented speakers. Therefore, MLLR, an efficient model adaptation technique, is also presented both alone and in combination with PDA. It is shown that when PDA, MLLR and PDA+MLLR are used, error rate reductions of 13.9%, 24.1% and 28.4% respectively are achieved.

Pronunciation modelling for conversational speech recognition - A status report from WS97

by B. Byrne, M. Finke, S. Khudanpur, J. Mcdonough, H. Nock, M. Riley, M. Saraclar, C. Wooters, G. Zavaliagkos - in Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding , 1997
"... Accurately modelling pronunciation variability in conversational speech is an important component for automatic speech recognition. We describe some of the projects undertaken in this direction at WS97, the Fifth LVCSR Summer ..."
Abstract - Cited by 9 (1 self) - Add to MetaCart
Accurately modelling pronunciation variability in conversational speech is an important component for automatic speech recognition. We describe some of the projects undertaken in this direction at WS97, the Fifth LVCSR Summer
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University