Results 11 -
15 of
15
AUTOMATIC SPEECH RECOGNITION AND INTRINSIC SPEECH VARIATION
"... This paper briefly reviews state of the art related to the topic of speech variability sources in automatic speech recognition systems. It focuses on some variations within the speech signal that make the ASR task difficult. The variations detailed in the paper are intrinsic to the speech and affect ..."
Abstract
- Add to MetaCart
This paper briefly reviews state of the art related to the topic of speech variability sources in automatic speech recognition systems. It focuses on some variations within the speech signal that make the ASR task difficult. The variations detailed in the paper are intrinsic to the speech and affect the different levels of the ASR processing chain. For different sources of speech variation, the paper summarizes the current knowledge and highlights specific feature extraction or modeling weaknesses and current trends. 1.
Pronunciation Change in Conversational Speech and
, 2003
"... Pronunciations in spontaneous speech di#er significantly from citation form and pronunciation modeling for automatic speech recognition has received considerable attention in the last few years. Most methods describe alternate pronunciations of a word using multiple entries in a dictionary or using ..."
Abstract
- Add to MetaCart
Pronunciations in spontaneous speech di#er significantly from citation form and pronunciation modeling for automatic speech recognition has received considerable attention in the last few years. Most methods describe alternate pronunciations of a word using multiple entries in a dictionary or using a network of phones, assuming implicitly that a deviation from the canonical pronunciation results in a "complete" change as described by the alternate pronunciation. We investigate this implicit assumption about pronunciation change in conversational speech and demonstrate here that in most cases, the change is only partial; a phone is not completely deleted or substituted by another phone but is modified only partially. Evidence supporting this conclusion comes from the three-way analysis of features extracted from the acoustic signal for use in a speech recognition system, canonical pronunciations from a dictionary, and careful phonetic transcriptions produced by human labelers. Most often, when a deviation from the canonical pronunciation is marked, neither the canonical nor the manually labeled phones represent the actual acoustics adequately. Further analysis of the manual phonetic transcription reveals a significant number (>20%) of instances where even human labelers disagree on the identity of the surface-form. In light of this evidence, two methods are suggested for accommodating such partial pronunciation change in the automatic recognition of spontaneous speech and experimental results are presented for each method.
Lexical Coverage Issues for Speech Recognition in Indian Languages ∗
"... This report investigates issues of lexical coverage in Indian languages. More specifically, a parallel analysis of Out-of-Vocabulary words is made in Telugu and Tamil. Although generic, this study is focussed on understanding the morphological aspects in these languages as necessary for speech recog ..."
Abstract
- Add to MetaCart
This report investigates issues of lexical coverage in Indian languages. More specifically, a parallel analysis of Out-of-Vocabulary words is made in Telugu and Tamil. Although generic, this study is focussed on understanding the morphological aspects in these languages as necessary for speech recognition. The observations reveal that morphological analysis and preprocessing can increase the lexical coverage by over 50%, thereby bringing them closer to the numbers in English. 1
Modeling Pronunciation Variation for Bi-Lingual Mandarin/Taiwanese Speech Recognition
"... In this paper, a bi-lingual large vocaburary speech recognition experiment based on the idea of modeling pronunciation variations is described. The two languages under study are Mandarin Chinese and Taiwanese (Min-nan). These two languages are basically mutually unintelligible, and they have many wo ..."
Abstract
- Add to MetaCart
In this paper, a bi-lingual large vocaburary speech recognition experiment based on the idea of modeling pronunciation variations is described. The two languages under study are Mandarin Chinese and Taiwanese (Min-nan). These two languages are basically mutually unintelligible, and they have many words with the same Chinese characters and the same meanings, although they are pronounced differently. Observing the bi-lingual corpus, we found five types of pronunciation variations for Chinese characters. A one-pass, three-layer recognizer was developed that includes a combination of bi-lingual acoustic models, an integrated pronunciation model, and a tree-structure based searching net. The recognizer’s performance was evaluated under three different pronunciation models. The results showed that the character error rate with integrated pronunciation models was better than that with pronunciation models, using either the knowledge-based or the data-driven approach. The relative frequency ratio was also used as a measure to choose the best number of pronunciation variations for each Chinese character. Finally, the best character error rates in Mandarin and Taiwanese testing sets were found to be 16.2 % and 15.0%, respectively, when the average number of pronunciations for one Chinese character was 3.9. Keywords: Bi-lingual, One-pass ASR, Pronunciation Modeling 1.
Pronunciation Variation Modeling in Automatic Speech Recognition
"... Robust speech recognition is a critical research topic – systems must be able to handle a wide variation in types of speech to make speech technology more user-friendly. One major source of variation in speech is different speaking styles; handling this variation in user input is difficult for curre ..."
Abstract
- Add to MetaCart
Robust speech recognition is a critical research topic – systems must be able to handle a wide variation in types of speech to make speech technology more user-friendly. One major source of variation in speech is different speaking styles; handling this variation in user input is difficult for current state-ofthe-art recognizers. Modeling pronunciation variation within the system can ameliorate the difficulties to some degree. Pronunciation variation can be modeled in different parts of the recognizer; in this presentation we focus on lexical adaptation (other articles in this issue of Telektronikk cover other types of robust modeling).

