Results 11 - 20
of
25
Automatic Diacritization of Arabic for Acoustic Modeling in Speech Recognition
- In COLING 2004 Computational Approaches to Arabic Script-based Languages
, 2004
"... Automatic recognition of Arabic dialectal speech is a challenging task because Arabic dialects are essentially spoken varieties. Only few dialectal resources are available to date; moreover, most available acoustic data collections are transcribed without diacritics. Such a transcription omits essen ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Automatic recognition of Arabic dialectal speech is a challenging task because Arabic dialects are essentially spoken varieties. Only few dialectal resources are available to date; moreover, most available acoustic data collections are transcribed without diacritics. Such a transcription omits essential pronunciation information about a word, such as short vowels. In this paper we investigate various procedures that enable us to use such training data by automatically inserting the missing diacritics into the transcription. These procedures use acoustic information in combination with different levels of morphological and contextual constraints. We evaluate their performance against manually diacritized transcriptions. In addition, we demonstrate the effect of their accuracy on the recognition performance of acoustic models trained on automatically diacritized training data. 1
Robust Speech Recognition for Multiple Topological Scenarios of the GSM Mobile Phone System
, 1998
"... This paper deals with robust speech recognition in the GSM mobile environment. Our focus is on the voice degradation due to the losses in the GSM coding scheme. Thus, we initially propose an experimental framework of network topologies that consists of various coding-decoding systems placed in tande ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
This paper deals with robust speech recognition in the GSM mobile environment. Our focus is on the voice degradation due to the losses in the GSM coding scheme. Thus, we initially propose an experimental framework of network topologies that consists of various coding-decoding systems placed in tandem. After measuring the recognition performance for each of these network scenarios, we try to increase recognition accuracy by using feature compensation and model adaptation algorithms. We first compare the different methods for all the network topologies assuming the topology is known. We then investigate the more realistic case, in which we don't know the network topology the voice has passed through. The results show that robustness can be achieved even in this case.
Prosodic Features for Automatic Text-Independent Evaluation of Nativeness for Language Learners
, 2000
"... Predicting the degree of nativeness of a student utterance is an important issue in computer-aided language learning. This task has been addressed by many studies focusing on the segmental assessment of the speech signal. To achieve improved correlations between human and automatic nativeness scores ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Predicting the degree of nativeness of a student utterance is an important issue in computer-aided language learning. This task has been addressed by many studies focusing on the segmental assessment of the speech signal. To achieve improved correlations between human and automatic nativeness scores, other aspects of speech should also be considered, such as prosody. The goal of this study is to evaluate the use of prosodic information to help predict the degree of nativeness of pronunciation, independent of the text. A supervised strategy based on human grades is used in an attempt to select promising features for this task. Preliminary results show improvements in the corre- lation between human and automatic scores.
Evaluation Of Speaker's Degree Of Nativeness Using Text-Independent Prosodic Features
- in Proc. of the Workshop on Multilingual Speech and Language Processing
, 2001
"... Giving feedback on the degree of nativeness of a student's speech is an important aspect of computer-aided language learning. This task has been addressed by many studies focusing on the segmental assessment of the speech signal. To better model human nativeness scores, other aspects of speech shoul ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Giving feedback on the degree of nativeness of a student's speech is an important aspect of computer-aided language learning. This task has been addressed by many studies focusing on the segmental assessment of the speech signal. To better model human nativeness scores, other aspects of speech should also be considered, such as prosody. This study examines the use of prosodic information to evaluate the degree of nativeness of student pronunciation, independent of the text. Supervised strategies based on human grades are used in an attempt to select promising features for this task. Previous results obtained with non-native speakers showed improvements in the correlation between human and automatic scores. New strategies were evaluated with tests including native and non-native speakers. Specific features based on durations, namely for intra-sentence pauses, revealed potential use for further improvements.
The SRI Telephone-based ATIS System
, 1995
"... The telephone-based ATIS system developed at SRI International is composed of the DECIPHER 1 speech recognition system, Gemini natural language understanding system, and Entropic's TrueTalk text-to-speech system. DECIPHER's acoustic models were trained on data collected over the telephone, and the s ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
The telephone-based ATIS system developed at SRI International is composed of the DECIPHER 1 speech recognition system, Gemini natural language understanding system, and Entropic's TrueTalk text-to-speech system. DECIPHER's acoustic models were trained on data collected over the telephone, and the system was configured to run in real time. Gemini was augmented to generate responses appropriate for reading over the telephone. The response generation process has two goals: paraphrase and dialogue control. The paraphrase component converts the logical form representation of the speaker utterance to a sentence that provides confirmation to the speaker that their utterance was correctly understood, while the dialogue control component decides if a follow-up question is appropriate and, if so, which question is most appropriate. 1. INTRODUCTION SRI's pilot telephone-based ATIS spoken language system (SLS) is an over-the-telephone modification of SRI's existing ATIS3 system. The changes r...
Automatic Detection Of Phone-Level Mispronunciation For Language Learning
- Learning, Proc. of Eurospeech 99
, 1999
"... We are interested in automatically detecting specific phone segments that have been mispronounced by a nonnative student of a foreign language. The phone-level information allows a language instruction system to provide the student with feedback about specific pronunciation mistakes. Two approaches ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
We are interested in automatically detecting specific phone segments that have been mispronounced by a nonnative student of a foreign language. The phone-level information allows a language instruction system to provide the student with feedback about specific pronunciation mistakes. Two approaches were evaluated; in the first approach, log-posterior probability-based scores [1] are computed for each phone segment. These probabilities are based on acoustic models of native speech. The second approach uses a phonetically labeled nonnative speech database to train two different acoustic models for each phone: one model is trained with the acceptable, or correct native-like pronunciations, while the other model is trained with the incorrect, strongly nonnative pronunciations. For each phone segment, a log-likelihood ratio score is computed using the incorrect and correct pronunciation models. Either type of score is compared with a phone dependent threshold to detect a mispronunciation. P...
Connectionist Speaker Normalization And Adaptation
- in Eurospeech
, 1995
"... In a speaker-independent, large-vocabulary continuous speech recognition systems, recognition accuracy varies considerably from speaker to speaker, and performance may be significantly degraded for outlier speakers such as nonnative talkers. In this paper, we explore supervised speaker adaptation an ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In a speaker-independent, large-vocabulary continuous speech recognition systems, recognition accuracy varies considerably from speaker to speaker, and performance may be significantly degraded for outlier speakers such as nonnative talkers. In this paper, we explore supervised speaker adaptation and normalization in the MLP component of a hybrid hidden Markov model/ multilayer perceptron version of SRI's DECIPHER TM speech recognition system. Normalization is implemented through an additional transformation network that preprocesses the cepstral input to the MLP. Adaptation is accomplished through incremental retraining of the MLP weights on adaptation data. Our approach combines both adaptation and normalization in a single, consistent manner, works with limited adaptation data, and is text-independent. We show significant improvement in recognition accuracy. 1. INTRODUCTION In a speaker-independent (SI), large-vocabulary continuous speech recognition system, recognition accuracy ...
Training Mixture Density HMMs with SOM and LVQ
, 1997
"... ¯ The objective of this paper is to present experiments and discussions of how some neural network algorithms can help the phoneme recognition with mixture density hidden Markov models (MDHMMs). In MDHMMs the modeling of the stochastic observation processes associated with the states is based on the ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
¯ The objective of this paper is to present experiments and discussions of how some neural network algorithms can help the phoneme recognition with mixture density hidden Markov models (MDHMMs). In MDHMMs the modeling of the stochastic observation processes associated with the states is based on the estimation of the probability density function of the short-time observations in each state as a mixture of Gaussian densities. The Learning Vector Quantization (LVQ) is used to increase the discrimination between dioeerent phoneme models both during the initialization of the Gaussian codebooks and during the actual MDHMM training. The Self-Organizing Map (SOM) is applied to provide a suitably smoothed mapping of the training vectors to accelerate the convergence of the actual training. The obtained codebook topology can also be exploited in the recognition phase to speed up the calculations to approximate the observation probabilities. The experiments with LVQ and SOMs show reductions both...
Calibration of machine scores for pronunciation grading
- Proc. Int'l Conf. on Spoken Language Processing
, 1998
"... Our proposed paradigm for automatic assessment of pronunciation quality uses hidden Markov models (HMMs) to generate phonetic segmentations of the student’s speech. From these segmentations, we use the HMMs to obtain spectral match and duration scores. In this work we focus on the problem of calibra ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Our proposed paradigm for automatic assessment of pronunciation quality uses hidden Markov models (HMMs) to generate phonetic segmentations of the student’s speech. From these segmentations, we use the HMMs to obtain spectral match and duration scores. In this work we focus on the problem of calibrating different machine scores to obtain an accurate prediction of the grades that a human expert would assign to the pronunciation. We discuss the application of different approaches based on minimum mean square error (MMSE) estimation and Bayesian classification. We investigate the characteristics of the different mappings as well as the effects of the prior distribution of grades in the calibration database. We finally suggest a simple method to extrapolate mappings from one language to another. 1.
WebGrader: A Multilingual Pronunciation Practice Tool
, 1998
"... WebGrader TM is a pronunciation grading tool designed for practicing pronunciation in a second language. The system uses SRI's speech recognition [1] and pronunciation scoring [2][3][4] technologies. The application client was implemented by using the Java platform to facilitate deployment and upd ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
WebGrader TM is a pronunciation grading tool designed for practicing pronunciation in a second language. The system uses SRI's speech recognition [1] and pronunciation scoring [2][3][4] technologies. The application client was implemented by using the Java platform to facilitate deployment and updates of software and content over the World Wide Web. We present the overall system architecture, user-interface design, scoring algorithms, and a preliminary user study. 1. Introduction Most foreign language instruction courses focus on teaching reading, and writing, and on listening comprehension. Much less effort is dedicated to teaching speech production because it is sometimes considered less critical for communicating in a foreign language, or simply because of a lack of resources such as private tutors who are native or near-native speakers of the target language. We believe that an interactive system capable of grading pronunciation can facilitate the pronunciation learning process....

