• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Padmanablan M., Data-driven Approach to Designing Compound Words for Continuous Speech Recognition, ASRU’99 (1999)

by G Saon
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 15
Next 10 →

RECENT IMPROVEMENTS IN THE CU SONIC ASR SYSTEM FOR NOISY SPEECH: THE SPINE TASK

by Bryan Pellom, Kadri Hacioglu
"... In this paper we report on recent improvements in the University of Colorado system for the DARPA/NRL Speech in Noisy Environments (SPINE) task. In particular, we describe our efforts on improving acoustic and language modeling for the task and investigate methods for unsupervised speaker and enviro ..."
Abstract - Cited by 15 (5 self) - Add to MetaCart
In this paper we report on recent improvements in the University of Colorado system for the DARPA/NRL Speech in Noisy Environments (SPINE) task. In particular, we describe our efforts on improving acoustic and language modeling for the task and investigate methods for unsupervised speaker and environment adaptation from limited data. We show that the MAPLR adaptation method outperforms single and multiple regression class MLLR on the SPINE task. Our current SPINE system uses the Sonic speech recognition engine that was recently developed at the University of Colorado. This system is shown to have a word error rate of 31.5 % on the SPINE-2 evaluation data. These improvements amount to a 16 % reduction in relative word error rate compared to our previous SPINE-2 system fielded in the Nov. 2001 DARPA/NRL evaluation.

Automatic Summarization of Voicemail Messages Using Lexical and Prosodic Features

by Konstantinos Koumpis, Steve Renals , 2005
"... This paper presents trainable methods for extracting principal content words from voicemail messages. The short text summaries generated are suitable for mobile messaging applications. The system uses a set of classifiers to identify the summary words, with each word being identified by a vector of ..."
Abstract - Cited by 14 (2 self) - Add to MetaCart
This paper presents trainable methods for extracting principal content words from voicemail messages. The short text summaries generated are suitable for mobile messaging applications. The system uses a set of classifiers to identify the summary words, with each word being identified by a vector of lexical and prosodic features. We use an ROC-based algorithm, Parcel, to select input features (and classifiers). We have performed a series of objective and subjective evaluations using unseen data from two different speech recognition systems, as well as human transcriptions of voicemail speech.

On Lexicon Creation for Turkish LVCSR

by Kadri Hacioglu , Bryan Pellom, et al. , 2003
"... In this paper, we address the lexicon design problem in Turkish large vocabulary speech recognition. Although we focus only on Turkish, the methods described here are general enough that they can be considered for other agglutinative languages like Finnish, Korean etc. In an agglutinative language, ..."
Abstract - Cited by 13 (9 self) - Add to MetaCart
In this paper, we address the lexicon design problem in Turkish large vocabulary speech recognition. Although we focus only on Turkish, the methods described here are general enough that they can be considered for other agglutinative languages like Finnish, Korean etc. In an agglutinative language, several words can be created from a single root word using a rich collection of morphological rules. So, a virtually infinite size lexicon is required to cover the language if words are used as the basic units. The standard approach to this problem is to discover a number of primitive units so that a large set of words can be created by compounding those units. Two broad classes of methods are available for splitting words into their sub-units; morphology-based and data-driven methods. Although the word splitting significantly reduces the out of vocabulary rate, it shrinks the context and increases acoustic confusibility. We have used two methods to address the latter. In one method, we use word counts to avoid splitting of high frequency lexical units, and in the other method, we recompound splits according to a probabilistic measure. We present experimental results that show the methods are very effective to lower the word error rate at the expense of lexicon size.

The Role of Prosody in a Voicemail Summarization System

by Konstantinos Koumpis, Steve Renals - In Proc. ISCA Workshop on Prosody in Speech Recognition and Understanding , 2001
"... When a speaker leaves a voicemail message there are prosodic cues that emphasize the important points in the message, in addition to lexical content. In this paper we compare and visualize the relative contribution of these two types of features within a voicemail summarization system. We describe t ..."
Abstract - Cited by 9 (6 self) - Add to MetaCart
When a speaker leaves a voicemail message there are prosodic cues that emphasize the important points in the message, in addition to lexical content. In this paper we compare and visualize the relative contribution of these two types of features within a voicemail summarization system. We describe the system's ability to generate summaries of two test sets, having trained and validated using 700 messages from the IBM Voicemail corpus. Results measuring the quality of summary artifacts show that combined lexical and prosodic features are at least as robust as combined lexical features alone across all operating conditions. 1.

Evaluation of Extractive Voicemail Summarization

by Konstantinos Koumpis, Steve Renals - In Proc. ISCA Workshop on Multilingual Spoken Document Retrieval, Hong Kong , 2003
"... This paper is about the evaluation of a system that generates short text summaries of voicemail messages, suitable for transmission as text messages. Our approach to summarization is based on a speech-recognized transcript of the voicemail message, from which a set of summary words is extracted. The ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
This paper is about the evaluation of a system that generates short text summaries of voicemail messages, suitable for transmission as text messages. Our approach to summarization is based on a speech-recognized transcript of the voicemail message, from which a set of summary words is extracted. The system uses a classifier to identify the summary words, with each word being identified by a vector of lexical and prosodic features. The features are selected using Parcel, an ROC-based algorithm. Our evaluations of the system, using a slot error rate metric, have compared manual and automatic summarization, and manual and automatic recognition (using two different recognizers). We also report on two subjective evaluations using mean opinion score of summaries, and a set of comprehension tests. The main results from these experiments were that the perceived difference in quality of summarization was affected more by errors resulting from automatic transcription, than by the automatic summarization process.

From generic to task-oriented speech recognition : French experience in the NESPOLE! European project

by Dominique Vaufreydaz, Laurent Besacier, Carole Bergamini, Richard Lamy - In Proc. ITRW Workshop on Adaptation Methods for Speech Recognition, Sophia Antipolis , 2001
"... This paper presents CLIPS laboratory activities in speech recognition related to language model adaptation and acoustic model adaptation in the NESPOLE! European project. ASR system needed to be adapted in two ways. The language model had to deal with task specific vocabulary and the acoustic model ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
This paper presents CLIPS laboratory activities in speech recognition related to language model adaptation and acoustic model adaptation in the NESPOLE! European project. ASR system needed to be adapted in two ways. The language model had to deal with task specific vocabulary and the acoustic model had to be robust to VoIP (Voice over IP) speech. It was shown that Internet, as a very large source of text, can be a very interesting database for spoken language modelling adaptation. The influence of different VoIP codecs on the performance of our speech recognition engine was investigated and a new strategy was proposed to cope with degradation due to low bitrate coding. The acoustic models of the speech recognition system were trained with transcoded speech. Results have shown that this strategy allows to recover acceptable performance for the NESPOLE! project context. 1.

Evolution of the performance of automatic speech recognition algorithms in transcribing conversational telephone speech

by M. Padmanabhan, G. Saon, G. Zweig, J. Huang, B. Kingsbury, L. Mangu - IEEE Instrumentation and Measurement Technology Conf. (IMTC), IEEE , 2001
"... Abstract- ' Research in the speech recognition speech-to-text conversion) area hus been underway for a couple of decades, and a greal deal of progress has been made in reducing the word error rate (WER). In this paper, we at-tempt to summarize the state of the art in speech recognition algorithms. T ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract- ' Research in the speech recognition speech-to-text conversion) area hus been underway for a couple of decades, and a greal deal of progress has been made in reducing the word error rate (WER). In this paper, we at-tempt to summarize the state of the art in speech recognition algorithms. The algorithms we describe span the areas of lexicon design, feature extrac-tion, classifir design, combinntion of hypotheses, and speaker adaphthn of acoustic models. We will benchmark the algorithms on two main sources of speech, the first being Voicemail (conversational telephone speech from a single speaker) and the second being Switchboard (conversatbnal telephone speech between two speakers). We ako present the results of some cross-hmuin experiments which highlight the "bri#leness " of speech recognition systems today and illustrates the need to focus research effort on improving cross-domain pedormance.

Speech Recognition For Darpa Communicator

by A Aaron, S Chen, Aaron Chen Cohen, S Dharanipragada, E Eide, M Franz, J-m Leroux, X Luo, B Maison, L Mangu, T Mathes, M Novak, P Olsen, M Picheny, H Printz, B Ramabhadran, A Sakrajda, G Saon, B Tydlitat, K Visweswariah, D Yuk - In ICASSP , 2001
"... We report the results of investigations in acoustic modeling, language modeling and decoding techniques, for DARPA Communicator, a speaker-independent, telephone-based dialog system. By a combination of methods, including enlarging the acoustic model, augmenting the recognizer vocabulary, conditioni ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
We report the results of investigations in acoustic modeling, language modeling and decoding techniques, for DARPA Communicator, a speaker-independent, telephone-based dialog system. By a combination of methods, including enlarging the acoustic model, augmenting the recognizer vocabulary, conditioning the language model upon dialog state, and applying a post-processing decoding method, we lowered the overall word error rate from 21.9% to 15.0%, a gain of 6.9% absolute and 31.5% relative.

DATA-DRIVEN LEXICON EXPANSION FOR MANDARIN BROADCAST NEWS AND CONVERSATION SPEECH RECOGNITION

by Xin Lei, Wen Wang, Andreas Stolcke
"... We present a data-driven framework for expanding the lexicon to improve Mandarin broadcast news and conversation speech recognition. The lexicon expansion includes the generation of pronunciation variants for frequent words and vocabulary augmentation with new words and phrases derived from the trai ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
We present a data-driven framework for expanding the lexicon to improve Mandarin broadcast news and conversation speech recognition. The lexicon expansion includes the generation of pronunciation variants for frequent words and vocabulary augmentation with new words and phrases derived from the training data. To learn multiple pronunciations, we first generate all possible pronunciation candidates for a word from its character pronunciation network. The top pronunciation variants are then selected from forced alignment statistics. To augment the acoustic vocabulary, we propose an efficient algorithm that derives new words based on N-gram statistics. Experiments show that a dictionary expanded in this manner yields significant improvements on a Mandarin broadcast speech recognition task. Index Terms — Pronunciation learning, vocabulary expansion, Mandarin speech recognition.

IEEE Instrumentation and Measurement

by Technology Conference Budapest, Balázs Vödrös, István Kollár
"... Controllers of industrial furnaces operate differently in different temperature ranges. The controller has different parameter sets for each of these ranges. The operation of controllers is switched according to the temperature. It is desirable to change the parameters continuously following the tem ..."
Abstract - Add to MetaCart
Controllers of industrial furnaces operate differently in different temperature ranges. The controller has different parameter sets for each of these ranges. The operation of controllers is switched according to the temperature. It is desirable to change the parameters continuously following the temperature. The continuous change of parameters instead of mode switching may decrease the switching transients and lead to more accurate temperature control A furnace identification scheme is investigated in this paper. Measurement problems and possible corrections that result in more accurate models after processing the collected data are shown. A possible "interpolation" technique of frequency domain models is also shown here.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University