Results 1 -
3 of
3
A Comparison Of Word Graph And N-Best List Based Confidence Measures
- in Proc. EUROSPEECH
, 1999
"... In this paper we present and compare several confidence measures for large vocabulary continuous speech recognition. We show that posterior word probabilities computed on word graphs and N-best lists clearly outperform non-probabilistic confidence measures, e.g. the acoustic stability and the hypoth ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
In this paper we present and compare several confidence measures for large vocabulary continuous speech recognition. We show that posterior word probabilities computed on word graphs and N-best lists clearly outperform non-probabilistic confidence measures, e.g. the acoustic stability and the hypothesis density. In addition, we prove that the estimation of posterior word probabilities on word graphs yields better results than their estimation on N-best lists and discuss both methods in detail. We present experimental results on three different corpora, the English NAB '94 20k development corpus, the German VERBMOBIL '96 evaluation corpus and a Dutch corpus, which has been recorded with a train timetable information system in the ARISE project. 1. INTRODUCTION In previous studies, the combination of several confidence features was investigated. These features were collected during the acoustic decoding process, e.g. [1] or were extracted from Nbest lists and word graphs, e.g. [2, 5]. ...
Transcribing Multilingual Broadcast News Using Hypothesis Driven Lexicon Adaptation
- BNTUW-98 Proceedings of the 1998 DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne VA
, 1998
"... This paper describes first results of our DARPA-sponsored efforts toward recognizing and browsing foreign language, more specifically, Serbo-Croatian broadcast news. For Serbo-Croatian as well as many other than the most common well studied languages, the problems of broadcast quality recognition ar ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
This paper describes first results of our DARPA-sponsored efforts toward recognizing and browsing foreign language, more specifically, Serbo-Croatian broadcast news. For Serbo-Croatian as well as many other than the most common well studied languages, the problems of broadcast quality recognition are complicated by 1.) the lack of available acoustic and language data, and 2.) the excessive vocabulary growth in heavily inflected languages that lead to unacceptable OOV-rates. We present a Serbo-Croatian large vocabulary system that achieves a 74 % recognition rate, despite limited training data. Our system achieves this rate by a multipass strategy that dynamically adapts the recognition dictionary to the speech segment to be recognized by generating morphological variations (Hypothesis Driven Lexical Adaptation). We will outline the bootstrapping and training process of the Janus Recognition Toolkit (JanusRTk) based broadcast news recognition engine: data collection, segmentation and labeling of the data according to different acoustic conditions, dictionary design, language modeling and training. The Hypothesis Driven Lexical Adaptation (HDLA) approach has been tested both on Serbo-Croatian and German news data and has achieved considerable recognition improvements. OOV-rates were reduced by 35-45%; on the Serbo-Croatian broadcast news data from 8.7 % to 4.8 % thereby also decreasing word error rate from 29.5 % to 26%. 1.
Transcribing Multilingual Broadcast News Using Hypothesis Driven Lexical Adaptation
- BNTUW-98 Proceedings of the 1998 DARPA Broadcast News Transcription and Understanding Workshop, Landsdowne VA
, 1998
"... This paper describes first results of our DARPA-sponsored efforts toward recognizing and browsing foreign language, more specifically, Serbo-Croatian broadcast news. For Serbo-Croatian as well as many other than the most common well studied languages, the problems of broadcast quality recognition ar ..."
Abstract
- Add to MetaCart
This paper describes first results of our DARPA-sponsored efforts toward recognizing and browsing foreign language, more specifically, Serbo-Croatian broadcast news. For Serbo-Croatian as well as many other than the most common well studied languages, the problems of broadcast quality recognition are complicated by 1.) the lack of available acoustic and language data, and 2.) the excessive vocabulary growth in heavily inflected languages that lead to unacceptable OOV-rates. We present a Serbo-Croatian large vocabulary system that achieves a 74% recognition rate, despite limited training data. Our system achieves this rate by a multipass strategy that dynamically adapts the recognition dictionary to the speech segment to be recognized by generating morphological variations (Hypothesis Driven Lexical Adaptation). We will outline the bootstrapping and training process of the Janus Recognition Toolkit (JanusRTk) based broadcast news recognition engine: data collection, segmentation and la...

