Results 11 -
16 of
16
A Confidence Measure Based On Agreement Among Multiple Lvcsr Models - Correlation Between Pair Of Acoustic Models And Confidence
- in Proc. 7th ICSLP
, 2002
"... For many practical applications of speech recognition systems, it is quite desirable to have an estimate of confidence for each hypothesized word. Unlike previous works on confidence measures, this paper studies features for confidence measures that are extracted from outputs of more than one LVCSR ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
For many practical applications of speech recognition systems, it is quite desirable to have an estimate of confidence for each hypothesized word. Unlike previous works on confidence measures, this paper studies features for confidence measures that are extracted from outputs of more than one LVCSR models. More specifically, this paper experimentally evaluates the agreement among the outputs of multiple Japanese LVCSR models, with respect to whether it is effective as an estimate of confidence for each hypothesized word. The results of experimental evaluation show that the agreement between the outputs with two LVCSR models with different decoders and acoustic models can achieve quite reliable confidence. Furthermore, among various features of acoustic models based on Gaussian mixture HMMs, it is concluded that ones such as whether or not to have short pause models, as well as different units in HMMs (e.g., triphone model or syllable model) are the most effective in achieving highly reliable confidence.
Abstract Articulatory-feature-based confidence measures
"... Confidence measures are computed to estimate the certainty that target acoustic units are spoken in specific speech segments. They are applied in tasks such as keyword verification or utterance verification. Because many of the confidence measures use the same set of models and features as in recogn ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Confidence measures are computed to estimate the certainty that target acoustic units are spoken in specific speech segments. They are applied in tasks such as keyword verification or utterance verification. Because many of the confidence measures use the same set of models and features as in recognition, the resulting scores may not provide an independent measure of reliability. In this paper, we propose two articulatory feature (AF) based phoneme confidence measures that estimate the acoustic reliability based on the match in AF properties. While acoustic-based features, such as Mel-frequency cepstral coefficients (MFCC), are widely used in speech processing, some recent works have focus on linguistically based features, such as the articulatory features that relate directly to the human articulatory process which may better capture speech characteristics. The articulatory features can either replace or complement the acoustic-based features in speech processing. The proposed AF-based measures in this paper were evaluated, in comparison and in combination, with the HMM-based scores on phoneme and keyword verification tasks using childrenÕs speech collected for a computer-based English pronunciation learning project. To fully evaluate their usefulness, the proposed measures and combinations were evaluated on both native and non-native data; and under field test conditions that mis-matches with the training condition. The experimental results show that under the different environments, combinations of the AF scores with the HMM-based
Cross-language bootstrapping based on completely unsupervised training using multilingual Astabil
- In International Conference on Acoustics, Speech and Signal Processing, ICASSP 2011
, 2011
"... This paper presents our work on rapid language adaptation of acoustic models based on multilingual cross-language bootstrapping and unsupervised training. We used Automatic Speech Recognition (ASR) systems in English, French, German, and Spanish to build a Czech ASR system from scratch. System build ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper presents our work on rapid language adaptation of acoustic models based on multilingual cross-language bootstrapping and unsupervised training. We used Automatic Speech Recognition (ASR) systems in English, French, German, and Spanish to build a Czech ASR system from scratch. System building was performed without using any transcribed audio data by applying three consecutive steps, i.e. cross-language transfer, unsupervised training based on the “multilingual A-stabil “ confidence score [1], and bootstrapping. Based on the confidence score we selected 72% (16.6 hours) of the available audio data with a transcription WER of less than 14.5%. The cross-language bootstrap achieves a word error rate of 23.3 % on the Czech development set and 22.4 % on the evaluation set. These results are very promising as the performance compares favorably to the Czech ASR system which was trained on 23 hours of manually transcribed data (21.8 % on the development set and 21.3 % on the evaluation set). Index Terms — rapid language adaptation of ASR, unsupervised training, multilingual A-Stabil 1.
Speech Recognition Using Context Conditional Word Posterior Probabilities
, 2000
"... In this paper two new scoring schemes for large vocabulary continuous speech recognition are compared. Instead of using the joint probability of a word sequence and a sequence of acoustic observations, we determine the best path through a word graph using posterior word probabilities with or without ..."
Abstract
- Add to MetaCart
In this paper two new scoring schemes for large vocabulary continuous speech recognition are compared. Instead of using the joint probability of a word sequence and a sequence of acoustic observations, we determine the best path through a word graph using posterior word probabilities with or without word context. The exact calculation of the posterior probability for a word sequence implies a sum over all possible word boundaries, which is approximated by a maximum operation in the standard scoring approach. The new scoring scheme using word posterior probabilities could be expected to lead to improved recognition performance, because it involves partial summation over word boundaries. We present experimental results on five different corpora, the Dutch Arise corpus, the German Verbmobil '98 corpus, the English North American Business '94 20k and 64k development corpora, and the English Broadcast News '96 corpus. It is shown that the Viterbi approximation within words has no effect on standard and word posterior based recognition. Using word posterior probabilities with and without word context, the relative reduction in word error rate is comparable and ranges between 1.5% and 5%. A reason why the additional consideration of word context does not further improve the recognition performance might be that the increase in word context information is traded against a decrease in the number of word sequences that contributes to a particular word posterior probability.
Rapid building of an ASR system for Under-Resourced Languages based on Multilingual Unsupervised Training
"... This paper presents our work on rapid language adaptation of acoustic models based on multilingual cross-language bootstrapping and unsupervised training. We used Automatic Speech Recognition (ASR) systems in the six source languages English, French, German, Spanish, Bulgarian and Polish to build fr ..."
Abstract
- Add to MetaCart
This paper presents our work on rapid language adaptation of acoustic models based on multilingual cross-language bootstrapping and unsupervised training. We used Automatic Speech Recognition (ASR) systems in the six source languages English, French, German, Spanish, Bulgarian and Polish to build from scratch an ASR system for Vietnamese, an underresourced language. System building was performed without using any transcribed audio data by applying three consecutive steps, i.e. cross-language transfer, unsupervised training based on the “multilingual A-stabil ” confidence score [1], and bootstrapping. We investigated the correlation between performance of “multilingual A-stabil ” and the number of source languages and improved the performance of “multilingual A-stabil ” by applying it at the syllable level. Furthermore, we showed that increasing the amount of source language ASR systems for the multilingual framework results in better performance of the final ASR system in the target language Vietnamese. The final Vietnamese recognition system has a Syllable Error Rate (SyllER) of 16.8 % on the development set and 16.1 % on the evaluation set. Index Terms: rapid language adaptation of ASR, unsupervised training, multilingual A-Stabil
Confidence Measures for Speech Recognition and Utterance Verification
, 2000
"... Despite the significant advances done in speech and language technologies, automatic speech recognition systems are far from being perfect. A significant amount of uncertainty still persists. Confidence measures are an objective means to evaluate the degree of uncertainty inherent to recognition res ..."
Abstract
- Add to MetaCart
Despite the significant advances done in speech and language technologies, automatic speech recognition systems are far from being perfect. A significant amount of uncertainty still persists. Confidence measures are an objective means to evaluate the degree of uncertainty inherent to recognition results. The purpose of such measures is to express, as reliably as possible, the level of correspondence between the original utterance from the speaker and the results of the recognizer. This research

