Results 1 - 10
of
12
Recognition confidence scoring and its use in speech understanding systems
- Computer Speech and Language
, 2002
"... In this paper we present an approach to recognition confidence scoring and a method for integrating confidence scores into the understanding and dialogue components of a speech understanding system. The system uses a multi-tiered approach where confidence scores are computed at the phonetic, word, a ..."
Abstract
-
Cited by 42 (4 self)
- Add to MetaCart
In this paper we present an approach to recognition confidence scoring and a method for integrating confidence scores into the understanding and dialogue components of a speech understanding system. The system uses a multi-tiered approach where confidence scores are computed at the phonetic, word, and utterance levels. The scores are produced by extracting confidence features from the computation of the recognition hypotheses and processing these features using an accept/reject classifier for word and utterance hypotheses. The output of the confidence classifiers can then be incorporated into the parsing mechanism of the language understanding component. To evaluate the system, experiments were conducted using the JUPITER weather information system. Evaluation was performed at the understanding level using key-value pair concept error rate as the evaluation metric. When confidence scores were integrated into the understanding component of the system, the concept error rate was reduced by over 35%.
Connectionist speech recognition of Broadcast News
, 2002
"... This paper describes connectionist techniques for recognition of Broadcast News. The fundamental difference between connectionist systems and more conventional mixture-of-Gaussian systems is that connectionist models directly estimate posterior probabilities as opposed to likelihoods. Access to post ..."
Abstract
-
Cited by 28 (10 self)
- Add to MetaCart
This paper describes connectionist techniques for recognition of Broadcast News. The fundamental difference between connectionist systems and more conventional mixture-of-Gaussian systems is that connectionist models directly estimate posterior probabilities as opposed to likelihoods. Access to posterior probabilities has enabled us to develop a number of novel approaches to confidence estimation, pronunciation modelling and search. In addition we have investigated a new feature extraction technique based on the modulation-filtered spectrogram (MSG), and methods for combining multiple information sources. We have incorporated all of these techniques into a system for the transcription
Transcription And Summarization Of Voicemail Speech
- Proc. ICSLP
, 2000
"... This paper describes the development of a system to transcribe and summarize voicemail messages. The results of the research presented in this paper are two-fold. First, a hybrid connectionist approach to the Voicemail transcription task shows that competitive performance can be achieved using a con ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
This paper describes the development of a system to transcribe and summarize voicemail messages. The results of the research presented in this paper are two-fold. First, a hybrid connectionist approach to the Voicemail transcription task shows that competitive performance can be achieved using a context-independent system with fewer parameters than those based on mixtures of Gaussian likelihoods. Second, an effective and robust combination of statistical with prior knowledge sources for term weighting is used to extract information from the decoder's output in order to deliver summaries to the message recipients via a GSM Short Message Service (SMS) gateway. 1. INTRODUCTION As the emphasis in cellular networks changes from voice-only communication to a rich combination of content based applications and services, speech recognition can provide access to several types of information through a number of portable solutions, including mobile phones and personal digital assistants. This pa...
Automatic Summarization of Voicemail Messages Using Lexical and Prosodic Features
, 2005
"... This paper presents trainable methods for extracting principal content words from voicemail messages. The short text summaries generated are suitable for mobile messaging applications. The system uses a set of classifiers to identify the summary words, with each word being identified by a vector of ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
This paper presents trainable methods for extracting principal content words from voicemail messages. The short text summaries generated are suitable for mobile messaging applications. The system uses a set of classifiers to identify the summary words, with each word being identified by a vector of lexical and prosodic features. We use an ROC-based algorithm, Parcel, to select input features (and classifiers). We have performed a series of objective and subjective evaluations using unseen data from two different speech recognition systems, as well as human transcriptions of voicemail speech.
The Role of Prosody in a Voicemail Summarization System
- In Proc. ISCA Workshop on Prosody in Speech Recognition and Understanding
, 2001
"... When a speaker leaves a voicemail message there are prosodic cues that emphasize the important points in the message, in addition to lexical content. In this paper we compare and visualize the relative contribution of these two types of features within a voicemail summarization system. We describe t ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
When a speaker leaves a voicemail message there are prosodic cues that emphasize the important points in the message, in addition to lexical content. In this paper we compare and visualize the relative contribution of these two types of features within a voicemail summarization system. We describe the system's ability to generate summaries of two test sets, having trained and validated using 700 messages from the IBM Voicemail corpus. Results measuring the quality of summary artifacts show that combined lexical and prosodic features are at least as robust as combined lexical features alone across all operating conditions. 1.
Confidence Measures for an Address Reading System
- In 7th Int. Conf. on Document Analysis and Recognition
, 2003
"... In this paper the performance of different confidence measures used for an address recognition system are evaluated. The recognition system for cursive handwritten German address words is based on Hidden Markov Models (HMMs). It is essential, that the structure of the address (name, street, city, co ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
In this paper the performance of different confidence measures used for an address recognition system are evaluated. The recognition system for cursive handwritten German address words is based on Hidden Markov Models (HMMs). It is essential, that the structure of the address (name, street, city, country) is known, so that a specific small but complete dictionary can be selected. Choosing a wrong dictionary (OOV: out-of-vocabulary) or misrecognize the word, the recognition result should be rejected by means of the confidence measure. This paper points out two aspects: the comparison of four confidence measures for single words -- based on the likelihood, a garbage-model, a two-best recognition or a character decoding -- and the comparison of using complete or wrong dictionaries. It is shown, that the best confidence measure -- the two-best distance -- has a quite different behavior using OOV.
Dynamic behaviour of connectionist speech recognition with strong latency constraints
- Speech Comm
"... This paper describes the use of connectionist techniques in phonetic speech recognition with strong latency constraints. The constraints are imposed by the task of deriving the lip movements of a synthetic face in real time from the speech signal, by feeding the phonetic string into an articulatory ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
This paper describes the use of connectionist techniques in phonetic speech recognition with strong latency constraints. The constraints are imposed by the task of deriving the lip movements of a synthetic face in real time from the speech signal, by feeding the phonetic string into an articulatory synthesiser. Particular attention has been paid to analysing the interaction between the time evolution model learnt by the multi-layer perceptrons and the transition model imposed by the Viterbi decoder, in different latency conditions. Two experiments were conducted in which the time dependencies in the language model (LM) were controlled by a parameter. The results show a strong interaction between the three factors involved, namely the neural network topology, the length of time dependencies in the LM and the decoder latency. Key words: speech recognition, neural network, low latency, non-linear dynamics 1
An Acoustic Model Based on Kullback-Leibler Divergence for Posterior Features
, 2007
"... This paper investigates the use of features based on posterior probabilities of subword units such as phonemes. These features are typically transformed when used as inputs for a hidden Markov model with mixture of Gaussians as emission distribution (HMM/GMM). In this work, we introduce a novel aco ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This paper investigates the use of features based on posterior probabilities of subword units such as phonemes. These features are typically transformed when used as inputs for a hidden Markov model with mixture of Gaussians as emission distribution (HMM/GMM). In this work, we introduce a novel acoustic model that avoids the Gaussian assumption and directly uses posterior features without any transformation. This model is described by a finite state machine where each state is characterized by a target distribution and the cost function associated to each state is given by the Kullback-Leibler (KL) divergence between its target distribution and the posterior features. Furthermore, hybrid HMM/ANN system can be seen as a particular case of this KL-based model where state target distributions are predefined. A training method is also presented that minimizes the KL-divergence between the state target distributions and the posteriors features.
Abstract Articulatory-feature-based confidence measures
"... Confidence measures are computed to estimate the certainty that target acoustic units are spoken in specific speech segments. They are applied in tasks such as keyword verification or utterance verification. Because many of the confidence measures use the same set of models and features as in recogn ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Confidence measures are computed to estimate the certainty that target acoustic units are spoken in specific speech segments. They are applied in tasks such as keyword verification or utterance verification. Because many of the confidence measures use the same set of models and features as in recognition, the resulting scores may not provide an independent measure of reliability. In this paper, we propose two articulatory feature (AF) based phoneme confidence measures that estimate the acoustic reliability based on the match in AF properties. While acoustic-based features, such as Mel-frequency cepstral coefficients (MFCC), are widely used in speech processing, some recent works have focus on linguistically based features, such as the articulatory features that relate directly to the human articulatory process which may better capture speech characteristics. The articulatory features can either replace or complement the acoustic-based features in speech processing. The proposed AF-based measures in this paper were evaluated, in comparison and in combination, with the HMM-based scores on phoneme and keyword verification tasks using childrenÕs speech collected for a computer-based English pronunciation learning project. To fully evaluate their usefulness, the proposed measures and combinations were evaluated on both native and non-native data; and under field test conditions that mis-matches with the training condition. The experimental results show that under the different environments, combinations of the AF scores with the HMM-based
An Advanced Integrated Architecture for Wireless Voicemail Data Retrieval
- Proc. ICOIN
, 2001
"... This paper describes an alternative architecture for voicemail data retrieval on the move. It is comprised of three distinct components: a speech recognizer, a text summarizer and a WAP Push Service initiator, enabling mobile users to receive text summaries of their voicemail in real-time without an ..."
Abstract
- Add to MetaCart
This paper describes an alternative architecture for voicemail data retrieval on the move. It is comprised of three distinct components: a speech recognizer, a text summarizer and a WAP Push Service initiator, enabling mobile users to receive text summaries of their voicemail in real-time without an explicit request. Our approach overcomes the cost and usability limitations of the conventional voicemail retrieval paradigm which requires a connection establishment in order to listen to spoken messages. We report performance results on all different components of the system that has been trained and tested on a database comprised 1843 North American English messages as well as on the duration of the corresponding data path. The proposed architecture can be further customized to meet the requirements of a complete voicemail value-added service. Keywords: voicemail data retrieval, automatic speech recognition, text summarization, Wireless Application Protocol, Short Message Service.

