Results 1 -
5 of
5
Lexical Modeling in a Speaker Independent Speech Understanding System
, 1993
"... Over the past 40 years, significant progress has been made in the fields of speech recognition and speech understanding. Current state-of-the-art speech recognition systems are capable of achieving word-level accuracies of 90 % to 95 % on continuous speech recognition tasks using 5000 words. Even la ..."
Abstract
-
Cited by 39 (8 self)
- Add to MetaCart
Over the past 40 years, significant progress has been made in the fields of speech recognition and speech understanding. Current state-of-the-art speech recognition systems are capable of achieving word-level accuracies of 90 % to 95 % on continuous speech recognition tasks using 5000 words. Even larger systems, capable of recognizing 20,000 words are just now being developed. Speech understanding systems have recently been developed that perform fairly well within a restricted domain. While the size and performance of modern speech recognition and understanding systems are impressive, it is evident to anyone who has used these systems that the technology is primitive compared to our own human ability to understand speech. Some of the difficulties hampering progress in the fields of speech recognition and understanding stem from the many sources of variation that occur during human communication. One of the sources of variation that occurs in human communication is the different ways that words can be pronounced. There are many causes of pronunciation variation, such as: the phonetic environment in which the word occurs, the dialect of the speaker,
LVCSR log-likelihood ratio scoring for keyword spotting
- in Proc. ICASSP, 129–132
, 1995
"... A new scoring algorithm has been developed for generating wordspotting hypotheses and their associated scores. This technique uses a large-vocabulary continuous speech recognition (LVCSR) system to generate the N-best answers along with their Viterbi alignments. The score for a putative hit is compu ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
A new scoring algorithm has been developed for generating wordspotting hypotheses and their associated scores. This technique uses a large-vocabulary continuous speech recognition (LVCSR) system to generate the N-best answers along with their Viterbi alignments. The score for a putative hit is computed by summing the likelihoods for all hypotheses that contain the keyword normalized by dividing by the sum of all hypothesis likelihoods in the N-best list. Using a test set of conversational speech from Switchboard Credit Card conversations, we achieved an 81 % figure of merit (FOM). Our word recognition error rate on this same test set is 54.7%. 1.
Environmental Adaptation for Robust Speech Recognition
, 1994
"... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1. Approaches to Overcoming Environmental Variability . . . . . . ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1. Approaches to Overcoming Environmental Variability . . . . . . . . . . . . . . 6 1.1.1. Re-Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.1.2. Multi-Style Training . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.3. Environmental Compensation Using Dynamic Adaptation . . . . . . . . . . 8 1.2. Towards Environment-Independent Recognition . . . . . . . . . . . . . . . . 8 1.2.1. Sources of Environmental Variability . . . . . . . . . . . . . . . . . . 9 1.2.2. Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . 9 1.3. Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Chapter 2 Overview of Environmental Robustness in Speech Recognition . . . . . . 12 2.1. Sources of Degradation...
Environment normalization for robust speech recognition using direct cepstral comparison
- Proc. ICASSP-94
, 1994
"... In this paper we describe and evaluate a series of new algorithms that compensate for the effects of unknown acoustical environments (or changes in environment) through the use of compensation vectors that are added to the cepstral representations of speech that is input to a speech recognition syst ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
In this paper we describe and evaluate a series of new algorithms that compensate for the effects of unknown acoustical environments (or changes in environment) through the use of compensation vectors that are added to the cepstral representations of speech that is input to a speech recognition system. These compensation vectors are obtained from direct frame-by-frame comparisons of the cepstral representations of speech that is simultaneously recorded in the training environment and various testing environments, but the algorithms do not make use of such “stereo ” speech data in analyzing speech from an unknown environment. In the proposed paper we will compare the improvement in recognition accuracy provided by the algorithms using common standard ARPA speech recognition corpora. For example, the normalization algorithm known as MFCDCN provided a 22 % reduction in word error rate when compared to results obtained using cepstral mean normalization on the 1992 ARPA WSJ/CSR corpus, and a 56.6 % reduction in error rate compared to baseline processing. A family of new algorithms, PDCN, which accomplish the environment normalization inside the decoder are described and evaluated in the same corpus. A substantial word error rate reduction, 66.8%, can be achieved by combining MFCDCN and PDCN in the system with cepstral mean normalization compared to baseline system. 1.

