Results 1 -
5 of
5
Dictionary Learning For Spontaneous Speech Recognition
, 1996
"... Spontaneous speech adds a variety of phenomena to a speech recognition task: false starts, human and nonhuman noises, new words, and alternative pronunciations. All of these phenomena have to be tackled when adapting a speech recognition system for spontaneous speech. In this paper we will focus on ..."
Abstract
-
Cited by 32 (0 self)
- Add to MetaCart
Spontaneous speech adds a variety of phenomena to a speech recognition task: false starts, human and nonhuman noises, new words, and alternative pronunciations. All of these phenomena have to be tackled when adapting a speech recognition system for spontaneous speech. In this paper we will focus on how to automatically expand and adapt phonetic dictionaries for spontaneous speech recognition. Especially for spontaneous speech it is important to choose the pronunciations of a word according to the frequency in which they appear in the database rather than the "correct" pronunciation as might be found in a lexicon. Therefore, we proposed a data-driven approach to add new pronunciations to a given phonetic dictionary [1] in a way that they model the given occurrences of words in the database. We will show how this algorithm can be extended to produce alternative pronunciations for word tuples and frequently misrecognized words. We will also discuss how further knowledge can be incorporated into the phoneme recognizer in a way that it learns to generalize from pronunciations which were found previously. The experiments have been performed on the German Spontaneous Scheduling Task (GSST), using the speech recognition engine of JANUS 2, the spontaneous speech-to-speech translation system of the Interactive Systems Laboratories at Carnegie Mellon and Karlsruhe University [2, 3].
The Bucket Box Intersection (BBI) Algorithm For Fast Approximative Evaluation Of Diagonal Mixture Gaussians
- In Proc. ICASSP
, 1996
"... Today, most of the state-of-the-art speech recognizers are based on Hidden Markov modeling. Using semi-continuous or continuous density Hidden Markov Models, the computation of emission probabilities requires the evaluation of mixture Gaussian probability density functions. Since it is very expensiv ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
Today, most of the state-of-the-art speech recognizers are based on Hidden Markov modeling. Using semi-continuous or continuous density Hidden Markov Models, the computation of emission probabilities requires the evaluation of mixture Gaussian probability density functions. Since it is very expensive to evaluate all the Gaussians of the mixture density codebook, many recognizers only compute the M most significant Gaussians (M = 1; : : : ; 8). This paper presents an alternative approach to approximate mixture Gaussians with diagonal covariance matrices, based on a binary feature space partitioning tree. The proposed algorithm is experimentally evaluated in the context of large vocabulary, speaker independent, spontaneous speech recognition using the JANUS-2 speech recognizer. In the case of mixtures with 50 Gaussians, we achieve a speedup of 2-5 in the computation of HMM emission probabilities, without affecting the accuracy of the system. 1. INTRODUCTION To approximate the log probab...
Dictionary Learning: Performance through Consistency
, 1995
"... We present first results from our efforts in automatically increasing and adapting phonetic dictionaries for spontaneous speech recognition. Spontaneous speech adds a variety of phenomena to a speech recognition task: false starts [1], human and nonhuman noises [2], new words [3] and alternative pro ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
We present first results from our efforts in automatically increasing and adapting phonetic dictionaries for spontaneous speech recognition. Spontaneous speech adds a variety of phenomena to a speech recognition task: false starts [1], human and nonhuman noises [2], new words [3] and alternative pronunciations. All of these phenomena have to be tackled when adapting a speech recognition system for spontaneous speech. For phonetic dictionaries (especially for spontaneous speech) it is important to choose the pronunciations of a word according to the frequency in which they appear in the database rather than the "correct" pronunciation as it might be found in a lexicon. Additionally modifications of the dictionary should not lead to a higher phoneme confusability. Therefore we propose a data-driven approach to add new pronunciations to a given phonetic dictionary, in a way that they model the given occurrences of words in the database. We show how even a simple approach can lead to signi...
Language Models For A Spelled Letter Recognizer
- In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing
, 1995
"... In some speech recognition applications, it is reasonable to constrain the search space of a speech recognizer to a large but finite set of sentences. We demonstrate the problem on a spelling task, where the recognition of continuously spelled last names is constrained to 110,000 entries (= 43,000 u ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
In some speech recognition applications, it is reasonable to constrain the search space of a speech recognizer to a large but finite set of sentences. We demonstrate the problem on a spelling task, where the recognition of continuously spelled last names is constrained to 110,000 entries (= 43,000 unique names) of a telephone book. Several techniques to address this problem are compared: recognition without any language model, bigrams, functions to map a hypothesis onto a legal string, n-best lists, and finally a newly developed method which integrates all constraints directly into the search process within reasonable memory and time bounds. The baseline result of 56% string accuracy is improved to 62, 85, 88, and 92%, respectively. To appear in: Proc. IEEE International Conf. on Acoustics, Speech, and Signal Processing, Detroit, USA, May 1995. 1. INTRODUCTION Spelled letter recognition is an essential subtask of many speech recognition systems. Applications include spelling of arbit...
Janus-II - Translation Of Spontaneous Conversational Speech
, 1996
"... JANUS-II is a research system to design and test components of speech-to-speech translation systems as well as a research prototype for such a system. We will focus on two aspects of the system: 1) new features of the speech recognition component JANUS-SR, 2) the end-to-end performance of JANUS-II, ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
JANUS-II is a research system to design and test components of speech-to-speech translation systems as well as a research prototype for such a system. We will focus on two aspects of the system: 1) new features of the speech recognition component JANUS-SR, 2) the end-to-end performance of JANUS-II, including a comparison of two machine translation strategies used for JANUS-MT (PHOENIX and GLR*). 1. INTRODUCTION Currently JANUS-II components for English, German, Korean, Japanese, and Spanish speech input and translation are under development; though not all language pairs can always be kept at the same performance level, multilinguality is required to ensure generality in the recognition and translation approaches. A number of smaller and larger scale research projects contribute to the JANUS-II system [1], including language identification [2], robust speech recognition [3], recognition speed [4], noise modeling [5], new word modeling [6], portability to new languages [7], language mo...

