Results 11 - 20
of
46
Environmental Adaptation for Robust Speech Recognition
, 1994
"... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1. Approaches to Overcoming Environmental Variability . . . . . . ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1. Approaches to Overcoming Environmental Variability . . . . . . . . . . . . . . 6 1.1.1. Re-Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.1.2. Multi-Style Training . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.3. Environmental Compensation Using Dynamic Adaptation . . . . . . . . . . 8 1.2. Towards Environment-Independent Recognition . . . . . . . . . . . . . . . . 8 1.2.1. Sources of Environmental Variability . . . . . . . . . . . . . . . . . . 9 1.2.2. Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . 9 1.3. Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Chapter 2 Overview of Environmental Robustness in Speech Recognition . . . . . . 12 2.1. Sources of Degradation...
Polyphone Decision Tree Specialization For Language Adaptation
- in Proceedings of the ICASSP, Instanbul
, 2000
"... With the distribution of speech technology products all over the world, the fast and efficient portability to new target languages becomes a practical concern. In this paper we explore the relative effectiveness of adapting multilingual LVCSR systems to a new target language with limited adaptation ..."
Abstract
-
Cited by 17 (8 self)
- Add to MetaCart
With the distribution of speech technology products all over the world, the fast and efficient portability to new target languages becomes a practical concern. In this paper we explore the relative effectiveness of adapting multilingual LVCSR systems to a new target language with limited adaptation data. For this purpose we introduce a polyphone decision tree specialization method. Several recognition results are presented based on mono- and multilingual recognizers. These recognizers are developed in the framework of the project GlobalPhone. In this project we investigate speech recognition in the 15 languages Arabic, Mandarin and Shanghai Chinese, Croatian, English, French, German, Japanese, Korean, Portuguese, Russian, Spanish, Swedish, Tamil, and Turkish. 1. INTRODUCTION With the distribution of speech technology products all over the world, the fast and efficient portability to new target languages becomes a practical concern. One of the major time and costs factor for developin...
Near-Miss Modeling: A Segment-Based Approach to Speech Recognition
, 1998
"... Currently, most approaches to speech recognition are frame-based in that they represent speech as a temporal sequence of feature vectors. Although these approaches have been successful, they cannot easily incorporate complex modeling strategies that may further improve speech recognition performance ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Currently, most approaches to speech recognition are frame-based in that they represent speech as a temporal sequence of feature vectors. Although these approaches have been successful, they cannot easily incorporate complex modeling strategies that may further improve speech recognition performance. In contrast, segment-based approaches represent speech as a temporal graph of feature vectors and facilitate the incorporation of a wide range of modeling strategies. However, difficulties in segmentbased recognition have impeded the realization of potential advantages in modeling. This thesis
Hierarchical search for large vocabulary conversational speech recognition
- IEEE Signal Processing Magazine
, 1999
"... ABSTRACT 2 Speaker-independent speech recognition technology has made significant progress from the days of isolated word recognition. Today, state-of-the-art systems are capable of performing large vocabulary continuous speech recognition (LVCSR) on audio streams derived from complex information so ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
ABSTRACT 2 Speaker-independent speech recognition technology has made significant progress from the days of isolated word recognition. Today, state-of-the-art systems are capable of performing large vocabulary continuous speech recognition (LVCSR) on audio streams derived from complex information sources such as broadcast news and two-way telephone dialogs. A significant contribution to this advancement in technology is the development of search techniques that find suboptimal but accurate solutions in problems involving large search spaces and extremely complex statistical models. Moreover, these search strategies are capable of dynamically integrating information from a number of diverse knowledge sources to determine the correct word hypothesis, and limit the scope of the search by using a hierarchical search strategy. We refer to this problem as the decoding or search problem. This paper describes the complexity associated with decoding using hierarchical representations for linguistic and acoustic knowledge sources. An extensible object-oriented decoder available in the public domain, that leverages current state-of-the-art technology is described to illustrate these concepts. This decoder supports efficient handling of acoustic models for cross-word contextdependent phones, multiple pronunciations of words using lexical trees, and rescoring of word graphs based on N-gram language models in a single pass. It employs a state-of-the-art Viterbistyle dynamic programming algorithm, and is equipped with several heuristic pruning criteria to minimize the consumption of computational resources while maintaining good accuracy.
Coarse-to-Fine Dynamic Programming
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2001
"... We introduce an extension of dynamic programming (DP) we call "Coarse-to-Fine Dynamic Programming" (CFDP), ideally suited to DP problems with large state space. CFDP uses dynamic programming to solve a sequence of coarse approximations which are lower bounds to the original DP problem. These approxi ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
We introduce an extension of dynamic programming (DP) we call "Coarse-to-Fine Dynamic Programming" (CFDP), ideally suited to DP problems with large state space. CFDP uses dynamic programming to solve a sequence of coarse approximations which are lower bounds to the original DP problem. These approximations are developed by merging states in the original graph into "superstates" in a coarser graph which uses an optimistic arc cost between superstates. The approximations are designed so that when CFDP terminates the optimal path through the original state graph has been found. CFDP leads to significant decreases in the amount of computation necessary to solve many DP problems and can, in some instances, make otherwise infeasible computations possible. CFDP generalizes to DP problems with continuous state space and we offer a convergence result for this extension. The computation of the approximations requires that we bound the arc cost over all possible arcs associated with an adjacent pair of superstates; thus the feasibility of our proposed method requires the identification of such a lower bound. We demonstrate applications of this technique to optimization of functions and boundary estimation in mine recognition.
Environmental Robustness in Speech Recognition using Physiologically-Motivated Signal Processing
, 1993
"... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13 Chapter 1 Introduction 14 Chapter 2 The SPHINX Speech Recognition System 18 2.1. Front-End Si ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13 Chapter 1 Introduction 14 Chapter 2 The SPHINX Speech Recognition System 18 2.1. Front-End Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2. Vector Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3. Discrete Hidden Markov Model . . . . . . . . . . . . . . . . . . . . . . . . . 20 Chapter 3 Signal Processing Issues in Environmental Robustness 21 3.1. Sources of Degradation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.1.1 Additive Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.1.2 Linear Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.1.3 Other Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2. Solutions to the Environmental Robus...
HMM Continuous Speech Recognition Using Predictive LR Parsing
- In IEEE International Conference on Acoustics, Speech and Signal Processing
, 1989
"... This paper proposes a new continuous speech recognition method using an ecient parsing mechanism, an LR parser, driving HMM modules directly without any intervening structures such as a phoneme lattice. Accurate and ecient speech parsing is achieved by combining HMM and LR parsing. This method is te ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
This paper proposes a new continuous speech recognition method using an ecient parsing mechanism, an LR parser, driving HMM modules directly without any intervening structures such as a phoneme lattice. Accurate and ecient speech parsing is achieved by combining HMM and LR parsing. This method is tested in Japanese phrase recognition experiments. Two grammars are prepared, a general Japanese grammar and a task-specic grammar. The phrase recognition rate with the general grammar is 72% for top candidates and 95% for the ve best candidates. With the task-specic grammar, recognition rate is 80% and 99%, respectively. 1 INTRODUCTION There have been many speech recognition systems which use syntactic information to improve recognition accuracy. For example, statistical language modeling such as a bigram or a trigram [1, 2, 3], nite state grammars [4, 5] and context-free grammars [6, 7]. This paper proposes a new method for parsing speech data directly without any intervening structur...
Discriminative keyword spotting
- In Proc. of Workshop on Non-Linear Speech Processsing
, 2007
"... This paper proposes a new approach for keyword spotting, which is not based on HMMs. The proposed method employs a new discriminative learning procedure, in which the learning phase aims at maximizing the area under the ROC curve, ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
This paper proposes a new approach for keyword spotting, which is not based on HMMs. The proposed method employs a new discriminative learning procedure, in which the learning phase aims at maximizing the area under the ROC curve,
Wide Context Acoustic Modeling In Read Vs. Spontaneous Speech
, 1997
"... Context-dependent acoustic models have been applied in speech recognition research for many years, and have been shown to increase the recognition accuracy significantly. The most common approach is to use triphones. Recently, several speech recognition groups have started investigating the use of l ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Context-dependent acoustic models have been applied in speech recognition research for many years, and have been shown to increase the recognition accuracy significantly. The most common approach is to use triphones. Recently, several speech recognition groups have started investigating the use of larger phonetic context windows when building acoustic models. In this paper we discuss some of the computational problems arising from wide context modeling (polyphonic modeling) and present methods to cope with these problems. A two stage decision tree based polyphonic clustering approach is described which implements a more flexible parameter tying scheme. The new clustering approach gave us significant improvement across all tasks - WSJ, SWB, and Spontaneous Scheduling Task - and across all languages involved (German, Spanish, English). We report recognition results based on the JANUS speech recognition toolkit [2, 8] on two tasks comparing acoustic context phenomena in English read versu...
Integrating Dynamic Speech Modalities Into Context Decision Trees
- Proc. ICASSP 2000
, 2000
"... Context decision trees are widely used in the speech recognition community. Besides questions about phonetic classes of a phone's context, questions about their position within a word [Lee88] and questions about the gender of the current speaker [RC99] have been used so far. In this paper we additio ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Context decision trees are widely used in the speech recognition community. Besides questions about phonetic classes of a phone's context, questions about their position within a word [Lee88] and questions about the gender of the current speaker [RC99] have been used so far. In this paper we additionally incorporate questions about current modalities of the spoken utterance like the speaker's dialect, the speaking rate, the signal to noise ratio, the latter two of which may change while speaking one utterance. We present a framework that treats all these modalities in a uniform way. Experiments with the Janus speech recognizer have produced error rate reductions of up to 10% when compared to systems that do not use modality questions. 1. INTRODUCTION 1.1. Context Decision Trees in Janus As described in [FR97] and [Rog97], Janus uses decision trees [Ode92] to assign acoustic models to polyphone segments. The base algorithm of the decoder is described in [Wos98]. Like many other decoder...

