Results 1 -
7 of
7
Acoustical and Environmental Robustness in Automatic Speech Recognition
, 1990
"... This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment. These algorithms attempt to improve the recognition accuracy of speech recognition systems when they are trained and tested in d ..."
Abstract
-
Cited by 145 (8 self)
- Add to MetaCart
This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment. These algorithms attempt to improve the recognition accuracy of speech recognition systems when they are trained and tested in different acoustical environments, and when a desk-top microphone (rather than a close-talking microphone) is used for speech input. Without such processing, mismatches between training and testing conditions produce an unacceptable degradation in recognition accuracy. Two kinds of
The SPHINX-II Speech Recognition System: An Overview
- Computer, Speech and Language
, 1992
"... In order for speech recognizers to deal with increased task perplexity, speaker variation, and environment variation, improved speech recognition is critical. Steady progress has been made along these three dimensions at Carnegie Mellon. In this paper, we review the SPHINX-II speech recognition syst ..."
Abstract
-
Cited by 137 (7 self)
- Add to MetaCart
In order for speech recognizers to deal with increased task perplexity, speaker variation, and environment variation, improved speech recognition is critical. Steady progress has been made along these three dimensions at Carnegie Mellon. In this paper, we review the SPHINX-II speech recognition system and summarize our recent efforts on improved speech recognition. This research was sponsored by the Defense Advanced Research Projects Agency and monitored by the Space and Naval Warfare Systems Command under Contract N00039-91-C-0158, ARPA Order No. 7239. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government. Keywords: Speech recognition, hidden Markov models, SPHINX-II 1. INTRODUCTION At Carnegie Mellon, wehave made significant progress in large-vocabulary speaker-independent continuous speech recognition during the past years [1, 2, 3]. SP...
Markovian Models for Sequential Data
, 1996
"... Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many machine learning applications, especially for speech recognition. Furthermore, in the last few years, many new and promising probabilistic models related to HMMs have been proposed. We firs ..."
Abstract
-
Cited by 69 (2 self)
- Add to MetaCart
Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many machine learning applications, especially for speech recognition. Furthermore, in the last few years, many new and promising probabilistic models related to HMMs have been proposed. We first summarize the basics of HMMs, and then review several recent related learning algorithms and extensions of HMMs, including in particular hybrids of HMMs with artificial neural networks, Input-Output HMMs (which are conditional HMMs using neural networks to compute probabilities), weighted transducers, variable-length Markov models and Markov switching state-space models. Finally, we discuss some of the challenges of future research in this very active area. 1 Introduction Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many applications in artificial intelligence, pattern recognition, speech recognition, and modeling of biological ...
State-Based Gaussian Selection In Large Vocabulary Continuous Speech Recognition Using HMMs
, 1998
"... This paper investigates the use of Gaussian Selection (GS) to increase the speed of a large vocabulary speech recognition system. Typically 30-70% of the computational time of a continuous density HMM-based speech recogniser is spent calculating probabilities. The aim of GS is to reduce this load ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
This paper investigates the use of Gaussian Selection (GS) to increase the speed of a large vocabulary speech recognition system. Typically 30-70% of the computational time of a continuous density HMM-based speech recogniser is spent calculating probabilities. The aim of GS is to reduce this load by selecting the subset of Gaussian component likelihoods that should be computed given a particular input vector. This paper examines new techniques for obtaining "good" Gaussian subsets or "shortlists". All the new schemes make use of state information, specifically which state each of the Gaussian components belongs to. In this way a maximum number of Gaussian components per state may be specified, hence reducing the size of the shortlist. The first technique introduced is a simple extension of the standard GS method, which uses this state information. Then, more complex schemes based on maximising the likelihood of the training data are proposed. These new approaches are compared with the standard GS scheme on a large vocabulary speech recognition task. On this task, the use of state information reduced the percentage of Gaussians computed to 10-15%, compared with 20-30% for the standard GS scheme, with little degradation in performance. 1 M.J.F.Gales is now at the IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA. 2 K.M. Knill is now at Nuance Communications, 1380 Willow Rd, Menlo Park, CA 94025, USA. List of Tables 1 Change in the average forced alignment likelihood of the ARPA 1994 H1 development data for SGS and SBGS systems, compared to the standard no GS system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2 Recognition performance of the standard no GS, SGS and SBGS systems on the ARPA 1994 H...
Histogram Equalization of the Speech Representation for Robust Speech Recognition
, 2001
"... The noise degrades the performance of Automatic Speech Recognition systems mainly due to the mismatch between the training and recognition conditions it introduces. The noise causes a distortion of the feature space which usually presents a non-linear behavior. In order to reduce this mismatch, the ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
The noise degrades the performance of Automatic Speech Recognition systems mainly due to the mismatch between the training and recognition conditions it introduces. The noise causes a distortion of the feature space which usually presents a non-linear behavior. In order to reduce this mismatch, the methods proposed for robust speech recognition try to compensate the noise effect either by obtaining an estimation of the clean speech or by adapting the recognizer acoustic models for a proper modeling of the noisy speech. In this paper we propose a method to compensate the noise effect over the speech representation. This method is based on the histogram equalization technique frequently applied for Digital Image Processing, which has been adapted to the speech representation. For each component of the feature vectors representing the speech signal, the histogram is estimated and the transformation which converts it into a reference histogram is calculated. Such transformations tend to compensate the distortion the noise produces over the different components of the feature vector and improve the performance of the recognition systems under noise conditions. We describe how the histogram equalization method can be adapted to robust speech recognition and present some recognition experiments to evaluate the proposed method.
Improved Hidden Markov Modeling for Speaker-Independent Continuous Speech Recognition
- Proc. DARPA Speech and Natural language Workshop
, 1990
"... This paper reports recent efforts to further improve the perfor-mance of the Sphinx system for speaker-independent contin-uous speech recognition. The recognition error rate is signifi-cantly reduced with incorporation of additional dynamic fea-tures, semi-continuous hidden Markov models, and speake ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
This paper reports recent efforts to further improve the perfor-mance of the Sphinx system for speaker-independent contin-uous speech recognition. The recognition error rate is signifi-cantly reduced with incorporation of additional dynamic fea-tures, semi-continuous hidden Markov models, and speaker clustering. For the June 1990 (RM2) evaluation test set, the error rates of our current system are 4.3 % and 19.9 % for word-pair grammar and no grammar respectively.
An Optimally Randomized Minimax Algorithm (manuscript submitted for publication and subject to change)
"... This short paper proposes a simple extension of the celebrated MINIMAX algorithm used in zero-sum two-player games, called Rminimax. The Rminimax algorithm allows controlling the strength of an artificial rival by randomizing its strategy in an optimal way. In particular, the randomized shortest-pat ..."
Abstract
- Add to MetaCart
This short paper proposes a simple extension of the celebrated MINIMAX algorithm used in zero-sum two-player games, called Rminimax. The Rminimax algorithm allows controlling the strength of an artificial rival by randomizing its strategy in an optimal way. In particular, the randomized shortest-path framework (Saerens et al., 2009) is applied for biasing the AI adversary towards worse or better solutions, therefore controlling its strength. This framework takes into account all possible strategies by computing an optimal trade-off between exploration (quantified by the spread entropy in the tree) and exploitation (quantified by the expected cost to an end game) of the game tree. As opposed to other tree-exploration techniques, this new algorithm considers complete paths of a tree (strategies) where a given entropy is spread. The optimal randomized strategy is efficiently computed by means of a simple recurrence relation while keeping the same complexity as the original MINIMAX. As a result, the Rminimax implements a non-deterministic, strength-adapted, AI opponent for board games in a principled way, thus avoiding the assumption of complete rationality. Simulations on two common games show that Rminimax behaves as expected.

