Results 1 - 10
of
27
Acoustical and Environmental Robustness in Automatic Speech Recognition
, 1990
"... This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment. These algorithms attempt to improve the recognition accuracy of speech recognition systems when they are trained and tested in d ..."
Abstract
-
Cited by 145 (8 self)
- Add to MetaCart
This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment. These algorithms attempt to improve the recognition accuracy of speech recognition systems when they are trained and tested in different acoustical environments, and when a desk-top microphone (rather than a close-talking microphone) is used for speech input. Without such processing, mismatches between training and testing conditions produce an unacceptable degradation in recognition accuracy. Two kinds of
Speech recognition by machines and humans
, 1997
"... This paper reviews past work comparing modern speech recognition systems and humans to determine how far recent dramatic advances in technology have progressed towards the goal of human-like performance. Comparisons use six modern speech corpora with vocabularies ranging from 10 to more than 65,000 ..."
Abstract
-
Cited by 101 (0 self)
- Add to MetaCart
This paper reviews past work comparing modern speech recognition systems and humans to determine how far recent dramatic advances in technology have progressed towards the goal of human-like performance. Comparisons use six modern speech corpora with vocabularies ranging from 10 to more than 65,000 words and content ranging from read isolated words to spontaneous conversations. Error rates of machines are often more than an order of magnitude greater than those of humans for quiet, wideband, read speech. Machine performance degrades further below that of humans in noise, with channel variability, and for spontaneous speech. Humans can also recognize quiet, clearly spoken nonsense syllables and nonsense sentences with little high-level grammatical information. These comparisons suggest that the human–machine performance gap can be reduced by basic research on improving low-level acoustic-phonetic modeling, on improving robustness with noise and channel variability, and on more accurately modeling spontaneous speech.
Survey of the State of the Art in Human Language Technology
, 1995
"... Contents 1 Spoken Language Input 1 Ron Cole & Victor Zue, chapter editors 1.1 Overview : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 Victor Zue & Ron Cole 1.2 Speech Recognition : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 Victor Zue, Ron Cole, & Wayne Ward 1.3 Sig ..."
Abstract
-
Cited by 47 (0 self)
- Add to MetaCart
Contents 1 Spoken Language Input 1 Ron Cole & Victor Zue, chapter editors 1.1 Overview : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 Victor Zue & Ron Cole 1.2 Speech Recognition : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 Victor Zue, Ron Cole, & Wayne Ward 1.3 Signal Representation : : : : : : : : : : : : : : : : : : : : : : : : : : 11 Melvyn J. Hunt 1.4 Robust Speech Recognition : : : : : : : : : : : : : : : : : : : : : : 17 Richard M. Stern 1.5 HMM Methods in Speech Recognition : : : : : : : : : : : : : : : 24 Renato De Mori & Fabio Brugnara 1.6 Language Representation : : : : : : : : : : : : : : : : : : : : : : : : 35 Salim Roukos 1.7 Speaker Recognition : : : : : : : : : : : : : : : : : : : : : : : : : : :<F35.37
Automatic Transcription of General Audio Data: Preliminary Analyses
"... The task of automatically transcribing general audio data is very different from the transcription task typically required of current automatic speech recognition systems. The general goal of this work is to quantify the difficult issues posed by such data, thus leading to an understanding of how a ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
The task of automatically transcribing general audio data is very different from the transcription task typically required of current automatic speech recognition systems. The general goal of this work is to quantify the difficult issues posed by such data, thus leading to an understanding of how a speech recognition system may have to be altered to accommodate the added complexities. Specifically, we describe some preliminary analyses and experiments we have conducted on data collected from a radio news program. We found that using relatively straightforward acoustic measurements and classification techniques, we were able to achieve better than 80 % classification accuracy for seven salient sound classes present in the data, and nearly 94 % classification accuracy for a speech/non-speech decision. In addition, lexical analysis revealed that while the vocabulary size of a single broadcast is moderate, it grows exponentially as more shows are added.
Uncertainty decoding for noise robust speech recognition
- in Proc. Interspeech
, 2004
"... This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings ..."
Abstract
-
Cited by 26 (8 self)
- Add to MetaCart
This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings
Joint uncertainty decoding for robust large vocabulary speech recognition
, 2006
"... Standard techniques to increase automatic speech recognition noise robustness typically assume recognition models are clean trained. This “clean ” training data may in fact not be clean at all, but may contain channel variations, varying noise conditions, as well as different speakers. Hence rather ..."
Abstract
-
Cited by 23 (20 self)
- Add to MetaCart
Standard techniques to increase automatic speech recognition noise robustness typically assume recognition models are clean trained. This “clean ” training data may in fact not be clean at all, but may contain channel variations, varying noise conditions, as well as different speakers. Hence rather than considering noise robustness techniques as compensating clean acoustic models for environmental noise, they may be thought of as reducing the acoustic mismatch between training and test conditions. This report examines the application of VTS model compensation or model-based Joint uncertainty decoding to clean and multistyle trained systems. An EM-based noise estimation procedure is also presented to produce ML VTS or Joint noise models depending on the form of compensation used. Alternatively, compared to multistyle training, adaptive training with Joint uncertainty transforms, also referred to as JAT in this work, provides a better method for handling heterogeneous data. With JAT, the uncertainty bias added to the model variances de-weights observations proportional to the noise level. In this way, Joint transforms normalise the noise from the data allowing the canonical model to solely represent the underlying “clean ” acoustic signal. This
Environmental Adaptation for Robust Speech Recognition
, 1994
"... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1. Approaches to Overcoming Environmental Variability . . . . . . ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1. Approaches to Overcoming Environmental Variability . . . . . . . . . . . . . . 6 1.1.1. Re-Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.1.2. Multi-Style Training . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.3. Environmental Compensation Using Dynamic Adaptation . . . . . . . . . . 8 1.2. Towards Environment-Independent Recognition . . . . . . . . . . . . . . . . 8 1.2.1. Sources of Environmental Variability . . . . . . . . . . . . . . . . . . 9 1.2.2. Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . 9 1.3. Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Chapter 2 Overview of Environmental Robustness in Speech Recognition . . . . . . 12 2.1. Sources of Degradation...
Effect of speaking style on LVCSR performance
- In ICSLP
, 1996
"... SRI collected a corpus to study how spontaneous speech differs from other types of speech. The corpus was collected in two parts: (1) a spontaneous Switchboard-style conversation on an assigned topic, and (2) a reading session in which participants read transcripts of their conversation from part 1. ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
SRI collected a corpus to study how spontaneous speech differs from other types of speech. The corpus was collected in two parts: (1) a spontaneous Switchboard-style conversation on an assigned topic, and (2) a reading session in which participants read transcripts of their conversation from part 1. Experiments were conducted on sentences with identical transcripts that varied in speaking style. The word-error rates varied from 29% (careful dictation) to 53 % (spontaneous conversation) depending on the speaking style. These experiments show that speaking style is a dominant factor in determining the performance of large-vocabulary conversational speech recognition (LVCSR) systems. 1.
A Novel Framework and Training Algorithm for VariableParameter Hidden Markov Models
- IEEE trans. on Audio, Speech, and Language Processing
"... Abstract—We propose a new framework and the associated maximum-likelihood and discriminative training algorithms for the variable-parameter hidden Markov model (VPHMM) whose mean and variance parameters vary as functions of additional environment-dependent conditioning parameters. Our framework diff ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
Abstract—We propose a new framework and the associated maximum-likelihood and discriminative training algorithms for the variable-parameter hidden Markov model (VPHMM) whose mean and variance parameters vary as functions of additional environment-dependent conditioning parameters. Our framework differs from the VPHMM proposed by Cui and Gong (2007) in that piecewise spline interpolation instead of global polynomial regression is used to represent the dependency of the HMM parameters on the conditioning parameters, and a more effective functional form is used to model the variances. Our framework unifies and extends the conventional discrete VPHMM. It no longer requires quantization in estimating the model parameters and can support both parameter sharing and instantaneous conditioning parameters naturally. We investigate the strengths and weaknesses of the model on the Aurora-3 corpus. We show that under the well-matched condition the proposed discriminatively trained VPHMM outperforms the conventional HMM trained in the same way with relative word error rate (WER) reduction of 19 % and 15%, respectively, when only mean is updated and when both mean and variances are updated. Index Terms—Discriminative training, growth transformation, parameter clustering, speech recognition, spline interpolation, variable-parameter hidden Markov model (VPHMM). I.

