Results 1 - 10
of
24
Signal modeling techniques in speech recognition
- PROCEEDINGS OF THE IEEE
, 1993
"... We have seen three important trends develop in the last five years in speech recognition. First, heterogeneous parameter sets that mix absolute spectral information with dynamic, or time-derivative, spectral information, have become common. Second, similariry transform techniques, often used to norm ..."
Abstract
-
Cited by 99 (5 self)
- Add to MetaCart
We have seen three important trends develop in the last five years in speech recognition. First, heterogeneous parameter sets that mix absolute spectral information with dynamic, or time-derivative, spectral information, have become common. Second, similariry transform techniques, often used to normalize and decor-relate parameters in some computationally inexpensive way, have become popular. Third, the signal parameter estimation problem has merged with the speech recognition process so that more sophisticated statistical models of the signal’s spectrum can be estimated in a closed-loop manner. In this paper, we review the signal processing components of these algorithms. These al-gorithms are presented as part of a unified view of the signal parameterization problem in which there are three major tasks: measurement, transformation, and statistical modeling. This paper is by no means a comprehensive survey of all possible techniques of signal modeling in speech recognition. There are far too many algorithms in use today to make an exhaustive survey feasible (and cohesive). Instead, this paper is meant to serve as a tutorial on signal processing in state-of-the-art speech recognition systems and to review those techniques most commonly used. In keeping with this goal, a complete mathematical description of each algorithm has been included in the paper.
Markovian Models for Sequential Data
, 1996
"... Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many machine learning applications, especially for speech recognition. Furthermore, in the last few years, many new and promising probabilistic models related to HMMs have been proposed. We firs ..."
Abstract
-
Cited by 69 (2 self)
- Add to MetaCart
Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many machine learning applications, especially for speech recognition. Furthermore, in the last few years, many new and promising probabilistic models related to HMMs have been proposed. We first summarize the basics of HMMs, and then review several recent related learning algorithms and extensions of HMMs, including in particular hybrids of HMMs with artificial neural networks, Input-Output HMMs (which are conditional HMMs using neural networks to compute probabilities), weighted transducers, variable-length Markov models and Markov switching state-space models. Finally, we discuss some of the challenges of future research in this very active area. 1 Introduction Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many applications in artificial intelligence, pattern recognition, speech recognition, and modeling of biological ...
Support vector machines for speech recognition
- Proceedings of the International Conference on Spoken Language Processing
, 1998
"... Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative informati ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative information and are prone to overfitting and over-parameterization. Recent work in machine learning has focused on models, such as the support vector machine (SVM), that automatically control generalization and parameterization as part of the overall optimization process. In this paper, we show that SVMs provide a significant improvement in performance on a static pattern classification task based on the Deterding vowel data. We also describe an application of SVMs to large vocabulary speech recognition, and demonstrate an improvement in error rate on a continuous alphadigit task (OGI Aphadigits) and a large vocabulary conversational speech task (Switchboard). Issues related to the development and optimization of an SVM/HMM hybrid system are discussed.
Multi-Digit Recognition Using A Space Displacement Neural Network
- Neural Information Processing Systems
, 1992
"... We present a feed-forward network architecture for recognizing an unconstrained handwritten multi-digit string. This is an extension of previous work on recognizing isolated digits. In this architecture a single digit recognizer is replicated over the input. The output layer of the network is couple ..."
Abstract
-
Cited by 33 (7 self)
- Add to MetaCart
We present a feed-forward network architecture for recognizing an unconstrained handwritten multi-digit string. This is an extension of previous work on recognizing isolated digits. In this architecture a single digit recognizer is replicated over the input. The output layer of the network is coupled to a Viterbi alignment module that chooses the best interpretation of the input. Training errors are propagated through the Viterbi module. The novelty in this procedure is that segmentation is done on the feature maps developed in the Space Displacement Neural Network (SDNN) rather than the input (pixel) space. 1 Introduction In previous work (Le Cun et al., 1990) we have demonstrated a feed-forward backpropagation network that recognizes isolated handwritten digits at state-of-the-art performance levels. The natural extension of this work is towards recognition of unconstrained strings of handwritten digits. The most straightforward solution is to divide the process into two: segmentati...
A tutorial on energy-based learning
- Predicting Structured Data
, 2006
"... Energy-Based Models (EBMs) capture dependencies between variables by associating a scalar energy to each configuration of the variables. Inference consists in clamping the value of observed variables and finding configurations of the remaining variables that minimize the energy. Learning consists in ..."
Abstract
-
Cited by 27 (6 self)
- Add to MetaCart
Energy-Based Models (EBMs) capture dependencies between variables by associating a scalar energy to each configuration of the variables. Inference consists in clamping the value of observed variables and finding configurations of the remaining variables that minimize the energy. Learning consists in finding an energy function in which observed configurations of the variables are given lower energies than unobserved ones. The EBM approach provides a common theoretical framework for many learning models, including traditional discriminative and generative approaches, as well as graph-transformer networks, conditional random fields, maximum margin Markov networks, and several manifold learning methods. Probabilistic models must be properly normalized, which sometimes requires evaluating intractable integrals over the space of all possible variable configurations. Since EBMs have no requirement for proper normalization, this problem is naturally circumvented. EBMs can be viewed as a form of non-probabilistic factor graphs, and they provide considerably more flexibility in the design of architectures and training criteria than probabilistic approaches. 1
Multimodal Interfaces
- Artificial Intelligence Review Journal, special issue
, 1994
"... In this paper, we present an overview of research in our laboratories on Multimodal Human Computer Interfaces. The goal for such interfaces is to free human computer interaction from the limitations and acceptance barriers due to rigid operating commands and keyboards as only/main I/O-device. Instea ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
In this paper, we present an overview of research in our laboratories on Multimodal Human Computer Interfaces. The goal for such interfaces is to free human computer interaction from the limitations and acceptance barriers due to rigid operating commands and keyboards as only/main I/O-device. Instead we move to involve all available human communication modalities. These human modalities include Speech, Gesture and Pointing,
Connected Letter Recognition with a Multi-State Time Delay Neural Network
- In 3rd European Conference on Speech, Communication and Technology (EUROSPEECH) 93
, 1993
"... The Multi-State Time Delay Neural Network (MS-TDNN) integrates a nonlinear time alignment procedure (DTW) and the highaccuracy phoneme spotting capabilities of a TDNN into a connectionist speech recognition system with word-level classification and error backpropagation. We present an MS-TDNN for re ..."
Abstract
-
Cited by 22 (13 self)
- Add to MetaCart
The Multi-State Time Delay Neural Network (MS-TDNN) integrates a nonlinear time alignment procedure (DTW) and the highaccuracy phoneme spotting capabilities of a TDNN into a connectionist speech recognition system with word-level classification and error backpropagation. We present an MS-TDNN for recognizing continuously spelled letters, a task characterized by a small but highly confusable vocabulary. Our MS-TDNN achieves 98.5/92.0% word accuracy on speaker dependent/independent tasks, outperforming previously reported results on the same databases. We propose training techniques aimed at improving sentence level performance, including free alignment across word boundaries, word duration modeling and error backpropagation on the sentence rather than the word level. Architectures integrating submodules specialized on a subset of speakers achieved further improvements. 1 INTRODUCTION The recognition of spelled strings of letters is essential for all applications involving proper names,...
Prototype-Based Minimum Classification Error / Generalized Probabilistic Descent Training for Various Speech Units
- Computer Speech and Language
, 1994
"... In previous work we reported high classification rates for Learning Vector Quantization (LVQ) networks trained to classify phoneme tokens shifted in time. It has since been shown that the framework of Minimum Classification Error (MCE) and Generalized Probabilistic Descent (GPD) can treat LVQ as a s ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
In previous work we reported high classification rates for Learning Vector Quantization (LVQ) networks trained to classify phoneme tokens shifted in time. It has since been shown that the framework of Minimum Classification Error (MCE) and Generalized Probabilistic Descent (GPD) can treat LVQ as a special case of a general method for gradient descent on a rigorously defined classification loss measure that closely reflects the misclassification rate. This framework allows us to extend LVQ into a prototype-based minimum error classifier (PBMEC) appropriate for the classification of various speech units which the original LVQ was unable to treat. Speech categories are represented using a prototype-based multi-state architecture incorporating a Dynamic Time Warping procedure. We present results for the difficult E-set task, as well as for isolated word recognition for a vocabulary of 5240 words, that reveal clear gains in performance as a result of using PBMEC. In addition, we discuss the...
A Connectionist Recognizer For On-Line Cursive Handwriting Recognition
- Proc. ICASSP'94
"... In this paper we show how the Multi-State Time Delay Neural Network (MS-TDNN), which is already used successfully in continuous speech recognition tasks, can be applied both to online single character and cursive (continuous) handwriting recognition. The MS-TDNN integrates the high accuracy single c ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
In this paper we show how the Multi-State Time Delay Neural Network (MS-TDNN), which is already used successfully in continuous speech recognition tasks, can be applied both to online single character and cursive (continuous) handwriting recognition. The MS-TDNN integrates the high accuracy single character recognition capabilities of a TDNN with a non-linear time alignment procedure (dynamic time warping algorithm) for finding stroke and character boundaries in isolated, handwritten characters and words. In this approach each character is modelled by up to 3 different states and words are represented as a sequence of these characters. We describe the basic MS-TDNN architecture and the input features used in this paper, and present results (up to 97.7% word recognition rate) both on writer dependent/ independent, single character recognition tasks and writer dependent, cursive handwriting tasks with varying vocabulary sizes up to 20000 words. 1. INTRODUCTION This paper describes a con...
Recognition Of Spelled Names Over The Telephone
- Proceedings of ICSLP `96
, 1996
"... Recognition of spelled names over the telephone line is essential for applications such as telephone directory assistance, or automatic mail ordering. We present recognition results on the spelling section of the OGI Spelled and Spoken Word Telephone Corpus, using a Multi-State Time Delay Neural Net ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Recognition of spelled names over the telephone line is essential for applications such as telephone directory assistance, or automatic mail ordering. We present recognition results on the spelling section of the OGI Spelled and Spoken Word Telephone Corpus, using a Multi-State Time Delay Neural Network (MS-TDNN). Many applications allow for strong language modeling constraints. In our experiments we examined the beneficial effects of reducing the search space to a list of last names, ranging from about 1000 to 14 million entries. We compare tree search methods and show that significant improvements can be achieved by enriching the search trees with probabilities.

