Results 1 -
7 of
7
Support vector machines for speech recognition
- Proceedings of the International Conference on Spoken Language Processing
, 1998
"... Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative informati ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative information and are prone to overfitting and over-parameterization. Recent work in machine learning has focused on models, such as the support vector machine (SVM), that automatically control generalization and parameterization as part of the overall optimization process. In this paper, we show that SVMs provide a significant improvement in performance on a static pattern classification task based on the Deterding vowel data. We also describe an application of SVMs to large vocabulary speech recognition, and demonstrate an improvement in error rate on a continuous alphadigit task (OGI Aphadigits) and a large vocabulary conversational speech task (Switchboard). Issues related to the development and optimization of an SVM/HMM hybrid system are discussed.
Dynamic Programming Search for Continuous Speech Recognition
, 1999
"... . Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery e#cient and practical pruning str ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
. Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery e#cient and practical pruning strategy so that very large search spaces can be handled. Second, the dynamic programming strategy has turned out to be extremely #exible in adapting to new requirements. Examples of such requirements are the lexical tree organization of the pronunciation lexicon and the generation of a word graph instead of the single best sentence. In this paper, we attempt to systematically review the use of dynamic programming search strategies for small#vocabulary and large#vocabulary continuous speech recognition. The following methods are described in detail: search using a linear lexicon, search using a lexical tree, language-model look-ahead and word graph generation. 1 Introduction Search strategie...
EWAVES: an efficient decoding algorithm for lexical tree based speech recognition
- in Proc. of ICSLP
"... We present an optimized implementation of the Viterbi algorithm suitable for small to large vocabulary, and isolated or continuous speech recognition. The Viterbi algorithm is certainly the most popular dynamic programming algorithm used in speech recognition. In this paper we propose a new algorith ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
We present an optimized implementation of the Viterbi algorithm suitable for small to large vocabulary, and isolated or continuous speech recognition. The Viterbi algorithm is certainly the most popular dynamic programming algorithm used in speech recognition. In this paper we propose a new algorithm that outperforms the Viterbi algorithm in term of complexity and of memory requirements. It is based on the assumption of strictly left to right models and explores the lexical tree in an optimal way, such that book-keeping computation is minimized. The tree is encoded such that children of a node are placed contiguously and in increasing order of memory heap so that the proposed algorithm also optimizes cache usage. Even though the algorithm is asymptotically two times faster that the conventional Viterbi algorithm, in our experiments
A Maximum-entropy Solution to the Frame-dependency Problem in Speech Recognition
, 2001
"... The HMM assumption of conditional independence of observations causes a variety of problems for speech-recognition applications. Previous attempts to construct acoustic models that remove this assumption have suffered from a significant increase in the number of parameters to train. Another weakness ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
The HMM assumption of conditional independence of observations causes a variety of problems for speech-recognition applications. Previous attempts to construct acoustic models that remove this assumption have suffered from a significant increase in the number of parameters to train. Another weakness of current acoustic models is that they do not account for the origin of derived features (estimated derivatives). We show how to both remove the independence assumption and properly account for derived features, with little or no increase in the number of parameters to train, by applying the principle of maximum entropy. We also show that ignoring the origins of derived features in training HMM acoustic models can lead to severe distortions of the effective language model. Evaluation of our maxent model on a simple problem cuts an already-low error rate in half compared to an equivalent HMM with the same number of parameters.
Generalized Hierarchical Search in the ISIP ASR System
"... It has long been a goal of speech researchers to incorporate higher-level knowledge sources such as discourse, part of speech, and understanding constraints into the speech recognition problem. However, current speech recognition systems are highly tuned to N-gram, triphone-based recognition. Thus, ..."
Abstract
- Add to MetaCart
It has long been a goal of speech researchers to incorporate higher-level knowledge sources such as discourse, part of speech, and understanding constraints into the speech recognition problem. However, current speech recognition systems are highly tuned to N-gram, triphone-based recognition. Thus, researchers have been unable to exploit this knowledge without extensive modifications to the most complex portion of an ASR system- the decoder. In this paper, we describe a publiclyavailable, state-of-the-art decoder that employs a flexible and configurable multi-level search strategy capable of incorporating hierarchical knowledge sources with no changes to source code. 1.
Dr. Vikesh Kumar Director
"... In an effort to provide a more efficient representation of the speech signal, the application of the wavelet analysis is considered. This research presents an effective and robust method for extracting features for speech processing. Based on the time‐frequency multi‐resolution property of wavelet t ..."
Abstract
- Add to MetaCart
In an effort to provide a more efficient representation of the speech signal, the application of the wavelet analysis is considered. This research presents an effective and robust method for extracting features for speech processing. Based on the time‐frequency multi‐resolution property of wavelet transform, the input speech signal is decomposed into various frequency channels. The major issues concerning the design of this Wavelet based speech recognition system are choosing optimal wavelets for speech signals, decomposition level in the DWT, selecting the feature vectors from the wavelet coefficients. More specifically automatic classification of various speech signals using the DWT is described and compared using different wavelets. Finally, wavelet based feature extraction system and its performance on an isolated word recognition problem are investigated. For the classification of the words, three layered feed forward network is used.

