Results 1 - 10
of
11
From HMM's to Segment Models: A Unified View of Stochastic Modeling for Speech Recognition
, 1996
"... ..."
Support vector machines for speech recognition
- Proceedings of the International Conference on Spoken Language Processing
, 1998
"... Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative informati ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative information and are prone to overfitting and over-parameterization. Recent work in machine learning has focused on models, such as the support vector machine (SVM), that automatically control generalization and parameterization as part of the overall optimization process. In this paper, we show that SVMs provide a significant improvement in performance on a static pattern classification task based on the Deterding vowel data. We also describe an application of SVMs to large vocabulary speech recognition, and demonstrate an improvement in error rate on a continuous alphadigit task (OGI Aphadigits) and a large vocabulary conversational speech task (Switchboard). Issues related to the development and optimization of an SVM/HMM hybrid system are discussed.
Hierarchical search for large vocabulary conversational speech recognition
- IEEE Signal Processing Magazine
, 1999
"... ABSTRACT 2 Speaker-independent speech recognition technology has made significant progress from the days of isolated word recognition. Today, state-of-the-art systems are capable of performing large vocabulary continuous speech recognition (LVCSR) on audio streams derived from complex information so ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
ABSTRACT 2 Speaker-independent speech recognition technology has made significant progress from the days of isolated word recognition. Today, state-of-the-art systems are capable of performing large vocabulary continuous speech recognition (LVCSR) on audio streams derived from complex information sources such as broadcast news and two-way telephone dialogs. A significant contribution to this advancement in technology is the development of search techniques that find suboptimal but accurate solutions in problems involving large search spaces and extremely complex statistical models. Moreover, these search strategies are capable of dynamically integrating information from a number of diverse knowledge sources to determine the correct word hypothesis, and limit the scope of the search by using a hierarchical search strategy. We refer to this problem as the decoding or search problem. This paper describes the complexity associated with decoding using hierarchical representations for linguistic and acoustic knowledge sources. An extensible object-oriented decoder available in the public domain, that leverages current state-of-the-art technology is described to illustrate these concepts. This decoder supports efficient handling of acoustic models for cross-word contextdependent phones, multiple pronunciations of words using lexical trees, and rescoring of word graphs based on N-gram language models in a single pass. It employs a state-of-the-art Viterbistyle dynamic programming algorithm, and is equipped with several heuristic pruning criteria to minimize the consumption of computational resources while maintaining good accuracy.
Speech Recognition System Design Based on Automatically Derived Units
, 1999
"... In most speech recognition systems today, acoustic modeling and lexical modeling are viewed as separable problems. Currently the most popular approach is to manually define canonical word pronunciations in terms of phonetic units and let the acoustic models capture differences between actual spoken ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
In most speech recognition systems today, acoustic modeling and lexical modeling are viewed as separable problems. Currently the most popular approach is to manually define canonical word pronunciations in terms of phonetic units and let the acoustic models capture differences between actual spoken and canonical pronunciations implicitly with Gaussian mixture models. As a result, these models can be very broad, particularly for casual spontaneous speech. An alternative approach, explored in this thesis, is to learn a unit inventory and pronunciation dictionary from training data using a maximum likelihood objective function. In particular,
Lattice-Based Search Strategies For Large Vocabulary Speech Recognition
, 1995
"... The design of search algorithms is an important issue in recognition, particularly for very large vocabulary, continuous speech. It is an especially crucial problem when computationally expensive knowledge sources are used in the system, as is necessary to achieve high accuracy. Recently, multi-pass ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
The design of search algorithms is an important issue in recognition, particularly for very large vocabulary, continuous speech. It is an especially crucial problem when computationally expensive knowledge sources are used in the system, as is necessary to achieve high accuracy. Recently, multi-pass search strategies have been used as a means of applying inexpensive knowledge sources early on to prune the search space for subsequent passes using more expensive knowledge sources. Three multi-pass search algorithms are investigated in this thesis work: the N-best search algorithm, a lattice dynamic programming search algorithm and a lattice local search algorithm. Both the lattice dynamic programming and lattice local search algorithms are shown to achieve comparable performance to the N-best search algorithm while running as much as 10 times faster on a 20,000 word vocabulary task. The lattice local search algorithm is also shown to have the additional advantage over the lattice dynamic programming search algorithm of allowing sentence-level knowledge sources to be incorporated into the search.
Efficient 2-Pass N-Best Decoder
- DARPA Speech Recognition Workshop
, 1997
"... In this paper, we describe the new BBN BYBLOS efficient 2-Pass N-Best decoder used for the 1996 Hub-4 Benchmark Tests. The decoder uses a quick fastmatch to determine the likely word endings. Then in the second pass, it performs a time-synchronous beam search using a detailed continuousdensity HMM a ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
In this paper, we describe the new BBN BYBLOS efficient 2-Pass N-Best decoder used for the 1996 Hub-4 Benchmark Tests. The decoder uses a quick fastmatch to determine the likely word endings. Then in the second pass, it performs a time-synchronous beam search using a detailed continuousdensity HMM and a trigram language model to decide the word starting positions. From these word starts, the decoder, without looking at the input speech, constructs a trigram word lattice, and generates the top N likely hypotheses. This new 2-pass N-Best decoder maintains comparable recognition performance as the old 4-pass N-Best decoder, while its search strategy is simpler and much more efficient. 1. INTRODUCTION As previously described in [2], the old BBN BYBLOS decoder used a multi-pass search strategy consisting of 4 passes to generate the top N most likely hypotheses, which were then rescored using more detailed, but expensive knowledge sources. These N best hypotheses were then reordered and th...
N-Best Breadth Search For Large Vocabulary Continuous Speech Recognition Using A Long Span Language Model
, 1998
"... In large vocabulary continuous speech recognition, high level linguistic knowledge can enhance performance. However, integration of high level linguistic knowledge and complex acoustic models under an efficient search scheme is still an open question. In this paper, we propose the n-best breadth sea ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
In large vocabulary continuous speech recognition, high level linguistic knowledge can enhance performance. However, integration of high level linguistic knowledge and complex acoustic models under an efficient search scheme is still an open question. In this paper, we propose the n-best breadth search algorithm under the framework of a state space search. The n-best breadth search is a combination of the best first search and the breadth first search, and it efficiently accommodates the long span language models and complex acoustic models. Our pilot experiment shows that the proposed algorithm decreases execution time with little effect on performance. 136th Meeting of Acoustical Society of America 2 Contents 1 INTRODUCTION 3 2 REVIEW OF DECODING ALGORITHMS 4 3 N-BEST BREADTH SEARCH 5 4 IMPLEMENTATION ISSUES 7 5 EXPERIMENTAL RESULTS 8 6 CONCLUSIONS 9 7 ACKNOWLEDGMENT 136th Meeting of Acoustical Society of America 3 1 INTRODUCTION In the statistical approach, speech recognition ...
Unification-Based Glossing
- In Proceedings of the International Joint Conference on Artificial Intelligence
, 1995
"... We present an approach to syntax-based machine translation that combines unification-style interpretation with statistical processing. This approach enables us to translate any Japanese newspaper article into English, with quality far better than a word-for-word translation. Novel ideas include the ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We present an approach to syntax-based machine translation that combines unification-style interpretation with statistical processing. This approach enables us to translate any Japanese newspaper article into English, with quality far better than a word-for-word translation. Novel ideas include the use of feature structures to encode word lattices and the use of unification to compose and manipulate lattices. Unification also allows us to specify abstract features that delay target-language synthesis until enough source-language information is assembled. Our statistical component enables us to search efficiently among competing translations and locate those with high English fluency. 1 Background JAPANGLOSS [ Knight et al., 1994; 1995 ] is a project whose goals are to scale up knowledge-based machine translation (KBMT) techniques to handle JapaneseEnglish newspaper MT, to achieve higher quality output than is currently available, and to develop techniques for rapidly constructing MT ...
Second Thoughts on an Artificial Intelligence Approach to Speech Understanding
- In 14th Spoken Language and Discourse Workshop Notes (SIGSLUD-14
, 1996
"... A few years ago I undertook a new speech understanding research project, aiming to explore innovative techniques rather than pursue short-term results. My method was to build on the classic 1970s AI approaches to speech, as an alternative to the current mainstream speech understanding research metho ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
A few years ago I undertook a new speech understanding research project, aiming to explore innovative techniques rather than pursue short-term results. My method was to build on the classic 1970s AI approaches to speech, as an alternative to the current mainstream speech understanding research methods. This led to a system that was competitive, both in elegance and performance, with other recent AI-inspired speech understanding systems. However, evaluation of results and prospects led to the realization that the system had no future. This paper analyzes the roots of this failure as a case study in AI methodology gone awry. In particular, it explains why my original, classicly AI goals --- namely, be optimal in principle, be well integrated, iteratively refine the interpretation, deal directly with noisy inputs, be linguistically interesting, be tunable by hand, work with clear hypotheses, be architecturally innovative, and relate to general issues in AI --- are less important than they...
Artificial Intelligence And Other Approaches . . .
- JOURNAL OF EXPERIMENTAL AND THEORETICAL ARTIFICIAL INTELLIGENCE
, 1998
"... This paper characterizes the methodology of Artificial Intelligence by looking at research in speech understanding, a field where AI approaches contrast starkly with the alternatives, particularly engineering approaches. Four values of AI stand out as influential: ambitious goals, introspective p ..."
Abstract
- Add to MetaCart
This paper characterizes the methodology of Artificial Intelligence by looking at research in speech understanding, a field where AI approaches contrast starkly with the alternatives, particularly engineering approaches. Four values of AI stand out as influential: ambitious goals, introspective plausibility, computational elegance, and wide significance. The paper also discusses the utility and larger significance of these values.

