Results 1 - 10
of
18
The Use of Context in Large Vocabulary Speech Recognition
, 1995
"... decide which contexts are similar and can share parameters. A key feature of this approach is that it allows the construction of models which are dependent upon contextual effects occurring across word boundaries. The use of cross word context dependent models presents problems for conventional dec ..."
Abstract
-
Cited by 93 (0 self)
- Add to MetaCart
decide which contexts are similar and can share parameters. A key feature of this approach is that it allows the construction of models which are dependent upon contextual effects occurring across word boundaries. The use of cross word context dependent models presents problems for conventional decoders. The second part of the thesis therefore presents a new decoder design which is capable of using these models efficiently. The decoder is suitable for use with very large vocabularies and long span language models. It is also capable of generating a lattice of word hypotheses with little computational overhead. These lattices can be used to constrain further decoding, allowing efficient use of complex acoustic and language models. The effectiveness of these techniques has been assessed on a variety of large vocabulary continuous speech recognition tasks and results are presented which analyse performance in terms of computational complexity and recognition accuracy. The experiments dem
Support vector machines for speech recognition
- Proceedings of the International Conference on Spoken Language Processing
, 1998
"... Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative informati ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative information and are prone to overfitting and over-parameterization. Recent work in machine learning has focused on models, such as the support vector machine (SVM), that automatically control generalization and parameterization as part of the overall optimization process. In this paper, we show that SVMs provide a significant improvement in performance on a static pattern classification task based on the Deterding vowel data. We also describe an application of SVMs to large vocabulary speech recognition, and demonstrate an improvement in error rate on a continuous alphadigit task (OGI Aphadigits) and a large vocabulary conversational speech task (Switchboard). Issues related to the development and optimization of an SVM/HMM hybrid system are discussed.
LVCSR log-likelihood ratio scoring for keyword spotting
- in Proc. ICASSP, 129–132
, 1995
"... A new scoring algorithm has been developed for generating wordspotting hypotheses and their associated scores. This technique uses a large-vocabulary continuous speech recognition (LVCSR) system to generate the N-best answers along with their Viterbi alignments. The score for a putative hit is compu ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
A new scoring algorithm has been developed for generating wordspotting hypotheses and their associated scores. This technique uses a large-vocabulary continuous speech recognition (LVCSR) system to generate the N-best answers along with their Viterbi alignments. The score for a putative hit is computed by summing the likelihoods for all hypotheses that contain the keyword normalized by dividing by the sum of all hypothesis likelihoods in the N-best list. Using a test set of conversational speech from Switchboard Credit Card conversations, we achieved an 81 % figure of merit (FOM). Our word recognition error rate on this same test set is 54.7%. 1.
Degraded Text Recognition Using Visual And Linguistic Context
, 1995
"... Recognition of degraded text is a challenging problem. To improve the performance of an OCR system on degraded images of text, postprocessing techniques are critical. The objective of postprocessing is to correct errors or to resolve ambiguities in OCR results by using contextual information. Depend ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
Recognition of degraded text is a challenging problem. To improve the performance of an OCR system on degraded images of text, postprocessing techniques are critical. The objective of postprocessing is to correct errors or to resolve ambiguities in OCR results by using contextual information. Depending on the extent of context used, there are different levels of postprocessing. In current commercial OCR systems, word-level postprocessing methods, such as dictionary-lookup, have been applied successfully. However, many OCR errors cannot be corrected by word-level postprocessing. To overcome this limitation, passage-level postprocessing, in which global contextual information is utilized, is necessary. In most current studies on passage-level postprocessing, linguistic context is the major resource to be exploited. This thesis addresses problems in degraded text recognition and discusses potential solutions through passage-level postprocessing. The objective is to develop a postprocessin...
Hierarchical search for large vocabulary conversational speech recognition
- IEEE Signal Processing Magazine
, 1999
"... ABSTRACT 2 Speaker-independent speech recognition technology has made significant progress from the days of isolated word recognition. Today, state-of-the-art systems are capable of performing large vocabulary continuous speech recognition (LVCSR) on audio streams derived from complex information so ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
ABSTRACT 2 Speaker-independent speech recognition technology has made significant progress from the days of isolated word recognition. Today, state-of-the-art systems are capable of performing large vocabulary continuous speech recognition (LVCSR) on audio streams derived from complex information sources such as broadcast news and two-way telephone dialogs. A significant contribution to this advancement in technology is the development of search techniques that find suboptimal but accurate solutions in problems involving large search spaces and extremely complex statistical models. Moreover, these search strategies are capable of dynamically integrating information from a number of diverse knowledge sources to determine the correct word hypothesis, and limit the scope of the search by using a hierarchical search strategy. We refer to this problem as the decoding or search problem. This paper describes the complexity associated with decoding using hierarchical representations for linguistic and acoustic knowledge sources. An extensible object-oriented decoder available in the public domain, that leverages current state-of-the-art technology is described to illustrate these concepts. This decoder supports efficient handling of acoustic models for cross-word contextdependent phones, multiple pronunciations of words using lexical trees, and rescoring of word graphs based on N-gram language models in a single pass. It employs a state-of-the-art Viterbistyle dynamic programming algorithm, and is equipped with several heuristic pruning criteria to minimize the consumption of computational resources while maintaining good accuracy.
Anatomy of an extremely fast LVCSR decoder
- in Proc. Interspeech
, 2005
"... We report in detail the decoding strategy that we used for the past two Darpa Rich Transcription evaluations (RT’03 and RT’04) which is based on finite state automata (FSA). We discuss the format of the static decoding graphs, the particulars of our Viterbi implementation, the lattice generation and ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
We report in detail the decoding strategy that we used for the past two Darpa Rich Transcription evaluations (RT’03 and RT’04) which is based on finite state automata (FSA). We discuss the format of the static decoding graphs, the particulars of our Viterbi implementation, the lattice generation and the likelihood evaluation. This paper is intended to familiarize the reader with some of the design issues encountered when building an FSA decoder. Experimental results are given on the EARS database (English conversational telephone speech) with emphasis on our faster than real-time system. 1.
Parse Scoring with Prosodic Information
- In Int. Conf. on Spoken Language Processing
, 1992
"... The relative size and location of prosodic phrase boundaries provides an important cue for resolving syntactic ambiguity, and can be used to improve the accuracy of automatic speech understanding. This paper describes an approach to scoring candidate sentence hypotheses and associated parses using p ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
The relative size and location of prosodic phrase boundaries provides an important cue for resolving syntactic ambiguity, and can be used to improve the accuracy of automatic speech understanding. This paper describes an approach to scoring candidate sentence hypotheses and associated parses using prosodic phrase cues. Specifically, for each hypothesized parse, prosodic breaks are automatically detected and the probability of these breaks given the parse is computed based on a stochastic model of the prosody/syntax relationship. The parse probability can be used to rank sentence hypotheses and associated parses, optionally in combination with other scores. Both the prosodic break recognition algorithm and the prosody/syntax model can be automatically trained and can therefore be designed specifically for different speaking styles or task domains, given appropriate labeled data. We have demonstrated the potential of this approach in experiments with a corpus of ambiguous sentences spoke...
Articulatory Methods for Speech Production and Recognition
, 1996
"... roduction-based knowledge into the recognition framework. By using an explicit time-domain articulatory model of the mechanisms of co-articulation, it is hoped to obtain a more accurate model of contextual effects in the acoustic signal, while using fewer parameters than traditional acoustically-dri ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
roduction-based knowledge into the recognition framework. By using an explicit time-domain articulatory model of the mechanisms of co-articulation, it is hoped to obtain a more accurate model of contextual effects in the acoustic signal, while using fewer parameters than traditional acoustically-driven approaches. Separate articulatory and acoustic models are provided, and in each case the parameters of the models are automatically optimised over a training data set. A predictive statistically-based model of co-articulation is described, and found to yield improved articulatory modelling accuracy compared with X-ray articulatory traces. Parameterised acoustic vectors are synthesised by a set of artificial neural networks, and the resulting acoustic representations are used to re-score N-best recognition hypothesis lists produced by an HMM-based recogniser. The system is evaluated on two test databases, one including speaker-specific X-ray training data and the other aco
Integrating Large Context Language Models Into A Real Time Word Recognizer
, 1996
"... In this paper we present a new recognizer architecture that allows the efficient integration of language models with arbitrary large context information, e.g. polygram models, into the recognition process. Instead of using these models for rescoring the n best word chains generated using bigram inf ..."
Abstract
-
Cited by 8 (8 self)
- Add to MetaCart
In this paper we present a new recognizer architecture that allows the efficient integration of language models with arbitrary large context information, e.g. polygram models, into the recognition process. Instead of using these models for rescoring the n best word chains generated using bigram information, we extract the best word chain, or optionally the n best word chains, directly from the word lattice using an A ? algorithm that incorporates full language model information. For comparison, we developed an improved architecture for fast generation of the n best word chains using bigram information. Experimental results show, that direct incorporation of full language model information increases word accuracy significantly even when compared to rescoring the 1000 best word chains. At the same time, computation time is drastically reduced. 1 Introduction It is well known that the consideration of language constraints is vital for effective and efficient speech recognition. Typica...
Continuous Word Recognition Based on the Stochastic Segment Model
- Proc. DARPA Workshop CSR
, 1992
"... This paper presents an overview of the Boston University continuous word recognition system, which is based on the Stochastic Segment Model (SSM). The key components of the system described here include: a segment-based acoustic model that uses a family of Gaussian distributions to characterize vari ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
This paper presents an overview of the Boston University continuous word recognition system, which is based on the Stochastic Segment Model (SSM). The key components of the system described here include: a segment-based acoustic model that uses a family of Gaussian distributions to characterize variable length segments; a divisive clustering technique for estimating robust context-dependent models; and recognition using the N-best rescoring formalism, which also provides a mechanism for combining different knowledge sources (e.g. SSM and HMM scores). Results are reported for the speaker-independent portion of the Resource Management Corpus, for both the SSM system and a combined BU-SSM/BBN-HMM system. 1. INTRODUCTION In the last decade, most of the research on continuous speech recognition has focused on different variations of hidden Markov models (HMMs), and the various efforts have led to significant improvements in recognition performance. However, some researchers have begun to ...

