Results 21 - 30
of
52
A Word Graph Based N-Best Search in Continuous Speech Recognition
, 1996
"... In this paper, weintroduce an e#cient algorithm for the exhaustive search of N best sentence hypotheses in a word graph. The search procedure is based on a two-pass algorithm. In the #rst pass, a word graph is constructed with standard time-synchronous beam search. The actual extraction of N best wo ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
In this paper, weintroduce an e#cient algorithm for the exhaustive search of N best sentence hypotheses in a word graph. The search procedure is based on a two-pass algorithm. In the #rst pass, a word graph is constructed with standard time-synchronous beam search. The actual extraction of N best word sequences from the word graph takes place during the second pass.
The development of SRI’s 1997 Broadcast News transcription system
- In Proceedings DARPA BroadcastNews Transcription and Understanding Workshop
"... This paper describes SRI’s 1997 broadcastnews transcription system used for the 1997 DARPA H4 evaluations. Our system had several novel components. These include automatic segmentation of entire broadcast shows, word-internal and crossword acoustic models robustly estimated with a new Gaussian Mergi ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
This paper describes SRI’s 1997 broadcastnews transcription system used for the 1997 DARPA H4 evaluations. Our system had several novel components. These include automatic segmentation of entire broadcast shows, word-internal and crossword acoustic models robustly estimated with a new Gaussian Merging-Splitting (GMS) algorithm, the use of trigram language models (LMs) in lattices instead of for rescoring N-best lists, and an LM pruning algorithm that allows efficient representation of high-order (like 4- or 5-gram) LMs. We briefly describe these features and give comparative experimental results. We achieved a 18.7 % relative improvement in performance on our 1996 H4 partitioned evaluation (PE) development test set as compared to our 1996 H4 PE evaluation system. 1.
Lattice-Based Search Strategies For Large Vocabulary Speech Recognition
, 1995
"... The design of search algorithms is an important issue in recognition, particularly for very large vocabulary, continuous speech. It is an especially crucial problem when computationally expensive knowledge sources are used in the system, as is necessary to achieve high accuracy. Recently, multi-pass ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
The design of search algorithms is an important issue in recognition, particularly for very large vocabulary, continuous speech. It is an especially crucial problem when computationally expensive knowledge sources are used in the system, as is necessary to achieve high accuracy. Recently, multi-pass search strategies have been used as a means of applying inexpensive knowledge sources early on to prune the search space for subsequent passes using more expensive knowledge sources. Three multi-pass search algorithms are investigated in this thesis work: the N-best search algorithm, a lattice dynamic programming search algorithm and a lattice local search algorithm. Both the lattice dynamic programming and lattice local search algorithms are shown to achieve comparable performance to the N-best search algorithm while running as much as 10 times faster on a 20,000 word vocabulary task. The lattice local search algorithm is also shown to have the additional advantage over the lattice dynamic programming search algorithm of allowing sentence-level knowledge sources to be incorporated into the search.
The JANUS Speech Recognizer
- In ARPA SLT Workshop
, 1995
"... JANUS [17] was designed for the translation of spontaneous human-to-human speech. Before the 1994 CSR evaluation, JANUS was run with vocabularies of up to 2500 words. JANUS was also tested on the Conference Registration and the Resource Management tasks. The best error rate on the '89 Resource Manag ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
JANUS [17] was designed for the translation of spontaneous human-to-human speech. Before the 1994 CSR evaluation, JANUS was run with vocabularies of up to 2500 words. JANUS was also tested on the Conference Registration and the Resource Management tasks. The best error rate on the '89 Resource Management evaluation set was 5.9%. At the June 1994 Verbmobil speech component evaluation [1], JANUS scored best among eight participants on the German appointment scheduling task, a task of spontaneous human to human dialogs. In this paper we give a detailed description of the recognition engine of JANUS, focusing on the acoustic modeling and our first run with the WSJ task. 1. ACOUSTIC MODELING IN JANUS 1.1 PREPROCESSING For the 1994 CSR evaluation we computed 16 mel scale spectral coefficients from an FFT with a window size of 256 sample points and a window shift (frame rate) of 10 ms. 16 mel spectral coefficients, 16 delta coefficients, and 16 delta-delta coefficients were used to build a 4...
PARSEC: A Constraint-based Framework for Spoken Language Understanding
- In Proceedings of the International Conference on Spoken Language Processing
, 1992
"... We have extended Maruyama's [5, 6, 7] constraint dependency grammar (CDG) to process a lattice or graph of sentence hypotheses instead of separate text strings. A post-processor to a speech recognizer producing N-best hypotheses generates the word graph representation, which is then augmented with i ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
We have extended Maruyama's [5, 6, 7] constraint dependency grammar (CDG) to process a lattice or graph of sentence hypotheses instead of separate text strings. A post-processor to a speech recognizer producing N-best hypotheses generates the word graph representation, which is then augmented with information required for parsing. We will summarize the CDG parsing algorithm and then describe how the algorithm is extended to process a word graph on a single processor machine. 1 Introduction The most successful of the current speech recognition systems which process continuous speech for a limited (1000 word) vocabulary are those which utilize hidden Markov models (HMM). Most systems utilizing this approach (e.g., [4, 10])) have reduced recognition errors by incorporating some language information (syntactic and semantic) directly into the HMM to reduce perplexity, but since the goal of these systems is recognition, not understanding, no structural analysis of the utterance is construc...
An Experimental Study Of Acoustic Adaptation Algorithms
- IEEE Int'l Conference on ASSP
, 1996
"... Recently there has been much interest in the area of adaptation for improved speech recognition in the presence of mismatches between the training and testing conditions. In this paper we focus on transformation-based maximum-likelihood (ML) adaptation. Some of the important adaptation parameters in ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Recently there has been much interest in the area of adaptation for improved speech recognition in the presence of mismatches between the training and testing conditions. In this paper we focus on transformation-based maximum-likelihood (ML) adaptation. Some of the important adaptation parameters include whether the adaptation is sbibperformed in the feature-space or model-space, and whether the adaptation is supervised or unsupervised. An additional parameter is the adaptation data. For example adaptation may be performed using an independent dataset or the test data itself. The latter is referred to as transcription-mode adaptation. In this paper, we experimentally study the effect of these various parameters, and report on our findings. 1. INTRODUCTION Recently, there has been much interest in the area of transformation-based ML adaptation to reduce the recognition degradation caused by acoustic mismatches between the training and testing conditions [1, 2, 3]. It is assumed that...
Improved Modeling and Efficiency for Automatic Transcription of Broadcast News
, 2000
"... Over the last few years, the DARPA-sponsored Hub4 continuous speech recognition evaluations have pushed speech recognition technology for the very interesting and difficult task of automatically transcribing broadcast news. In this paper, we report on our research and progress on this problem. We fo ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Over the last few years, the DARPA-sponsored Hub4 continuous speech recognition evaluations have pushed speech recognition technology for the very interesting and difficult task of automatically transcribing broadcast news. In this paper, we report on our research and progress on this problem. We focus on individual techniques we developed, rather than on descriptions of our evaluation systems. We provide comparative experimental results showing the improvements obtained with the novel approaches we developed. 1 Introduction In recent years there has been increasing interest in developing large-vocabulary continuous speech recognition (LVCSR) systems for speech found in real sources. Broadcast news, in particular, has been the testbed for the DARPA-sponsored Hub4 continuous speech recognition (CSR) evaluations over the last few years, and represents a significant challenge to speech recognition researchers. Many interesting problems are associated with the automatic recognition of b...
Efficient Multilingual Phoneme-to-Grapheme Conversion Based on HMM
- Computational Linguistics
, 1996
"... Grapheme-to-phoneme conversion (GTPC) has been achieved in most European languagesby dictionary look-up or using rules. The application of these methods, however, in the reverse pro-cess, (i.e., in phoneme-to-grapheme conversion [PTGC]) creates serious problems, especially in inflectionally rich lan ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Grapheme-to-phoneme conversion (GTPC) has been achieved in most European languagesby dictionary look-up or using rules. The application of these methods, however, in the reverse pro-cess, (i.e., in phoneme-to-grapheme conversion [PTGC]) creates serious problems, especially in inflectionally rich languages. In this paper the PTGC problem is approached from a completely different point of view. Instead of rules or a dictionary, the statistics of language connecting pro-nunciation to spelling are exploited. The novelty lies in modeling the natural language intraword features using the theory of hidden Markov models (HMM) and performing the conversion using the Viterbi algorithm. The PTGC system has been established and tested on various multilingual corpora. Initially, the first-order HMM and the common Viterbi algorithm were used to obtain a single transcription for each word. Afterwards, the second-order HMM and the N-best algorithm adapted to PTGC were implemented to provide one or more transcriptions for each word input (homophones). This system gave an average score of more than 99 % correctly transcribed words (overall success in the first four candidates)for most of the seven languages it was tested on (Dutch, English, French, German, Greek, Italian, and Spanish). The system can be adapted to almost any language with little effort and can be implemented in hardware to serve in real-time speech recognition systems. 1.
A Robust Loose Coupling for Speech Recognition and Natural Language Understanding
- IEEE, Bob O'Hara and Al
, 1995
"... The focus of this thesis proposal is to improve the ability of a computational system to understand spoken utterances in a dialogue with a human. Available computational methods for word recognition do not perform as well on spontaneous speech as we would hope. Even a state of the art recognizer ach ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The focus of this thesis proposal is to improve the ability of a computational system to understand spoken utterances in a dialogue with a human. Available computational methods for word recognition do not perform as well on spontaneous speech as we would hope. Even a state of the art recognizer achieves slightly worse than 70% word accuracy on (nearly) spontaneous speech in a conversation about a specific problem. To address this problem, I will explore novel methods for post-processing the output of a speech recognizer in order to correct errors. I adopt statistical techniques for modeling the noisy channel from the speaker to the listener in order to correct some of the errors introduced there. The statistical model accounts for frequent errors such as simple word/word confusions and short phrasal problems (one-to-many word substitutionsand many-to-one word concatenations). To use the model, a search algorithm is required to find the most likely correction of a given word sequence ...

