Results 1 - 10
of
19
The Use of Context in Large Vocabulary Speech Recognition
, 1995
"... decide which contexts are similar and can share parameters. A key feature of this approach is that it allows the construction of models which are dependent upon contextual effects occurring across word boundaries. The use of cross word context dependent models presents problems for conventional dec ..."
Abstract
-
Cited by 93 (0 self)
- Add to MetaCart
decide which contexts are similar and can share parameters. A key feature of this approach is that it allows the construction of models which are dependent upon contextual effects occurring across word boundaries. The use of cross word context dependent models presents problems for conventional decoders. The second part of the thesis therefore presents a new decoder design which is capable of using these models efficiently. The decoder is suitable for use with very large vocabularies and long span language models. It is also capable of generating a lattice of word hypotheses with little computational overhead. These lattices can be used to constrain further decoding, allowing efficient use of complex acoustic and language models. The effectiveness of these techniques has been assessed on a variety of large vocabulary continuous speech recognition tasks and results are presented which analyse performance in terms of computational complexity and recognition accuracy. The experiments dem
Large Vocabulary Continuous Speech Recognition: a Review
- of INCIS Project, Schedule 6 in (Small
, 1996
"... This article will discuss the principles and architecture of current LVR systems and identify the key issues affecting their future deployment. To illustrate the various points raised, the Cambridge University HTK system will be described. This is a modern design giving state-of-the-art performance ..."
Abstract
-
Cited by 62 (1 self)
- Add to MetaCart
This article will discuss the principles and architecture of current LVR systems and identify the key issues affecting their future deployment. To illustrate the various points raised, the Cambridge University HTK system will be described. This is a modern design giving state-of-the-art performance and it is typical of the current generation of recognition systems. 2 System Overview
Dynamic Programming Search for Continuous Speech Recognition
, 1999
"... . Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery e#cient and practical pruning str ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
. Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery e#cient and practical pruning strategy so that very large search spaces can be handled. Second, the dynamic programming strategy has turned out to be extremely #exible in adapting to new requirements. Examples of such requirements are the lexical tree organization of the pronunciation lexicon and the generation of a word graph instead of the single best sentence. In this paper, we attempt to systematically review the use of dynamic programming search strategies for small#vocabulary and large#vocabulary continuous speech recognition. The following methods are described in detail: search using a linear lexicon, search using a lexical tree, language-model look-ahead and word graph generation. 1 Introduction Search strategie...
Is N-Best Dead
- In Proceedings of the Human Language Technology Workshop
, 1994
"... We developed a faster search algorithm that avoids the use of the N-Best paradigm until after more powerful knowledge sources have been used. We found, however, that there was little or no decrease in word errors. We then showed that the use of the N-Best paradigm is still essential for the use of s ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
We developed a faster search algorithm that avoids the use of the N-Best paradigm until after more powerful knowledge sources have been used. We found, however, that there was little or no decrease in word errors. We then showed that the use of the N-Best paradigm is still essential for the use of still more powerful knowledge sources, and for several other purposes that are outlined in the paper. 1.
High Quality Word Graphs Using Forward-Backward Pruning
- In Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing
, 1999
"... This paper presents an efficient method for constructing high quality word graphs for large vocabulary continuous speech recognition. The word graphs are constructed in a two-pass strategy. In the first pass, a huge word graph is produced using the timesynchronous lexical tree search method. Then, i ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
This paper presents an efficient method for constructing high quality word graphs for large vocabulary continuous speech recognition. The word graphs are constructed in a two-pass strategy. In the first pass, a huge word graph is produced using the timesynchronous lexical tree search method. Then, in the second pass, this huge word graph is pruned by applying a modified forwardbackward algorithm. To analyze the characteristic properties of this word graph pruning method, we present a detailed comparison with the conventional time-synchronous forward pruning. The recognition experiments, carried out on the North American Business (NAB) 20 000-word task, demonstrate that, in comparison to the forward pruning, the new method leads to a significant reduction in the size of the word graph without an increase in the graph word error rate. 1. INTRODUCTION In this paper, we present a different approach to the word graph forward pruning technique [7]. This approach is based on the paradigm of...
The BBN/HARC spoken language understanding system
, 1993
"... We describe the design and performance of a complete spoken language understanding system currently under development at BBN. The system, dubbed HARC (Hear And Respond to Con-tinuous speech), successfully integrates state-of-the-art speech recognition and natural language understanding subsystems. T ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
We describe the design and performance of a complete spoken language understanding system currently under development at BBN. The system, dubbed HARC (Hear And Respond to Con-tinuous speech), successfully integrates state-of-the-art speech recognition and natural language understanding subsystems. The system has been tested extensively on a restricted airline travel in-formation (ATIS) domain with a vocabulary of about 2000 words. HARC is implemented in portable, high-level software that runs in real time on today's workstations to support interactive online human-machme dialogs. No special purpose hardware is required other than an A/D converter to digitize the speech. The system works well for any native speaker of American English and does not require any enrollment data from the users. We present results of formal DARPA tests in Feb. '92 and Nov. '92.
Lattice-Based Search Strategies For Large Vocabulary Speech Recognition
, 1995
"... The design of search algorithms is an important issue in recognition, particularly for very large vocabulary, continuous speech. It is an especially crucial problem when computationally expensive knowledge sources are used in the system, as is necessary to achieve high accuracy. Recently, multi-pass ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
The design of search algorithms is an important issue in recognition, particularly for very large vocabulary, continuous speech. It is an especially crucial problem when computationally expensive knowledge sources are used in the system, as is necessary to achieve high accuracy. Recently, multi-pass search strategies have been used as a means of applying inexpensive knowledge sources early on to prune the search space for subsequent passes using more expensive knowledge sources. Three multi-pass search algorithms are investigated in this thesis work: the N-best search algorithm, a lattice dynamic programming search algorithm and a lattice local search algorithm. Both the lattice dynamic programming and lattice local search algorithms are shown to achieve comparable performance to the N-best search algorithm while running as much as 10 times faster on a 20,000 word vocabulary task. The lattice local search algorithm is also shown to have the additional advantage over the lattice dynamic programming search algorithm of allowing sentence-level knowledge sources to be incorporated into the search.
Efficient 2-Pass N-Best Decoder
- DARPA Speech Recognition Workshop
, 1997
"... In this paper, we describe the new BBN BYBLOS efficient 2-Pass N-Best decoder used for the 1996 Hub-4 Benchmark Tests. The decoder uses a quick fastmatch to determine the likely word endings. Then in the second pass, it performs a time-synchronous beam search using a detailed continuousdensity HMM a ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
In this paper, we describe the new BBN BYBLOS efficient 2-Pass N-Best decoder used for the 1996 Hub-4 Benchmark Tests. The decoder uses a quick fastmatch to determine the likely word endings. Then in the second pass, it performs a time-synchronous beam search using a detailed continuousdensity HMM and a trigram language model to decide the word starting positions. From these word starts, the decoder, without looking at the input speech, constructs a trigram word lattice, and generates the top N likely hypotheses. This new 2-pass N-Best decoder maintains comparable recognition performance as the old 4-pass N-Best decoder, while its search strategy is simpler and much more efficient. 1. INTRODUCTION As previously described in [2], the old BBN BYBLOS decoder used a multi-pass search strategy consisting of 4 passes to generate the top N most likely hypotheses, which were then rescored using more detailed, but expensive knowledge sources. These N best hypotheses were then reordered and th...
The JANUS Speech Recognizer
- In ARPA SLT Workshop
, 1995
"... JANUS [17] was designed for the translation of spontaneous human-to-human speech. Before the 1994 CSR evaluation, JANUS was run with vocabularies of up to 2500 words. JANUS was also tested on the Conference Registration and the Resource Management tasks. The best error rate on the '89 Resource Manag ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
JANUS [17] was designed for the translation of spontaneous human-to-human speech. Before the 1994 CSR evaluation, JANUS was run with vocabularies of up to 2500 words. JANUS was also tested on the Conference Registration and the Resource Management tasks. The best error rate on the '89 Resource Management evaluation set was 5.9%. At the June 1994 Verbmobil speech component evaluation [1], JANUS scored best among eight participants on the German appointment scheduling task, a task of spontaneous human to human dialogs. In this paper we give a detailed description of the recognition engine of JANUS, focusing on the acoustic modeling and our first run with the WSJ task. 1. ACOUSTIC MODELING IN JANUS 1.1 PREPROCESSING For the 1994 CSR evaluation we computed 16 mel scale spectral coefficients from an FFT with a window size of 256 sample points and a window shift (frame rate) of 10 ms. 16 mel spectral coefficients, 16 delta coefficients, and 16 delta-delta coefficients were used to build a 4...
Language Models For A Spelled Letter Recognizer
- In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing
, 1995
"... In some speech recognition applications, it is reasonable to constrain the search space of a speech recognizer to a large but finite set of sentences. We demonstrate the problem on a spelling task, where the recognition of continuously spelled last names is constrained to 110,000 entries (= 43,000 u ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
In some speech recognition applications, it is reasonable to constrain the search space of a speech recognizer to a large but finite set of sentences. We demonstrate the problem on a spelling task, where the recognition of continuously spelled last names is constrained to 110,000 entries (= 43,000 unique names) of a telephone book. Several techniques to address this problem are compared: recognition without any language model, bigrams, functions to map a hypothesis onto a legal string, n-best lists, and finally a newly developed method which integrates all constraints directly into the search process within reasonable memory and time bounds. The baseline result of 56% string accuracy is improved to 62, 85, 88, and 92%, respectively. To appear in: Proc. IEEE International Conf. on Acoustics, Speech, and Signal Processing, Detroit, USA, May 1995. 1. INTRODUCTION Spelled letter recognition is an essential subtask of many speech recognition systems. Applications include spelling of arbit...

