Results 1 - 10
of
14
Dynamic Programming Search for Continuous Speech Recognition
, 1999
"... . Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery e#cient and practical pruning str ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
. Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery e#cient and practical pruning strategy so that very large search spaces can be handled. Second, the dynamic programming strategy has turned out to be extremely #exible in adapting to new requirements. Examples of such requirements are the lexical tree organization of the pronunciation lexicon and the generation of a word graph instead of the single best sentence. In this paper, we attempt to systematically review the use of dynamic programming search strategies for small#vocabulary and large#vocabulary continuous speech recognition. The following methods are described in detail: search using a linear lexicon, search using a lexical tree, language-model look-ahead and word graph generation. 1 Introduction Search strategie...
A One-Pass Decoder Based on Polymorphic Linguistic Context Assignment
, 2001
"... In this study, we examine how fast decoding of conversational speech with large vocabularies pro ts from ecient use of linguistic information, i.e. language models and grammars. Based on a re-entrant single pronunciation pre x tree, we use the concept of linguistic context polymorphism to allow an ..."
Abstract
-
Cited by 29 (10 self)
- Add to MetaCart
In this study, we examine how fast decoding of conversational speech with large vocabularies pro ts from ecient use of linguistic information, i.e. language models and grammars. Based on a re-entrant single pronunciation pre x tree, we use the concept of linguistic context polymorphism to allow an early incorporation of language model information. This approach allows us to use all available language model information in a one-pass decoder, using the same engine to decode with statistical n-gram language models as well as context free grammars or re-scoring of lattices in an ecient way.
Integrated Context-Dependent Networks in Very Large Vocabulary Speech Recognition
- in Proceedings of the 6th European Conference on Speech Communication and Technology (Eurospeech '99
, 1999
"... All the components used in the search stage of speech recognition systems -- language model, pronunciation dictionary, context-dependent network, HMM model -- can be represented by finite-state labeled networks. To construct real-time recognition systems, it is important to optimize these networks a ..."
Abstract
-
Cited by 24 (10 self)
- Add to MetaCart
All the components used in the search stage of speech recognition systems -- language model, pronunciation dictionary, context-dependent network, HMM model -- can be represented by finite-state labeled networks. To construct real-time recognition systems, it is important to optimize these networks and to efficiently combine them. We present new methods that substantially improve these steps. We show that an efficient recognition network including context-dependent and HMM models can be built using weighted determinization of transducers [6]. We report experiments with a 463,331-word vocabulary North American Business News Task that show a substantial improvement of the recognition speed over our previous method [9]. Furthermore, the size of the integrated context-dependent networks constructed can be dramatically reduced using a factoring algorithm that we briefly describe. With our construction, the integrated NAB network contains only about 1:3 times as many arcs as the language mode...
Language-Model Look-Ahead For Large Vocabulary Speech Recognition
- Proc. Int. Conf. on Spoken Language Processing
, 1996
"... In this paper, we present an efficient look-ahead technique which incorporates the language model knowledge at the earliest possible stage during the search process. This so-called language model look-ahead is built into the time synchronous beam search algorithm using a tree-organized pronunciation ..."
Abstract
-
Cited by 22 (9 self)
- Add to MetaCart
In this paper, we present an efficient look-ahead technique which incorporates the language model knowledge at the earliest possible stage during the search process. This so-called language model look-ahead is built into the time synchronous beam search algorithm using a tree-organized pronunciation lexicon for a bigram language model. The language model look-ahead technique exploits the full knowledge of the bigram language model by distributing the language model probabilities over the nodes of the lexical tree for each predecessor word. We present a method for handling the resulting memory requirements. The recognition experiments performed on the 20 000-word North American Business task (Nov.'96) demonstrate that in comparison with the unigram look-ahead a reduction by a factor of 5 in the acoustic search effort can be achieved without loss in recognition accuracy.
Network Optimizations for Large Vocabulary Speech Recognition
- Speech Communication
, 1998
"... The redundancy and the size of networks in large-vocabulary speech recognition systems can have a critical effect on their overall performance. We describe the use of two new algorithms: weighted determinization and minimization [12]. These algorithms transform recognition labeled networks into equi ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
The redundancy and the size of networks in large-vocabulary speech recognition systems can have a critical effect on their overall performance. We describe the use of two new algorithms: weighted determinization and minimization [12]. These algorithms transform recognition labeled networks into equivalent ones that require much less time and space in large-vocabulary speech recognition. They are both optimal: weighted determinization eliminates the number of alternatives at each state to the minimum, and weighted minimization reduces the size of deterministic networks to the smallest possible number of states and transitions. These algorithms generalize classical automata determinization and minimization to deal properly with the probabilities of alternative hypotheses and with the relationships between units (distributions, phones, words) at different levels in the recognition system. We illustrate their use in several applications, and report the results of our experiments. Key words...
Improvements In Tree-Based Language Model Representation
- in Proc. of EUROSPEECH
, 1995
"... This paper describes an efficient way of representing a bigram language model with a finite state network used by a beam-search based and continuous speech HMM recognizer. In a previous paper [1], a compact tree-based organization of the search space was presented, that could be further reduced thro ..."
Abstract
-
Cited by 14 (10 self)
- Add to MetaCart
This paper describes an efficient way of representing a bigram language model with a finite state network used by a beam-search based and continuous speech HMM recognizer. In a previous paper [1], a compact tree-based organization of the search space was presented, that could be further reduced through an optimization algorithm. There, it was pointed out that for a 10,000-word newspaper dictation task the minimization step could have taken a lot of time and space on a standard workstation. In this paper, a new compilation technique that takes into account the particular tree-based topology is described. Results show that without additional time and space costs, the new technique produces networks equivalent to the tree-based ones but almost as small as the optimized one. 1 INTRODUCTION The most widely used Language Models (LMs) in speech recognition are n-gram models, due to both easy inference from the training corpus and easy integrability with the decoding algorithms commonly used...
N-Best Breadth Search For Large Vocabulary Continuous Speech Recognition Using A Long Span Language Model
, 1998
"... In large vocabulary continuous speech recognition, high level linguistic knowledge can enhance performance. However, integration of high level linguistic knowledge and complex acoustic models under an efficient search scheme is still an open question. In this paper, we propose the n-best breadth sea ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
In large vocabulary continuous speech recognition, high level linguistic knowledge can enhance performance. However, integration of high level linguistic knowledge and complex acoustic models under an efficient search scheme is still an open question. In this paper, we propose the n-best breadth search algorithm under the framework of a state space search. The n-best breadth search is a combination of the best first search and the breadth first search, and it efficiently accommodates the long span language models and complex acoustic models. Our pilot experiment shows that the proposed algorithm decreases execution time with little effect on performance. 136th Meeting of Acoustical Society of America 2 Contents 1 INTRODUCTION 3 2 REVIEW OF DECODING ALGORITHMS 4 3 N-BEST BREADTH SEARCH 5 4 IMPLEMENTATION ISSUES 7 5 EXPERIMENTAL RESULTS 8 6 CONCLUSIONS 9 7 ACKNOWLEDGMENT 136th Meeting of Acoustical Society of America 3 1 INTRODUCTION In the statistical approach, speech recognition ...
Look-Ahead Techniques For Improved Beam Search
- In Proc. of the CRIM-FORWISS Workshop
, 1996
"... . This paper presents two look-ahead techniques for large vocabulary continuous speech recognition. These two techniques, which are referred to as language model look-ahead and phoneme look-ahead, are incorporated into the pruning process of the time-synchronous one-pass beam search algorithm. The s ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
. This paper presents two look-ahead techniques for large vocabulary continuous speech recognition. These two techniques, which are referred to as language model look-ahead and phoneme look-ahead, are incorporated into the pruning process of the time-synchronous one-pass beam search algorithm. The search algorithm is based on a tree-organized pronunciation lexicon in connection with a bigram language model. Both look-ahead techniques have been tested on the 20 000-word NAB'94 task (ARPA North American Business Corpus). The recognition experiments show that the combination of bigram language model look-ahead and phoneme look-ahead reduces the size of search space by a factor of about 27 without affecting the word recognition accuracy. 1 Introduction In this paper, we describe two look-ahead techniques for improved beam search, namely language model look-ahead and phoneme look-ahead, for large vocabulary continuous speech recognition. The basic idea of the language model look-ahead is t...
Towards A Compact Speech Recognizer: Subspace Distribution Clustering Hidden Markov Model
, 1998
"... : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xiii 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 The Problem: Too Many Parameters : : : : : : : : : : : : : : : : : : : : : : 3 1.2 Proposed Solution: It Is Time to ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xiii 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 The Problem: Too Many Parameters : : : : : : : : : : : : : : : : : : : : : : 3 1.2 Proposed Solution: It Is Time to Share More! : : : : : : : : : : : : : : : : : 4 1.3 Thesis Summary and Outline : : : : : : : : : : : : : : : : : : : : : : : : : : 6 2 Review of Acoustic Modeling Using Hidden Markov Model : : : : : : : 9 2.1 Speech Characteristics : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 2.2 Selection of Input Speech Space and Speech Model : : : : : : : : : : : : : : 10 2.2.1 Cepstral Input : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10 2.2.2 Hidden Markov Model : : : : : : : : : : : : : : : : : : : : : : : : : : 11 2.2.3 Our Choice of HMM for Acoustic Modeling : : : : : : : : : : : : : : 14 2.3 Speech Unit to Model : : : : : : : : : : : : : : : : : : : : : : : : : : ...
Acoustic And Syntactical Modeling in the ATROS System
, 1999
"... Current speech technology allows us to build efficient speech recognition systems. However, model learning of knowledge sources in a speech recognition system is not a closed problem. In addition, lower demand of computational requirements are crucial to building real-time systems. ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Current speech technology allows us to build efficient speech recognition systems. However, model learning of knowledge sources in a speech recognition system is not a closed problem. In addition, lower demand of computational requirements are crucial to building real-time systems.

