Results 1 - 10
of
14
Dynamic Programming Search for Continuous Speech Recognition
, 1999
"... . Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery e#cient and practical pruning str ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
. Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery e#cient and practical pruning strategy so that very large search spaces can be handled. Second, the dynamic programming strategy has turned out to be extremely #exible in adapting to new requirements. Examples of such requirements are the lexical tree organization of the pronunciation lexicon and the generation of a word graph instead of the single best sentence. In this paper, we attempt to systematically review the use of dynamic programming search strategies for small#vocabulary and large#vocabulary continuous speech recognition. The following methods are described in detail: search using a linear lexicon, search using a lexical tree, language-model look-ahead and word graph generation. 1 Introduction Search strategie...
A Comparison Of Time Conditioned And Word Conditioned Search Techniques For Large Vocabulary Speech Recognition
- Proc. Int. Conf. on Spoken Language Processing
, 1996
"... In this paper, we compare the search effort of the word conditioned and the time conditioned tree search methods. Both methods are based on a time-synchronous, left-to-right beam search using a treeorganized lexicon. Whereas the word conditioned method is well known and widely used, the time conditi ..."
Abstract
-
Cited by 19 (14 self)
- Add to MetaCart
In this paper, we compare the search effort of the word conditioned and the time conditioned tree search methods. Both methods are based on a time-synchronous, left-to-right beam search using a treeorganized lexicon. Whereas the word conditioned method is well known and widely used, the time conditioned method is novel in the context of 20 000--word vocabulary recognition. We extend both methods to handle trigram language models in a one--pass strategy. Both methods were tested on a train schedule inquiry task (1 850 words, telephone speech) and on the North American Business (Nov.'94) development corpus (20 000 words).
The EuTRANS-I Speech Translation System
, 1999
"... The EuTRANS project aims at using Example-Based approaches for the automatic development of Machine Translation systems --accepting text and speech input-- for limited domain applications. During the first phase of the project, a speech translation system that is based on the use of automatically le ..."
Abstract
-
Cited by 18 (10 self)
- Add to MetaCart
The EuTRANS project aims at using Example-Based approaches for the automatic development of Machine Translation systems --accepting text and speech input-- for limited domain applications. During the first phase of the project, a speech translation system that is based on the use of automatically learnt Subsequential Transducers has been built. This paper contains a detailed and to a long extent self-contained overview of the transducer learning algorithms and system architecture, along with a new approach for using categories representing words or short phrases in both input and output languages. Experimental results using this approach are reported for a task involving the recognition and translation of sentences in the hotel reception communication domain, with a vocabulary of 683 words in Spanish. A translation word error rate of 1.97% is achieved in real time factor 2.7 in a Personal Computer.
Hierarchical search for large vocabulary conversational speech recognition
- IEEE Signal Processing Magazine
, 1999
"... ABSTRACT 2 Speaker-independent speech recognition technology has made significant progress from the days of isolated word recognition. Today, state-of-the-art systems are capable of performing large vocabulary continuous speech recognition (LVCSR) on audio streams derived from complex information so ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
ABSTRACT 2 Speaker-independent speech recognition technology has made significant progress from the days of isolated word recognition. Today, state-of-the-art systems are capable of performing large vocabulary continuous speech recognition (LVCSR) on audio streams derived from complex information sources such as broadcast news and two-way telephone dialogs. A significant contribution to this advancement in technology is the development of search techniques that find suboptimal but accurate solutions in problems involving large search spaces and extremely complex statistical models. Moreover, these search strategies are capable of dynamically integrating information from a number of diverse knowledge sources to determine the correct word hypothesis, and limit the scope of the search by using a hierarchical search strategy. We refer to this problem as the decoding or search problem. This paper describes the complexity associated with decoding using hierarchical representations for linguistic and acoustic knowledge sources. An extensible object-oriented decoder available in the public domain, that leverages current state-of-the-art technology is described to illustrate these concepts. This decoder supports efficient handling of acoustic models for cross-word contextdependent phones, multiple pronunciations of words using lexical trees, and rescoring of word graphs based on N-gram language models in a single pass. It employs a state-of-the-art Viterbistyle dynamic programming algorithm, and is equipped with several heuristic pruning criteria to minimize the consumption of computational resources while maintaining good accuracy.
Real-Time Self-Localization in Unknown Indoor Environments using a Panorama Laser Range Finder
- In IEEE/RSJ International Workshop on Robots ans Systems, IROS 97
, 1997
"... This paper deals with self-localization of a mobile robot on the condition that no a-priori knowledge about the environment is available. The applied method features to be accurate, robust, independent of any artificial landmarks and feasible with such a moderate computational effort that all necess ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
This paper deals with self-localization of a mobile robot on the condition that no a-priori knowledge about the environment is available. The applied method features to be accurate, robust, independent of any artificial landmarks and feasible with such a moderate computational effort that all necessary tasks can be executed in real-time on a standard PC. The perception system used is a panorama laser range finder (PLRF) which takes scans of its present environment. A modified Dynamic Programming (DP) algorithm provides pattern matching and pattern recognition on the preprocessed panorama scans and thereby renders a qualitative fusion of the sensory data. For an exact quantitative estimate of the robot's current position, a robust localization module is employed. The knowledge gained about the environment along that way is stored in a self-growing, graph based map which combines geometrical information and topological restrictions. Preliminary experiments in a common office environment ...
Progress in Dynamic Programming Search for LVCSR
- Proceedings of the IEEE
, 1997
"... This paper gives an overview of the recent improvements in dynamic programming search for large vocabulary continuous speech recognition: search using lexical trees, time-conditioned search and word graph construction. ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
This paper gives an overview of the recent improvements in dynamic programming search for large vocabulary continuous speech recognition: search using lexical trees, time-conditioned search and word graph construction.
An RNN-Based Pre-classification Method for Fast Continuous Mandarin Speech Recognition
- IEEE Trans. Speech Audio Processing
"... A novel RNN-based front-end pre-classification scheme for fast continuous Mandarin speech recognition is proposed in this paper. First, an RNN is employed to discriminate each input frame for the three broad classes of initial, final, and silence. A finite state machine (FSM) is then used to classif ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
A novel RNN-based front-end pre-classification scheme for fast continuous Mandarin speech recognition is proposed in this paper. First, an RNN is employed to discriminate each input frame for the three broad classes of initial, final, and silence. A finite state machine (FSM) is then used to classify the input frame into four states including three stable states of Initial (I), Final (F), and Silence (S), and a Transient (T) state. The decision is made based on examining whether the RNN discriminates well between classes. We then restrict the search space for the three stable states in the following DP search to speed up the recognition process. Efficiency of the proposed scheme was examined by simulations in which we incorporate it with an HMMbased continuous 411 Mandarin base-syllables recognizer. Experimental results showed that it can be used in conjunction with the beam search to greatly reduce the computational complexity of the HMM recognizer while keeping the recognition rate a...
Improving the recognition of interleaved activities
- In submission
, 2008
"... We introduce Interleaved Hidden Markov Models for recognizing multitasked activities. The model captures both inter-activity and intra-activity dynamics. Although the state space is intractably large, we describe an approximation that is both effective and efficient. This method significantly reduce ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We introduce Interleaved Hidden Markov Models for recognizing multitasked activities. The model captures both inter-activity and intra-activity dynamics. Although the state space is intractably large, we describe an approximation that is both effective and efficient. This method significantly reduces the error rate when compared with previously proposed methods. The algorithm is suitable for mobile platforms where computational resources may be limited.
Extensions To The Word Graph Method For Large Vocabulary Continuous Speech Recognition
- Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing
, 1997
"... This paper describes two methods for constructing word graphs for large vocabulary continuous speech recognition. Both word graph methods are based on a time-synchronous, left-to-right beam search strategy in connection with a tree-organized pronunciation lexicon. The first method is based on the so ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
This paper describes two methods for constructing word graphs for large vocabulary continuous speech recognition. Both word graph methods are based on a time-synchronous, left-to-right beam search strategy in connection with a tree-organized pronunciation lexicon. The first method is based on the so-called word pair approximation and fits directly into a word-conditioned search organization. In order to avoid the assumptions made in the word pair approximation, we design another word graph method. This method is based on a time conditioned factoring of the search space. For the case of a trigram language model, we give a detailed comparison of both word graph methods with an integrated search method. The experiments have been carried out on the North American Business (NAB'94) 20,000-word task.
Towards A Compact Speech Recognizer: Subspace Distribution Clustering Hidden Markov Model
, 1998
"... : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xiii 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 The Problem: Too Many Parameters : : : : : : : : : : : : : : : : : : : : : : 3 1.2 Proposed Solution: It Is Time to ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xiii 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 The Problem: Too Many Parameters : : : : : : : : : : : : : : : : : : : : : : 3 1.2 Proposed Solution: It Is Time to Share More! : : : : : : : : : : : : : : : : : 4 1.3 Thesis Summary and Outline : : : : : : : : : : : : : : : : : : : : : : : : : : 6 2 Review of Acoustic Modeling Using Hidden Markov Model : : : : : : : 9 2.1 Speech Characteristics : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 2.2 Selection of Input Speech Space and Speech Model : : : : : : : : : : : : : : 10 2.2.1 Cepstral Input : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10 2.2.2 Hidden Markov Model : : : : : : : : : : : : : : : : : : : : : : : : : : 11 2.2.3 Our Choice of HMM for Acoustic Modeling : : : : : : : : : : : : : : 14 2.3 Speech Unit to Model : : : : : : : : : : : : : : : : : : : : : : : : : : ...

