Results 1 -
8 of
8
Fast Likelihood Computation Methods For Continuous Mixture Densities In Large Vocabulary Speech Recognition
- In Proc. of the European Conf. on Speech Communication and Technology
, 1997
"... This paper studies algorithms for reducing the computational effort of the mixture density calculations in HMM-based speech recognition systems. These likelihood calculations take about 70 \Gamma 85% of the total recognition time in the RWTH system for large vocabulary continuous speech recognition. ..."
Abstract
-
Cited by 11 (8 self)
- Add to MetaCart
This paper studies algorithms for reducing the computational effort of the mixture density calculations in HMM-based speech recognition systems. These likelihood calculations take about 70 \Gamma 85% of the total recognition time in the RWTH system for large vocabulary continuous speech recognition. To reduce the computational cost of the likelihood calculations, we investigate several space partitioning methods. A detailed comparison of these techniques is given on the North American Business Corpus (NAB'94) for a 20 000word task. As a result, the so-called projection search algorithm in combination with the VQ method reduces the cost of likelihood computation by a factor of about 8 with no significant loss in the word recognition accuracy. 1.
Efficient Handling of N-gram Language Models for Statistical Machine Translation
"... Statistical machine translation, as well as other areas of human language processing, have recently pushed toward the use of large scale n-gram language models. This paper presents efficient algorithmic and architectural solutions which have been tested within the Moses decoder, an open source toolk ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Statistical machine translation, as well as other areas of human language processing, have recently pushed toward the use of large scale n-gram language models. This paper presents efficient algorithmic and architectural solutions which have been tested within the Moses decoder, an open source toolkit for statistical machine translation. Experiments are reported with a high performing baseline, trained on the Chinese-English NIST 2006 Evaluation task and running on a standard Linux 64-bit PC architecture. Comparative tests show that our representation halves the memory required by SRI LM Toolkit, at the cost of 44 % slower translation speed. However, as it can take advantage of memory mapping on disk, the proposed implementation seems to scale-up much better to very large language models: decoding with a 289-million 5-gram language model runs in 2.1Gb of RAM. 1
Improved Lexical Tree Search For Large Vocabulary Speech Recognition
- In Proceedings of the ICASSP
, 1998
"... This paper describes some extensions to the language model (LM) look-ahead pruning approach which is integrated into the time-synchronous beam search algorithm. The search algorithm is based on a lexical prefix tree in combination with a wordconditioned dynamic search space organization for handling ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
This paper describes some extensions to the language model (LM) look-ahead pruning approach which is integrated into the time-synchronous beam search algorithm. The search algorithm is based on a lexical prefix tree in combination with a wordconditioned dynamic search space organization for handling trigram language models in a one-pass strategy. In particular, we study several LM look-ahead pruning techniques. Further, we improve the efficiency of this look-ahead technique by exploiting subtree dominance. This method avoids the computation of redundant subtrees within the copies of the lexical prefix tree and thus reduces the memory requirements of the search algorithm. In addition, we present a pruning criterion depending on the state index. The experimental results on the 20 000-word NAB'94 task (ARPA North American Business Corpus) indicate that the computational effort can be reduced to 4 times real time on a ALPHA5000 PC without a significant loss in the recognition accuracy.
A Comparison Of Dialogue-State Dependent Language Models
- in Proceedings of ECSA Workshop on Interactive Dialogue in Multi-Modal Systems, Irsee
, 1999
"... Dialogue-state dependent language models in automatic inquiry systems can be employed to improve speech recognition and understanding. In this paper, the dialogue state is defined by the set of parameters contained in the system prompt. Using this knowledge, a separate language model for each state ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Dialogue-state dependent language models in automatic inquiry systems can be employed to improve speech recognition and understanding. In this paper, the dialogue state is defined by the set of parameters contained in the system prompt. Using this knowledge, a separate language model for each state can be constructed. In order to obtain robust language models we study the linear interpolation of all dialogue-state dependent language models and an automatic text clustering algorithm. In particular, we extend the clustering algorithm so as to automatically determine the optimal number of clusters. These clusters are then be combined with linear interpolation. We present experimental results on a Dutch corpus which has been recorded in the Netherlands with a train timetable information system in the framework of the ARISE project [1]. The perplexity, the word error rate, and the attribute error rate can be reduced significantly with all of these methods. 1. INTRODUCTION If the choice o...
Word Prediction via a Clustered Optimal Binary Search Tree
, 2003
"... Abstract: Word prediction methodologies depend heavily on the statistical approach that uses the unigram, bigram, and the trigram of words. However, the construction of the N-gram model requires a very large size of memory, which is beyond the capability of many existing computers. Beside this, the ..."
Abstract
- Add to MetaCart
Abstract: Word prediction methodologies depend heavily on the statistical approach that uses the unigram, bigram, and the trigram of words. However, the construction of the N-gram model requires a very large size of memory, which is beyond the capability of many existing computers. Beside this, the approximation reduces the accuracy of word prediction. In this paper, we suggest to use a cluster of computers to build an Optimal Binary Search Tree (OBST) that will be used for the statistical approach in word prediction. The OBST will contain extra links so that the bigram and the trigram of the language will be presented. In addition, we suggest the incorporation of other enhancements to achieve optimal performance of word prediction. Our experimental results showed that the suggested approach improves the keystroke saving. Keywords: Bigram, cluster computing, N-gram, unigram, trigram, word frequency, word prediction.
Robust Appearance-based Sign Language Recognition
, 2007
"... In this work, we introduce a robust appearance-based sign language recognition system which is derived from a large vocabulary speech recognition system. The system employs a large variety of methods known from automatic speech recognition research for the modeling of temporal and language specifi ..."
Abstract
- Add to MetaCart
In this work, we introduce a robust appearance-based sign language recognition system which is derived from a large vocabulary speech recognition system. The system employs a large variety of methods known from automatic speech recognition research for the modeling of temporal and language specific issues. The feature extraction part of the system is based on recent developments in image processing which model different aspects of the signs and accounts for visual variabilities in appearance. Different issues of appearance-based sign language recognition such as datasets, appearance-based features, geometric features, training, and recognition parts are investigated and analyzed. We discuss the state of the art in sign language and gesture recognition. In contrast to the proposed system, most of the existing approaches use special data acquisition tools to collect the data of the signings. The systems which use this kind of data capturing tools are not useful in practical environments. Furthermore, the datasets created within their own
The Time-Conditioned Approach in Dynamic Programming Search for LVCSR
"... Abstract—This paper presents the time-conditioned approach in dynamic programming search for large-vocabulary continuousspeech recognition. The following topics are presented: the baseline algorithm, a time-synchronous beam search version, a comparison with the word-conditioned approach, a compariso ..."
Abstract
- Add to MetaCart
Abstract—This paper presents the time-conditioned approach in dynamic programming search for large-vocabulary continuousspeech recognition. The following topics are presented: the baseline algorithm, a time-synchronous beam search version, a comparison with the word-conditioned approach, a comparison with stack decoding. The approach has been successfully tested on the NAB task using a vocabulary of 64 000 words. Index Terms—Beam search, dynamic programming, large vocabulary speech recognition, one-pass DP search, search organization, time-conditioned DP search. I.
Appearance-Based Features for Automatic Continuous Sign Language Recognition
, 2006
"... This diploma thesis investigates appearance-based features for the person-independent vision-based recognition of continuous sign language. A large variety of methods which have been successfully used for automatic speech recognition is applied to this task. Appearance-based approaches do not rely ..."
Abstract
- Add to MetaCart
This diploma thesis investigates appearance-based features for the person-independent vision-based recognition of continuous sign language. A large variety of methods which have been successfully used for automatic speech recognition is applied to this task. Appearance-based approaches do not rely on a segmentation of the images or on predefined models of the image content and use the image itself as the feature. A novel tracking algorithm is introduced and applied to hand and head tracking. The tracked body parts are used in order to calculate additional features to improve recognition performance. The presented automatic sign language recognition system is evaluated on a set of sentences in American Sign Language.

