Results 21 - 30
of
38
An RNN-Based Pre-classification Method for Fast Continuous Mandarin Speech Recognition
- IEEE Trans. Speech Audio Processing
"... A novel RNN-based front-end pre-classification scheme for fast continuous Mandarin speech recognition is proposed in this paper. First, an RNN is employed to discriminate each input frame for the three broad classes of initial, final, and silence. A finite state machine (FSM) is then used to classif ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
A novel RNN-based front-end pre-classification scheme for fast continuous Mandarin speech recognition is proposed in this paper. First, an RNN is employed to discriminate each input frame for the three broad classes of initial, final, and silence. A finite state machine (FSM) is then used to classify the input frame into four states including three stable states of Initial (I), Final (F), and Silence (S), and a Transient (T) state. The decision is made based on examining whether the RNN discriminates well between classes. We then restrict the search space for the three stable states in the following DP search to speed up the recognition process. Efficiency of the proposed scheme was examined by simulations in which we incorporate it with an HMMbased continuous 411 Mandarin base-syllables recognizer. Experimental results showed that it can be used in conjunction with the beam search to greatly reduce the computational complexity of the HMM recognizer while keeping the recognition rate a...
Look-Ahead Techniques For Improved Beam Search
- In Proc. of the CRIM-FORWISS Workshop
, 1996
"... . This paper presents two look-ahead techniques for large vocabulary continuous speech recognition. These two techniques, which are referred to as language model look-ahead and phoneme look-ahead, are incorporated into the pruning process of the time-synchronous one-pass beam search algorithm. The s ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
. This paper presents two look-ahead techniques for large vocabulary continuous speech recognition. These two techniques, which are referred to as language model look-ahead and phoneme look-ahead, are incorporated into the pruning process of the time-synchronous one-pass beam search algorithm. The search algorithm is based on a tree-organized pronunciation lexicon in connection with a bigram language model. Both look-ahead techniques have been tested on the 20 000-word NAB'94 task (ARPA North American Business Corpus). The recognition experiments show that the combination of bigram language model look-ahead and phoneme look-ahead reduces the size of search space by a factor of about 27 without affecting the word recognition accuracy. 1 Introduction In this paper, we describe two look-ahead techniques for improved beam search, namely language model look-ahead and phoneme look-ahead, for large vocabulary continuous speech recognition. The basic idea of the language model look-ahead is t...
Towards A Compact Speech Recognizer: Subspace Distribution Clustering Hidden Markov Model
, 1998
"... : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xiii 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 The Problem: Too Many Parameters : : : : : : : : : : : : : : : : : : : : : : 3 1.2 Proposed Solution: It Is Time to ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xiii 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 The Problem: Too Many Parameters : : : : : : : : : : : : : : : : : : : : : : 3 1.2 Proposed Solution: It Is Time to Share More! : : : : : : : : : : : : : : : : : 4 1.3 Thesis Summary and Outline : : : : : : : : : : : : : : : : : : : : : : : : : : 6 2 Review of Acoustic Modeling Using Hidden Markov Model : : : : : : : 9 2.1 Speech Characteristics : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 2.2 Selection of Input Speech Space and Speech Model : : : : : : : : : : : : : : 10 2.2.1 Cepstral Input : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10 2.2.2 Hidden Markov Model : : : : : : : : : : : : : : : : : : : : : : : : : : 11 2.2.3 Our Choice of HMM for Acoustic Modeling : : : : : : : : : : : : : : 14 2.3 Speech Unit to Model : : : : : : : : : : : : : : : : : : : : : : : : : : ...
Sooner Or Later: Exploring Asynchrony In Multi-Band Speech Recognition
- Proceedings of Eurospeech-99, Budapest
, 1999
"... Multi-band speech recognition is an exploratory paradigm in which each frequency region is treated as a distinct source of information and the streams are combined after each is processed independently. A number of researchers have hypothesized that it is advantageous to combine the sub-frequency in ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Multi-band speech recognition is an exploratory paradigm in which each frequency region is treated as a distinct source of information and the streams are combined after each is processed independently. A number of researchers have hypothesized that it is advantageous to combine the sub-frequency information in an asynchronous manner. This paper examines this hypothesis, using two different approaches in relaxing synchrony constraints: HMM decomposition/recombination [19] and two-level dynamic programming (DP) [16]. Drawing on this work and those of others [2, 18], we conclude that relaxing the synchrony constraints indiscriminately for all phone-to-phone transitions does not consistently and significantly reduce the word error rate. The optimal permissible asynchrony must depend on both the phone-class transitions and the training-data statistics. 1. INTRODUCTION Multi-band approaches have generated a great deal of interest in the automatic speech recognition (ASR) community [9, 2,...
A Fast Version Of The ATROS System
, 1999
"... Atros is an automatic speech recognition/understanding /translation system whose knowledge sources (acoustic models, lexical models, syntactic language models, semantic models and translation models) can be learnt automatically from training data by using similar techniques. The search process in At ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Atros is an automatic speech recognition/understanding /translation system whose knowledge sources (acoustic models, lexical models, syntactic language models, semantic models and translation models) can be learnt automatically from training data by using similar techniques. The search process in Atros is performed through a Synchronous Beam Search technique. In this paper, a faster version of Atros is presented and evaluated. This version supports improved acoustic and syntactical models. It also incorporates improved search algorithms to reduce and the computational requirements for decoding: Fast Phoneme Look-Ahead and Histogram Pruning. The system has been tested on a Spanish task of queries to a geographical database (with a vocabulary of 1,264 words). The best result achieved (in real time) was 7.10% of word error rate. 1 System overview Optimal speech decoding based on a search process in an integrated network of different knowledge sources is a hard computational problem [1]. ...
Acoustic And Syntactical Modeling in the ATROS System
, 1999
"... Current speech technology allows us to build efficient speech recognition systems. However, model learning of knowledge sources in a speech recognition system is not a closed problem. In addition, lower demand of computational requirements are crucial to building real-time systems. ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Current speech technology allows us to build efficient speech recognition systems. However, model learning of knowledge sources in a speech recognition system is not a closed problem. In addition, lower demand of computational requirements are crucial to building real-time systems.
Phonetic And Prosodic Analysis Of Speech
- Modern Modes of Man-Machine Communication
, 1994
"... : In order to cope with the problems of spontaneous speech (including, for example, hesitations and non-words) it is necessary to extract from the speech signal all information it contains. Modeling of words by segmental units should be supported by suprasegmental units since valuable information is ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
: In order to cope with the problems of spontaneous speech (including, for example, hesitations and non-words) it is necessary to extract from the speech signal all information it contains. Modeling of words by segmental units should be supported by suprasegmental units since valuable information is represented in the prosody of an utterance. We present an approach to flexible and efficient modeling of speech by segmental units and describe extraction and use of suprasegmental information. Keywords: speech recognition, hidden Markov models, prosody, INTRODUCTION This paper presents an approach towards statistical modeling and use of segmental and suprasegmental information in a speech signal. We treat the aspects of word recognition and improvement of linguistic analysis by suprasegmental information. Sect. 1 gives an account of acoustic--phonetic analysis in the ISADORA system for word recognition. It will be demonstrated that it is general enough to also include prosodic informati...
A New Verification-Based Fast Match Approach To Large Vocabulary Constinuous Speech Recognition
- Proc. of European Conference on Speech Communication and Technology
, 2001
"... Acoustic fast match is usually used to accelerate search in large vocabulary continuous speech recognition. This paper discusses a new acoustic fast match algorithm. This proposed fast match is based on incremental evaluation of the score and the use of normalized likelihood scores. This is in contr ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Acoustic fast match is usually used to accelerate search in large vocabulary continuous speech recognition. This paper discusses a new acoustic fast match algorithm. This proposed fast match is based on incremental evaluation of the score and the use of normalized likelihood scores. This is in contrast to more traditional fast matches where a likelihood score is used. In addition, streaming SIMD extensions (SSE) for Intel machine instructions are used for fast Gaussian calculation. Results on a 20K Japanese broadcast news task show that the proposed fast match leads to about 30% improvement in speed with a slight performance degradation.
The LIMSI SDR System for TREC-9
, 2000
"... In this paper we describe the LIMSI Spoken Document Retrieval system used in the TREC-9 evaluation. This system combines an adapted version of the LIMSI 1999 Hub-4E transcription system for speech recognition with text-based IR methods. Compared with the LIMSI TREC-8 system, this year's system is ab ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper we describe the LIMSI Spoken Document Retrieval system used in the TREC-9 evaluation. This system combines an adapted version of the LIMSI 1999 Hub-4E transcription system for speech recognition with text-based IR methods. Compared with the LIMSI TREC-8 system, this year's system is able to index the audio data without knowledge of the story boundaries using a double windowing approach. The query expansion procedure of the information retrieval component has been revised and makes use of contemporaneous text sources. Experimental results are reported in terms of mean average precision for both the TREC SDR'99 and SDR'00 queries using the same 557h data set. The mean average precision of this year's system is 0.5250 for SDR'99 and 0.3706 for SDR'00 for the focus unknown story boundary condition with a 20% word error rate.
CarpeDiem: Optimizing the Viterbi Algorithm and Applications to Supervised Sequential Learning
"... The growth of information available to learning systems and the increasing complexity of learning tasks determine the need for devising algorithms that scale well with respect to all learning parameters. In the context of supervised sequential learning, the Viterbi algorithm plays a fundamental role ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The growth of information available to learning systems and the increasing complexity of learning tasks determine the need for devising algorithms that scale well with respect to all learning parameters. In the context of supervised sequential learning, the Viterbi algorithm plays a fundamental role, by allowing the evaluation of the best (most probable) sequence of labels with a time complexity linear in the number of time events, and quadratic in the number of labels. In this paper we propose CarpeDiem, a novel algorithm allowing the evaluation of the best possible sequence of labels with a sub-quadratic time complexity. 1 We provide theoretical grounding together with solid empirical results supporting two chief facts. CarpeDiem always finds the optimal solution requiring, in most cases, only a small fraction of the time taken by the Viterbi algorithm; meantime, CarpeDiem is never asymptotically worse than the Viterbi algorithm, thus confirming it as a sound replacement.

