Results 11 - 20
of
25
Efficient Evaluation Of The Lvcsr Search Space Using The Noway Decoder
- In ICASSP
, 1996
"... This work further develops and analyses the large vocabulary continuous speech recognition (LVCSR) search strategy reported at ICASSP-95 [1]. In particular, the posteriorbased phone deactivation pruning approach has been extended to include phone-dependent thresholds and an improved estimate of the ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
This work further develops and analyses the large vocabulary continuous speech recognition (LVCSR) search strategy reported at ICASSP-95 [1]. In particular, the posteriorbased phone deactivation pruning approach has been extended to include phone-dependent thresholds and an improved estimate of the least upper bound on the utterance log-probability has been developed. Analysis of the pruning procedures and of the search's interaction with the language model has also been performed. Experiments were carried out using the ARPA North American Business News task with a 20,000 word vocabulary and a trigram language model. As a result of these improvements and analyses, the computational cost of the recognition process performed by the noway decoder has been substantially reduced. 1. INTRODUCTION At ICASSP-95, we introduced an efficient search procedure [1] that was implemented as a software decoder known as noway and used in the Abbot hybrid connectionist/ HMM LVCSR system [2, 3]. Key fea...
Phonetic Context-Dependency In a Hybrid ANN/HMM Speech Recognition System
, 1997
"... This report uses a bark scale, which has been replaced here with a mel-scale. CHAPTER 3. THE ABBOT SPEECH RECOGNITION SYSTEM 32 where, ¯ i = 1 ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
This report uses a bark scale, which has been replaced here with a mel-scale. CHAPTER 3. THE ABBOT SPEECH RECOGNITION SYSTEM 32 where, ¯ i = 1
The THISL Spoken Document Retrieval System
- In TREC-6
, 1998
"... THISL is an ESPRIT Long Term Research Project focused the development and construction of a system to items from an archive of television and radio news broadcasts. In this paper we outline our spoken document retrieval system based on the ABBOT speech recognizer and a text retrieval system based on ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
THISL is an ESPRIT Long Term Research Project focused the development and construction of a system to items from an archive of television and radio news broadcasts. In this paper we outline our spoken document retrieval system based on the ABBOT speech recognizer and a text retrieval system based on Okapi term-weighting . The system has been evaluated as part of the TREC-6 and TREC-7 spoken document retrieval evaluations and we report on the results of the TREC-7 evaluation based on a document collection of 100 hours of North American broadcast news. Keywords: Multimedia Information Retrieval; Spoken Document Retrieval; Speech Recognition; Broadcast Data. 1 INTRODUCTION THISL is an ESPRIT Long Term Research project in the area of speech retrieval. It is concerned with the construction of a system which performs good recognition of broadcast speech from television and radio news programmes, from which it can produce multimedia indexing data. The project is concentrating on British an...
The 1995 Abbot Lvcsr System For Multiple Unknown Microphones
- IN INT. CONF. IN SPOKEN LANGUAGE PROCESSING
, 1996
"... ABBOT is the hybrid connectionist-hidden Markov model largevocabulary speech recognition system developed at Cambridge University. In this system, a recurrent network maps each acoustic vector to an estimate of the posterior probabilities of the phone classes, which are used as observation probabili ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
ABBOT is the hybrid connectionist-hidden Markov model largevocabulary speech recognition system developed at Cambridge University. In this system, a recurrent network maps each acoustic vector to an estimate of the posterior probabilities of the phone classes, which are used as observation probabilities within an HMM. This paper describes the system which participated in the November 1995 ARPA Hub-3 Multiple Unknown Microphones (MUM) evaluation of continuous speech recognition systems, under the guise of the CU-CON system. The emphasis of the paper is on the changes made to the 1994 ABBOT system, specifically to accomodate the H3 task. This includes improved acoustic modelling using limited word-internal context-dependentmodels, training on the Wall Street Journal secondary channel database, and using the linear input network for speaker and environmental adaptation. Experimental results are reported for various test and development sets from the November 1994 and 1995 ARPA benchmark tests.
The 1995 Abbot Hybrid Connectionist-HMM Large-Vocabulary Recognition System
"... Abbot is the hybrid connectionist-hidden Markov model large-vocabulary speech recognition system developed at Cambridge University. In this system, a recurrent network maps each acoustic vector to an estimate of the posterior probabilities of the phone classes. This paper describes the system which ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abbot is the hybrid connectionist-hidden Markov model large-vocabulary speech recognition system developed at Cambridge University. In this system, a recurrent network maps each acoustic vector to an estimate of the posterior probabilities of the phone classes. This paper describes the system which participated in the November 1995 ARPA H3 Multiple Unknown Microphones (MUM) evaluation of continuous speech recognition systems, under the guise of the CU-CON system. The emphasis of the paper is on the changes made to the 1994 Abbot system, specifically to accomodate the H3 task. This includes improved acoustic modelling using limited word-internal context-dependent models, training on the Wall Street Journal secondary channel database, the linear input network for speaker and environmental adaptation and the continued development of a realtime single-pass decoder well suited to the hybrid approach. Experimental results are reported for various test and development sets from the November 1...
Smoothed Local Adaptation Of Connectionist Systems
- PROC. ICSLP
, 1996
"... ABBOT is the hybrid connectionist hidden Markov model (HMM) large vocabulary continuous speech recognition system developed at Cambridge University Engineering Department. abbot makes effective use of the linear input network (LIN) adaptation technique to achieve speaker and channel adaptation. Alt ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
ABBOT is the hybrid connectionist hidden Markov model (HMM) large vocabulary continuous speech recognition system developed at Cambridge University Engineering Department. abbot makes effective use of the linear input network (LIN) adaptation technique to achieve speaker and channel adaptation. Although the LIN is effective at adapting to new speakers or a new environment (e.g. a different microphone), the transform is global over the input space. In this paper we describe a technique by which the transform may be made locally linear over different regions of the input space. The local linear transforms are combined by an additional network using a non-linear transform. This scheme falls naturally into the mixtures of experts framework.
Look-Ahead Techniques For Improved Beam Search
- In Proc. of the CRIM-FORWISS Workshop
, 1996
"... . This paper presents two look-ahead techniques for large vocabulary continuous speech recognition. These two techniques, which are referred to as language model look-ahead and phoneme look-ahead, are incorporated into the pruning process of the time-synchronous one-pass beam search algorithm. The s ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
. This paper presents two look-ahead techniques for large vocabulary continuous speech recognition. These two techniques, which are referred to as language model look-ahead and phoneme look-ahead, are incorporated into the pruning process of the time-synchronous one-pass beam search algorithm. The search algorithm is based on a tree-organized pronunciation lexicon in connection with a bigram language model. Both look-ahead techniques have been tested on the 20 000-word NAB'94 task (ARPA North American Business Corpus). The recognition experiments show that the combination of bigram language model look-ahead and phoneme look-ahead reduces the size of search space by a factor of about 27 without affecting the word recognition accuracy. 1 Introduction In this paper, we describe two look-ahead techniques for improved beam search, namely language model look-ahead and phoneme look-ahead, for large vocabulary continuous speech recognition. The basic idea of the language model look-ahead is t...
A Continuous Density Interpretation of Discrete HMM Systems and MMI-Neural Networks
- IEEE Transactions on Speech and Audio Processing
, 2001
"... The subject of this paper is the integration of the traditional vector quantizer (VQ) and discrete hidden Markov models (HMM) combination in the mixture emission density framework commonly used in automatic speech recognition (ASR). It is shown that the probability density of a system that consists ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The subject of this paper is the integration of the traditional vector quantizer (VQ) and discrete hidden Markov models (HMM) combination in the mixture emission density framework commonly used in automatic speech recognition (ASR). It is shown that the probability density of a system that consists of a VQ and a discrete classifier can be interpreted as a special case of a semicontinuous mixture model. Thus, the VQ parameters and the classifier can be trained jointly. In this framework, a gradient based VQ training method for single and multiple feature stream systems is derived. This leads to an approach that is directly related to the paradigm of maximum mutual information (MMI) neural networks, that has been successfully applied as VQ in ASR earlier. In continuous speech recognition experiments that were carried out for the Resource Management and Wall Street Journal databases the presented systems achieve recognition accuracies that compete well with comparable Gaussian mixture HMMs. Thus, we demonstrate that the performance degradations, often reported for discrete HMM systems, are not mainly caused by the vector quantization process in itself, but that they are due to the traditional separation of the VQ and the HMM during parameter estimation. These degradations can be avoided by training of the entire system as described here, while keeping the attractive computational speed of discrete HMMs.
Hybrid Speech Recognition Systems: A Real Alternative To Traditional Approaches?
- Survey Lecture, Proc. International Workshop Speech and Computer (SPECOM'98
, 1998
"... In this paper, an introduction to hybrid modeling techniques for speech recognition is presented. A hybrid speech recognition system consists of the combination of Hidden Markov Models (HMMs) with Neural Networks (NNs) in order to combine the advantages of these two powerful pattern recognition tech ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
In this paper, an introduction to hybrid modeling techniques for speech recognition is presented. A hybrid speech recognition system consists of the combination of Hidden Markov Models (HMMs) with Neural Networks (NNs) in order to combine the advantages of these two powerful pattern recognition techniques for improved speech recognition. An overview of several different hybrid speech recognition approaches is presented, and special emphasis is given to the establishment of relationships between these different techniques and traditional speech recognition techniques. In this way, it is demonstrated that all popular speech recognition techniques are more or less related to each other, but that hybrid approaches can still be considerd to be an interesting alternative to traditional techniques and will take an important role in future speech technology research. 1. INTRODUCTION The technique of Hidden Markov Models (HMMs) has emerged as the dominating speech technology since the late 19...
Efficient Search With Posterior Probability Estimates In Hmm-Based Speech Recognition
- in Proc. Int. Conf. Acoustics, Speech and Signal Processing
, 1998
"... In this paper we present the methods we developed to estimate posterior probabilities for HMM states in continuous and discrete HMM-based speech recognition systems and several ways to speed up decoding by using these posterior probability estimates. The proposed pruning techniques are State Deactiv ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper we present the methods we developed to estimate posterior probabilities for HMM states in continuous and discrete HMM-based speech recognition systems and several ways to speed up decoding by using these posterior probability estimates. The proposed pruning techniques are State Deactivation Pruning (SDP), similar to an approach proposed for hybrid recognition systems, and a novel posteriori-based lookahead technique, Posteriori Lookahead Pruning (PLP), that evaluates future posteriors in order to exclude unlikely HMM states as early as possible during search. By applying the proposed methods we managed to vastly reduce the decoding time consumed by our time-synchronous Viterbi -decoder for recognition systems based on the Verbmobil and the Wall Street Journal database with hardly any additional search error. 1. INTRODUCTION With the introduction of long-span language models, very large vocabularies and context-dependent acoustic models, the problem of an efficient searc...

