Results 1 -
8 of
8
Connectionist speech recognition of Broadcast News
, 2002
"... This paper describes connectionist techniques for recognition of Broadcast News. The fundamental difference between connectionist systems and more conventional mixture-of-Gaussian systems is that connectionist models directly estimate posterior probabilities as opposed to likelihoods. Access to post ..."
Abstract
-
Cited by 28 (10 self)
- Add to MetaCart
This paper describes connectionist techniques for recognition of Broadcast News. The fundamental difference between connectionist systems and more conventional mixture-of-Gaussian systems is that connectionist models directly estimate posterior probabilities as opposed to likelihoods. Access to posterior probabilities has enabled us to develop a number of novel approaches to confidence estimation, pronunciation modelling and search. In addition we have investigated a new feature extraction technique based on the modulation-filtered spectrogram (MSG), and methods for combining multiple information sources. We have incorporated all of these techniques into a system for the transcription
Start-synchronous search for large vocabulary continuous speech recognition
- IEEE Trans. Speech and Audio Processing
"... Abstract — In this paper, we present a novel, efficient search strategy for large vocabulary continuous speech recognition. The search algorithm, based on a stack decoder framework, utilizes phone-level posterior probability estimates (produced by a connectionist/hidden Markov model acoustic model) ..."
Abstract
-
Cited by 17 (9 self)
- Add to MetaCart
Abstract — In this paper, we present a novel, efficient search strategy for large vocabulary continuous speech recognition. The search algorithm, based on a stack decoder framework, utilizes phone-level posterior probability estimates (produced by a connectionist/hidden Markov model acoustic model) as a basis for phone deactivation pruning—a highly efficient method of reducing the required computation. The single-pass algorithm is naturally factored into the time-asynchronous processing of the word sequence and the time-synchronous processing of the hidden Markov model state sequence. This enables the search to be decoupled from the language model while still maintaining the computational benefits of time-synchronous processing. The incorporation of the language model in the search is discussed and computationally cheap approximations to the full language model are introduced. Experiments were performed on the North American Business News task using a 60 000 word vocabulary and a trigram language model. Results indicate that the computational cost of the search may be reduced by more than a factor of 40 with a relative search error of less than 2 % using the techniques discussed in the paper. Index Terms — Hidden Markov model, large vocabulary continuous speech recognition, phone deactivation pruning, search, stack decoding. I.
Efficient Evaluation Of The Lvcsr Search Space Using The Noway Decoder
- In ICASSP
, 1996
"... This work further develops and analyses the large vocabulary continuous speech recognition (LVCSR) search strategy reported at ICASSP-95 [1]. In particular, the posteriorbased phone deactivation pruning approach has been extended to include phone-dependent thresholds and an improved estimate of the ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
This work further develops and analyses the large vocabulary continuous speech recognition (LVCSR) search strategy reported at ICASSP-95 [1]. In particular, the posteriorbased phone deactivation pruning approach has been extended to include phone-dependent thresholds and an improved estimate of the least upper bound on the utterance log-probability has been developed. Analysis of the pruning procedures and of the search's interaction with the language model has also been performed. Experiments were carried out using the ARPA North American Business News task with a 20,000 word vocabulary and a trigram language model. As a result of these improvements and analyses, the computational cost of the recognition process performed by the noway decoder has been substantially reduced. 1. INTRODUCTION At ICASSP-95, we introduced an efficient search procedure [1] that was implemented as a software decoder known as noway and used in the Abbot hybrid connectionist/ HMM LVCSR system [2, 3]. Key fea...
The THISL Spoken Document Retrieval System
- In TREC-6
, 1998
"... THISL is an ESPRIT Long Term Research Project focused the development and construction of a system to items from an archive of television and radio news broadcasts. In this paper we outline our spoken document retrieval system based on the ABBOT speech recognizer and a text retrieval system based on ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
THISL is an ESPRIT Long Term Research Project focused the development and construction of a system to items from an archive of television and radio news broadcasts. In this paper we outline our spoken document retrieval system based on the ABBOT speech recognizer and a text retrieval system based on Okapi term-weighting . The system has been evaluated as part of the TREC-6 and TREC-7 spoken document retrieval evaluations and we report on the results of the TREC-7 evaluation based on a document collection of 100 hours of North American broadcast news. Keywords: Multimedia Information Retrieval; Spoken Document Retrieval; Speech Recognition; Broadcast Data. 1 INTRODUCTION THISL is an ESPRIT Long Term Research project in the area of speech retrieval. It is concerned with the construction of a system which performs good recognition of broadcast speech from television and radio news programmes, from which it can produce multimedia indexing data. The project is concentrating on British an...
Dynaspeak: SRI’s scalable speech recognizer for embedded and mobile systems
- in Proceedsings of HLT
, 2002
"... We introduce SRI’s new speech recognition engine, DynaSpeak TM, which is characterized by its scalability and flexibility, high recognition accuracy, memory and speed efficiency, adaptation capability, efficient grammar optimization, support for natural language parsing functionality, and operation ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
We introduce SRI’s new speech recognition engine, DynaSpeak TM, which is characterized by its scalability and flexibility, high recognition accuracy, memory and speed efficiency, adaptation capability, efficient grammar optimization, support for natural language parsing functionality, and operation based on integer arithmetic. These features are designed to address the needs of the fast-developing and changing domain of embedded and mobile computing platforms.
A New Verification-Based Fast Match Approach To Large Vocabulary Constinuous Speech Recognition
- Proc. of European Conference on Speech Communication and Technology
, 2001
"... Acoustic fast match is usually used to accelerate search in large vocabulary continuous speech recognition. This paper discusses a new acoustic fast match algorithm. This proposed fast match is based on incremental evaluation of the score and the use of normalized likelihood scores. This is in contr ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Acoustic fast match is usually used to accelerate search in large vocabulary continuous speech recognition. This paper discusses a new acoustic fast match algorithm. This proposed fast match is based on incremental evaluation of the score and the use of normalized likelihood scores. This is in contrast to more traditional fast matches where a likelihood score is used. In addition, streaming SIMD extensions (SSE) for Intel machine instructions are used for fast Gaussian calculation. Results on a 20K Japanese broadcast news task show that the proposed fast match leads to about 30% improvement in speed with a slight performance degradation.
The Thisl Spoken Document Retrieval System
, 1998
"... INTRODUCTION The THISL spoken document retrieval system is based on the ABBOT Large Vocabulary Continuous Speech Recognition (LVCSR) system developed by Cambridge University, Sheffield University and SoftSound, and uses PRISE (NIST) for indexing and retrieval. We participated in full SDR mode. Our ..."
Abstract
- Add to MetaCart
INTRODUCTION The THISL spoken document retrieval system is based on the ABBOT Large Vocabulary Continuous Speech Recognition (LVCSR) system developed by Cambridge University, Sheffield University and SoftSound, and uses PRISE (NIST) for indexing and retrieval. We participated in full SDR mode. Our approach was to transcribe the spoken documents at the word level using ABBOT, indexing the resulting text transcriptions using PRISE. The LVCSR system uses a recurrent network-based acoustic model (with no adaptation to different conditions) trained on the 50 hour Broadcast News training set, a 65,000 word vocabulary and a trigram language model derived from Broadcast News text. Words in queries which were out-of-vocabulary (OOV) were word spotted at query time (utilizing the posterior phone probabilities output by the acoustic model), added to the transcriptions of the relevant documents and the collection was then re-indexed. We generated pronunciati
DynaSpeak: SRI's Scalable Speech Recognizer for
- in Proceedsings of HLT
, 2002
"... We introduce SRI's new speech recognition engine, , which is characterized by its scalability and flexibility, high recognition accuracy, memory and speed efficiency, adaptation capability, efficient grammar optimization, support for natural language parsing functionality, and operation based on i ..."
Abstract
- Add to MetaCart
We introduce SRI's new speech recognition engine, , which is characterized by its scalability and flexibility, high recognition accuracy, memory and speed efficiency, adaptation capability, efficient grammar optimization, support for natural language parsing functionality, and operation based on integer arithmetic. These features are designed to address the needs of the fast-developing and changing domain of embedded and mobile computing platforms.

