Results 11 - 20
of
22
Abberley The THISL SDR system at TREC-9
- Proceedings of TREC-9
, 2000
"... This paper describes our participation in the TREC-9 Spoken Document Retrieval (SDR) track. The THISL SDR system consists of a realtime version of a hybrid connectionist/HMM large vocabulary speech recognition system and a probabilistic text retrieval system. This paper describes the configuration o ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper describes our participation in the TREC-9 Spoken Document Retrieval (SDR) track. The THISL SDR system consists of a realtime version of a hybrid connectionist/HMM large vocabulary speech recognition system and a probabilistic text retrieval system. This paper describes the configuration of the speech recognition and text retrieval systems, including segmentation and query expansion. We report our results for development tests using the TREC-8 queries, and for the TREC-9 evaluation. 1.
Hybrid Connectionist-Structural Acoustical Modeling In The Atros System
- In Proc. Eurospeech'99
, 1999
"... In this paper, we introduce several hybrid connectionist-structural acoustic models for contextindependent phone-like units in the atros recognition system. The structural part of the acoustic models has been modeled with Markov chains, and a multilayer perceptron (or a committee of multilayer perce ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
In this paper, we introduce several hybrid connectionist-structural acoustic models for contextindependent phone-like units in the atros recognition system. The structural part of the acoustic models has been modeled with Markov chains, and a multilayer perceptron (or a committee of multilayer perceptrons) is used to estimate the emission probabilities of the Markov chains. We compare the recognition performance attained by these models with the performance obtained by classical continuous density hidden Markov models on a semantic restricted task. 1 Introduction Acoustic phonetic-decoding for continuous speech recognition is an open problem in speech research, because the nal performance of an automatic speech recognition system greatly depends on the acoustic modeling quality. Hidden Markov models (HMMs) of phone-like units are the most popular option for modeling speech sounds. Under the statistical framework [1], the problem of speech recognition is to search for a word string ^ ...
Mlp Emulation Of N-Gram Models As A First Step To Connectionist Language Modeling
- In: Proc. of the ICANN
, 1992
"... In problems such as automatic speech recognition and machine translation, where the system response must be a sentence in a given language, language models are em- ployed in order to improve system performance. These language models are usually N-gram models (for instance, bigram or trigram models) ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In problems such as automatic speech recognition and machine translation, where the system response must be a sentence in a given language, language models are em- ployed in order to improve system performance. These language models are usually N-gram models (for instance, bigram or trigram models) which are estimated from large text databases using the occurrence frequen- cies of these N-grams.
Acoustic And Syntactical Modeling in the ATROS System
, 1999
"... Current speech technology allows us to build efficient speech recognition systems. However, model learning of knowledge sources in a speech recognition system is not a closed problem. In addition, lower demand of computational requirements are crucial to building real-time systems. ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Current speech technology allows us to build efficient speech recognition systems. However, model learning of knowledge sources in a speech recognition system is not a closed problem. In addition, lower demand of computational requirements are crucial to building real-time systems.
Modular Recurrent Neural Networks for Mandarin Syllable Recognition
, 1998
"... A new modular recurrent neural network (MRNN)- based speech-recognition method that can recognize the entire vocabulary of 1280 highly confusable Mandarin syllables is proposed in this paper. The basic idea is to first split the complicated task, in both feature and temporal domains, into several mu ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
A new modular recurrent neural network (MRNN)- based speech-recognition method that can recognize the entire vocabulary of 1280 highly confusable Mandarin syllables is proposed in this paper. The basic idea is to first split the complicated task, in both feature and temporal domains, into several much simpler subtasks involving subsyllable and tone discrimination, and then to use two weighting RNN's to generate several dynamic weighting functions to integrate the subsolutions into a complete solution. The novelty of the proposed method lies mainly in the use of appropriate a priori linguistic knowledge of simple initialfinal structures of Mandarin syllables in the architecture design of the MRNN. The resulting MRNN is therefore effective and efficient in discriminating among highly confusable Mandarin syllables. Thus both the time-alignment and scaling problems of the ANN-based approach for large-vocabulary speech-recognition can be addressed. Experimental results show that the proposed method and its extensions, the reverse-time MRNN (Rev-MRNN) and bidirection MRNN (Bi-MRNN), all outperform an advanced HMM method trained with the MCE/GPD algorithm in both recognition-rate and system complexity.
The Thisl Spoken Document Retrieval System
, 1998
"... INTRODUCTION The THISL spoken document retrieval system is based on the ABBOT Large Vocabulary Continuous Speech Recognition (LVCSR) system developed by Cambridge University, Sheffield University and SoftSound, and uses PRISE (NIST) for indexing and retrieval. We participated in full SDR mode. Our ..."
Abstract
- Add to MetaCart
INTRODUCTION The THISL spoken document retrieval system is based on the ABBOT Large Vocabulary Continuous Speech Recognition (LVCSR) system developed by Cambridge University, Sheffield University and SoftSound, and uses PRISE (NIST) for indexing and retrieval. We participated in full SDR mode. Our approach was to transcribe the spoken documents at the word level using ABBOT, indexing the resulting text transcriptions using PRISE. The LVCSR system uses a recurrent network-based acoustic model (with no adaptation to different conditions) trained on the 50 hour Broadcast News training set, a 65,000 word vocabulary and a trigram language model derived from Broadcast News text. Words in queries which were out-of-vocabulary (OOV) were word spotted at query time (utilizing the posterior phone probabilities output by the acoustic model), added to the transcriptions of the relevant documents and the collection was then re-indexed. We generated pronunciati
Fast Phoneme Look-Ahead in the ATROS system
- Accepted in VIII Spanish Symposium on Pattern Recognition and Image Analysis
, 1999
"... Current speech recognition systems require a lot of computational resources to decode an input utterance. Many efforts have been done in order to reduce these requirements. One of the techniques that is being explored is the fast phoneme look-ahead. The idea is to compute quickly approximate scor ..."
Abstract
- Add to MetaCart
Current speech recognition systems require a lot of computational resources to decode an input utterance. Many efforts have been done in order to reduce these requirements. One of the techniques that is being explored is the fast phoneme look-ahead. The idea is to compute quickly approximate scores in order to prune little promising hypothesis. These scores are computed by using simple phone-like units and analysing an acoustic segment look-ahead.
The Thisl Sdr System At Trec-8
- Proc. of the 8th Text Retrieval Conference TREC-8, Nov 1999. Martine Adda-Decker, Gilles Adda
"... This paper describes the participation of the THISL group at the TREC-8 Spoken Document Retrieval (SDR) track. The THISL SDR system consists of the realtime version of the ABBOT large vocabulary speech recognition system and the THISLIR text retrieval system. The TREC-8 evaluation assessed SDR perfo ..."
Abstract
- Add to MetaCart
This paper describes the participation of the THISL group at the TREC-8 Spoken Document Retrieval (SDR) track. The THISL SDR system consists of the realtime version of the ABBOT large vocabulary speech recognition system and the THISLIR text retrieval system. The TREC-8 evaluation assessed SDR performance on a corpus of 500 hours of broadcast news material collected over a five month period. The main test condition involved retrieval of stories defined by manual segmentation of the corpus in which non-news material, such as commercials, were excluded. An optional test condition required required retrieval of the same stories from the unsegmented audio stream. The THISL SDR system participated at both test conditions. The results show that a system such as THISL can produce respectable information retrieval performance on a realistically-sized corpus of unsegmented audio material. 1. INTRODUCTION The TREC-8 test collection was obtained from the TDT2 corpus and consisted of 902 shows (...
Speech Recognition Issues for Dutch Spoken Document Retrieval
, 2001
"... In this paper, ongoing work on the development of the speech recognition modules of MMIR environment for Dutch is described. The work on the generation of acoustic models and language models along with their current performance is presented. Some characteristics of the Dutch language and of the ..."
Abstract
- Add to MetaCart
In this paper, ongoing work on the development of the speech recognition modules of MMIR environment for Dutch is described. The work on the generation of acoustic models and language models along with their current performance is presented. Some characteristics of the Dutch language and of the target video archives that require special treatment are discussed.

