Results 1 -
8 of
8
Spoken document retrieval for TREC-8 at Cambridge University
- IN PROC. TREC-8
, 2000
"... This paper presents work done at Cambridge University on the TREC-8 Spoken Document Retrieval (SDR) Track. The 500 hours of broadcast news audio was filtered using an automatic scheme for detecting commercials, and then transcribed using a 2-pass HTK speech recogniser which ran at 13 times real time ..."
Abstract
-
Cited by 22 (5 self)
- Add to MetaCart
This paper presents work done at Cambridge University on the TREC-8 Spoken Document Retrieval (SDR) Track. The 500 hours of broadcast news audio was filtered using an automatic scheme for detecting commercials, and then transcribed using a 2-pass HTK speech recogniser which ran at 13 times real time. The system gave an overall word error rate of 20.5 % on the 10 hour scored subset of the corpus, the lowest in the track. Our retrieval engine used an Okapi scheme with traditional stopping and Porter stemming, enhanced with part-of-speech weighting on query terms, a stemmer exceptions list, semantic ‘poset ’ indexing, parallel collection frequency weighting, both parallel and traditional blind relevance feedback and document expansion using parallel blind relevance feedback. The final system gave an Average Precision of 55.29 % on our transcriptions. For the case where story boundaries are unknown, a similar retrieval system, without the document expansion, was run on a set of “stories ” derived from windowing the transcriptions after removal of commercials. Boundaries were forced at “commercial” or “music” changes and some recombination of temporally close stories was allowed after retrieval. When scoring duplicate story hits and commercials as irrelevant, this system gave an Average Precision of 41.47 % on our transcriptions. The paper also presents results for cross-recogniser experiments using our retrieval strategies on transcriptions from our own first pass output, AT&T, CMU, 2 NIST-run BBN baselines, LIMSI and Sheffield University, and the relationship between performance and transcription error rate is shown.
Audio Indexing and Retrieval of Complete Broadcast News Shows
, 2000
"... This paper describes a system for retrieving relevant portions of complete broadcast news shows starting with only the audio data. A novel system of automatically detecting and removing commercials is described and shown to increase the performance of the system whilst also reducing the computationa ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
This paper describes a system for retrieving relevant portions of complete broadcast news shows starting with only the audio data. A novel system of automatically detecting and removing commercials is described and shown to increase the performance of the system whilst also reducing the computational effort required. The sophisticated large vocabulary speech recogniser which produces the high-quality transcriptions and the window-based retrieval system with post-merging are also described. Results are
Automatic Language Model Adaptation for Spoken Document Retrieval
- in Proceedings of RIAO 2000 Conference on Content-Based Multimedia Information Access
, 2000
"... This paper describes experiments implemented at NIST in adapting language models over time to improve recognition of broadcast news recorded over many months. These experiments were designed specifically to improve the utility of automatically generated transcripts for retrieval applications. To eva ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
This paper describes experiments implemented at NIST in adapting language models over time to improve recognition of broadcast news recorded over many months. These experiments were designed specifically to improve the utility of automatically generated transcripts for retrieval applications. To evaluate the potential of the approach, a time-adaptive automatic speech recognition run was implemented to support the 1999 TREC Spoken Document Retrieval (SDR) Track -- more than 500 hours of broadcast news sampled across 5 months. The accuracy of retrieval for several systems using the time-adaptive system transcripts was evaluated against transcripts produced by virtually the same recognition system with a fixed language model. This paper details the process we employed to identify and implement the time-adaptive language model and discusses the results of the experiment in terms of its effect on word error rate, out of vocabulary rate and retrieval accuracy (Mean Average Precision). 1. I...
The Cambridge University Multimedia Document Retrieval Demo System
- International Journal of Speech Technology
, 2000
"... The Cambridge University Multimedia Document Retrieval Demo System is a web based application that allows the user to query a database of automatically generated transcripts of radio broadcasts that are available on-line. The paper describes how speech recognition and information retrieval technique ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
The Cambridge University Multimedia Document Retrieval Demo System is a web based application that allows the user to query a database of automatically generated transcripts of radio broadcasts that are available on-line. The paper describes how speech recognition and information retrieval techniques are combined in this system and shows how the user can interact with it. 1 Introduction To provide content-specific access to the vast amount of text data that are available on the Internet, search engines have been developed that operate on text documents of various formats (e.g. html). Since there is an increasing amount of audio data containing speech on the Internet, a similar device is desirable that operates automatically on audio streams without the need of manual transcription. The Cambridge University Multimedia Document Retrieval (CU-MDR) demo system tries to fill this gap.
Abberley The THISL SDR system at TREC-9
- Proceedings of TREC-9
, 2000
"... This paper describes our participation in the TREC-9 Spoken Document Retrieval (SDR) track. The THISL SDR system consists of a realtime version of a hybrid connectionist/HMM large vocabulary speech recognition system and a probabilistic text retrieval system. This paper describes the configuration o ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper describes our participation in the TREC-9 Spoken Document Retrieval (SDR) track. The THISL SDR system consists of a realtime version of a hybrid connectionist/HMM large vocabulary speech recognition system and a probabilistic text retrieval system. This paper describes the configuration of the speech recognition and text retrieval systems, including segmentation and query expansion. We report our results for development tests using the TREC-8 queries, and for the TREC-9 evaluation. 1.
The 1998 HTK Broadcast News Transcription System: Development and Results
, 1999
"... This paper presents the development of the HTK broadcast news transcription system for the November 1998 Hub4 evaluation. Relative to the previous year's system The system a number of features were added including vocal tract length normalisation; cluster-based variance normalisation; double the qua ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper presents the development of the HTK broadcast news transcription system for the November 1998 Hub4 evaluation. Relative to the previous year's system The system a number of features were added including vocal tract length normalisation; cluster-based variance normalisation; double the quantity of acoustic training data; interpolated word level language models to combine text sources; increased broadcast news language model training data; and an extra adaptation stage using a full-variance transform. Overall these changes to the system reduced the error rate by 13% on the 1997 evaluation data and the final system had an overall word error rate of 13.8% for the 1998 evaluation data sets.
Automatic Capitalisation Generation for Speech Input
"... Two different systems are proposed for the task of capitalisation generation. The first system is a slightly modified speech recogniser. In this system, every word in the vocabulary is duplicated: once in a decapitalised form and again in capitalised forms. In addition, the language model is re-t ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Two different systems are proposed for the task of capitalisation generation. The first system is a slightly modified speech recogniser. In this system, every word in the vocabulary is duplicated: once in a decapitalised form and again in capitalised forms. In addition, the language model is re-trained on mixed case texts. The other system
Named entity recognition from speech and its use in the generation of enhanced speech recognition output
, 2001
"... Abstract Page 1 The work in this thesis concerns Named Entity (NE) recognition from speech and its use in the generation of enhanced speech recognition output with automatic punctuation and automatic capitalisation. A method for the automatic generation of rules is proposed for NE recognition. Punct ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract Page 1 The work in this thesis concerns Named Entity (NE) recognition from speech and its use in the generation of enhanced speech recognition output with automatic punctuation and automatic capitalisation. A method for the automatic generation of rules is proposed for NE recognition. Punctuation marks are generated using context and prosody information. Capitalisation is pro-duced based on the results of NE recognition and punctuation generation. Previous work regarding the NE task is mainly categorised by hand crafted rule-based systems and stochastic systems. By contrast, in this thesis, an automatic rule generating method, which uses the Brill rule inference approach, is proposed. The performance of the rule-based NE recog-niser is compared with that of the BBN’s commercial implementation called IdentiFinder. When only the sequences of words are available, both systems show almost equal performance as is also the case with additional information such as punctuation, capitalisation and name lists. In cases where input texts are corrupted by speech recognition errors, the performances of both systems are degraded by almost the same level. Although the rule-based approach is different

