Results 1 -
6 of
6
An overview of audio information retrieval
, 1999
"... The problem of audio information retrieval is familiar to anyone who has returned from vacation to find an answering machine full of messages. While there is not yet an “AltaVista ” for the audio data type, many workers are finding ways to automatically locate, index, and browse audio using recent ..."
Abstract
-
Cited by 112 (1 self)
- Add to MetaCart
The problem of audio information retrieval is familiar to anyone who has returned from vacation to find an answering machine full of messages. While there is not yet an “AltaVista ” for the audio data type, many workers are finding ways to automatically locate, index, and browse audio using recent advances in speech recognition and machine listening. This paper reviews the state of the art in audio information retrieval, and presents recent advances in automatic speech recognition, word spotting, speaker and music identification, and audio similarity with a view towards making audio less “opaque”. A special section addresses intelligent interfaces for navigating and browsing audio and multimedia documents, using automatically derived information to go beyond the tape recorder metaphor.
Robust Talker-Independent Audio Document Retrieval
, 1996
"... The goal of the Video Mail Retrieval (VMR) project is to integrate state-of-the-art document retrieval methods with speech recognition to yield a robust and efficient retrieval system. The work presented here extends VMR towards an open-vocabulary, talker-independent system for retrieving spontaneou ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
The goal of the Video Mail Retrieval (VMR) project is to integrate state-of-the-art document retrieval methods with speech recognition to yield a robust and efficient retrieval system. The work presented here extends VMR towards an open-vocabulary, talker-independent system for retrieving spontaneously-spoken audio and video messages. We present results showing successful retrieval using a stan- dard large-vocabulary (LV) recogniser, despite the lack of a matched language model and vocabulary. We further show that integrating a LV recogniser with conventional word spotting (WS) gives more robust retrieval performance than either method alone. This paper gives details of the message archive used, the speech recognition methodologies, the information retrieval methods, and experimental results.
Hub4 language modeling using domain interpolation and data clustering
- In Proceedings DARPA Speech Recognition Workshop
, 1997
"... In SRI’s language modeling experiments for the Hub4 domain, three basic approaches were pursued: interpolating multiple models estimated from Hub4 and non-Hub4 training data, adapting the language model (LM) to the focus conditions, and adapting the LM to different topic types. In the first approach ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
In SRI’s language modeling experiments for the Hub4 domain, three basic approaches were pursued: interpolating multiple models estimated from Hub4 and non-Hub4 training data, adapting the language model (LM) to the focus conditions, and adapting the LM to different topic types. In the first approach, we built separate LMs for the closely transcribed Hub4 material (acoustic training transcripts) and the loosely transcribed Hub4 material (LM training data), as well as the North-American Business News (NABN) and Switchboard training data, projected onto the Hub4 vocabulary. By interpolating the probabilities obtained from these models, we obtained a 20 % reduction in perplexity and a 1.8 % reduction in word error rate, compared to a baseline Hub4-only language model. Two adaptation approaches are also described: adapting language models to the speech styles correlated with different focus conditions, and building cluster-specific LM mixtures. These two approaches give some reduction in perplexity, but no significant reduction in word error. Finally, we identify the problems and future directions of our work. 1.
Automatic language identification using large vocabulary continuous speech recognition
- In Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing
, 1996
"... We have developed a highly accurate automatic lan-guage identification system based on large vocabulary continuous speech recognition (LVCSR). Each test utterance is recognized in a number of languages, and the language ID decision is based on the probability of the output word sequence reported by ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We have developed a highly accurate automatic lan-guage identification system based on large vocabulary continuous speech recognition (LVCSR). Each test utterance is recognized in a number of languages, and the language ID decision is based on the probability of the output word sequence reported by each recognizer. Recognizers were implemented for this test in English, Japanese, and Spanish, using the Ricardo corpus of telephone monologues. When tested on the OGI corpus of digitally recorded telephone speech, we obtained error rates of 3 % or lower on 2-way and 3-way closed-set classification of ten-second and one-minute speech segments. 1.
Language identification via large vocabulary speaker independent continuous speech recognition
- In Proceedings of ARPA Human Language Technology
, 1994
"... The goal of this study is to evaluate the potential for using large vocabulary continuous speech recognition as an engine for automatically classifying utterances according to the language being spoken. The problem of language identification is often thought of as being separate from the problem of ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The goal of this study is to evaluate the potential for using large vocabulary continuous speech recognition as an engine for automatically classifying utterances according to the language being spoken. The problem of language identification is often thought of as being separate from the problem of speech recognition. But in this paper, as in Dragon's earlier work on topic and speaker identification, we explore a unifying approach to all three message classification problems based on the underlying stochastic process which gives rise to speech. We discuss the theoretical framework upon which our message classification systems are built and report on a series of experiments in which this theory is tested, using large vocabulary continuous speech recognition to distinguish English from Spanish. 1.
Hub4 Language Modeling Using Domain Interpolation and Data Clustering
, 1997
"... In SRI's language modeling experiments for the Hub4 domain, three basic approaches were pursued: interpolating multiple models estimated from Hub4 and non-Hub4 training data, adapting the language model (LM) to the focus conditions, and adapting the LM to different topic types. In the first approac ..."
Abstract
- Add to MetaCart
In SRI's language modeling experiments for the Hub4 domain, three basic approaches were pursued: interpolating multiple models estimated from Hub4 and non-Hub4 training data, adapting the language model (LM) to the focus conditions, and adapting the LM to different topic types. In the first approach, we built separate LMs for the closely transcribed Hub4 material (acoustic training transcripts) and the loosely transcribed Hub4 material (LM training data), as well as the NorthAmerican Business News (NABN) and Switchboard training data, projected onto the Hub4 vocabulary. By interpolating the probabilities obtained from these models, we obtained a 20% reduction in perplexity and a 1.8% reduction in word error rate, compared to a baseline Hub4-only language model. Two adaptation approaches are also described: adapting language models to the speech styles correlated with different focus conditions, and building cluster-specific LM mixtures. These two approaches give some reduction in per...

