Results 1 - 10
of
24
An overview of audio information retrieval
, 1999
"... The problem of audio information retrieval is familiar to anyone who has returned from vacation to find an answering machine full of messages. While there is not yet an “AltaVista ” for the audio data type, many workers are finding ways to automatically locate, index, and browse audio using recent ..."
Abstract
-
Cited by 153 (1 self)
- Add to MetaCart
The problem of audio information retrieval is familiar to anyone who has returned from vacation to find an answering machine full of messages. While there is not yet an “AltaVista ” for the audio data type, many workers are finding ways to automatically locate, index, and browse audio using recent advances in speech recognition and machine listening. This paper reviews the state of the art in audio information retrieval, and presents recent advances in automatic speech recognition, word spotting, speaker and music identification, and audio similarity with a view towards making audio less “opaque”. A special section addresses intelligent interfaces for navigating and browsing audio and multimedia documents, using automatically derived information to go beyond the tape recorder metaphor.
The Application of Classical Information Retrieval Techniques to Spoken Documents
, 1995
"... Object Description General Discussion Map Reading Photographic Interpretation Cartoon Description Table 4.1: Message classes in classification experiments of Rose et al. Now, an estimate of I(C i ; w k ) can be calculated by a four--way partition of the set of test messages, depending on (a) whether ..."
Abstract
-
Cited by 46 (1 self)
- Add to MetaCart
Object Description General Discussion Map Reading Photographic Interpretation Cartoon Description Table 4.1: Message classes in classification experiments of Rose et al. Now, an estimate of I(C i ; w k ) can be calculated by a four--way partition of the set of test messages, depending on (a) whether or not a message belongs to topic class C i and (b) whether or not it contains word w k . If N is the number of messages in the test collection, R i is the number belonging to topic class C i , n k is the number of messages containing word w k and r ik is the number of messages in class C i containing word w k , then, estimating the probabilities by frequency counts, I(C i ; w k ) = log ( r ik R i ) ( n k N ) : This is actually identical to a form of retrospective term relevance weight, initially proposed in the IR literature by both Barkla [66] and Miller [67], and reviewed by Robertson and Sparck Jones in their classic paper on the subject [42]. Moreover, Rose proposed, but did no...
Discriminating Capabilities of Syllable-based Features and Approaches of Utilizing Them for Voice
- Retrieval of Speech Information in Mandarin Chinese,” IEEE Trans. on Speech and Audio Processing
, 2002
"... Abstract—With the rapidly growing use of the audio and multimedia information over the Internet, the technology for retrieving speech information using voice queries is becoming more and more important. In this paper, considering the monosyllabic structure of the Chinese language, a whole class of s ..."
Abstract
-
Cited by 31 (14 self)
- Add to MetaCart
(Show Context)
Abstract—With the rapidly growing use of the audio and multimedia information over the Internet, the technology for retrieving speech information using voice queries is becoming more and more important. In this paper, considering the monosyllabic structure of the Chinese language, a whole class of syllable-based indexing features, including overlapping segments of syllables and syllable pairs separated by a few syllables, is extensively investigated based on a Mandarin broadcast news database. The strong discriminating capabilities of such syllable-based features were verified by comparing with the word- or character-based features. Good approaches for better utilizing such capabilities, including fusion with the word- and character-level information and improved approaches to obtain better syllable-based features and query expressions, were extensively investigated. Very encouraging experimental results were obtained. Index Terms—Confidence measure, retrieval of speech information, syllable-based features, term association matrix. I.
A system for spoken query information retrieval on mobile devices
- IEEE TRANS. SPEECH AND AUDIO PROC
, 2002
"... With the proliferation of handheld devices, information access on mobile devices is a topic of growing relevance. This paper presents a system that allows the user to search for information on mobile devices using spoken natural-language queries. We explore several issues related to the creation of ..."
Abstract
-
Cited by 28 (0 self)
- Add to MetaCart
With the proliferation of handheld devices, information access on mobile devices is a topic of growing relevance. This paper presents a system that allows the user to search for information on mobile devices using spoken natural-language queries. We explore several issues related to the creation of this system, which combines state-of-the-art speech-recognition and information-retrieval technologies. This is the first work that we are aware of which evaluates spoken query based information retrieval on a commonly available and well researched text database, the Chinese news corpus used in National Institute of Standards and Technology (NIST)’s TREC-5 and TREC-6 benchmarks. To compare spoken-query retrieval performance for different relevant scenarios and recognition accuracies, the benchmark queries—read verbatim by 20 speakers—were recorded simultaneously through three channels: headset microphone, PDA microphone, and cellular phone. Our results show that for mobile devices with high-quality microphones, spoken-query retrieval based on existing technologies yields retrieval precisions that come close to that for perfect text input (mean average precision 0.459 and 0.489, respectively, on TREC-6).
A System For Unrestricted Topic Retrieval From Radio News Broadcasts
- Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP
, 1996
"... The "topic classification" systems described in the speech literature typically partition a collection of spoken messages into a small number of pre-defined topics. As such, they are only useful if the set of message topics does not vary over time. However, the techniques of textual inform ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
(Show Context)
The "topic classification" systems described in the speech literature typically partition a collection of spoken messages into a small number of pre-defined topics. As such, they are only useful if the set of message topics does not vary over time. However, the techniques of textual information retrieval (IR) have long allowed for retrieval by arbitrary subject from a document collection. This paper describes experiments in unrestricted retrieval from a collection of radio news broadcasts. A hybrid message indexing strategy, with conventional word recognition and a fast lattice-based wordspotter, allows for the retrieval of news reports concerning any subject. The results show that retrieval can be carried out extremely quickly and that high accuracy is possible, even with errorful recognition output. 1. THE MESSAGE RETRIEVAL PROBLEM There is considerable interest, in the speech research community, in the automatic classification of spoken-word recordings solely by their acoustic cont...
On interfaces for mobile information retrieval
- In Proc. 4th Int. Symp. Human Computer Interaction with Mobile Devices
, 2002
"... We consider the task of retrieving online information in mobile environments. We propose question answering as a more appropriate interface than page browsing for small displays. We assess different modalities for communicating using a mobile device with question-answering systems, focusing on speec ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
We consider the task of retrieving online information in mobile environments. We propose question answering as a more appropriate interface than page browsing for small displays. We assess different modalities for communicating using a mobile device with question-answering systems, focusing on speech. We then survey existing research in spoken information retrieval, present some new findings, and assess the feasibility of the endeavor. 1
Robust Talker-Independent Audio Document Retrieval
, 1996
"... The goal of the Video Mail Retrieval (VMR) project is to integrate state-of-the-art document retrieval methods with speech recognition to yield a robust and efficient retrieval system. The work presented here extends VMR towards an open-vocabulary, talker-independent system for retrieving spontaneou ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
The goal of the Video Mail Retrieval (VMR) project is to integrate state-of-the-art document retrieval methods with speech recognition to yield a robust and efficient retrieval system. The work presented here extends VMR towards an open-vocabulary, talker-independent system for retrieving spontaneously-spoken audio and video messages. We present results showing successful retrieval using a stan- dard large-vocabulary (LV) recogniser, despite the lack of a matched language model and vocabulary. We further show that integrating a LV recogniser with conventional word spotting (WS) gives more robust retrieval performance than either method alone. This paper gives details of the message archive used, the speech recognition methodologies, the information retrieval methods, and experimental results.
A method for openvocabulary speech-driven text retrieval
- in Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, 2002
"... While recent retrieval techniques do not limit the number of index terms, out-ofvocabulary (OOV) words are crucial in speech recognition. Aiming at retrieving information with spoken queries, we fill the gap between speech recognition and text retrieval in terms of the vocabulary size. Given a spoke ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
While recent retrieval techniques do not limit the number of index terms, out-ofvocabulary (OOV) words are crucial in speech recognition. Aiming at retrieving information with spoken queries, we fill the gap between speech recognition and text retrieval in terms of the vocabulary size. Given a spoken query, we generate a transcription and detect OOV words through speech recognition. We then correspond detected OOV words to terms indexed in a target collection to complete the transcription, and search the collection for documents relevant to the completed transcription. We show the effectiveness of our method by way of experiments. 1
A Speech-In List-Out Approach to Spoken User Interfaces
- in Proc. Human Language Technologies 2004
, 2004
"... Spoken user interfaces are conventionally either dialoguebased or menu-based. In this paper we propose a third approach, in which the task of invoking responses from the system is treated as one of retrieval from the set of all possible responses. Unlike conventional spoken user interfaces that retu ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
Spoken user interfaces are conventionally either dialoguebased or menu-based. In this paper we propose a third approach, in which the task of invoking responses from the system is treated as one of retrieval from the set of all possible responses. Unlike conventional spoken user interfaces that return a unique response to the user, the proposed interface returns a shortlist of possible responses, from which the user must make the final selection. We refer to such interfaces as Speech-In List-Out or SILO interfaces. Experiments show that SILO interfaces can be very effective, are highly robust to degraded speech recognition performance, and can impose significantly lower cognitive load on the user as compared to menu-based interfaces.