Results 1 - 10
of
25
SpeechSkimmer: A System for Interactively Skimming Recorded Speech
- ACM Transactions on Computer Human Interaction
, 1997
"... Note that the text that appeared in printed journal contains very minor typographic and grammatical corrections that do not appear in this version. SpeechSkimmer: ..."
Abstract
-
Cited by 85 (1 self)
- Add to MetaCart
Note that the text that appeared in printed journal contains very minor typographic and grammatical corrections that do not appear in this version. SpeechSkimmer:
SpeechSkimmer: Interactively Skimming Recorded Speech
, 1993
"... Skimming or browsing audio recordings is much more difficult than visually scanning a document because of the temporal nature of audio. By exploiting properties of spontaneous speech it is possible to automatically select and present salient audio segments in a time-efficient manner. Techniques for ..."
Abstract
-
Cited by 67 (2 self)
- Add to MetaCart
Skimming or browsing audio recordings is much more difficult than visually scanning a document because of the temporal nature of audio. By exploiting properties of spontaneous speech it is possible to automatically select and present salient audio segments in a time-efficient manner. Techniques for segmenting recordings and a prototype user interface for skimming speech are described. The system developed incorporates time-compressed speech and pause removal to reduce the time needed to listen to speech recordings. This paper presents a multi-level approach to auditory skimming, along with user interface techniques for interacting with the audio and providing feedback. Several time compression algorithms and an adaptive speech detection technique are also summarized. KEYWORDS Speech skimming, browsing, speech user interfaces, interactive listening, time compression, speech detection, speech as data, non-speech audio. INTRODUCTION This paper describes SpeechSkimmer, a user interface...
Subword-based Approaches for Spoken Document Retrieval
, 2000
"... This thesis explores approaches to the problem of spoken document retrieval (SDR), which is the task of automatically indexing and then retrieving relevant items from a large collection of recorded speech messages in response to a user specified natural language text query. We investigate the use of ..."
Abstract
-
Cited by 40 (0 self)
- Add to MetaCart
This thesis explores approaches to the problem of spoken document retrieval (SDR), which is the task of automatically indexing and then retrieving relevant items from a large collection of recorded speech messages in response to a user specified natural language text query. We investigate the use of subword unit representations for SDR as an alternative to words generated by either keyword spotting or continuous speech recognition. Our investigation is motivated by the observation that word-based retrieval approaches face the problem of either having to know the keywords to search for a priori, or requiring a very large recognition vocabulary in order to cover the contents of growing and diverse message collections. The use of subword units in the recognizer constrains the size of the vocabulary needed to cover the language; and the use of subword units as indexing terms allows for the detection of new user-specified query terms during retrieval. Four
Video Mail Retrieval: The Effect of Word Spotting Accuracy on Precision
- Proceedings of ICASSP 95
, 1995
"... The goal of the Video Mail Retrieval project is to integrate state-of-the-art document retrieval methods with high accuracy word spotting to yield a robust and efficient retrieval system. This paper describes a preliminary study to determine the extent to which retrieval precision is affected by wor ..."
Abstract
-
Cited by 26 (7 self)
- Add to MetaCart
The goal of the Video Mail Retrieval project is to integrate state-of-the-art document retrieval methods with high accuracy word spotting to yield a robust and efficient retrieval system. This paper describes a preliminary study to determine the extent to which retrieval precision is affected by word spotting performance. It includes a description of the database design, the word spotting algorithm, and the information retrieval method used. Results are presented which show audio retrieval performance very close to that of text.
A Fast Lattice-Based Approach to Vocabulary Independent Wordspotting
- In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
, 1994
"... Practical applications of wordspotting, such as spoken message retrieval and browsing, require the ability to process large amounts of speech data at speeds many times faster than real-time. This paper presents a novel approach to this problem in which all of the stored audio material is preprocesse ..."
Abstract
-
Cited by 26 (7 self)
- Add to MetaCart
Practical applications of wordspotting, such as spoken message retrieval and browsing, require the ability to process large amounts of speech data at speeds many times faster than real-time. This paper presents a novel approach to this problem in which all of the stored audio material is preprocessed off-line to generate a phoneme lattice. At search time, putative word matches are found in this lattice using symmetric dynamic programming. The paper presents the details of the algorithms used and compares performance with a number of conventional approaches using a 20 keyword vocabulary on the DARPA Resource Management Task. The results show that the proposed method is very much faster yet performs acceptably compared to conventional systems which depend on keyword-specific training or prior knowledge of the test set vocabulary. 1. INTRODUCTION In recent years, computers have become increasingly able to manipulate non-textual data, and applications such as video and voice mail have ari...
Metadata for Integrating Speech Documents in a Text Retrieval System
- SIGMOD Record
, 1994
"... CH-8092 Z"urich (Switzerland) ..."
A System For Unrestricted Topic Retrieval From Radio News Broadcasts
- Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP
, 1996
"... The "topic classification" systems described in the speech literature typically partition a collection of spoken messages into a small number of pre-defined topics. As such, they are only useful if the set of message topics does not vary over time. However, the techniques of textual information retr ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
The "topic classification" systems described in the speech literature typically partition a collection of spoken messages into a small number of pre-defined topics. As such, they are only useful if the set of message topics does not vary over time. However, the techniques of textual information retrieval (IR) have long allowed for retrieval by arbitrary subject from a document collection. This paper describes experiments in unrestricted retrieval from a collection of radio news broadcasts. A hybrid message indexing strategy, with conventional word recognition and a fast lattice-based wordspotter, allows for the retrieval of news reports concerning any subject. The results show that retrieval can be carried out extremely quickly and that high accuracy is possible, even with errorful recognition output. 1. THE MESSAGE RETRIEVAL PROBLEM There is considerable interest, in the speech research community, in the automatic classification of spoken-word recordings solely by their acoustic cont...
Speech-based retrieval using semantic co-occurrence filtering
- In Proc. ARPA Human Language Technology Workshop, Plainsboro, NJ
, 1994
"... In this paper we demonstrate that speech recognition can be effectively applied to information retrieval (IR) applications. Our system exploits the fact that the intended words of a spo-ken query tend to co-occur in text documents in close proxim-ity whereas word combinations that are the result of ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
In this paper we demonstrate that speech recognition can be effectively applied to information retrieval (IR) applications. Our system exploits the fact that the intended words of a spo-ken query tend to co-occur in text documents in close proxim-ity whereas word combinations that are the result of recogni-tion errors are usually not semantically correlated and thus do not appear together. Termed "Semantic Co-occurrence Filtering " this enables the system to simultaneously disam-biguate word hypotheses and find relevant text for retrieval. The system is built by integrating standard IR and speech recognition techniques. An evaluation of the system is pre-seated and we discuss several refinements to the functionality. 1.
Speaker Identification Based Text To Audio Alignment For An Audio Retrieval System
, 1997
"... We report on an audio retrieval system which lets Internet users efficiently access a large audio database containing recordings of the proceedings of the United States House of Representatives. The audio has been temporally aligned to text transcripts of the proceedings (which are manually generate ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
We report on an audio retrieval system which lets Internet users efficiently access a large audio database containing recordings of the proceedings of the United States House of Representatives. The audio has been temporally aligned to text transcripts of the proceedings (which are manually generated by the U.S. Government) using a novel method based on speaker identification. Speaker sequence and approximate timing information is extracted from the text transcript and used to constrain a Viterbi alignment of speaker models to the observed audio. Speakers are modeled by computing Gaussian statistics of cepstral coefficients extracted from samples of each person's speech. The speaker identification is used to locate speaker transition points in the audio which are then linked to corresponding speaker transitions in the text transcript. The alignment system has been successfully integrated into a World Wide Web based search and browse system as an experimental service on the Internet.
Cross-Language Speech Retrieval: Establishing a Baseline Performance
- In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 1997
"... We present here the realisation of a cross-language speech retrieval system which retrieves German speech documents in response to user queries specified as French text. This has been achieved through the integration of two existing modules of the SPIDER information retrieval system, namely the quer ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
We present here the realisation of a cross-language speech retrieval system which retrieves German speech documents in response to user queries specified as French text. This has been achieved through the integration of two existing modules of the SPIDER information retrieval system, namely the query pseudo-translation module and the speech retrieval module. Our approach to cross-language retrieval uses an automatically constructed corpus-based information structure called a similarity thesaurus. A similarity thesaurus can be constructed over any loosely comparable corpus - a parallel corpus is not necessary. The similarity thesaurus used here was constructed over a 330 MByte corpus of comparable German and French news stories. Our speech retrieval module is based on a speaker-independent phoneme recognizer and it indexes speech documents by N-grams of phonemic features. The speech retrieval module includes an additional probabilistic matching technique designed to aid retrieval from e...

