Results 1 - 10
of
41
An overview of audio information retrieval
, 1999
"... The problem of audio information retrieval is familiar to anyone who has returned from vacation to find an answering machine full of messages. While there is not yet an “AltaVista ” for the audio data type, many workers are finding ways to automatically locate, index, and browse audio using recent ..."
Abstract
-
Cited by 112 (1 self)
- Add to MetaCart
The problem of audio information retrieval is familiar to anyone who has returned from vacation to find an answering machine full of messages. While there is not yet an “AltaVista ” for the audio data type, many workers are finding ways to automatically locate, index, and browse audio using recent advances in speech recognition and machine listening. This paper reviews the state of the art in audio information retrieval, and presents recent advances in automatic speech recognition, word spotting, speaker and music identification, and audio similarity with a view towards making audio less “opaque”. A special section addresses intelligent interfaces for navigating and browsing audio and multimedia documents, using automatically derived information to go beyond the tape recorder metaphor.
A Probabilistic Framework For Feature-Based Speech Recognition
, 1996
"... Most current speech recognizers use an observation space which is based on a temporal sequence of "frames" (e.g., Mel-cepstra). There is another class of recognizer which further processes these frames to produce a segment-based network, and represents each segment by fixed-dimensional "features." I ..."
Abstract
-
Cited by 101 (24 self)
- Add to MetaCart
Most current speech recognizers use an observation space which is based on a temporal sequence of "frames" (e.g., Mel-cepstra). There is another class of recognizer which further processes these frames to produce a segment-based network, and represents each segment by fixed-dimensional "features." In such feature-based recognizers the observation space takes the form of a temporal network of feature vectors, so that a single segmentation of an utterance will use a subset of all possible feature vectors. In this work we examine amaximuma posteriori decoding strategy for feature-based recognizers and develop a normalization criterion useful for a segmentbased Viterbi or A* search. We report experimental results for the task of phonetic recognition on the TIMIT corpus where we achieved context-independent and context-dependent (using diphones) results on the core test set of 64.1% and 69.5% respectively.
Document and Passage Retrieval Based on Hidden Markov Models
- In Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 1994
"... Introduced is a new approach to Information Retrieval developed on the bais of Hidden Markov Models (HMMs). HMMs are shown to provide a mathematically sound framework for retrieving documents--documents with predefined boundaries and also entities of information that are of arbitrary lengths and ..."
Abstract
-
Cited by 49 (2 self)
- Add to MetaCart
Introduced is a new approach to Information Retrieval developed on the bais of Hidden Markov Models (HMMs). HMMs are shown to provide a mathematically sound framework for retrieving documents--documents with predefined boundaries and also entities of information that are of arbitrary lengths and formats (passage retrieval). Our retrieval model is shown to encompass promising capabilities: First, the position of occurrences of indexing features can be used for indexing. Posi- tional information is essential, for instance, when considering phrases, negation, and the proximity of features. Second, from training collections we can derive automatically optimal weights for arbitrary features. Third, a query dependent structure can be determined for every document by segmenting the documents into passages that axe either relevant or irrelevant to the query. The theoretical analysis of our retrieval model is complemented by the results of pre]imlnaxy experiments.
Neural-Network Based Measures Of Confidence For Word Recognition
- in Proc. ICASSP
, 1997
"... This paper proposes a probabilistic framework to define and evaluate confidence measures for word recognition. We describe a novel method to combine different knowledge sources and estimate the confidence in a word hypothesis, via a neural network. We also propose a measure of the joint performance ..."
Abstract
-
Cited by 41 (4 self)
- Add to MetaCart
This paper proposes a probabilistic framework to define and evaluate confidence measures for word recognition. We describe a novel method to combine different knowledge sources and estimate the confidence in a word hypothesis, via a neural network. We also propose a measure of the joint performance of the recognition and confidence systems. The definitions and algorithms are illustrated with results on the Switchboard Corpus. 1. INTRODUCTION In the last few years, a lot of research has been devoted to the development of confidence scores associated with the outputs of automatic speech recognition (ASR) systems. These scores were used mostly to help spot keywords in spontaneous or read texts, and to provide a basis for the rejection of out-of-vocabulary words (e.g. [4-11]). Many other ASR applications could also benefit from knowing the level of confidence in correct recognition. For example, text-dependent speaker recognition systems could put more emphasis on words recognized with h...
Subword-based Approaches for Spoken Document Retrieval
, 2000
"... This thesis explores approaches to the problem of spoken document retrieval (SDR), which is the task of automatically indexing and then retrieving relevant items from a large collection of recorded speech messages in response to a user specified natural language text query. We investigate the use of ..."
Abstract
-
Cited by 40 (0 self)
- Add to MetaCart
This thesis explores approaches to the problem of spoken document retrieval (SDR), which is the task of automatically indexing and then retrieving relevant items from a large collection of recorded speech messages in response to a user specified natural language text query. We investigate the use of subword unit representations for SDR as an alternative to words generated by either keyword spotting or continuous speech recognition. Our investigation is motivated by the observation that word-based retrieval approaches face the problem of either having to know the keywords to search for a priori, or requiring a very large recognition vocabulary in order to cover the contents of growing and diverse message collections. The use of subword units in the recognizer constrains the size of the vocabulary needed to cover the language; and the use of subword units as indexing terms allows for the detection of new user-specified query terms during retrieval. Four
Gestural Interface to a Visual Computing Environment for Molecular Biologists
- in: FG ’96: 2nd International Conference on Automatic Face and Gesture Recognition, IEEE Computer Society
, 1996
"... In recent years there has been tremendous progress in 3D, immersive display and virtual reality (VR) technologies. Scientific visualization of data is one of many applications that has benefited from this progress. To fully exploit the potential of these applications in the new environment there is ..."
Abstract
-
Cited by 34 (5 self)
- Add to MetaCart
In recent years there has been tremendous progress in 3D, immersive display and virtual reality (VR) technologies. Scientific visualization of data is one of many applications that has benefited from this progress. To fully exploit the potential of these applications in the new environment there is a need for "natural" interfaces that allow the manipulation of such displays without burdensome attachments. This paper describes the use of visual hand gesture analysis enhanced with speech recognition for developing a bimodal gesture/speech interface for controlling a 3-D display. The interface augments an existing application, VMD, which is a VR visual computing environment for molecular biologists. The free hand gestures are used for manipulating the 3-D graphical display together with a set of speech commands. We concentrate on the visual gesture analysis techniques used in developing this interface. The dual modality of gesture/speech is found to greatly aid the interaction capability....
A Fast Lattice-Based Approach to Vocabulary Independent Wordspotting
- In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
, 1994
"... Practical applications of wordspotting, such as spoken message retrieval and browsing, require the ability to process large amounts of speech data at speeds many times faster than real-time. This paper presents a novel approach to this problem in which all of the stored audio material is preprocesse ..."
Abstract
-
Cited by 26 (7 self)
- Add to MetaCart
Practical applications of wordspotting, such as spoken message retrieval and browsing, require the ability to process large amounts of speech data at speeds many times faster than real-time. This paper presents a novel approach to this problem in which all of the stored audio material is preprocessed off-line to generate a phoneme lattice. At search time, putative word matches are found in this lattice using symmetric dynamic programming. The paper presents the details of the algorithms used and compares performance with a number of conventional approaches using a 20 keyword vocabulary on the DARPA Resource Management Task. The results show that the proposed method is very much faster yet performs acceptably compared to conventional systems which depend on keyword-specific training or prior knowledge of the test set vocabulary. 1. INTRODUCTION In recent years, computers have become increasingly able to manipulate non-textual data, and applications such as video and voice mail have ari...
Multimodal Interfaces
- Artificial Intelligence Review Journal, special issue
, 1994
"... In this paper, we present an overview of research in our laboratories on Multimodal Human Computer Interfaces. The goal for such interfaces is to free human computer interaction from the limitations and acceptance barriers due to rigid operating commands and keyboards as only/main I/O-device. Instea ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
In this paper, we present an overview of research in our laboratories on Multimodal Human Computer Interfaces. The goal for such interfaces is to free human computer interaction from the limitations and acceptance barriers due to rigid operating commands and keyboards as only/main I/O-device. Instead we move to involve all available human communication modalities. These human modalities include Speech, Gesture and Pointing,
Metadata for Integrating Speech Documents in a Text Retrieval System
- SIGMOD Record
, 1994
"... CH-8092 Z"urich (Switzerland) ..."
A System For Unrestricted Topic Retrieval From Radio News Broadcasts
- Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP
, 1996
"... The "topic classification" systems described in the speech literature typically partition a collection of spoken messages into a small number of pre-defined topics. As such, they are only useful if the set of message topics does not vary over time. However, the techniques of textual information retr ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
The "topic classification" systems described in the speech literature typically partition a collection of spoken messages into a small number of pre-defined topics. As such, they are only useful if the set of message topics does not vary over time. However, the techniques of textual information retrieval (IR) have long allowed for retrieval by arbitrary subject from a document collection. This paper describes experiments in unrestricted retrieval from a collection of radio news broadcasts. A hybrid message indexing strategy, with conventional word recognition and a fast lattice-based wordspotter, allows for the retrieval of news reports concerning any subject. The results show that retrieval can be carried out extremely quickly and that high accuracy is possible, even with errorful recognition output. 1. THE MESSAGE RETRIEVAL PROBLEM There is considerable interest, in the speech research community, in the automatic classification of spoken-word recordings solely by their acoustic cont...

