• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Speech Recognition And Information Retrieval: Experiments In Retrieving Spoken Documents

by Michael J. Witbrock , Alexander G. Hauptmann
Add To MetaCart

Tools

Sorted by:
Results 1 - 4 of 4

Subword-based Approaches for Spoken Document Retrieval

by Kenney Ng , 2000
"... This thesis explores approaches to the problem of spoken document retrieval (SDR), which is the task of automatically indexing and then retrieving relevant items from a large collection of recorded speech messages in response to a user specified natural language text query. We investigate the use of ..."
Abstract - Cited by 40 (0 self) - Add to MetaCart
This thesis explores approaches to the problem of spoken document retrieval (SDR), which is the task of automatically indexing and then retrieving relevant items from a large collection of recorded speech messages in response to a user specified natural language text query. We investigate the use of subword unit representations for SDR as an alternative to words generated by either keyword spotting or continuous speech recognition. Our investigation is motivated by the observation that word-based retrieval approaches face the problem of either having to know the keywords to search for a priori, or requiring a very large recognition vocabulary in order to cover the contents of growing and diverse message collections. The use of subword units in the recognizer constrains the size of the vocabulary needed to cover the language; and the use of subword units as indexing terms allows for the detection of new user-specified query terms during retrieval. Four

Spoken Document Retrieval Based on Phoneme Recognition

by Martin Wechsler, Dipl Informatik-ing Eth, Prof P. Schauble , 1998
"... Recently, vast amounts of audio and video material containing spoken information have become available in digital format, for example radio news recordings. Along with this development there is an increased demand to retrieve spoken information in response to a user's information need. This thesis a ..."
Abstract - Cited by 10 (0 self) - Add to MetaCart
Recently, vast amounts of audio and video material containing spoken information have become available in digital format, for example radio news recordings. Along with this development there is an increased demand to retrieve spoken information in response to a user's information need. This thesis addresses the problem of spoken document retrieval (SDR). A particular goal is to perform experimental studies on documents spoken in German. The approach taken by this thesis requires a phoneme recognizer, which initially generates phoneme sequences from the spoken documents. The main issues of phoneme-recognition--based SDR are (1) missing word boundaries in the phoneme sequences and (2) the high number of phoneme recognition errors, requiring an error-tolerant method to detect query words. We present Probabilistic String Matching (PSM); a new retrieval method where query words are spotted in document phoneme sequences that are corrupted by recognition errors. This method includes the detec...

Combining LVCSR and Vocabulary-Independent Ranked Utterance Retrieval for Robust Speech Search ABSTRACT

by J. Scott Olsson
"... (LVCSR) has been shown to generally be more effective than vocabulary-independent techniques for ranked retrieval of spoken content when one or the other approach is used alone. Tuning LVCSR systems to a topic domain can be costly, however, and the experiments in this paper show that Out-Of-Vocabula ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
(LVCSR) has been shown to generally be more effective than vocabulary-independent techniques for ranked retrieval of spoken content when one or the other approach is used alone. Tuning LVCSR systems to a topic domain can be costly, however, and the experiments in this paper show that Out-Of-Vocabulary (OOV) query terms can significantly reduce retrieval effectiveness when that tuning is not performed. Further experiments demonstrate, however, that retrieval effectiveness for queries with OOV terms can be substantially improved by combining evidence from LVCSR with additional evidence from vocabulary-independent Ranked Utterance Retrieval (RUR). The combination is performed by using relevance judgments from held-out topics to learn generic (i.e., topic-independent), smooth, non-decreasing transformations from LVCSR and RUR system scores to probabilities of topical relevance. Evaluated using a CLEF collection that includes topics, spontaneous conversational speech audio, and relevance judgments, the system recovers 57 % of the mean uninterpolated average precision that could have been obtained through LVCSR domain tuning for very short queries (or 41 % for longer queries).

Semantic Multi-modal Analysis, Structuring, and Visualization for Candid Personal Interaction Videos

by Alexander Haubold
"... Videos are rich in multimedia content and semantics, which should be used by video browsers to better present the audio-visual information to the viewer. Ubiquitous video players allow for content to be scanned linearly, rarely providing summaries or methods for searching. Through analysis of audio ..."
Abstract - Add to MetaCart
Videos are rich in multimedia content and semantics, which should be used by video browsers to better present the audio-visual information to the viewer. Ubiquitous video players allow for content to be scanned linearly, rarely providing summaries or methods for searching. Through analysis of audio and video tracks, it is possible to extract text transcripts from audio, displayed text from video, and higher-level semantics through speaker identification and scene analysis. External data sources, when available, can be used to cross-reference the video content and impose a structure for organization. Various research tools have addressed video summarization and browsing using one or more of these modalities; however, most of them assume edited videos as input. We focus our research on genres in personal interaction videos and collections of such videos in their unedited form. We present and verify formal models for their structure, and develop methods for their automatic analysis, summarization and indexing. We specify the characteristic semantic components of three related genres of candidly captured videos: formal instructions or lectures, student team project presentations, and discussions. For each genre, we design and
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University