Results 1 - 10
of
61
Advances in automatic meeting record creation and access
- in Proc. IEEE ICASSP
, 2001
"... Oral communication is transient but many important decisions, so-cial contracts and fact 'ndings are 'rst canied out in an oral setup, documented in written form and later retrieved. At Carnegie Mel-lons University s Interactive Systems Laboratories we have been experimenting with the documentation ..."
Abstract
-
Cited by 52 (6 self)
- Add to MetaCart
Oral communication is transient but many important decisions, so-cial contracts and fact 'ndings are 'rst canied out in an oral setup, documented in written form and later retrieved. At Carnegie Mel-lons University s Interactive Systems Laboratories we have been experimenting with the documentation of meetings. Ths paper summarizes part of the progress that we have made in this test bed, speci'cally on the question of automatic transcription us-ing LVCSR, information access using non-keyword based meth-ods, summarization and user interfaces. The system is capable to automatically construct a searchable and browsable audiovisual database of meetings and provide access to these records. 1.
SCANMail: Browsing and Searching Speech Data by Content
, 2001
"... Increasing amounts of public, corporate, and private audio data are available for use, but limited in usefulness by the lack of tools to permit their browsing and search. In this paper, we describe SCANMail, a system that employs automatic speech recognition, information retrieval, information extra ..."
Abstract
-
Cited by 24 (5 self)
- Add to MetaCart
Increasing amounts of public, corporate, and private audio data are available for use, but limited in usefulness by the lack of tools to permit their browsing and search. In this paper, we describe SCANMail, a system that employs automatic speech recognition, information retrieval, information extraction, and human computer interaction technology to permit users to browse and search their voicemail messages by content through a graphical user interface interface. The SCANMail client also provides note-taking capabilities as well as browsing and querying features. A CallerId server also proposes caller names from existing caller acoustic models and is trained from user feedback. An Email server sends the original message plus its transcription to a mailing address specified in the user's profile.
Automatic Recognition of Spontaneous Speech for Access to Multilingual Oral History Archives
- IEEE Transactions on Speech and Audio Processing
, 2004
"... Abstract—Much is known about the design of automated systems to search broadcast news, but it has only recently become possible to apply similar techniques to large collections of spontaneous speech. This paper presents initial results from experiments with speech recognition, topic segmentation, to ..."
Abstract
-
Cited by 20 (6 self)
- Add to MetaCart
Abstract—Much is known about the design of automated systems to search broadcast news, but it has only recently become possible to apply similar techniques to large collections of spontaneous speech. This paper presents initial results from experiments with speech recognition, topic segmentation, topic categorization, and named entity detection using a large collection of recorded oral histories. The work leverages a massive manual annotation effort on 10 000 h of spontaneous speech to evaluate the degree to which automatic speech recognition (ASR)-based segmentation and categorization techniques can be adapted to approximate decisions made by human annotators. ASR word error rates near 40 % were achieved for both English and Czech for heavily accented, emotional and elderly spontaneous speech based on 65–84 h of transcribed speech. Topical segmentation based on shifts in the recognized English vocabulary resulted in 80 % agreement with
Improving Speech Playback using Time-Compression and Speech Recognition
- ACM CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS
, 2004
"... Despite the ready availability of digital recording technology and the continually decreasing cost of digital storage, browsing audio recordings remains a tedious task. This paper presents evidence in support of a system designed to assist with information comprehension and retrieval tasks from a la ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
Despite the ready availability of digital recording technology and the continually decreasing cost of digital storage, browsing audio recordings remains a tedious task. This paper presents evidence in support of a system designed to assist with information comprehension and retrieval tasks from a large collection of recorded speech. Two techniques are employed to assist users with these tasks. First, a speech recognizer creates necessarily error-laden transcripts of the recorded speech. Second, audio playback is time-compressed using the SOLAFS technique. When used together, subjects are able to perform comprehension tasks with more speed and accuracy.
From Text Summarisation to Style-Specific Summarisation for Broadcast News
, 2004
"... In this paper we report on a series of experiments investigating the path from text summarisation to style-specific summarisation of spoken news stories. We show that the portability of traditional text summarisation features to broadcast news is dependent on the diffusiveness of the information in ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
In this paper we report on a series of experiments investigating the path from text summarisation to style-specific summarisation of spoken news stories. We show that the portability of traditional text summarisation features to broadcast news is dependent on the diffusiveness of the information in the broadcast news story. An analysis of two categories of news stories (containing only read speech or including some spontaneous speech) demonstrates the importance of the style and the quality of the transcript, when extracting the summary-worthy information content. Further experiments indicate the advantages of doing style-specific summarisation of broadcast news.
Mandarin-English Information (MEI): Investigating Translingual Speech Retrieval
- In First International Conference on Human Language Technologies
, 2000
"... We describe a system which supports English text queries searching for Mandarin Chinese spoken documents. ..."
Abstract
-
Cited by 14 (10 self)
- Add to MetaCart
We describe a system which supports English text queries searching for Mandarin Chinese spoken documents.
An audio-based personal memory aid
- PROCEEDINGS OF UBICOMP 2004: UBIQUITOUS COMPUTING
, 2004
"... We are developing a wearable device that attempts to alleviate some everyday memory problems. The “memory prosthesis” records audio and contextual information from conversations and provides a suite of retrieval tools (on both the wearable and a personal computer) to help users access forgotten memo ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
We are developing a wearable device that attempts to alleviate some everyday memory problems. The “memory prosthesis” records audio and contextual information from conversations and provides a suite of retrieval tools (on both the wearable and a personal computer) to help users access forgotten memories in a timely fashion. This paper describes the wearable device, the personal-computer-based retrieval tool, and their supporting technologies. Anecdotal observations based on real-world use and quantitative results based on a controlled memory-retrieval task are reported. Finally, some social, legal, and design challenges of ubiquitous recording and remembering via a personal audio archive are discussed.
Perspectives on Information Retrieval and Speech
- SIGIR Workshop: Information Retrieval Techniques for Speech Applications 2001
, 2002
"... Several years of research have suggested that the accuracy of spoken document retrieval systems is not adversely a#ected by speech recognition errors. Even with error rates of around 40%, the e#ectiveness of an IR system falls less than 10%. The paper hypothesizes that this robust behavior is th ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Several years of research have suggested that the accuracy of spoken document retrieval systems is not adversely a#ected by speech recognition errors. Even with error rates of around 40%, the e#ectiveness of an IR system falls less than 10%. The paper hypothesizes that this robust behavior is the result of repetition of important words in the text---meaning that losing one or two occurrences is not crippling--- and the result of additional related words providing a greater context--- meaning that those words will match even if the seemingly critical word is misrecognized. This hypothesis is supported by examples from TREC's SDR track, the TDT evaluation, and some work showing the impact of recognition errors on spoken queries. 1 IR and ASR Information Retrieval (IR) research encompasses algorithms that process large amounts of unstructured or semi-structured information, though most work has been done with human-generated text. Search engines (such as those on the Web) ar...
A Qualitative Study Towards Using Large Vocabulary Automatic Speech Recognition to Index Recorded Presentations for Search and Access over the Web
- In Proceedings of IADIS WWW/Internet 2002 Conference
, 2002
"... Recording lectures and putting them on the Web for access by students has become a general trend at various universities. To take full gain of the knowledge data base that is built by these documents elaborate search functionality has to be provided that goes beyond search on meta-data level but per ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Recording lectures and putting them on the Web for access by students has become a general trend at various universities. To take full gain of the knowledge data base that is built by these documents elaborate search functionality has to be provided that goes beyond search on meta-data level but performs a detailed analysis of the corresponding multimedia documents. In this paper, we present some experiments we did towards setting up a Web-based search engine for audio recordings of presentations. We evaluate standard, state-of-the-art speech recognition software as well as achievable retrieval performance. In addition, we compare the speech retrieval results with a traditional, text-based approach for searching to evaluate the value of speech processing for lecture retrieval.
Segmenting Conversations by Topic, Initiative and Style
, 2001
"... Topical segmentation is a basic tool for information access to audio records of meetings and other types of speech documents which may be fairly long and contain multiple topics. Standard segmentation algorithms are typically based on keywords, pitch contours or pauses. This work demonstrates that s ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Topical segmentation is a basic tool for information access to audio records of meetings and other types of speech documents which may be fairly long and contain multiple topics. Standard segmentation algorithms are typically based on keywords, pitch contours or pauses. This work demonstrates that speaker initiative and style may be used as segmentation criteria as well. A probabilistic segmentation procedure is presented which allows the integration and modeling of these features in a clean framework with good results.

