Results 1 - 10
of
29
I’ll Get That Off the Audio”: A case study of salvaging multimedia meeting records
, 1997
"... We describe a case study of a complex, ongoing, collaborative work process, where the central activity is a series of meetings reviewing a wide range of subtle technical topics. The problem is the accurate reporting of the results of these meetings, which is the responsibility of a single person, wh ..."
Abstract
-
Cited by 59 (4 self)
- Add to MetaCart
We describe a case study of a complex, ongoing, collaborative work process, where the central activity is a series of meetings reviewing a wide range of subtle technical topics. The problem is the accurate reporting of the results of these meetings, which is the responsibility of a single person, who is not well-versed in all the topics. We provided tools to capture the meeting discussions and tools to “salvage ” the captured multimedia recordings. Salvaging is a new kind of activity involving replaying, extracting, organizing, and writing. We observed a year of mature salvaging work in the case study. From this we describe the nature of salvage work (the constituent activities, the use of the workspace, the affordances of the audio medium, how practices develop and differentiate, how the content material affects practice). We also demonstrate how this work relates to the larger work processes (the task demands of the setting, the interplay of salvage with capture, the influence on the people being reported on and reported to). Salvaging tools are shown to be valuable for dealing with free-flowing discussions of complex subject matter and for producing high quality documentation.
The Audio Notebook - Paper and Pen Interaction with Structured Speech
, 2001
"... This paper addresses the problem that a listener experiences when attempting to capture information presented during a lecture, meeting, or interview. Listeners must divide their attention between the talker and their notetaking activity. We propose a new device -- the Audio Notebook -- for taking n ..."
Abstract
-
Cited by 59 (2 self)
- Add to MetaCart
This paper addresses the problem that a listener experiences when attempting to capture information presented during a lecture, meeting, or interview. Listeners must divide their attention between the talker and their notetaking activity. We propose a new device -- the Audio Notebook -- for taking notes and interacting with a speech recording. The Audio Notebook is a combination of a digital audio recorder and paper notebook, all in one device. Audio recordings are structured using two techniques: user structuring based on notetaking activity, and acoustic structuring based on a talker's changes in pitch, pausing, and energy. A field study showed that the interaction techniques enabled a range of usage styles, from detailed review to high speed skimming. The study motivated the addition of phrase detection and topic suggestions to improve access to the audio recordings. Through these audio interaction techniques, the Audio Notebook defines a new approach for navigation in the audio domain.
Scanmail: a voicemail interface that makes speech browsable, readable and searchable
- in Proceedings of CHI2002 Conference on Human Computer Interaction
, 2002
"... Increasing amounts of public, corporate, and private speech data are now available on-line. These are limited in their usefulness, however, by the lack of tools to permit their browsing and search. The goal of our research is to provide tools to overcome the inherent difficulties of speech access, b ..."
Abstract
-
Cited by 38 (10 self)
- Add to MetaCart
Increasing amounts of public, corporate, and private speech data are now available on-line. These are limited in their usefulness, however, by the lack of tools to permit their browsing and search. The goal of our research is to provide tools to overcome the inherent difficulties of speech access, by supporting visual scanning, search, and information extraction. We describe a novel principle for the design of UIs to speech data: What You See Is Almost What You Hear (WYSIAWYH). In WYSIAWYH, automatic speech recognition (ASR) generates a transcript of the speech data. The transcript is then used as a visual analogue to that underlying data. A graphical user interface allows users to visually scan, read, annotate and search these transcripts. Users can also use the transcript to access and play specific regions of the underlying message. We first summarize previous studies of voicemail usage that motivated the WYSIAWYH principle, and describe a voicemail UI, SCANMail, that embodies WYSIAWYH. We report on a laboratory experiment and a two-month field trial evaluation. SCANMail outperformed a state of the art voicemail system on core voicemail tasks. This was attributable to SCANMail’s support for visual scanning, search and information extraction. While the ASR transcripts contain errors, they nevertheless improve the efficiency of voicemail processing. Transcripts either provide enough information for users to extract key points or to navigate to important regions of the underlying speech, which they can then play directly.
SCAN: Designing and evaluating user interfaces to support retrieval from speech archives
- IN PROCEEDINGS OF THE 22ND ACM-SIGIR INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL
, 1999
"... Previous examinations of search in textual archives have assumed that users first retrieve a ranked set of documents relevant to their query, and then visually scan through these documents, to identify the information they seek. While document scanning is possible in text, it is much more laborious ..."
Abstract
-
Cited by 36 (7 self)
- Add to MetaCart
Previous examinations of search in textual archives have assumed that users first retrieve a ranked set of documents relevant to their query, and then visually scan through these documents, to identify the information they seek. While document scanning is possible in text, it is much more laborious in speech archives, due to the inherently serial nature of speech. Yet, in developing tools for speech access, little attention has so far been paid to users' problems in scanning and extracting information from within "speech documents". We demonstrate the extent of these problems in two user studies. We show that users experience severe problems with local navigation in extracting relevant information from within "speech documents". Based on these results, we propose a new user interface (UI) design paradigm: What You See Is (Almost) What You Hear, (WYSIAWYH) - a multimodal method for accessing speech archives. This paradigm presents a visual analogue to the underlying speech, enabling vi...
Using NLP or NLP Resources for Information Retrieval Tasks
- Natural Language Information Retrieval
, 1997
"... The imact of NLP on information retrieval tasks has largely been one of promise rather than substance. While there are exceptions to this as some of the chapters in the present volume demonstrate, for the most part NLP and information retrieval have only recently started to dovetail together. In thi ..."
Abstract
-
Cited by 32 (1 self)
- Add to MetaCart
The imact of NLP on information retrieval tasks has largely been one of promise rather than substance. While there are exceptions to this as some of the chapters in the present volume demonstrate, for the most part NLP and information retrieval have only recently started to dovetail together. In this chapter we will present a pr'ecis of our experiments in information retrieval using NLP which have had mixed successover the last few years. We introduce the respective roles of NLP and IR and then we summarise our early experiments on using syntactic analysis to derive term dependencies and structured representations of term-term relationships. We then re-thought the role that NLP could have for IR tasks and decided to concentrate our efforts onto using NLP resources rather than NLP tools in information retrieval and our more recent experiments in this area in which we use WordNet are summarised. Finally we present our conclusions and the status of our work. 1 2. Introduction The develo...
Jotmail: a voicemail interface that enables you to see what was said
- In Proceedings of CHI2000 Conference on Human Computer Interaction
, 2000
"... stevew/julia/urs @ research.att.com Voicemail is a pervasive, but under-researched tool for workplace communication. Despite potential advantages of voicemail over email, current phone-based voicemail UIs are highly problematic for users. We present a novel, Web-based, voicemail interface, Jotmail. ..."
Abstract
-
Cited by 21 (13 self)
- Add to MetaCart
stevew/julia/urs @ research.att.com Voicemail is a pervasive, but under-researched tool for workplace communication. Despite potential advantages of voicemail over email, current phone-based voicemail UIs are highly problematic for users. We present a novel, Web-based, voicemail interface, Jotmail. The design was based on data from several studies of voicemail tasks and user strategies. The GUI has two main elements: (a) personal annotations that serve as a visual analogue to underlying speech; (b) automatically derived message header information. We evaluated Jotmail in an 8-week field trial, where people used it as their only means for accessing voicemail. Jotmail was successful in supporting most key voicemail tasks, although users ' electronic annotation and archiving behaviors were different from our initial predictions. Our results argue for the utility of a combination of annotation based indexing and automatically derived information, as a general technique for accessing speech archives.
Accessing Multimedia through Concept Clustering
, 1997
"... Multimedia information retrieval is a challenging problem because multimedia information is not inherently structured. Jabber is an experimental system that attempts to bring some structure to this task. Jabber allows users to retrieve records of videoconferences based upon the concepts discussed. ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
Multimedia information retrieval is a challenging problem because multimedia information is not inherently structured. Jabber is an experimental system that attempts to bring some structure to this task. Jabber allows users to retrieve records of videoconferences based upon the concepts discussed. In this paper we introduce ConceptFinder, a sub-system within Jabber, and show how it is able to process the spoken text of a meeting into meeting topics. ConceptFinder can make subtle distinctions among different senses of the same words, and is able to summarize a set of related words, giving a name to each topic. Users can then use this name to query or browse the stored multimedia, through Jabber's user interface. By presenting information that closely matches a user's expectations, the challenge of multimedia retrieval is rendered more tractable. Keywords Multimedia indexing, information retrieval and browsing, concept clustering INTRODUCTION Multimedia holds tremendous appeal becau...
Accessing multimodal meeting data: Systems, problems and possibilities
- Lecture Notes in Computer Science, Machine Learning for Multimodal Interaction
, 2004
"... Abstract. As the amount of multimodal meetings data being recorded increases, so does the need for sophisticated mechanisms for accessing this data. This process is complicated by the different informational needs of users, as well as the range of data collected from meetings. This paper examines th ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
Abstract. As the amount of multimodal meetings data being recorded increases, so does the need for sophisticated mechanisms for accessing this data. This process is complicated by the different informational needs of users, as well as the range of data collected from meetings. This paper examines the current state of the art in meeting browsers. We examine both systems specifically designed for browsing multimodal meetings data and those designed to browse data collected from different environments, for example broadcast news and lectures. As a result of this analysis, we highlight potential directions for future research- semantic access, filtered presentation, limited display environments, browser evaluation and user requirements capture. 1
Look or listen: Discovering effective techniques for accessing speech data
- in Proc. of the Human-Computer Interaction Conference. 2003
, 2003
"... Commercial interfaces for accessing digital speech data are based on ‘tape recorder ’ metaphors. However, such interfaces make it highly laborious to access complex speech data. The absence of effective interfaces is a major obstacle to exploiting the increasing number of speech archives now availab ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Commercial interfaces for accessing digital speech data are based on ‘tape recorder ’ metaphors. However, such interfaces make it highly laborious to access complex speech data. The absence of effective interfaces is a major obstacle to exploiting the increasing number of speech archives now available online. More novel research interfaces provide potentially more effective access by presenting visual or textual indices into the underlying speech data. The current experimental study evaluates the utility of these newer techniques compared with a ‘tape recorder ’ interface. We compare: (a) High-level Visual Overviews showing the distribution and density of user query terms; (b) Textual Transcripts generated using state of the art ASR; (c) a tape recorder baseline. Laboratory tests showed that, contrary to our expectations, high-level visual information proved more useful than textual information, although both perform better than a tape-recorder baseline. Visual overviews enable users to quickly identify relevant regions to be played. In contrast, Textual transcripts can mislead users who try to extract detailed information solely by reading the transcript, without listening to the underlying speech.
Speech-Based Information Retrieval for Digital Libraries
- In Notes from AAAI Spring Symposium on Cross-Language Text and Speech Retrieval
, 1997
"... Libraries and archives collect recorded speech and multimedia objects that contain recorded speech, and such material may comprise a substantial portion of the collection in future digital libraries. Presently, access to most of this material is provided using a combination of manually annotated met ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Libraries and archives collect recorded speech and multimedia objects that contain recorded speech, and such material may comprise a substantial portion of the collection in future digital libraries. Presently, access to most of this material is provided using a combination of manually annotated metadata and linear search. Recent advances in speech processing technology have produced a number of techniques for extracting features from recorded speech that could provide a useful basis for the retrieval of speech ormultimedia objects in large digital library collections. Among these features are the semantic content of the speech, the identity of the speaker, and the language in which the speech was spoken. We propose to develop a graphical and auditory user interface for speech-based information retrieval that exploits these features to facilitate selection of recorded speech and multimedia information objects that include recorded speech. We plan to use that interface to evaluate the e ectiveness and usability of alternative ways of exploiting those features and as a testbed for the evaluation of advanced retrieval techniques such as cross-language speech retrieval.

