Results 1 -
6 of
6
Scanmail: a voicemail interface that makes speech browsable, readable and searchable
- in Proceedings of CHI2002 Conference on Human Computer Interaction
, 2002
"... Increasing amounts of public, corporate, and private speech data are now available on-line. These are limited in their usefulness, however, by the lack of tools to permit their browsing and search. The goal of our research is to provide tools to overcome the inherent difficulties of speech access, b ..."
Abstract
-
Cited by 55 (14 self)
- Add to MetaCart
(Show Context)
Increasing amounts of public, corporate, and private speech data are now available on-line. These are limited in their usefulness, however, by the lack of tools to permit their browsing and search. The goal of our research is to provide tools to overcome the inherent difficulties of speech access, by supporting visual scanning, search, and information extraction. We describe a novel principle for the design of UIs to speech data: What You See Is Almost What You Hear (WYSIAWYH). In WYSIAWYH, automatic speech recognition (ASR) generates a transcript of the speech data. The transcript is then used as a visual analogue to that underlying data. A graphical user interface allows users to visually scan, read, annotate and search these transcripts. Users can also use the transcript to access and play specific regions of the underlying message. We first summarize previous studies of voicemail usage that motivated the WYSIAWYH principle, and describe a voicemail UI, SCANMail, that embodies WYSIAWYH. We report on a laboratory experiment and a two-month field trial evaluation. SCANMail outperformed a state of the art voicemail system on core voicemail tasks. This was attributable to SCANMail’s support for visual scanning, search and information extraction. While the ASR transcripts contain errors, they nevertheless improve the efficiency of voicemail processing. Transcripts either provide enough information for users to extract key points or to navigate to important regions of the underlying speech, which they can then play directly.
Jotmail: a voicemail interface that enables you to see what was said
- In Proceedings of CHI2000 Conference on Human Computer Interaction
, 2000
"... stevew/julia/urs @ research.att.com Voicemail is a pervasive, but under-researched tool for workplace communication. Despite potential advantages of voicemail over email, current phone-based voicemail UIs are highly problematic for users. We present a novel, Web-based, voicemail interface, Jotmail. ..."
Abstract
-
Cited by 26 (14 self)
- Add to MetaCart
(Show Context)
stevew/julia/urs @ research.att.com Voicemail is a pervasive, but under-researched tool for workplace communication. Despite potential advantages of voicemail over email, current phone-based voicemail UIs are highly problematic for users. We present a novel, Web-based, voicemail interface, Jotmail. The design was based on data from several studies of voicemail tasks and user strategies. The GUI has two main elements: (a) personal annotations that serve as a visual analogue to underlying speech; (b) automatically derived message header information. We evaluated Jotmail in an 8-week field trial, where people used it as their only means for accessing voicemail. Jotmail was successful in supporting most key voicemail tasks, although users ' electronic annotation and archiving behaviors were different from our initial predictions. Our results argue for the utility of a combination of annotation based indexing and automatically derived information, as a general technique for accessing speech archives.
Error Correction of Voicemail Transcripts in SCANMail
"... Despite its widespread use, voicemail presents numerous usability challenges: People must listen to messages in their entirety, they cannot search by keywords, and audio files do not naturally support visual skimming. SCANMail overcomes these flaws by automatically generating text transcripts of voi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Despite its widespread use, voicemail presents numerous usability challenges: People must listen to messages in their entirety, they cannot search by keywords, and audio files do not naturally support visual skimming. SCANMail overcomes these flaws by automatically generating text transcripts of voicemail messages and presenting them in an email-like interface. Transcripts facilitate quick browsing and permanent archive. However, errors from the automatic speech recognition (ASR) hinder the usefulness of the transcripts. The work presented here specifically addresses these problems by evaluating user-initiated error correction of transcripts. User studies of two editor interfaces—a grammar-assisted menu and simple replacement by typing—reveal reduced audio playback times and an emphasis on editing important words with the menu, suggesting its value in mobile environments where limited input capabilities are the norm and user privacy is essential. The study also adds to the scarce body of work on ASR confidence shading, suggesting that shading may be more helpful than previously reported. Author Keywords Voicemail, error correction, speech recognition, editor
Accessing Speech Data Using Startegic Fixation
, 2006
"... When users access information from text, they engage in strategic fixation, visually scanning the text to focus on regions of interest. However, because speech is both serial and ephemeral, it does not readily support strategic fixation. This paper describes two design principles, indexing and trans ..."
Abstract
- Add to MetaCart
When users access information from text, they engage in strategic fixation, visually scanning the text to focus on regions of interest. However, because speech is both serial and ephemeral, it does not readily support strategic fixation. This paper describes two design principles, indexing and transcript-centric access that address the problem of speech access by supporting strategic fixation. Indexing involves users constructing external visual indices into speech. Users visually scan these indices to find information-rich regions of speech for more detailed processing and playback. Transcription involves transcribing speech using automatic speech recognition (ASR) and enriching that transcription with visual cues. The resulting enriched transcript is time-aligned to the original speech, allowing users to scan the transcript as a whole or the additional visual cues present in the transcript, to fixate and play regions of interest. We tested the effectiveness of these two approaches on a set of reference tasks derived from observations of current voicemail practice. A field trial evaluation of JotMail, an indexed-based interface similar to commercial unified messaging clients, showed that our approaches were effective in supporting speech scanning, information extraction and status tracking, but not archive management. However, users found it onerous to take manual notes with JotMail to provide effective retrieval indices. We therefore built SCANMail, a transcript-based interface that constructs indices automatically, using ASR to generate a transcript of the speech data. SCANMail also uses information extraction techniques to identify regions of
unknown title
"... Editing speech data is currently time-consuming and errorprone. Speech editors rely on acoustic waveform representations, which force users to repeatedly sample the underlying speech to identify words and phrases to edit. Instead we developed a semantic editor that reduces the need for extensive sam ..."
Abstract
- Add to MetaCart
(Show Context)
Editing speech data is currently time-consuming and errorprone. Speech editors rely on acoustic waveform representations, which force users to repeatedly sample the underlying speech to identify words and phrases to edit. Instead we developed a semantic editor that reduces the need for extensive sampling by providing access to meaning. The editor shows a time-aligned errorful transcript produced by applying automatic speech recognition (ASR) to the original speech. Users visually scan the words in the transcript to identify important phrases. They then edit the transcript directly using standard word processing ‘cut and paste ’ operations, which extract the corresponding time-aligned speech. ASR errors mean that users must supplement what they read in the transcript by accessing the original speech. Even when there are transcript errors, however, the semantic representation still provides users with enough information to target what they edit and play, reducing the need for extensive sampling. A laboratory evaluation showed that semantic editing is more efficient than acoustic editing even when ASR is highly inaccurate.
SEE PROFILE
, 2002
"... SCANMail: A voicemail interface that makes speech browsable, readable and searchable ..."
Abstract
- Add to MetaCart
(Show Context)
SCANMail: A voicemail interface that makes speech browsable, readable and searchable