Results 1 -
8 of
8
The Effect of Speech Recognition Accuracy Rates on the Usefulness and Usability of Webcast Archives
, 2006
"... The widespread availability of broadband connections has led to an increase in the use of Internet broadcasting (webcasting). Most webcasts are archived and accessed numerous times retrospectively. In the absence of transcripts of what was said, users have difficulty searching and scanning for speci ..."
Abstract
-
Cited by 15 (7 self)
- Add to MetaCart
The widespread availability of broadband connections has led to an increase in the use of Internet broadcasting (webcasting). Most webcasts are archived and accessed numerous times retrospectively. In the absence of transcripts of what was said, users have difficulty searching and scanning for specific topics. This research investigates user needs for transcription accuracy in webcast archives, and measures how the quality of transcripts affects user performance in a question-answering task, and how quality affects overall user experience. We tested 48 subjects in a within-subjects design under 4 conditions: perfect transcripts, transcripts with 25 % Word Error Rate (WER), transcripts with 45 % WER, and no transcript. Our data reveals that speech recognition accuracy linearly influences both user performance and experience, shows that transcripts with 45 % WER are unsatisfactory, and suggests that transcripts having a WER of 25 % or less would be useful and usable in webcast archives.
Software or Wetware? Discovering When and Why People Use Digital Prosthetic Memory
"... Our lives are full of memorable and important moments, as well as important items of information. The last few years have seen the proliferation of digital devices intended to support prosthetic memory (PM), to help users recall experiences, conversations and retrieve personal information. We nevert ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
Our lives are full of memorable and important moments, as well as important items of information. The last few years have seen the proliferation of digital devices intended to support prosthetic memory (PM), to help users recall experiences, conversations and retrieve personal information. We nevertheless have little systematic understanding of when and why people might use such devices, in preference to their own organic memory (OM). Although OM is fallible, it may be more efficient than accessing information from a complex PM device. We report a controlled lab study which investigates when and why people use PM and OM. We found that PM use depended on users ’ evaluation of the quality of their OM, as well as PM device properties. In particular, we found that users trade-off Accuracy and Efficiency, preferring rapid access to potentially inaccurate information over laborious access to accurate information. We discuss the implications of these results for future PM design and theory. Rather than replacing OM, future PM designs need to focus on allowing OM and PM to work in synergy.
Searching in audio: the utility of transcripts, dichotic presentation, and time-compression
- ACM CHI Conference on Human Factors in Computing Systems
, 2006
"... Searching audio data can potentially be facilitated by the use of automatic speech recognition (ASR) technology to generate text transcripts which can then be easily queried. However, since current ASR technology cannot reliably generate 100 % accurate transcripts, additional techniques for fluid br ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Searching audio data can potentially be facilitated by the use of automatic speech recognition (ASR) technology to generate text transcripts which can then be easily queried. However, since current ASR technology cannot reliably generate 100 % accurate transcripts, additional techniques for fluid browsing and searching of the audio itself are required. We explore the impact of transcripts of various qualities, dichotic presentation, and time-compression on an audio search task. Results show that dichotic presentation and reasonably accurate transcripts can assist in the search process, but suggest that time-compression and low accuracy transcripts should be used carefully. Author Keywords Dichotic listening, transcripts, audio time-compression.
Wiki-like editing of imperfect computer-generated webcast transcripts
- in Proc. Demo track of ACM Conf. on Computer Supported Cooperative Work – CSCW
"... As the use of Internet broadcasting (webcasting) increases, more webcasts will be archived and accessed numerous times retrospectively. One challenge in skimming and browsing through such archives is the lack of textual transcripts of the archived medias ’ audio channel. Ideally, transcripts would b ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
As the use of Internet broadcasting (webcasting) increases, more webcasts will be archived and accessed numerous times retrospectively. One challenge in skimming and browsing through such archives is the lack of textual transcripts of the archived medias ’ audio channel. Ideally, transcripts would be obtainable through Automatic Speech Recognition (ASR). However, current ASR systems can only deliver, in realistic conditions, Word Error Rates (WERs) of around 45 % – unsatisfactory, as shown in our recent study [1], which revealed that transcripts are useful and usable in webcast archives for WERs equal to or less than 25%. We therefore propose an extension to the ePresence webcast system that engages users to collaborate in a wiki manner on editing the imperfect transcripts obtained through ASR. 1.
Measuring the Acceptable Word Error Rate of Machine-Generated Webcast Transcripts
, 2006
"... The increased availability of broadband connections has recently led to an increase in the use of Internet broadcasting (webcasting). Most webcasts are archived and accessed numerous times retrospectively. One of the hurdles users face when browsing and skimming through archives is the lack of text ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The increased availability of broadband connections has recently led to an increase in the use of Internet broadcasting (webcasting). Most webcasts are archived and accessed numerous times retrospectively. One of the hurdles users face when browsing and skimming through archives is the lack of text transcripts of the audio channel of the webcast archive. In this paper, we proposed a procedure for prototyping an Automatic Speech Recognition (ASR) system that generates realistic transcripts of any desired Word Error Rate (WER), thus overcoming the drawbacks of both prototypebased and Wizard of Oz simulations. We used such a system in a study where human subjects perform question-answering tasks using archives of webcast lectures, and showed that their performance and perception of transcript quality is linearly affected by WER, and that transcripts of WER equal or less than 25 % would be acceptable for use in webcast archives.
Does Taking Notes Help You Remember Better? Exploring How Note Taking Relates to Memory
"... People are aware of the fact that their memories are fallible and as a result they spend significant amounts of time preparing for subsequent memory challenges, e.g. by taking notes about information they think they will later have to remember. There has been extensive research into note taking and ..."
Abstract
- Add to MetaCart
People are aware of the fact that their memories are fallible and as a result they spend significant amounts of time preparing for subsequent memory challenges, e.g. by taking notes about information they think they will later have to remember. There has been extensive research into note taking and whether it is effective as a memory aid, but most of this has concerned pen and paper rather than digital notes. We conducted an experiment investigating the relationship between note-taking behaviors (whether digital or paper based) and subsequent recall. We gave people two systems: a note-taking device called ChittyChatty (CC) that combines digital notes with an audio record – Fig 1; and conventional Pen & Paper (PP) – Fig 2. We observed the note taking patterns that occurred in digital CC notes and paper based PP notes. We then examined whether the quality and quantity of
Accessing Speech Data Using Startegic Fixation
, 2006
"... When users access information from text, they engage in strategic fixation, visually scanning the text to focus on regions of interest. However, because speech is both serial and ephemeral, it does not readily support strategic fixation. This paper describes two design principles, indexing and trans ..."
Abstract
- Add to MetaCart
When users access information from text, they engage in strategic fixation, visually scanning the text to focus on regions of interest. However, because speech is both serial and ephemeral, it does not readily support strategic fixation. This paper describes two design principles, indexing and transcript-centric access that address the problem of speech access by supporting strategic fixation. Indexing involves users constructing external visual indices into speech. Users visually scan these indices to find information-rich regions of speech for more detailed processing and playback. Transcription involves transcribing speech using automatic speech recognition (ASR) and enriching that transcription with visual cues. The resulting enriched transcript is time-aligned to the original speech, allowing users to scan the transcript as a whole or the additional visual cues present in the transcript, to fixate and play regions of interest. We tested the effectiveness of these two approaches on a set of reference tasks derived from observations of current voicemail practice. A field trial evaluation of JotMail, an indexed-based interface similar to commercial unified messaging clients, showed that our approaches were effective in supporting speech scanning, information extraction and status tracking, but not archive management. However, users found it onerous to take manual notes with JotMail to provide effective retrieval indices. We therefore built SCANMail, a transcript-based interface that constructs indices automatically, using ASR to generate a transcript of the speech data. SCANMail also uses information extraction techniques to identify regions of
Usable speech recognition: toward improved access to webcast lectures
, 2008
"... A growing number of lecture webcasts are archived after being delivered live. In the absence of transcripts, users are faced with increased difficulty in performing tasks easily achieved with text documents (retrieval, browsing, skimming). Unfortunately, speech recognition systems do not perform sat ..."
Abstract
- Add to MetaCart
A growing number of lecture webcasts are archived after being delivered live. In the absence of transcripts, users are faced with increased difficulty in performing tasks easily achieved with text documents (retrieval, browsing, skimming). Unfortunately, speech recognition systems do not perform satisfactorily when transcribing lectures. In this paper, we present an overview of the ePresence lecture transcription project, whose goal is to improve the usefulness and usability of automaticallygenerated transcripts of webcast lectures. We achieve this by integrating novel speech recognition techniques specifically addressed at increasing the accuracy of webcast transcriptions with the development of an interactive collaborative interface that facilitates users' contribution to the improvement of machine-generated transcripts. We conclude by discussing the challenges (and possible solutions) to successfully integrate transcripts into archives of webcast lectures.

