Results 1 - 10
of
13
An audio-based personal memory aid
- PROCEEDINGS OF UBICOMP 2004: UBIQUITOUS COMPUTING
, 2004
"... We are developing a wearable device that attempts to alleviate some everyday memory problems. The “memory prosthesis” records audio and contextual information from conversations and provides a suite of retrieval tools (on both the wearable and a personal computer) to help users access forgotten memo ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
We are developing a wearable device that attempts to alleviate some everyday memory problems. The “memory prosthesis” records audio and contextual information from conversations and provides a suite of retrieval tools (on both the wearable and a personal computer) to help users access forgotten memories in a timely fashion. This paper describes the wearable device, the personal-computer-based retrieval tool, and their supporting technologies. Anecdotal observations based on real-world use and quantitative results based on a controlled memory-retrieval task are reported. Finally, some social, legal, and design challenges of ubiquitous recording and remembering via a personal audio archive are discussed.
Searching in audio: the utility of transcripts, dichotic presentation, and time-compression
- ACM CHI Conference on Human Factors in Computing Systems
, 2006
"... Searching audio data can potentially be facilitated by the use of automatic speech recognition (ASR) technology to generate text transcripts which can then be easily queried. However, since current ASR technology cannot reliably generate 100 % accurate transcripts, additional techniques for fluid br ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Searching audio data can potentially be facilitated by the use of automatic speech recognition (ASR) technology to generate text transcripts which can then be easily queried. However, since current ASR technology cannot reliably generate 100 % accurate transcripts, additional techniques for fluid browsing and searching of the audio itself are required. We explore the impact of transcripts of various qualities, dichotic presentation, and time-compression on an audio search task. Results show that dichotic presentation and reasonably accurate transcripts can assist in the search process, but suggest that time-compression and low accuracy transcripts should be used carefully. Author Keywords Dichotic listening, transcripts, audio time-compression.
Second Messenger: Increasing the Visibility of Minority Viewpoints with a Face-to-face Collaboration Tool
, 2004
"... This paper introduces the application Second Messenger, a tool for supporting face-to-face meetings and discussions. Second Messenger uses a speech-recognition engine as an input method and outputs filtered keywords from the group's conversation onto an interactive display. The goal of this interfac ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
This paper introduces the application Second Messenger, a tool for supporting face-to-face meetings and discussions. Second Messenger uses a speech-recognition engine as an input method and outputs filtered keywords from the group's conversation onto an interactive display. The goal of this interface is to improve the quality of a group discussion by increasing the visibility of diverse viewpoints.
Vote and Be Heard: Adding Back-Channel Cues to Social Mirrors,” to be published
- in Proc. 12th IFIP Int’l Conf. HumanComputer Interaction (Interact 09), Int’l Federation for Information Processing
, 2009
"... In face-to-face group situations, social pressure and organizational hierarchy relegate the less outspoken into silence, often resulting in fewer voices, fewer ideas, and group-think. However, in mediated interaction, such as email, it has been shown that more people join in the discussion. With thi ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
In face-to-face group situations, social pressure and organizational hierarchy relegate the less outspoken into silence, often resulting in fewer voices, fewer ideas, and group-think. However, in mediated interaction, such as email, it has been shown that more people join in the discussion. With this work, we aim to combine the benefits of mediated communication with the benefits and affordances of face-to-face interaction by adding a mediated back-channel. We describe Conversation Votes, a tabletop system that augments verbal conversation with a shared anonymous back-channel to highlight agreement. We then discuss a study of our design for groups engaged in repeated discussion. Our results show that anonymous visual back-channels provide a medium for the underrepresented voices of a conversation and balances interaction among all participants.
Conversation Clusters: Grouping Conversation Topics through Human-Computer Dialog
- Proc. 27th Int’l Conf. Human Factors in Computing Systems (CHI 09), ACM
, 2009
"... Conversation Clusters explores the use of visualization to highlight salient moments of live conversation while archiving a meeting. Cheaper storage and easy access to recording devices allows extensive archival. However, as the size of the archive grows, retrieving the desired moments becomes incre ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Conversation Clusters explores the use of visualization to highlight salient moments of live conversation while archiving a meeting. Cheaper storage and easy access to recording devices allows extensive archival. However, as the size of the archive grows, retrieving the desired moments becomes increasingly difficult. We approach this problem from a socio-technical perspective and utilize human intuition aided by computer memory. We present computationally detected topics of conversation as visual summaries of discussion and as reference points into the archive. To further bootstrap the system, humans can participate in a dialog with the visualization of the clustering process and shape the development of clustering models.
On the benefits of confidence visualization in speech recognition
- Proc. CHI 2008, ACM Press
, 2008
"... In a typical speech dictation interface, the recognizer’s bestguess is displayed as normal, unannotated text. This ignores potentially useful information about the recognizer’s confidence in its recognition hypothesis. Using a confidence measure (which itself may sometimes be inaccurate), we investi ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
In a typical speech dictation interface, the recognizer’s bestguess is displayed as normal, unannotated text. This ignores potentially useful information about the recognizer’s confidence in its recognition hypothesis. Using a confidence measure (which itself may sometimes be inaccurate), we investigated providing visual feedback about low-confidence portions of the recognition using shaded, red underlining. An evaluation showed, compared to a baseline without underlining, underlining lowconfidence areas did not increase user’s speed or accuracy in detecting errors. However, we found that when recognition errors were correctly underlined, they were discovered significantly more often than baseline. Conversely, when errors failed to be underlined, they were discovered less often. Our results indicate confidence visualization can be effective – but only if the confidence measure has high accuracy. Further, since our results show that users tend to trust confidence visualization, designers should be careful in its application if a high accuracy confidence measure is not available. Author Keywords Speech recognition, visualization, recognition interfaces.
Vote and Be Heard: Adding Back-Channel Signals to Social Mirrors
"... Abstract. In face-to-face group situations, social pressure and organizational hierarchy relegate the less outspoken to silence, often resulting in fewer voices, fewer ideas, and groupthink. However, in mediated interaction like email, more people join in the discussion to offer their opinion. With ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. In face-to-face group situations, social pressure and organizational hierarchy relegate the less outspoken to silence, often resulting in fewer voices, fewer ideas, and groupthink. However, in mediated interaction like email, more people join in the discussion to offer their opinion. With this work, we aim to combine the benefits of mediated communication with the benefits and affordances of face-to-face interaction by adding a mediated back-channel. We describe Conversation Votes, a tabletop system that augments verbal conversation with a shared anonymous back-channel to highlight agreement. We then discuss a study of our design with groups engaged in repeated discussion. Our results show that anonymous visual back-channels provide a medium for the underrepresented voices of a conversation and balances interaction among all participants.
Audio surrogation for digital video: A design framework
, 2006
"... Video content becomes increasingly important in WWW applications as the emerging global cyber infrastructure develops. Improvements in hardware (e.g., fast CPUs, graphics chips, inexpensive mass storage, inexpensive video cameras, cell phones), software (e.g., video editors, video player extensions ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Video content becomes increasingly important in WWW applications as the emerging global cyber infrastructure develops. Improvements in hardware (e.g., fast CPUs, graphics chips, inexpensive mass storage, inexpensive video cameras, cell phones), software (e.g., video editors, video player extensions to web browsers and other general-purpose applications, web development environments), and networking
General Terms
"... This paper introduces the application Second Messenger, a tool for supporting face-to-face meetings and discussions. Second Messenger uses a speech-recognition engine as an input method and outputs filtered keywords from the group’s conversation onto an interactive display. The goal of this interfac ..."
Abstract
- Add to MetaCart
This paper introduces the application Second Messenger, a tool for supporting face-to-face meetings and discussions. Second Messenger uses a speech-recognition engine as an input method and outputs filtered keywords from the group’s conversation onto an interactive display. The goal of this interface is to improve the quality of a group discussion by increasing the visibility of diverse viewpoints. Categories and Subject Descriptors H.5.3 [Information Interfaces and Presentation]: Group and Organization Interfaces-- computer-supported cooperative work,
Accessing Speech Data Using Startegic Fixation
, 2006
"... When users access information from text, they engage in strategic fixation, visually scanning the text to focus on regions of interest. However, because speech is both serial and ephemeral, it does not readily support strategic fixation. This paper describes two design principles, indexing and trans ..."
Abstract
- Add to MetaCart
When users access information from text, they engage in strategic fixation, visually scanning the text to focus on regions of interest. However, because speech is both serial and ephemeral, it does not readily support strategic fixation. This paper describes two design principles, indexing and transcript-centric access that address the problem of speech access by supporting strategic fixation. Indexing involves users constructing external visual indices into speech. Users visually scan these indices to find information-rich regions of speech for more detailed processing and playback. Transcription involves transcribing speech using automatic speech recognition (ASR) and enriching that transcription with visual cues. The resulting enriched transcript is time-aligned to the original speech, allowing users to scan the transcript as a whole or the additional visual cues present in the transcript, to fixate and play regions of interest. We tested the effectiveness of these two approaches on a set of reference tasks derived from observations of current voicemail practice. A field trial evaluation of JotMail, an indexed-based interface similar to commercial unified messaging clients, showed that our approaches were effective in supporting speech scanning, information extraction and status tracking, but not archive management. However, users found it onerous to take manual notes with JotMail to provide effective retrieval indices. We therefore built SCANMail, a transcript-based interface that constructs indices automatically, using ASR to generate a transcript of the speech data. SCANMail also uses information extraction techniques to identify regions of

