Results 1 -
3 of
3
Creating conversational interfaces for children
- IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING
, 2002
"... Creating conversational interfaces for children is challenging in several respects. These include acoustic modeling for automatic speech recognition (ASR), language and dialog modeling, and multimodal-multimedia user interface design. First, issues in ASR of children speech are introduced by an ana ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
Creating conversational interfaces for children is challenging in several respects. These include acoustic modeling for automatic speech recognition (ASR), language and dialog modeling, and multimodal-multimedia user interface design. First, issues in ASR of children speech are introduced by an analysis of developmental changes in the spectral and temporal characteristics of the speech signal using data obtained from 456 children, ages five to 18 years. Acoustic modeling adaptation and vocal tract normalization algorithms that yielded state-of-the-art ASR performance on children speech are described. Second, an experiment designed to better understand how children interact with machines using spoken language is described. Realistic conversational multimedia interaction data were obtained from 160 children who played a voice-activated computer game in a Wizard of Oz (WoZ) scenario. Results of using these data in developing novel language and dialog models as well as in a unified maximum likelihood framework for acoustic decoding in ASR and semantic classification for spoken language understanding are described. Leveraging the lessons learned from the WoZ study and a concurrent user experience evaluation, a multimedia personal agent prototype for children was designed. Details of the architecture and application details are described. Informal evaluation by children was found positive especially for the animated agent and the speech interface.
Voice-IF: A Mixed-Initiative Spoken Dialogue System for
- AT&T Conference Services”, Eurospeech ’01
"... This paper presents the Voice-IF system; a mixedinitiative spoken dialogue system for AT&T conference services. One objective for creating Voice-IF is to provide a vehicle for evaluating our technologies in speech synthesis, recognition, understanding, dialogue and user interfaces on a real applicat ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper presents the Voice-IF system; a mixedinitiative spoken dialogue system for AT&T conference services. One objective for creating Voice-IF is to provide a vehicle for evaluating our technologies in speech synthesis, recognition, understanding, dialogue and user interfaces on a real application with relatively novice users. Another objective is to design, build and test a set of tools that allow us to rapidly prototype spoken dialogue applications. In this paper, we describe the performance of Voice-IF during its 6-week deployment period. In particular, we report a) results of perceptual evaluations of the synthesized speech, b) system performance and user satisfaction ratings, c) PARADISE analysis of the data, and d) comparisons with other systems, including the W99 conference registration system used at the ASRU’99 workshop and the Travel Communicator system. 1.
WEB-BASED MONITORING, LOGGING AND REPORTING TOOLS FOR MULTI-SERVICE MULTI-MODAL SYSTEMS
"... This paper describes MILER (Multi-modal data Logger for Evaluation and Report), a web-based multi-service monitoring, logging and reporting tool for advanced multi-modal dialog systems. MILER has been designed to directly arrange and synchronize logging data collected from live services and to provi ..."
Abstract
- Add to MetaCart
This paper describes MILER (Multi-modal data Logger for Evaluation and Report), a web-based multi-service monitoring, logging and reporting tool for advanced multi-modal dialog systems. MILER has been designed to directly arrange and synchronize logging data collected from live services and to provide real-time reports about service usage and system performance. Special attention has been given to the architecture design in order to achieve service and access-device independence and reliable synchronization of data from distributed logs. MILER allows researchers to analyze multimodal interactions, analyze the call flow, reconstruct the system/user dialogue turns, play the recorded user utterances, and provide a preliminary dialogue performance evaluation. It also supports labeling and annotation of the dialogue turns for further offline analysis. Once the user inputs (i.e. speech and other input modalities) are manually transcribed and labeled, along with detailed log events from each dialog, MILER derives a set of objective measures, which includes word and concept accuracy, number of attempts per concept, dialog turn counts and duration, and task completion rates. Subjective measures extracted from user’s surveys, including perceived task success and ease of use measures, can be combined with the objective measures and the results used later for accuracy computation. 1.

