Results 1 - 10
of
17
Creating conversational interfaces for children
- IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING
, 2002
"... Creating conversational interfaces for children is challenging in several respects. These include acoustic modeling for automatic speech recognition (ASR), language and dialog modeling, and multimodal-multimedia user interface design. First, issues in ASR of children speech are introduced by an ana ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
Creating conversational interfaces for children is challenging in several respects. These include acoustic modeling for automatic speech recognition (ASR), language and dialog modeling, and multimodal-multimedia user interface design. First, issues in ASR of children speech are introduced by an analysis of developmental changes in the spectral and temporal characteristics of the speech signal using data obtained from 456 children, ages five to 18 years. Acoustic modeling adaptation and vocal tract normalization algorithms that yielded state-of-the-art ASR performance on children speech are described. Second, an experiment designed to better understand how children interact with machines using spoken language is described. Realistic conversational multimedia interaction data were obtained from 160 children who played a voice-activated computer game in a Wizard of Oz (WoZ) scenario. Results of using these data in developing novel language and dialog models as well as in a unified maximum likelihood framework for acoustic decoding in ASR and semantic classification for spoken language understanding are described. Leveraging the lessons learned from the WoZ study and a concurrent user experience evaluation, a multimedia personal agent prototype for children was designed. Details of the architecture and application details are described. Informal evaluation by children was found positive especially for the animated agent and the speech interface.
Analysis of User Behavior under Error Conditions in Spoken Dialogs
- in ICSLP2002
, 2002
"... We focus on developing an account of user behavior under error conditions, working with annotated data from real human-machine mixed initiative dialogs. In particular, we examine categories of error perception, user behavior under error, effect of user strategies on error recovery, and the role of u ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
We focus on developing an account of user behavior under error conditions, working with annotated data from real human-machine mixed initiative dialogs. In particular, we examine categories of error perception, user behavior under error, effect of user strategies on error recovery, and the role of user initiative in error situations. A conditional probability model smoothed by weighted ASR error rate is proposed. Results show that users discovering errors through implicit confirmations are less likely to get back on track (or succeed) and take a longer time in doing so than other forms of error discovery such as system reject and reprompts. Further successful user error-recovery strategies included more rephrasing, less contradicting, and a tendency to terminate error episodes (cancel and startover) than to attempt at repairing a chain of errors.
Combining Prior Knowledge and Boosting for Call Classification in Spoken Language Dialogue
, 2002
"... Data collection and annotation are major bottlenecks in rapid development of accurate syntactic and semantic models for natural-language dialogue systems. In this paper we show how human knowledge can be used when designing a language understanding system in a manner that would alleviate the depend ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Data collection and annotation are major bottlenecks in rapid development of accurate syntactic and semantic models for natural-language dialogue systems. In this paper we show how human knowledge can be used when designing a language understanding system in a manner that would alleviate the dependence on large sets of data. In particular, we extend BoosTexter, a member of the boosting family of algorithms, to combine and balance hand-crafted rules with the statistics of available data. Experiments on two voiceenabled applications for customer care and help desk are presented. 1.
Learning Context-Dependent Mappings from Sentences to Logical Form
"... We consider the problem of learning context-dependent mappings from sentences to logical form. The training examples are sequences of sentences annotated with lambda-calculus meaning representations. We develop an algorithm that maintains explicit, lambda-calculus representations of salient discours ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
We consider the problem of learning context-dependent mappings from sentences to logical form. The training examples are sequences of sentences annotated with lambda-calculus meaning representations. We develop an algorithm that maintains explicit, lambda-calculus representations of salient discourse entities and uses a context-dependent analysis pipeline to recover logical forms. The method uses a hidden-variable variant of the perception algorithm to learn a linear model used to select the best analysis. Experiments on context-dependent utterances from the ATIS corpus show that the method recovers fully correct logical forms with 83.7% accuracy. 1
Learning database content for spoken dialogue system design
- In 5th International Conference on Language Resources and Evaluation (LREC
, 2006
"... Spoken dialogue systems are common interfaces to backend data in information retrieval domains. As more data is made available on the Web and IE technology matures, dialogue systems, whether they be speech- or text-based, will be more in demand to provide user-friendly access to this data. However, ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Spoken dialogue systems are common interfaces to backend data in information retrieval domains. As more data is made available on the Web and IE technology matures, dialogue systems, whether they be speech- or text-based, will be more in demand to provide user-friendly access to this data. However, dialogue systems must become both easier to configure, as well as more informative than the traditional form-based systems that are currently available. We present techniques in this paper to address the issue of automating both content selection for use in summary responses and in system initiative queries. 1.
EVALUATING DIALOGUE STRATEGIES IN MULTIMODAL DIALOGUE SYSTEMS
"... Previous research suggests that multimodal dialogue systems providing both speech and pen input, and outputting a combination of spoken language and graphics, are more robust than unimodal systems based on speech or graphics alone (Andr´e, 2002; Oviatt, 1999). Such systems are complex to build and ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Previous research suggests that multimodal dialogue systems providing both speech and pen input, and outputting a combination of spoken language and graphics, are more robust than unimodal systems based on speech or graphics alone (Andr´e, 2002; Oviatt, 1999). Such systems are complex to build and signifi cant research and evaluation effort must typically be expended to generate well-tuned modules for each system component. This chapter describes experiments utilising two complementary evaluation methods that can expedite the design process: (1) a Wizard-of-Oz data collection and evaluation using a novel Wizard tool we developed; and (2) an Overhearer evaluation experiment utilising logged interactions with the real system. We discuss the advantages and disadvantages of both methods and summarise how these two experiments have informed our research on dialogue management and response generation for the multimodal dialogue system MATCH.
Confirmation Strategy for Document Retrieval Systems with Spoken Dialog Interface
- In Proc. ICSLP
, 2004
"... Adequate confirmation is indispensable in spoken dialog systems to eliminate misunderstandings caused by speech recognition errors. Spoken language also inherently includes redundant expressions such as disfluency and out-of-domain phrases, which do not contribute to task achievement. It is easy to ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Adequate confirmation is indispensable in spoken dialog systems to eliminate misunderstandings caused by speech recognition errors. Spoken language also inherently includes redundant expressions such as disfluency and out-of-domain phrases, which do not contribute to task achievement. It is easy to define a set of keywords to be confirmed for conventional database query tasks, but not straightforward in general document retrieval tasks. In this paper, we propose two statistical measures for identifying portions to be confirmed. A relevance score (RS) represents matching degree with the document set. A significance score (SS) detects portions that consequently affect the retrieval results. With these measures, the system can generate confirmation prior to and posterior to the retrieval, respectively. The strategy is implemented and evaluated with retrieval from software support knowledge base of 40K entries. It is shown that the proposed strategy using the two measures is more efficient than using the conventional confidence measure. 1.
Voice-IF: A Mixed-Initiative Spoken Dialogue System for
- AT&T Conference Services”, Eurospeech ’01
"... This paper presents the Voice-IF system; a mixedinitiative spoken dialogue system for AT&T conference services. One objective for creating Voice-IF is to provide a vehicle for evaluating our technologies in speech synthesis, recognition, understanding, dialogue and user interfaces on a real applicat ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper presents the Voice-IF system; a mixedinitiative spoken dialogue system for AT&T conference services. One objective for creating Voice-IF is to provide a vehicle for evaluating our technologies in speech synthesis, recognition, understanding, dialogue and user interfaces on a real application with relatively novice users. Another objective is to design, build and test a set of tools that allow us to rapidly prototype spoken dialogue applications. In this paper, we describe the performance of Voice-IF during its 6-week deployment period. In particular, we report a) results of perceptual evaluations of the synthesized speech, b) system performance and user satisfaction ratings, c) PARADISE analysis of the data, and d) comparisons with other systems, including the W99 conference registration system used at the ASRU’99 workshop and the Travel Communicator system. 1.
Speech-based information retrieval system with clarification dialogue strategy
- in Proc. Human Language Technology Conf. (HLT/EMNLP
, 2005
"... This paper addresses a dialogue strategy to clarify and constrain the queries for speech-driven document retrieval systems. In spoken dialogue interfaces, users often make utterances before the query is completely generated in their mind; thus input queries are often vague or fragmental. As a result ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper addresses a dialogue strategy to clarify and constrain the queries for speech-driven document retrieval systems. In spoken dialogue interfaces, users often make utterances before the query is completely generated in their mind; thus input queries are often vague or fragmental. As a result, usually many items are matched. We propose an efficient dialogue framework, where the system dynamically selects an optimal question based on information gain (IG), which represents reduction of matched items. A set of possible questions is prepared using various knowledge sources. As a bottom-up knowledge source, we extract a list of words that can take a number of objects and potentially causes ambiguity, using a dependency structure analysis of the document texts. This is complemented by top-down knowledge sources of metadata and handcrafted questions. An experimental evaluation showed that the method significantly improved the success rate of retrieval, and all categories of the prepared questions contributed to the improvement. 1

