Results 1 - 10
of
25
Galaxy-II: A Reference Architecture For Conversational System Development
- in Proc. ICSLP
, 1998
"... GALAXY is a client-server architecture for accessing on-line information using spoken dialogue that we introduced at ICSLP94. It has served as the testbed for developing human language technologies for our group for several years. Recently, we have initiated a significant redesign of the GALAXY arch ..."
Abstract
-
Cited by 108 (14 self)
- Add to MetaCart
GALAXY is a client-server architecture for accessing on-line information using spoken dialogue that we introduced at ICSLP94. It has served as the testbed for developing human language technologies for our group for several years. Recently, we have initiated a significant redesign of the GALAXY architecture to make it easier for many researchers to develop their own applications, using either exclusively their own servers or intermixing them with servers developed by others. This redesign was done in part due to the fact that GALAXY has been designated as the first reference architecture for the new DARPA Communicator Program. The purpose of this paper is to document the changes to GALAXY that led to this first reference architecture, which makes use of a scripting language for flow control to provide flexible interaction among the servers, and a set of libraries to support rapid prototyping of new servers. We describe the new reference architecture in some detail, and report on the cu...
Conversational Interfaces: Advances and Challenges
, 2000
"... The last decade has witnessed the emergence of a new breed of human computer interfaces that combines several human language technologies to enable information access and transactional processing using spoken dialogue. In this paper, I discuss my view on the research issues involved in the developme ..."
Abstract
-
Cited by 61 (4 self)
- Add to MetaCart
The last decade has witnessed the emergence of a new breed of human computer interfaces that combines several human language technologies to enable information access and transactional processing using spoken dialogue. In this paper, I discuss my view on the research issues involved in the development of such interfaces, describe the recent work done in this area at the MIT Laboratory for Computer Science, and outline some of the unmet research challenges, including the need to work in real domains, spoken language generation, and portability across domains and languages.
The Cu Communicator System
, 1999
"... The CU Communicator system is our initial testbed for research leading to advanced dialog systems enabling robust and graceful human computer interaction. It is a DARPA hub compliant system for the DARPA Communicator task, and was demonstrated at the DARPA workshop in June 1999. Robustness and porta ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
The CU Communicator system is our initial testbed for research leading to advanced dialog systems enabling robust and graceful human computer interaction. It is a DARPA hub compliant system for the DARPA Communicator task, and was demonstrated at the DARPA workshop in June 1999. Robustness and portability of spoken dialog systems are two of the issues we attempt to address in the project. We use robust parsing and dialog control strategies to be as flexible as possible to user variance. In order to make the systems easier to develop, we have adopted a largely declarative representation where the bulk of the domain specific information is provided in external files. 1. INTRODUCTION In April 1999, the University of Colorado speech group began development of the CU Communicator system, a Hubcompliant implementation of the DARPA Communicator task[1][2].The system combines continuous speech recognition, natural language understanding and flexible dialog control to enable natural conversat...
A Flexible, Scalable Finite-State Transducer Architecture For Corpus-Based Concatenative Speech Synthesis
, 2000
"... In this paper we describe our work involving the conversion of our phonologically-based synthesizer into a finite-state transducer (FST) representation which can be used for real-time natural-sounding synthesis. We have designed a transducer structure to efficiently perform the common task of unit s ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
In this paper we describe our work involving the conversion of our phonologically-based synthesizer into a finite-state transducer (FST) representation which can be used for real-time natural-sounding synthesis. We have designed a transducer structure to efficiently perform the common task of unit selection in concatenative speech synthesis. By encapsulating domainindependent concatenative synthesis costs into a constraint kernel, we have obtained a topology that scales linearly with the size of the synthesis corpus. The FST representation provides a flexible, unified framework in which we can leverage our previous work in speech recognition in areas such as pronunciation modelling and search. The FST synthesizer has been incorporated into two servers which operate within our conversational system architecture to convert meaning representations into waveforms. We have had preliminary success with the new FST-based synthesis in several constrained spoken dialogue applications. 1. INTRO...
Response planning and generation in the MERCURY flight reservation system
, 2002
"... This paper describes the response planning and generation cneration of the MERCURY flight reservation system, a mixed-initiative spoken dialogue system that supports bothvoicP5z)u interac)u1 and multimodalinterac; ;u augmenting spoken inputs with typing orcuP;5P) at a displayed Web page. ME ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
This paper describes the response planning and generation cneration of the MERCURY flight reservation system, a mixed-initiative spoken dialogue system that supports bothvoicP5z)u interac)u1 and multimodalinterac; ;u augmenting spoken inputs with typing orcuP;5P) at a displayed Web page. MERCURY iscu;P0;0u using the Galaxy CommunicB2u arcunicB2u (Sene#, Hurley, Lau,Sc,u;N & Zue, 1998), where a suite of servers interac via program cogram mediated by ac:)B0P hub. Language generation is performed in two steps: response planning, ordeep-struc))u generation, iscu;B)P out by the dialogue manager, and is well-integrated with otheraspec; of dialogue calogue calogu flow isspecz5B by a dialogue control table (Sene# & Polifroni, 2000a). Response generation, or surfacu1;BN generation, isexec552 by a separate language generation server, under theguidanc of a set of recBNN)u generation rules and anassoc5u1; lexic (Baptist & Sene#, 2000). The generation of the textual string for thegraphic: interfac and the marked-up synthesis string for spoken outputs arecuP;P25u1 by a shared set of generation rules (Sene# & Polifroni, 2000b). Thus there is adirec meaning-to-speec mapping that eliminates the need to analyzelinguistic strucist for synthesis. To date, we havecveuP;N) over 25 000 utteranc1 from usersinterac5:u with the MERCURY system. We report here on both the results of usersatisfacu1: studies cudiesuP by the National Institute of Standards andTec::)2u1 (NIST), and on our own tabulation of a number of di#erent measures of dialogue success.
Galaxy-II as an Architecture for Spoken Dialogue Evaluation
- Proceedings of Second International Conference on Language Resources and Evaluation
, 2000
"... The GALAXY-II architecture, comprised of a centralized hub mediating the interaction among a suite of human language technology servers, provides both a useful tool for implementing systems and also a streamlined way of configuring the evaluation of these systems. In this paper, we discuss our ongoi ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
The GALAXY-II architecture, comprised of a centralized hub mediating the interaction among a suite of human language technology servers, provides both a useful tool for implementing systems and also a streamlined way of configuring the evaluation of these systems. In this paper, we discuss our ongoing efforts in evaluation of spoken dialogue systems, with particular attention to the way in which the architecture facilitates the development of a variety of evaluation configurations. We furthermore propose two new metrics for automatic evaluation of the discourse and dialogue components of a spoken dialogue system, which we call "user frustration" and "information bit rate." 1. Introduction Through our experience over the last decade in designing spoken dialogue systems, we have come to realize that an essential element in being able to rapidly configure new systems is to allow as many aspects of the system design as possible to be specifiable without modifying source code. To this end...
Joint Prosody Prediction And Unit Selection For Concatenative Speech Synthesis
, 2001
"... In this paper we describe how prosody prediction can be efficiently integrated with the unit selection process in a concatenative speech synthesizer under a weighted finite-state transducer (WFST) architecture. WFSTs representing prosody prediction and unit selection can be composed during synthesis ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
In this paper we describe how prosody prediction can be efficiently integrated with the unit selection process in a concatenative speech synthesizer under a weighted finite-state transducer (WFST) architecture. WFSTs representing prosody prediction and unit selection can be composed during synthesis, thus effectively expanding the space of possible prosodic targets. We implemented a symbolic prosody prediction module and a unit selection database as the synthesis components of a travel planning system. Results of perceptual experiments show that by combining the steps of prosody prediction and unit selection we are able to achieve improved naturalness of synthetic speech compared to the sequential implementation. 1. INTRODUCTION The growing popularity of speech-enabled computer interfaces demands high quality speech output, particularly for telephone applications. The perceived quality of standard general purpose text-tospeech (TTS) systems is not good enough, which forces applicatio...
Prosody modeling in concept-to-speech generation
, 2002
"... With the development of speech recognition and synthesis technology, speech interfaces for practical applications are in high demand. For applications like spoken dialogues systems, where not only the waveform but also the content of a system’s query/response have to be generated automatically, a Co ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
With the development of speech recognition and synthesis technology, speech interfaces for practical applications are in high demand. For applications like spoken dialogues systems, where not only the waveform but also the content of a system’s query/response have to be generated automatically, a Concept-to-Speech system is needed. One key module in a Concept-to-Speech system is prosody modeling. It determines how prosody (intonation), the suprasegmental aspect of speech that communicates the structure and meaning of utterances, should be represented and generated automatically. Since prosody directly affected by the meaning and structure of the sentences automatically produced by a natural language generator; at the same time, it also has significant influence on the naturalness and effectiveness of the speech synthesized, its performance is critical to the success of a Conceptto-Speech system where both natural language generation and speech synthesis are used together to generate the final spoken output. In this thesis, I focus on two aspects of the prosody modeling process. First, I explore novel features that are available during natural language generation, such as the meaning, structure, and context of sentences, and demonstrate how these features are related to prosody, based on empirical evidences derived from annotated speech corpora. Second, I propose a new prosody modeling approach that automatically combines different natural language features for prosody prediction. More specifically, I designed an augmented instance-based learning algorithm that makes use of the natural prosody in human speech to produce natural and vivid synthesized speech. Our subjective evaluation demonstrates the effectiveness of this approach. I implement the prosody modeling system for a medical application called MAGIC.
Unit Selection for Speech Synthesis Using Splicing Costs with Weighted Finite State Transducers
, 2001
"... In this paper we describe how unit selection for concatenative speech synthesis can be implemented efficiently for sub-phonetic units using weighted finite state transducers (WFST). We also introduce splicing costs as a measure to indicate which unit boundaries are particularly good or poor joint po ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
In this paper we describe how unit selection for concatenative speech synthesis can be implemented efficiently for sub-phonetic units using weighted finite state transducers (WFST). We also introduce splicing costs as a measure to indicate which unit boundaries are particularly good or poor joint points. Splicing costs extend the flexibility offered by the unit selection paradigm. Through a perceptual experiment we demonstrate an improvement in speech quality achieved by using splicing costs during unit selection.
Efficient Integrated Response Generation from Multiple Targets using Weighted Finite-State Transducers
, 2002
"... Abstract goes here. ..."

