Results 1 - 10
of
59
Vector-based Natural Language Call Routing
- Computational Linguistics
, 1999
"... This paper describes a domain-independent, automatically trained natural language call router for directing incoming calls in a call center. Our call router directs customer calls based on their response to an open-ended How may I direct your call? prompt. Routing behavior is trained from a corpus o ..."
Abstract
-
Cited by 61 (3 self)
- Add to MetaCart
This paper describes a domain-independent, automatically trained natural language call router for directing incoming calls in a call center. Our call router directs customer calls based on their response to an open-ended How may I direct your call? prompt. Routing behavior is trained from a corpus of transcribed and hand-routed calls and then carried out using vectorbased information retrieval techniques. Terms consist of n-gram sequences of morphologically reduced content words, while documents representing routing destinations consist of weighted term frequencies derived from calls to that destination in the training corpus. Based on the statistical discriminating power of the n-gram terms extracted from the caller's request, the caller is 1) routed to the appropriate destination, 2) transferred to a human operator, or 3) asked a disambiguation question. In the last case, the system dynamically generates queries tailored to the caller's request and the destinations with which it is consistent, based on our extension of the vector model. Evaluation of the call router performance over a financial services call center using both accurate transcriptions of calls and fairly noisy speech recognizer output demonstrated robustness in the face of speech recognition errors. More specifically, using accurate transcriptions of speech input, our system correctly routed 93.8% of the calls after redirecting 10.2% of all calls to a human operator. Using speech recognizer output with a 23% error rate reduced the number of correctly routed calls by 4%
Subword-based Approaches for Spoken Document Retrieval
, 2000
"... This thesis explores approaches to the problem of spoken document retrieval (SDR), which is the task of automatically indexing and then retrieving relevant items from a large collection of recorded speech messages in response to a user specified natural language text query. We investigate the use of ..."
Abstract
-
Cited by 40 (0 self)
- Add to MetaCart
This thesis explores approaches to the problem of spoken document retrieval (SDR), which is the task of automatically indexing and then retrieving relevant items from a large collection of recorded speech messages in response to a user specified natural language text query. We investigate the use of subword unit representations for SDR as an alternative to words generated by either keyword spotting or continuous speech recognition. Our investigation is motivated by the observation that word-based retrieval approaches face the problem of either having to know the keywords to search for a priori, or requiring a very large recognition vocabulary in order to cover the contents of growing and diverse message collections. The use of subword units in the recognizer constrains the size of the vocabulary needed to cover the language; and the use of subword units as indexing terms allows for the detection of new user-specified query terms during retrieval. Four
Concept-to-Speech Synthesis by Phonological Structure Matching
- Philosophical Transactions of the Royal Society, Series A
, 2000
"... This paper presents a new way of generating synthetic speech waveforms from a linguistic description. The algorithm is presented as a proposed solution to the speech generation problem in a concept-to-speech system. Off-line, a database of recorded speech is annotated so as to produce a phonological ..."
Abstract
-
Cited by 28 (2 self)
- Add to MetaCart
This paper presents a new way of generating synthetic speech waveforms from a linguistic description. The algorithm is presented as a proposed solution to the speech generation problem in a concept-to-speech system. Off-line, a database of recorded speech is annotated so as to produce a phonological tree for each sentence in that database. Synthesis is performed by generating a phonological tree called the target tree, and searching the database of trees to find nodes which are the same in both trees. A search strategy using target and concatenation costs is then used to find the optimal sequence of units for the target sentence. This paper explains this algorithm, compares it to existing algorithms and concludes with a discussion of future directions.
MIMIC: An Adaptive Mixed Initiative Spoken Dialogue System for Information Queries
, 2000
"... This paper describes MIMIC, an adaptive mixed initiative spoken dialogue system that provides movie showtime information. MIMIC improves upon previous dialogue systems in two respects. First, it employs initiative-oriented strategy adaptation to automatically adapt response generation strategies bas ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
This paper describes MIMIC, an adaptive mixed initiative spoken dialogue system that provides movie showtime information. MIMIC improves upon previous dialogue systems in two respects. First, it employs initiative-oriented strategy adaptation to automatically adapt response generation strategies based on the cumulative effect of information dynamically extracted from user utterances during the dialogue. Second, MIMIC's dialogue management architecture decouples its initiative module from the goal and response strategy selection processes, providing a general framework for developing spoken dialogue systems with different adaptation behavior.
Word informativeness and automatic pitch accent modeling
- in EMNLP/VCL
, 1999
"... To appear in Proc. of EMNLP/VLC, 1999. In intonational phonology and speech synthesis research, it has been suggested that the relative informativeness of a word can be used to predict pitch prominence. The more information conveyed by a word, the more likely it will be accented. But there are other ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
To appear in Proc. of EMNLP/VLC, 1999. In intonational phonology and speech synthesis research, it has been suggested that the relative informativeness of a word can be used to predict pitch prominence. The more information conveyed by a word, the more likely it will be accented. But there are others who express doubts about such a correlation. In this paper, we provide some empirical evidence to support the existence of such a correlation by employing two widely accepted measures of informativeness. Our experiments show that there is a positive correlation between the informativeness of a word and its pitch accent assignment. They also show that informativeness enables statistically significant improvements in pitch accent prediction. The computation of word informativeness is inexpensive and can be incorporated into speech synthesis systems easily. 1
Rare Events and Closed Domains: Two Delicate Concepts in Speech Synthesis
, 2003
"... One of the most serious challenges for speech synthesis is the systematic treatment of events in language and speech that are known to have low frequencies of occurrence. The problems that extremely unbalanced frequency distributions pose for rulebased or data-driven models are often underestimated ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
One of the most serious challenges for speech synthesis is the systematic treatment of events in language and speech that are known to have low frequencies of occurrence. The problems that extremely unbalanced frequency distributions pose for rulebased or data-driven models are often underestimated or even unrecognized. This paper discusses these problems in the contexts of morphology, syllabification, segmental duration and unit selection, and also suggests possible solutions. The design of databases for restricted application domains, where the distributions of linguistic and phonetic factors are known, is also critically reviewed.
Chinese Tone Modeling with Stem-ML
- in ICSLP
, 2000
"... This paper models tonal variations with Stem-ML tags. Surface tone shapes often deviate from their expected canonical shapes in natural sentences, presenting a challenging case to tone modeling. In this study we employed a subset of Stem-ML tags which incorporated information of lexical tones and li ..."
Abstract
-
Cited by 17 (12 self)
- Add to MetaCart
This paper models tonal variations with Stem-ML tags. Surface tone shapes often deviate from their expected canonical shapes in natural sentences, presenting a challenging case to tone modeling. In this study we employed a subset of Stem-ML tags which incorporated information of lexical tones and linguistically motivated prosodic strength of the syllable. The tags successfully captured the "distorted" tone shapes and produced contextually appropriate surface variations.
High-Accuracy Automatic Segmentation
, 1999
"... We propose a system for automatically determining boundaries between phonetic segments in a speechwave given a phonetic transcription: automatic segmentation. The system uses edge detectors that are applied to various speech representations; both are optimized for each diphone or diphone class. Out ..."
Abstract
-
Cited by 15 (7 self)
- Add to MetaCart
We propose a system for automatically determining boundaries between phonetic segments in a speechwave given a phonetic transcription: automatic segmentation. The system uses edge detectors that are applied to various speech representations; both are optimized for each diphone or diphone class. Output from these detectors, which contains spuriously detected edges, is then combined with alternative pronunciations generated via rules from the canonical pronunciation. The #nal output is generated with lowest-cost path algorithms applied to #nite state transducers. 1. INTRODUCTION Automatic segmentation is critical both for speech research and for speech technologies that rely on segmented speech corpora for training or construction purposes. For example, in text-to-speech synthesis #TTS# segmented corpora are used for the construction of intonation, duration, and synthesis components #4#. The standard approach to automated segmentation is to adapt an automatic speech recognition #ASR#...
Joint Prosody Prediction And Unit Selection For Concatenative Speech Synthesis
, 2001
"... In this paper we describe how prosody prediction can be efficiently integrated with the unit selection process in a concatenative speech synthesizer under a weighted finite-state transducer (WFST) architecture. WFSTs representing prosody prediction and unit selection can be composed during synthesis ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
In this paper we describe how prosody prediction can be efficiently integrated with the unit selection process in a concatenative speech synthesizer under a weighted finite-state transducer (WFST) architecture. WFSTs representing prosody prediction and unit selection can be composed during synthesis, thus effectively expanding the space of possible prosodic targets. We implemented a symbolic prosody prediction module and a unit selection database as the synthesis components of a travel planning system. Results of perceptual experiments show that by combining the steps of prosody prediction and unit selection we are able to achieve improved naturalness of synthetic speech compared to the sequential implementation. 1. INTRODUCTION The growing popularity of speech-enabled computer interfaces demands high quality speech output, particularly for telephone applications. The perceived quality of standard general purpose text-tospeech (TTS) systems is not good enough, which forces applicatio...
Prosody modeling in concept-to-speech generation
, 2002
"... With the development of speech recognition and synthesis technology, speech interfaces for practical applications are in high demand. For applications like spoken dialogues systems, where not only the waveform but also the content of a system’s query/response have to be generated automatically, a Co ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
With the development of speech recognition and synthesis technology, speech interfaces for practical applications are in high demand. For applications like spoken dialogues systems, where not only the waveform but also the content of a system’s query/response have to be generated automatically, a Concept-to-Speech system is needed. One key module in a Concept-to-Speech system is prosody modeling. It determines how prosody (intonation), the suprasegmental aspect of speech that communicates the structure and meaning of utterances, should be represented and generated automatically. Since prosody directly affected by the meaning and structure of the sentences automatically produced by a natural language generator; at the same time, it also has significant influence on the naturalness and effectiveness of the speech synthesized, its performance is critical to the success of a Conceptto-Speech system where both natural language generation and speech synthesis are used together to generate the final spoken output. In this thesis, I focus on two aspects of the prosody modeling process. First, I explore novel features that are available during natural language generation, such as the meaning, structure, and context of sentences, and demonstrate how these features are related to prosody, based on empirical evidences derived from annotated speech corpora. Second, I propose a new prosody modeling approach that automatically combines different natural language features for prosody prediction. More specifically, I designed an augmented instance-based learning algorithm that makes use of the natural prosody in human speech to produce natural and vivid synthesized speech. Our subjective evaluation demonstrates the effectiveness of this approach. I implement the prosody modeling system for a medical application called MAGIC.

