Natural-Sounding Speech Synthesis Using Variable-Length Units (1998)
Cached
Download Links
| Citations: | 33 - 4 self |
BibTeX
@MISC{Yi98natural-soundingspeech,
author = {Jon R. W. Yi and James R. Glass},
title = {Natural-Sounding Speech Synthesis Using Variable-Length Units},
year = {1998}
}
OpenURL
Abstract
The goal of this work was to develop a speech synthesis system which concatenates variable-length units to create naturalsounding speech. Our initial work in this area showed that by careful design of system responses to ensure consistent intonation contours, natural-sounding speech synthesis was achievable with word- and phrase-level concatenation. In order to extend the flexibility of this framework, we focused on the problem of generating novel words from a corpus of sub-word units. The design of the sub-word units was motivated by perceptual studies that investigated where speech could be spliced with minimal audible distortion and what contextual constraints were necessary to maintain in order to produce natural sounding speech. The sub-word corpus is searched during synthesis using a Viterbi search which selects a sequence of units based on how well they individually match the input specification and on how well they sound as an ensemble. This concatenative speech synthesis system, ENVOICE, has been used in a conversational information retrieval system in two application domains to convert meaning representations into speech waveforms.







