• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Natural-Sounding Speech Synthesis Using Variable-Length Units (1998)

Cached

  • Download as a PDF

Download Links

  • [www.sls.csail.mit.edu]
  • [groups.csail.mit.edu]
  • [www.sls.lcs.mit.edu]
  • [groups.csail.mit.edu]
  • [www.sls.lcs.mit.edu]
  • [www.sls.csail.mit.edu]
  • [www.sls.lcs.mit.edu]
  • [www.sls.lcs.mit.edu]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Jon R. W. Yi , James R. Glass
Citations:33 - 4 self
  • Summary
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Yi98natural-soundingspeech,
    author = {Jon R. W. Yi and James R. Glass},
    title = {Natural-Sounding Speech Synthesis Using Variable-Length Units},
    year = {1998}
}

Bookmark

citeulike Connotea Bibsonomy Del.icio.us Digg Reddit

OpenURL

 

Abstract

The goal of this work was to develop a speech synthesis system which concatenates variable-length units to create naturalsounding speech. Our initial work in this area showed that by careful design of system responses to ensure consistent intonation contours, natural-sounding speech synthesis was achievable with word- and phrase-level concatenation. In order to extend the flexibility of this framework, we focused on the problem of generating novel words from a corpus of sub-word units. The design of the sub-word units was motivated by perceptual studies that investigated where speech could be spliced with minimal audible distortion and what contextual constraints were necessary to maintain in order to produce natural sounding speech. The sub-word corpus is searched during synthesis using a Viterbi search which selects a sequence of units based on how well they individually match the input specification and on how well they sound as an ensemble. This concatenative speech synthesis system, ENVOICE, has been used in a conversational information retrieval system in two application domains to convert meaning representations into speech waveforms.

Citations

599 The Viterbi algorithm - Forney - 1973
227 Unit selection in a concatenative speech synthesis system using a large speech database - Hunt, Black - 1996
209 Pitch-synchronous waveforms processing techniques for text-to-speech synthesis using diphones - Charpentier, Moulines - 1989
130 Processing of Speech Signals - Rabiner, Schafer - 1993
101 A probabilistic framework for feature-based speech recognition - Glass, Chang - 1996
95 Speech Analysis Synthesis and Perception - Flanagan - 1972
89 Speech Communications – Human and Machine - O’Shaughnessy - 2000
54 GALAXY: A Human Language Interface to On-line Travel Information - Goddeau, Brill, et al. - 1994
47 Optimizing selection of units from speech database for concatenative synthesis - Black, Campbell - 1995
37 A tree-trellis based fast search for finding the n best sentence hypotheses in continuous speech recognition - Soong, Huang - 1990
30 The syllable - Selkirk - 1982
27 CHATR: A High-Definition Speech Re-Sequencing System - Campbell - 1996
27 Hierarchical duration modeling for speech recognition using the ANGIE framework - Chung, Seneff - 1997
27 A Trainable Text-toSpeech System - Huang - 1991
25 Sene , \Multilingual language generation across multiple domains - Glass, Polifroni, et al. - 1994
25 ANGIE: A new framework for speech analysis based on morphophonological modelling - Seneff, Lau, et al. - 1996
25 A prosody tutorial for investigators of auditory sentence processing - Shattuck-Hufnagel, Turk - 1996
24 A diphone synthesis system based on time-domain prosodic modifications of speech - Moulines, Charpentier, et al.
24 A comparison of approaches to on-line handwritten character recognition - Kassel - 1995
21 From interface to content: Translingual access and delivery of on-line information - Zue, Seneff, et al. - 1997
21 Speech synthesis by rule using an optimal selection of non-uniform synthesis units - Sagisaka - 1988
19 SAPPHIRE: An extensible speech analysis and recognition tool based on Tcl/Tk - Hetherington, McCandless - 1942
14 Combinatorial issues in text-to-speech synthesis - Santen - 1997
13 WHEELS: A conversational system in the automobile classi eds domain - Meng, Busayapongchai, et al. - 1996
7 Acoustic Characteristics of Stop Consonants: A Controlled Study - Zue - 1976
4 A trainable text-to-speech system - Whistler - 1996
4 From interface to content: Translingual access and delivery of online information - Glass, Hetherington, et al. - 1997
2 Nasal Consonants and Nasalized Vowels: An Acoustic Study and Recognition Experiment - Glass - 1984
2 Automating the design of compact linguistic corporation - Kassel - 1994
2 COMLEX English pronunciation dictionary. URL http://www.ldc.upenn.edu/ldc/catalog/html/lexical html/comlexep.html - McLemore
1 Linguistic criteria for building and recording units for concatenative speech synthesis in brazilian portoguese - Albano, Aquino - 1997
1 Talking Machines: Theories, Models, and Designs, chapter III.a, Syllable-based segmental duration - Campbell - 1992
1 Talking Machines: Theories, Models, and Designs, chapter I, On the basic scheme and algorithms in non-uniform unit speech synthesis - Takeda, Abe, et al. - 1992
1 Speech Coding and Synthesis, chapter 18, The Generation - Terken, Collier - 1995
1 Speech Coding and Synthesis, chapter 19, Computation of Timing - Santen - 1995
1 Time-Domain PSOLA concatenative speech synthesis using diphones - Yi - 1997
1 CHATR: A high-de nition speech re-sequencing system - Campbell - 1996
1 Stephanie Sene . Hierarchical duration modelling for speech recognition using the ANGIE framework - Chung - 1997
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University