Results 1 -
8 of
8
A hidden Markov-model-based trainable speech synthesizer
, 1999
"... This paper presents a new approach to speech synthesis in which a set of cross-word decision-tree state-clustered context-dependent hidden Markov models are used to define a set of subphone units to be used in a concatenation synthesizer. The models, trees, waveform segments and other parameters ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
This paper presents a new approach to speech synthesis in which a set of cross-word decision-tree state-clustered context-dependent hidden Markov models are used to define a set of subphone units to be used in a concatenation synthesizer. The models, trees, waveform segments and other parameters representing each clustered state are obtained completely automatically through training on a 1 hour single-speaker continuous-speech database. During synthesis the required utterance, specified as a string of words of known phonetic pronounciation, is generated as a sequence of these clustered states using a TD-PSOLA waveform concatenation synthesizer. The system produces speech which, though in a monotone, is both natural sounding and highly intelligible. A Modified Rhyme Test conducted to measure segmental intelligibility yielded a 50% error rate. The speech produced by the system mimics the voice of the speaker used to record the training database. The system can be retrained on...
The Bell Labs German Text-To-Speech System: An Overview
- Computer Speech and Language
, 1999
"... In this paper we present an overview of the German version of the Bell Labs text-to-speech system, a high-quality concatenative synthesis system with extensive text analysis capabilities. We discuss problems of text analysis, and our solutions to these problems, including: the integration of text no ..."
Abstract
-
Cited by 14 (7 self)
- Add to MetaCart
In this paper we present an overview of the German version of the Bell Labs text-to-speech system, a high-quality concatenative synthesis system with extensive text analysis capabilities. We discuss problems of text analysis, and our solutions to these problems, including: the integration of text normalization tasks into linguistic text analysis; the capability to morphologically analyze compounds and unseen words; name analysis and pronunciation. We briefly describe the prosodic components of the text-to-speech system and their underlying duration and intonation models. Finally, the phonetically motivated structure of the acoustic inventory is presented.
Interaction of Units in a Unit Selection Database
- In Proceedings of the European Conference on Speech Communication and Technology
, 1999
"... The purpose of this paper is to examine some aspects of unit selection for Text to Speech synthesis (TTS). We use Unit Selection as described in [2],[3]. The approach taken was to synthesize a large number of sentences and capture information about the selected units. We used approximately 25 millio ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The purpose of this paper is to examine some aspects of unit selection for Text to Speech synthesis (TTS). We use Unit Selection as described in [2],[3]. The approach taken was to synthesize a large number of sentences and capture information about the selected units. We used approximately 25 million phonemes resulting from 10,000 files of AP newswire text. Given these statistics we looked at the units selected in order to try to analyse how unit selection works from a statistical point of view. Results of our analysis are presented. 1. Introduction Unit selection synthesis techniques grew out of a dissatisfaction with older diphone concatenation techniques which allowed for only one example of any particular diphone. Diphone synthesis tended to sound unnatural. Much effort in diphone concatenation synthesis was spent on unit selection, although done off-line rather than the on-line version described here. Units were selected for their ability to join well to neighboring units on aver...
Recent Advances In Multilingual Text-To-Speech Synthesis
- IN FORTSCHRITTE DER AKUSTIK---DAGA '96
, 1996
"... ..."
A Diphone-Based Text-to-Speech System for Scottish Gaelic
, 1997
"... In this thesis, a diphone--based text--to--speech system for Scottish Gaelic, a language spoken by about 80.000 native speakers in Scotland and Canada, is presented. Text-- to--speech systems convert orthographic text input into speech output. The present system consists of two main parts: ffl an a ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In this thesis, a diphone--based text--to--speech system for Scottish Gaelic, a language spoken by about 80.000 native speakers in Scotland and Canada, is presented. Text-- to--speech systems convert orthographic text input into speech output. The present system consists of two main parts: ffl an automatic phonetic transcription module which produces an orthophonic transcription of the orthographic input text ffl a speech synthesis module which synthesizes an utterance from its transcription by concatenating and modifying previously recorded speech units. Diphones, speech units that cover two sounds and the transition between them, form the basis of the synthesis module. Duration and intonation are modelled on the basis of simple heuristics. The diphone inventory was designed for the Gaelic of Bayble, Lewis. Scottish Gaelic distinguishes four main phonetic settings: velarised, palatalised, nasalised, and neutral. As the domain of these settings is the syllable, they are difficult t...
Inducing Concatenative Units from Machine Readable Dictionaries and Corpora for Speech Synthesis
, 1994
"... The purpose of this research is to determine the best method for deciding on an optimal set of concatenative units for concatenative speech synthesis. Of the two main approaches to speech synthesis: segmental synthesis and rule-based synthesis, the former relies heavily on the successful choice of c ..."
Abstract
- Add to MetaCart
The purpose of this research is to determine the best method for deciding on an optimal set of concatenative units for concatenative speech synthesis. Of the two main approaches to speech synthesis: segmental synthesis and rule-based synthesis, the former relies heavily on the successful choice of concatenative units. Segmental synthesis consists of concatenating segmental units (diphones, triphones, etc); rule-based synthesis consists of the computation of control parameters based on pre-established rules. Deciding on the set of diphones is quite straightforward in the sense that it suffices to take the phoneme inventory of a language, and simply combine each phoneme with every other one. For example, taking the approximately 35 French phonemes, 1225 phonemic pairs (35x35) constitute the complete and exhaustive starting diphone inventory. On the other hand, deciding on the set of triphones, quadriphones and larger units raises difficult questions about the nature of phonemes in a give...
Models of Speech Synthesis
, 1994
"... We will in this paper review some of the approaches used to generate synthetic speech and discuss some of the basic motivations for choosing one method over another. Primarily, we will discuss different methods of generating synthetic speech in a text-to-speech system. In the last part of the paper ..."
Abstract
- Add to MetaCart
We will in this paper review some of the approaches used to generate synthetic speech and discuss some of the basic motivations for choosing one method over another. Primarily, we will discuss different methods of generating synthetic speech in a text-to-speech system. In the last part of the paper general issues such as different voices, accents and multiple languages are discussed.

