Results 1 -
4 of
4
A hidden Markov-model-based trainable speech synthesizer
, 1999
"... This paper presents a new approach to speech synthesis in which a set of cross-word decision-tree state-clustered context-dependent hidden Markov models are used to define a set of subphone units to be used in a concatenation synthesizer. The models, trees, waveform segments and other parameters ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
This paper presents a new approach to speech synthesis in which a set of cross-word decision-tree state-clustered context-dependent hidden Markov models are used to define a set of subphone units to be used in a concatenation synthesizer. The models, trees, waveform segments and other parameters representing each clustered state are obtained completely automatically through training on a 1 hour single-speaker continuous-speech database. During synthesis the required utterance, specified as a string of words of known phonetic pronounciation, is generated as a sequence of these clustered states using a TD-PSOLA waveform concatenation synthesizer. The system produces speech which, though in a monotone, is both natural sounding and highly intelligible. A Modified Rhyme Test conducted to measure segmental intelligibility yielded a 50% error rate. The speech produced by the system mimics the voice of the speaker used to record the training database. The system can be retrained on...
Flexible Speech Synthesis Using Weighted Finite State Transducers
, 1996
"... The main focus of this thesis is on improving the quality of concatenative speech synthesis by taking advantage of the natural (allowable) variability in spoken language, namely, the fact that there are multiple ways of uttering a given sentence and there are several word sequences that can represen ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The main focus of this thesis is on improving the quality of concatenative speech synthesis by taking advantage of the natural (allowable) variability in spoken language, namely, the fact that there are multiple ways of uttering a given sentence and there are several word sequences that can represent a given concept. An architecture for speech generation for constrained domain applications is proposed that tightly integrates language generation and speech synthesis, allowing the choice of words and desired intonation in the system's response to be optimized jointly with the speech output quality. Experiments with a travel planning dialog system have demonstrated that by expanding the space of candidate responses and possible prosodic realizations we achieve higher quality speech output.
A Diphone-Based Text-to-Speech System for Scottish Gaelic
, 1997
"... In this thesis, a diphone--based text--to--speech system for Scottish Gaelic, a language spoken by about 80.000 native speakers in Scotland and Canada, is presented. Text-- to--speech systems convert orthographic text input into speech output. The present system consists of two main parts: ffl an a ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In this thesis, a diphone--based text--to--speech system for Scottish Gaelic, a language spoken by about 80.000 native speakers in Scotland and Canada, is presented. Text-- to--speech systems convert orthographic text input into speech output. The present system consists of two main parts: ffl an automatic phonetic transcription module which produces an orthophonic transcription of the orthographic input text ffl a speech synthesis module which synthesizes an utterance from its transcription by concatenating and modifying previously recorded speech units. Diphones, speech units that cover two sounds and the transition between them, form the basis of the synthesis module. Duration and intonation are modelled on the basis of simple heuristics. The diphone inventory was designed for the Gaelic of Bayble, Lewis. Scottish Gaelic distinguishes four main phonetic settings: velarised, palatalised, nasalised, and neutral. As the domain of these settings is the syllable, they are difficult t...
Diphone Synthesis For Welsh
- Proceedings of the Institute of Acoustics
, 1994
"... INTRODUCTION The Welsh language is one of the lesser-used and lesser-researched languages of Europe. This work represents the first known attempt at developing a speech synthesiser for Welsh. Because comparatively little is known about the acoustic characteristics of Welsh speech sounds, it was dec ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
INTRODUCTION The Welsh language is one of the lesser-used and lesser-researched languages of Europe. This work represents the first known attempt at developing a speech synthesiser for Welsh. Because comparatively little is known about the acoustic characteristics of Welsh speech sounds, it was decided to use diphone concatenation rather than rule-based parametric synthesis. The software of an existing text-to-speech synthesis system for English (described in [1]) was adapted for use with Welsh. This software uses the PSOLA synthesis technique, as descibed in [2], [3]. The software can run on a SUN workstation or on a PC with an LSI DSP board. The number of Welsh phonemes included was 51, including 3 used only in English loanwords (/z/, affricates /ch/ and /jh/) and 3 used in restricted contexts (labialised /lw, nw, rw/). In total, there were 32 consonants and 19 vowels. Also, it was decided that the synthesiser should be able to handle English as well, due to the number of En

