Results 1 - 10
of
19
Speech Parameter Generation Algorithms for HMM-Based Speech Synthesis
- Proc. ICASSP
, 2000
"... This paper derives a speech parameter generation algorithm for HMM-based speech synthesis, in which speech parameter sequence is generated from HMMs whose observation vector consists of spectral parameter vector and its dynamic feature vectors. In the algorithm, we assume that the state sequence (s ..."
Abstract
-
Cited by 71 (11 self)
- Add to MetaCart
This paper derives a speech parameter generation algorithm for HMM-based speech synthesis, in which speech parameter sequence is generated from HMMs whose observation vector consists of spectral parameter vector and its dynamic feature vectors. In the algorithm, we assume that the state sequence (state and mixture sequence for the multi-mixture case) or a part of the state sequence is unobservable (i.e., hidden or latent). As a result, the algorithm iterates the forward-backward algorithm and the parameter generation algorithm for the case where state sequence is given. Experimental results show that by using the algorithm, we can reproduce clear formant structure from multi-mixture HMMs as compared with that produced from single-mixture HMMs.
Joint Prosody Prediction And Unit Selection For Concatenative Speech Synthesis
, 2001
"... In this paper we describe how prosody prediction can be efficiently integrated with the unit selection process in a concatenative speech synthesizer under a weighted finite-state transducer (WFST) architecture. WFSTs representing prosody prediction and unit selection can be composed during synthesis ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
In this paper we describe how prosody prediction can be efficiently integrated with the unit selection process in a concatenative speech synthesizer under a weighted finite-state transducer (WFST) architecture. WFSTs representing prosody prediction and unit selection can be composed during synthesis, thus effectively expanding the space of possible prosodic targets. We implemented a symbolic prosody prediction module and a unit selection database as the synthesis components of a travel planning system. Results of perceptual experiments show that by combining the steps of prosody prediction and unit selection we are able to achieve improved naturalness of synthetic speech compared to the sequential implementation. 1. INTRODUCTION The growing popularity of speech-enabled computer interfaces demands high quality speech output, particularly for telephone applications. The perceived quality of standard general purpose text-tospeech (TTS) systems is not good enough, which forces applicatio...
Corpus-Based Speech Synthesis: Methods and Challenges
"... Corpus-based approaches to speech synthesis have been advocated to overcome the limitations of concatenative synthesis from a xed acoustic unit inventory. The frequency of unit concatenations in, e.g., diphone synthesis has been argued to contribute to the perceived lack of naturalness of synthetic ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Corpus-based approaches to speech synthesis have been advocated to overcome the limitations of concatenative synthesis from a xed acoustic unit inventory. The frequency of unit concatenations in, e.g., diphone synthesis has been argued to contribute to the perceived lack of naturalness of synthetic speech. The key idea of corpus-based synthesis, or unit selection, is to use an entire speech corpus as the acoustic inventory and to select at run-time from this corpus the longest available strings of phonetic segments that match a sequence of target speech sounds in the utterance to be synthesized, thereby minimizing the number of concatenations and reducing the need for signal processing. This paper reviews the assumptions underlying this synthesis strategy and the dierent approaches to unit selection, as well as the major challenges encountered by corpus-based methods. One of the biggest problems to date is the relative weighting of acoustic distance measures. We further argue agains...
Segment Pre-Selection In Decision-Tree Based Speech Synthesis Systems
, 2000
"... Corpus based approaches to unit selection for concatenative speech synthesis have become popular in recent years due to their improved sensitivity to unit context over their more simple predecessors. These systems usually make use of large speech databases and employ sophisticated search algorithm ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Corpus based approaches to unit selection for concatenative speech synthesis have become popular in recent years due to their improved sensitivity to unit context over their more simple predecessors. These systems usually make use of large speech databases and employ sophisticated search algorithms to determine the optimal unit sequence to use to synthesise each sentence. For many applications it is not possible to have the entire database, which may be as large as several hundred megabytes, available to the synthesiser at runtime. What is required is some form of off-line pre-selection algorithm to determine which subset of the database enables the highest quality speech synthesis to be performed for a given runtime system size. This paper describes a pre-selection algorithm developed at IBM for use with decision-tree-based concatenative speech synthesisers. 1. INTRODUCTION In recent years corpus based approaches to unit selection for concatenative speech synthesis have become inc...
An introduction of trajectory model into HMM-based speech synthesis
- in Proc. of ISCA SSW5
, 2004
"... ..."
Unit Selection for Speech Synthesis Using Splicing Costs with Weighted Finite State Transducers
, 2001
"... In this paper we describe how unit selection for concatenative speech synthesis can be implemented efficiently for sub-phonetic units using weighted finite state transducers (WFST). We also introduce splicing costs as a measure to indicate which unit boundaries are particularly good or poor joint po ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
In this paper we describe how unit selection for concatenative speech synthesis can be implemented efficiently for sub-phonetic units using weighted finite state transducers (WFST). We also introduce splicing costs as a measure to indicate which unit boundaries are particularly good or poor joint points. Splicing costs extend the flexibility offered by the unit selection paradigm. Through a perceptual experiment we demonstrate an improvement in speech quality achieved by using splicing costs during unit selection.
Efficient Integrated Response Generation from Multiple Targets using Weighted Finite-State Transducers
, 2002
"... Abstract goes here. ..."
The Impact Of Speech Recognition On Speech Synthesis
, 2002
"... Speech synthesis has changed dramatically in the past few years to have a corpus-based focus, borrowing heavily from advances in automatic speech recognition. In this paper, we survey technology in speech recognition systems and how it translates (or doesn't translate) to speech synthesis systems. W ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Speech synthesis has changed dramatically in the past few years to have a corpus-based focus, borrowing heavily from advances in automatic speech recognition. In this paper, we survey technology in speech recognition systems and how it translates (or doesn't translate) to speech synthesis systems. We further speculate on future areas where ASR may impact synthesis and vice versa.
Flexible Speech Synthesis Using Weighted Finite State Transducers
, 1996
"... The main focus of this thesis is on improving the quality of concatenative speech synthesis by taking advantage of the natural (allowable) variability in spoken language, namely, the fact that there are multiple ways of uttering a given sentence and there are several word sequences that can represen ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The main focus of this thesis is on improving the quality of concatenative speech synthesis by taking advantage of the natural (allowable) variability in spoken language, namely, the fact that there are multiple ways of uttering a given sentence and there are several word sequences that can represent a given concept. An architecture for speech generation for constrained domain applications is proposed that tightly integrates language generation and speech synthesis, allowing the choice of words and desired intonation in the system's response to be optimized jointly with the speech output quality. Experiments with a travel planning dialog system have demonstrated that by expanding the space of candidate responses and possible prosodic realizations we achieve higher quality speech output.
Corpus-based unit selection for natural-sounding speech synthesis
, 2003
"... Speech synthesis is an automatic encoding process carried out by machine through which symbols conveying linguistic information are converted into an acoustic waveform. In the past decade or so, a recent trend toward a non-parametric, corpus-based approach has focused on using real human speech as s ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Speech synthesis is an automatic encoding process carried out by machine through which symbols conveying linguistic information are converted into an acoustic waveform. In the past decade or so, a recent trend toward a non-parametric, corpus-based approach has focused on using real human speech as source material for producing novel natural-sounding speech. This work proposes a communication-theoretic formulation in which unit selection is a noisy channel through which an input sequence of symbols passes and an output sequence, possibly corrupted due to the coverage limits of the corpus, emerges. The penalty of approximation is quantified by substitution and concatenation costs which grade what unit contexts are interchangeable and where concatenations are not perceivable. These costs are semi-automatically derived from data and are found to agree with acoustic-phonetic knowledge.

