Results 1 - 10
of
66
A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
- IEICE - Trans. Inf. Syst
, 2007
"... This paper describes a novel parameter generation algorithm for the HMM-based speech synthesis. The conventional algorithm generates a trajectory of static features that maximizes an output probability of a parameter sequence consisting of the static and dynamic features from HMMs under an actual co ..."
Abstract
-
Cited by 41 (13 self)
- Add to MetaCart
This paper describes a novel parameter generation algorithm for the HMM-based speech synthesis. The conventional algorithm generates a trajectory of static features that maximizes an output probability of a parameter sequence consisting of the static and dynamic features from HMMs under an actual constraint between the two features. The generated trajectory is often excessively smoothed due to the statistical processing. Using the over-smoothed trajectory causes the muffled sound. In order to alleviate the over-smoothing effect, we propose the generation algorithm considering not only the output probability used for the conventional method but also that of a global variance (GV) of the generated trajectory. The latter probability works as a penalty for a reduction of the variance of the generated trajectory. A result of a perceptual evaluation demonstrates that the proposed method causes large improvements of the naturalness of synthetic speech. 1.
Statistical parametric speech synthesis
- in Proc. ICASSP, 2007
, 2007
"... This paper gives a general overview of techniques in statistical parametric speech synthesis. One of the instances of these techniques, called HMM-based generation synthesis (or simply HMM-based synthesis), has recently been shown to be very effective in generating acceptable speech synthesis. This ..."
Abstract
-
Cited by 37 (5 self)
- Add to MetaCart
This paper gives a general overview of techniques in statistical parametric speech synthesis. One of the instances of these techniques, called HMM-based generation synthesis (or simply HMM-based synthesis), has recently been shown to be very effective in generating acceptable speech synthesis. This paper also contrasts these techniques with the more conventional unit selection technology that has dominated speech synthesis over the last ten years. Advantages and disadvantages of statistical parametric synthesis are highlighted as well as identifying where we expect the key developments to appear in the immediate future. Index Terms — Speech synthesis, hidden Markov models 1. BACKGROUND With the increase in power and resources of computer technology, building natural sounding synthetic voices has progressed from a
An HMM-Based Speech Synthesis System Applied To English
, 2002
"... This paper describes an HMM-based speech synthesis system (HTS), in which speech waveform is generated from HMMs themselves, and applies it to English speech synthesis using the general speech synthesis architecture of Festival. Similarly to other datadriven speech synthesis approaches, HTS has a co ..."
Abstract
-
Cited by 34 (7 self)
- Add to MetaCart
This paper describes an HMM-based speech synthesis system (HTS), in which speech waveform is generated from HMMs themselves, and applies it to English speech synthesis using the general speech synthesis architecture of Festival. Similarly to other datadriven speech synthesis approaches, HTS has a compact language dependent module: a list of contextual factors. Thus, it could easily be extended to other languages, though the first version of HTS was implemented for Japanese. The resulting run-time engine of HTS has the advantage of being small: less than 1 M bytes, excluding text analysis part. Furthermore, HTS can easily change voice characteristics of synthesized speech by using a speaker adaptation technique developed for speech recognition. The relation between the HMM-based approach and other unit selection approaches is also discussed.
Talking To Machines (Statistically Speaking)
"... Statistical methods have long been the dominant approach in speech recognition and probabilistic modelling in ASR is now a mature technology. The use of statistical methods in other areas of spoken dialogue is however more recent and rather less mature. This paper reviews spoken dialogue systems fro ..."
Abstract
-
Cited by 31 (10 self)
- Add to MetaCart
Statistical methods have long been the dominant approach in speech recognition and probabilistic modelling in ASR is now a mature technology. The use of statistical methods in other areas of spoken dialogue is however more recent and rather less mature. This paper reviews spoken dialogue systems from a statistical modelling perspective. The complete system is first presented as a partially observable Markov decision process. The various sub-components are then exposed by introducing appropriate intermediate variables. Samples of existing work are reviewed within this framework, including dialogue control and optimisation, semantic interpretation, goal detection, natural language generation and synthesis.
Spectral Conversion Based on Maximum Likelihood Estimation Considering Global Variance of Converted Parameter
- Proc. ICASSP
, 2005
"... This paper describes a novel spectral conversion method for the voice transformation. We perform spectral conversion between speakers using a Gaussian Mixture Model (GMM) on joint probability density of source and target features. A smooth spectral sequence can be estimated by applying maximum likel ..."
Abstract
-
Cited by 28 (15 self)
- Add to MetaCart
This paper describes a novel spectral conversion method for the voice transformation. We perform spectral conversion between speakers using a Gaussian Mixture Model (GMM) on joint probability density of source and target features. A smooth spectral sequence can be estimated by applying maximum likelihood (ML) estimation using dynamic features to the GMM-based mapping. However, the degradation of the converted speech quality is still caused due to an over-smoothing of the converted spectra, which is inevitable in the conventional ML-based parameter estimation. In order to alleviate the over-smoothing, we propose an ML-based conversion taking account of the global variance of the converted parameter in each utterance. Experimental results show that the performance of the voice conversion can be improved by using the global variance information. Moreover, it is demonstrated that the proposed algorithm is more effective than spectral enhancement by postfiltering. 1.
Voice Conversion Based on Maximum Likelihood Estimation of Spectral Parameter Trajectory
"... In this paper, we describe a novel spectral conversion ..."
Abstract
-
Cited by 27 (21 self)
- Add to MetaCart
In this paper, we describe a novel spectral conversion
Perfect Synthesis For All Of The People All Of The Time
- in IEEE 2002 Workshop on Speech Synthesis
, 2002
"... The quality of speech synthesis has drastically improved over the last ten years. Or at least it appears that this is the case. We have moved from diphones to unit selection. However, although we can produce much more natural sounding examples we have also given up an certain amount of control over ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
The quality of speech synthesis has drastically improved over the last ten years. Or at least it appears that this is the case. We have moved from diphones to unit selection. However, although we can produce much more natural sounding examples we have also given up an certain amount of control over what can be synthesized. We have reached the stage where playing a few examples to a non-expert can easily convince them that speech synthesis is a solved problem. This paper looks at how we might not only convince some of the people some of the time, but what we must do to produce perfect synthesis for all of the people all of the time.
Mapping from Articulatory Movements to Vocal Tract Spectrum with Gaussian Mixture Model for Articulatory Speech Synthesis
- in 5th ISCA Speech Synthesis Workshop
, 2004
"... This paper describes a method for determining the vocal tract spectrum from articulatory movements using a Gaussian Mixture Model (GMM) to synthesize speech with articulatory information. The GMM on joint probability density of articulatory parameters and acoustic spectral parameters is trained usi ..."
Abstract
-
Cited by 15 (6 self)
- Add to MetaCart
This paper describes a method for determining the vocal tract spectrum from articulatory movements using a Gaussian Mixture Model (GMM) to synthesize speech with articulatory information. The GMM on joint probability density of articulatory parameters and acoustic spectral parameters is trained using a parallel acousticarticulatory speech database. We evaluate the performance of the GMM-based mapping by a spectral distortion measure. Experimental results demonstrate that the distortion can be reduced by using not only the articulatory parameters of the vocal tract but also power and voicing information as input features. Moreover, in order to determine the best mapping, we apply maximum likelihood estimation (MLE) to the GMM-based mapping method. Experimental results show that MLE using both static and dynamic features can improve the mapping accuracy compared with the conventional GMM-based mapping.
USTC system for Blizzard Challenge 2006 an improved HMM-based speech synthesis method
- in Blizzard Challenge Workshop
, 2006
"... This paper introduces the USTC speech synthesis system for Blizzard Challenge 2006. The HMM-based parametric synthesis approach was adopted for its convenience and effectiveness in building a new voice, especially for the nonnative developers. Some useful techniques were also integrated into our sys ..."
Abstract
-
Cited by 15 (9 self)
- Add to MetaCart
This paper introduces the USTC speech synthesis system for Blizzard Challenge 2006. The HMM-based parametric synthesis approach was adopted for its convenience and effectiveness in building a new voice, especially for the nonnative developers. Some useful techniques were also integrated into our system, such as minimum generation error (MGE) training, phone duration modeling and linear spectral pair (LSP) based formant enhancement. The evaluation results show that the proposed system is able to synthesize speech with high naturalness and intelligibility by using either full database or only ARCTIC subset. 1.
Statistical Mapping between Articulatory Movements and Acoustic Spectrum Using a Gaussian Mixture Model
, 2007
"... In this paper, we describe a statistical approach to both an articulatory-to-acoustic mapping and an acoustic-to-articulatory inversion mapping without using phonetic information. The joint probability density of an articulatory parameter and an acoustic parameter is modeled using a Gaussian mixture ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
In this paper, we describe a statistical approach to both an articulatory-to-acoustic mapping and an acoustic-to-articulatory inversion mapping without using phonetic information. The joint probability density of an articulatory parameter and an acoustic parameter is modeled using a Gaussian mixture model (GMM) based on a parallel acoustic-articulatory speech database. We apply the GMM-based mapping using the minimum mean-square error (MMSE) criterion, which has been proposed for voice conversion, to the two mappings. Moreover, to improve the mapping performance, we apply maximum likelihood estimation (MLE) to the GMM-based mapping method. The determination of a target parameter trajectory having appropriate static and dynamic properties is obtained by imposing an explicit relationship between static and dynamic features in the MLE-based mapping. Experimental results demonstrate that the MLE-based mapping with dynamic features can significantly improve the mapping performance compared with the MMSE-based mapping in both the articulatory-to-acoustic mapping and the inversion mapping.

