Results 1 - 10
of
57
Speech Parameter Generation Algorithms for HMM-Based Speech Synthesis
- Proc. ICASSP
, 2000
"... This paper derives a speech parameter generation algorithm for HMM-based speech synthesis, in which speech parameter sequence is generated from HMMs whose observation vector consists of spectral parameter vector and its dynamic feature vectors. In the algorithm, we assume that the state sequence (s ..."
Abstract
-
Cited by 71 (11 self)
- Add to MetaCart
This paper derives a speech parameter generation algorithm for HMM-based speech synthesis, in which speech parameter sequence is generated from HMMs whose observation vector consists of spectral parameter vector and its dynamic feature vectors. In the algorithm, we assume that the state sequence (state and mixture sequence for the multi-mixture case) or a part of the state sequence is unobservable (i.e., hidden or latent). As a result, the algorithm iterates the forward-backward algorithm and the parameter generation algorithm for the case where state sequence is given. Experimental results show that by using the algorithm, we can reproduce clear formant structure from multi-mixture HMMs as compared with that produced from single-mixture HMMs.
A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
- IEICE - Trans. Inf. Syst
, 2007
"... This paper describes a novel parameter generation algorithm for the HMM-based speech synthesis. The conventional algorithm generates a trajectory of static features that maximizes an output probability of a parameter sequence consisting of the static and dynamic features from HMMs under an actual co ..."
Abstract
-
Cited by 41 (13 self)
- Add to MetaCart
This paper describes a novel parameter generation algorithm for the HMM-based speech synthesis. The conventional algorithm generates a trajectory of static features that maximizes an output probability of a parameter sequence consisting of the static and dynamic features from HMMs under an actual constraint between the two features. The generated trajectory is often excessively smoothed due to the statistical processing. Using the over-smoothed trajectory causes the muffled sound. In order to alleviate the over-smoothing effect, we propose the generation algorithm considering not only the output probability used for the conventional method but also that of a global variance (GV) of the generated trajectory. The latter probability works as a penalty for a reduction of the variance of the generated trajectory. A result of a perceptual evaluation demonstrates that the proposed method causes large improvements of the naturalness of synthetic speech. 1.
Statistical parametric speech synthesis
- in Proc. ICASSP, 2007
, 2007
"... This paper gives a general overview of techniques in statistical parametric speech synthesis. One of the instances of these techniques, called HMM-based generation synthesis (or simply HMM-based synthesis), has recently been shown to be very effective in generating acceptable speech synthesis. This ..."
Abstract
-
Cited by 37 (5 self)
- Add to MetaCart
This paper gives a general overview of techniques in statistical parametric speech synthesis. One of the instances of these techniques, called HMM-based generation synthesis (or simply HMM-based synthesis), has recently been shown to be very effective in generating acceptable speech synthesis. This paper also contrasts these techniques with the more conventional unit selection technology that has dominated speech synthesis over the last ten years. Advantages and disadvantages of statistical parametric synthesis are highlighted as well as identifying where we expect the key developments to appear in the immediate future. Index Terms — Speech synthesis, hidden Markov models 1. BACKGROUND With the increase in power and resources of computer technology, building natural sounding synthetic voices has progressed from a
An overview of Nitech HMMbased speech synthesis system for Blizzard Challenge 2005
- in Proc. Blizzard Workshop
, 2005
"... In the present paper, hidden Markov model (HMM) based speech synthesis system developed in Nagoya Institute of Technology (Nitech-HTS) for a competition of text-to-speech synthesis systems using the same speech databases, named Blizzard Challenge 2005, is described. We show an overview of the basic ..."
Abstract
-
Cited by 33 (4 self)
- Add to MetaCart
In the present paper, hidden Markov model (HMM) based speech synthesis system developed in Nagoya Institute of Technology (Nitech-HTS) for a competition of text-to-speech synthesis systems using the same speech databases, named Blizzard Challenge 2005, is described. We show an overview of the basic HMM-based speech synthesis system and then recent developments to the latest one such as STRAIGHT-based vocoding, hidden semi-Markov model (HSMM) based acoustic modeling, and parameter generation considering global variance are illustrated. Constructed voices can synthesize speech around 0.3 xRT (real time ratio) and their footprints are less than 2 MB. The listening test results show that performances of our systems are much better than we expected. 1.
Voice Conversion Based on Maximum Likelihood Estimation of Spectral Parameter Trajectory
"... In this paper, we describe a novel spectral conversion ..."
Abstract
-
Cited by 27 (21 self)
- Add to MetaCart
In this paper, we describe a novel spectral conversion
Adaptation of Pitch and Spectrum for HMM-Based Speech Synthesis Using MLLR
- Proc. ICASSP
, 2001
"... This paper describes a technique for synthesizing speech with an arbitrary speaker characteristics using speaker independent speech units, which we call "average voice" units. The technique is based on an HMM-based text-to-speech (TTS) system and MLLR adaptation algorithm. In the HMM-based TTS syste ..."
Abstract
-
Cited by 18 (8 self)
- Add to MetaCart
This paper describes a technique for synthesizing speech with an arbitrary speaker characteristics using speaker independent speech units, which we call "average voice" units. The technique is based on an HMM-based text-to-speech (TTS) system and MLLR adaptation algorithm. In the HMM-based TTS system, speech synthesis units are modeled by multi-space probability distribution (MSD) HMMs which can model spectrum and pitch simultaneously in a unified framework. We derive an extension of the MLLR algorithm to apply it to MSD-HMMs. We demonstrate that a few sentences uttered by a target speaker are sufficient to adapt not only voice characteristics but also prosodic features. Synthetic speech generated from adapted models using only four sentences is very close to that from speaker dependent models trained using 450 sentences. 1.
USTC system for Blizzard Challenge 2006 an improved HMM-based speech synthesis method
- in Blizzard Challenge Workshop
, 2006
"... This paper introduces the USTC speech synthesis system for Blizzard Challenge 2006. The HMM-based parametric synthesis approach was adopted for its convenience and effectiveness in building a new voice, especially for the nonnative developers. Some useful techniques were also integrated into our sys ..."
Abstract
-
Cited by 15 (9 self)
- Add to MetaCart
This paper introduces the USTC speech synthesis system for Blizzard Challenge 2006. The HMM-based parametric synthesis approach was adopted for its convenience and effectiveness in building a new voice, especially for the nonnative developers. Some useful techniques were also integrated into our system, such as minimum generation error (MGE) training, phone duration modeling and linear spectral pair (LSP) based formant enhancement. The evaluation results show that the proposed system is able to synthesize speech with high naturalness and intelligibility by using either full database or only ARCTIC subset. 1.
Hidden Semi-Markov Model Based Speech Synthesis
- in Proc. of ICSLP, 2004
, 2004
"... In the present paper, a hidden-semi Markov model (HSMM) based speech synthesis system is proposed. In a hidden Markov model (HMM) based speech synthesis system which we have proposed, rhythm and tempo are controlled by state duration probability distributions modeled by single Gaussian distributions ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
In the present paper, a hidden-semi Markov model (HSMM) based speech synthesis system is proposed. In a hidden Markov model (HMM) based speech synthesis system which we have proposed, rhythm and tempo are controlled by state duration probability distributions modeled by single Gaussian distributions. To synthesis speech, it constructs a sentence HMM corresponding to an arbitralily given text and determine state durations maximizing their probabilities, then a speech parameter vector sequence is generated for the given state sequence. However, there is an inconsistency: although the speech is synthesized from HMMs with explicit state duration probability distributions, HMMs are trained without them. In the present paper, we introduce an HSMM, which is an HMM with explicit state duration probability distributions, into the HMM-based speech synthesis system. Experimental results show that the use of HSMM training improves the naturalness of the synthesized speech.
Trajectory Modeling based on HMMs with the Explicit Relashinship between Static and Dynamic Features
, 2002
"... This paper shows that the HMM whose state output vector includes static and dynamic feature parameters can be reformulated as a trajectory model by imposing the explicit relationship between the static and dynamic features. The derived model, named trajectory HMM, can alleviate the limitations of HM ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
This paper shows that the HMM whose state output vector includes static and dynamic feature parameters can be reformulated as a trajectory model by imposing the explicit relationship between the static and dynamic features. The derived model, named trajectory HMM, can alleviate the limitations of HMMs: i) constant statistics within an HMM state and ii) independence assumption of state output probabilities. We also derive a Viterbi-type training algorithm for the trajectory HMM. A preliminary speech recognition experiment based on N-best rescoring demonstrates that the training algorithm can improve the recognition performance significantly even though the trajectory HMM has the same parameterization as the standard HMM.
An introduction of trajectory model into HMM-based speech synthesis
- in Proc. of ISCA SSW5
, 2004
"... ..."

