• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

High-quality and flexible speech synthesis with segment selection and voice conversion (2003)

by T Toda
Add To MetaCart

Tools

Sorted by:
Results 1 - 4 of 4

Spectral Conversion Based on Maximum Likelihood Estimation Considering Global Variance of Converted Parameter

by Tomoki Toda, Keiichi Tokuda - Proc. ICASSP , 2005
"... This paper describes a novel spectral conversion method for the voice transformation. We perform spectral conversion between speakers using a Gaussian Mixture Model (GMM) on joint probability density of source and target features. A smooth spectral sequence can be estimated by applying maximum likel ..."
Abstract - Cited by 28 (15 self) - Add to MetaCart
This paper describes a novel spectral conversion method for the voice transformation. We perform spectral conversion between speakers using a Gaussian Mixture Model (GMM) on joint probability density of source and target features. A smooth spectral sequence can be estimated by applying maximum likelihood (ML) estimation using dynamic features to the GMM-based mapping. However, the degradation of the converted speech quality is still caused due to an over-smoothing of the converted spectra, which is inevitable in the conventional ML-based parameter estimation. In order to alleviate the over-smoothing, we propose an ML-based conversion taking account of the global variance of the converted parameter in each utterance. Experimental results show that the performance of the voice conversion can be improved by using the global variance information. Moreover, it is demonstrated that the proposed algorithm is more effective than spectral enhancement by postfiltering. 1.

Improving the Understandability of Speech Synthesis by Modeling Speech

by Brian Langner, Alan W Black - in noise,” in ICASSP05 , 2005
"... Although the quality of synthetic speech has increased dramatically in the past several years, many people still have difficulty understanding speech produced by even the highest quality synthesizers. We describe an approach to improve understandability of synthetic speech using speech in noise. Nat ..."
Abstract - Cited by 6 (1 self) - Add to MetaCart
Although the quality of synthetic speech has increased dramatically in the past several years, many people still have difficulty understanding speech produced by even the highest quality synthesizers. We describe an approach to improve understandability of synthetic speech using speech in noise. Natural speech in noise is a change in the style of speech that is used by people to improve the understandability to the listener when speaking in poor channel conditions. We show that altering the presentation of synthetic speech in similar ways also improves understandability. Further, we discuss methods of obtaining speech in noise for use in speech synthesis, as well as the results of an evaluation of several synthetic voices that “speak in noise”. 1. BACKGROUND Despite vast improvements in the quality of synthetic speech, many people still find it difficult to understand, even when the best synthesizers are used. The CMU Let’s Go! project [1] is developing techniques to improve spoken dialog systems for non-native speakers and the elderly; specifically, improving the quality of spoken output to make it more understandable by those groups, as well as the general population. There are a number of factors that have an impact on understandability, including lexical choice, prosody, and spectral qualities of the speech itself. In an earlier experiment which used recorded natural speech [2], it was found that understandability improved when the speech was delivered as if the listener had said, “I can’t hear you, can you say that again. ” This change in speaking style can be elicited from people by having them speak in a noisy room. In order to reliably elicit such a delivery style – speech spoken in poor channel conditions – as well as obtain clean recordings for use in speech analysis and synthesis, we used the method that produced the CMU SIN database for speech synthesis [3] to record a small (30 sentence) database of speech in noise. The CMU SIN database is publicly available

An examination of speech in noise and its effect on understandability for natural and synthetic speech

by Brian Langner, Alan W Black - Language Technologies Institute, CMU, Pittsburgh PA , 2004
"... www.lti.cs.cmu.edu ..."
Abstract - Cited by 5 (2 self) - Add to MetaCart
www.lti.cs.cmu.edu

Using Articulatory Position Data in Voice Transformation

by Arthur R. Toth, Alan W Black
"... Articulatory position data is information about the location of various articulators in the vocal tract. One form of it has been made freely available in the MOCHA database [1]. This data is interesting in that it provides direct information on the production of speech, but there is the question of ..."
Abstract - Cited by 4 (2 self) - Add to MetaCart
Articulatory position data is information about the location of various articulators in the vocal tract. One form of it has been made freely available in the MOCHA database [1]. This data is interesting in that it provides direct information on the production of speech, but there is the question of whether it actually provides information beyond what can be derived from the audio signal, which is much easier to collect. Although there has been some success in improving small-scale speech recognition and in demonstrating mappings between articulatory positions and spectral features of the audio signal, there are many problems to which this data has not been applied. This work investigates the possibility of using articulatory position data to improve voice transformation, which is the process of making speech from one person sound as if it had been spoken by another. After further investigation, it appears to be difficult to use articulatory position data to improve voice transformation using state-of-the-art voice transformation techniques as we only had a few positive results across a range of experiments. To achieve these results, it was necessary to modify our baseline voice transformation approach and/or consider features derived from the articulatory positions. 1.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University