• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

A hidden Markov-model-based trainable speech synthesizer (1999)

by R.E. Donovan, P.C. Woodland
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 13
Next 10 →

Rare Events and Closed Domains: Two Delicate Concepts in Speech Synthesis

by Bernd Möbius , 2003
"... One of the most serious challenges for speech synthesis is the systematic treatment of events in language and speech that are known to have low frequencies of occurrence. The problems that extremely unbalanced frequency distributions pose for rulebased or data-driven models are often underestimated ..."
Abstract - Cited by 18 (6 self) - Add to MetaCart
One of the most serious challenges for speech synthesis is the systematic treatment of events in language and speech that are known to have low frequencies of occurrence. The problems that extremely unbalanced frequency distributions pose for rulebased or data-driven models are often underestimated or even unrecognized. This paper discusses these problems in the contexts of morphology, syllabification, segmental duration and unit selection, and also suggests possible solutions. The design of databases for restricted application domains, where the distributions of linguistic and phonetic factors are known, is also critically reviewed.

Corpus-Based Speech Synthesis: Methods and Challenges

by Bernd Möbius
"... Corpus-based approaches to speech synthesis have been advocated to overcome the limitations of concatenative synthesis from a xed acoustic unit inventory. The frequency of unit concatenations in, e.g., diphone synthesis has been argued to contribute to the perceived lack of naturalness of synthetic ..."
Abstract - Cited by 11 (0 self) - Add to MetaCart
Corpus-based approaches to speech synthesis have been advocated to overcome the limitations of concatenative synthesis from a xed acoustic unit inventory. The frequency of unit concatenations in, e.g., diphone synthesis has been argued to contribute to the perceived lack of naturalness of synthetic speech. The key idea of corpus-based synthesis, or unit selection, is to use an entire speech corpus as the acoustic inventory and to select at run-time from this corpus the longest available strings of phonetic segments that match a sequence of target speech sounds in the utterance to be synthesized, thereby minimizing the number of concatenations and reducing the need for signal processing. This paper reviews the assumptions underlying this synthesis strategy and the dierent approaches to unit selection, as well as the major challenges encountered by corpus-based methods. One of the biggest problems to date is the relative weighting of acoustic distance measures. We further argue agains...

A Characterization of Speech Recognition on Modern Computer Systems

by Kartik Agaram, Stephen W. Keckler, Doug Burger , 2001
"... In this paper we describe and characterize the speech recognition process, and assess the suitability of current microprocessors and memory systems for running speech recognition applications. We use representative benchmark applications — RASTA [7] to characterize the signalprocessing on the front ..."
Abstract - Cited by 10 (0 self) - Add to MetaCart
In this paper we describe and characterize the speech recognition process, and assess the suitability of current microprocessors and memory systems for running speech recognition applications. We use representative benchmark applications — RASTA [7] to characterize the signalprocessing on the front end, and SPHINX [13] for the graph search on the back end. Recognition time is dominated by the back end, which substantially exercises the memory system and exhibits low levels of instruction-level parallelism (ILP). As a result, SPHINX yields an average instructions per cycle (IPC) of 0.64 on a simulated 4-issue out-of-order microprocessor. We identify intelligent layout and thread-level parallelization as the primary methods to improve throughput, showing upper bounds on the performance improvements that these methods can achieve.

High-Quality and Flexible Speech Synthesis with Segment Selection and Voice Conversion

by Tomoki Toda, Tomoki Toda , 2003
"... Text-to-Speech (TTS) is a useful technology that converts any text into a speech signal. It can be utilized for various purposes, e.g. car navigation, announcements in railway stations, response services in telecommunications, and e-mail reading. Corpus-based TTS makes it possible to dramatically im ..."
Abstract - Cited by 8 (1 self) - Add to MetaCart
Text-to-Speech (TTS) is a useful technology that converts any text into a speech signal. It can be utilized for various purposes, e.g. car navigation, announcements in railway stations, response services in telecommunications, and e-mail reading. Corpus-based TTS makes it possible to dramatically improve the naturalness of synthetic speech compared with the early TTS. However, no general-purpose TTS has been developed that can consistently synthesize su#- ciently natural speech. Furthermore, there is not yet enough flexibility in corpusbased TTS.

The Impact Of Speech Recognition On Speech Synthesis

by Mari Ostendorf, Ivan Bulyko , 2002
"... Speech synthesis has changed dramatically in the past few years to have a corpus-based focus, borrowing heavily from advances in automatic speech recognition. In this paper, we survey technology in speech recognition systems and how it translates (or doesn't translate) to speech synthesis systems. W ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
Speech synthesis has changed dramatically in the past few years to have a corpus-based focus, borrowing heavily from advances in automatic speech recognition. In this paper, we survey technology in speech recognition systems and how it translates (or doesn't translate) to speech synthesis systems. We further speculate on future areas where ASR may impact synthesis and vice versa.

Flexible Speech Synthesis Using Weighted Finite State Transducers

by Ivan Bulyko, Mari Ostendorf, Mari Ostendorf, Alex Acero , 1996
"... The main focus of this thesis is on improving the quality of concatenative speech synthesis by taking advantage of the natural (allowable) variability in spoken language, namely, the fact that there are multiple ways of uttering a given sentence and there are several word sequences that can represen ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
The main focus of this thesis is on improving the quality of concatenative speech synthesis by taking advantage of the natural (allowable) variability in spoken language, namely, the fact that there are multiple ways of uttering a given sentence and there are several word sequences that can represent a given concept. An architecture for speech generation for constrained domain applications is proposed that tightly integrates language generation and speech synthesis, allowing the choice of words and desired intonation in the system's response to be optimized jointly with the speech output quality. Experiments with a travel planning dialog system have demonstrated that by expanding the space of candidate responses and possible prosodic realizations we achieve higher quality speech output.

Workload characterization of biometric applications on pentium 4 microarchitecture

by Chang-burm Cho, Asmita V. Ch, Yue Li, Tao Li - in IISWC , 2005
"... Biometric computing is a technique that uses physiological and behavioral characteristics of persons to identify and authenticate individuals. Due to the increasing demand on security, privacy and anti-terrorism, biometric applications represent the rapidly growing computing workloads. However, very ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Biometric computing is a technique that uses physiological and behavioral characteristics of persons to identify and authenticate individuals. Due to the increasing demand on security, privacy and anti-terrorism, biometric applications represent the rapidly growing computing workloads. However, very few results on the execution characteristics of these applications on the state-of-the-art microprocessor and memory systems have been published so far. This paper proposes a suite of biometric applications and reports the results of a biometric workload characterization effort, focusing on various architecture features. To understand the impacts and implications of biometric workloads on the processor and memory architecture design, we contrast the characteristics of biometric workloads and the widely used SPEC 2000 integer benchmarks. Our experiments show that biometric applications typically show small instruction footprint that can fit in the L1 instruction cache. The loads and stores account for more than 50 % of the dynamic instructions. This indicates that biometric applications are data-centric in nature. Although biometric applications work across large-scale datasets to identify matched patterns, the active working sets of these workloads are usually small. As a result, prefetching and large L2 cache effectively handle the data footprints of a majority of the studied benchmarks. Branch misprediction rate is less than 4 % on all studied workloads. The IPC of the studied benchmarks ranges from 0.13 to 0.77 indicates that out-of-order superscalar execution is not quite efficient. The developed biometric benchmark suite (BMW) and input data sets are freely available and can be downloaded from

Linguistic and Mixed Excitation Improvements on a HMM-based speech synthesis for Castilian Spanish

by Xavier Gonzalvo, Joan Claudi Socoró, Ignasi Iriondo, Carlos Monzo, Elisa Martínez
"... Hidden Markov Models based text-to-speech (HMM-TTS) synthesis is one of the techniques for generating speech from trained statistical models where spectrum and prosody of basic speech units are modelled altogether. This paper presents the advances in our Spanish HMM-TTS and a perceptual test is cond ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Hidden Markov Models based text-to-speech (HMM-TTS) synthesis is one of the techniques for generating speech from trained statistical models where spectrum and prosody of basic speech units are modelled altogether. This paper presents the advances in our Spanish HMM-TTS and a perceptual test is conducted to compare it with an extended PSOLA-based concatenative (E-PSOLA) system. The improvements have been performed on phonetic information and contextual factors according to the Castilian Spanish language and speech generation using a mixed excitation (ME) technique. The results show the preference of the new HMM-TTS system in front of the previous system and a better MOS in comparison with a real E-PSOLA in terms of acceptability, intelligibility and stability. 1.

Prosody Beyond Fundamental Frequency

by Greg Kochanski, R. Meyer, P. Augurzky, I. Mleinek, N. Richter, J. Schließer
"... Most of this book is concerned with tactical details of experiments: relatively detailed prescriptions for techniques after the goals of the experiment are already decided. In contrast, this chapter is intended to help with the broader questions of what to measure and what experimental strategies to ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Most of this book is concerned with tactical details of experiments: relatively detailed prescriptions for techniques after the goals of the experiment are already decided. In contrast, this chapter is intended to help with the broader questions of what to measure and what experimental strategies to use or

Automatic Classification of Hand Drawn Geometric Shapes using Constructional Sequence Analysis

by R. M. Guest, S. Chindaro, M. C. Fairhurst, J. M. Potter
"... A method for automatically assessing the constructional sequence from a neuropsychological drawing task using Hidden Markov Models is presented. We also present a method of extracting and identifying the position of individual pen strokes relating to individual sides of a shape within a drawing to f ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
A method for automatically assessing the constructional sequence from a neuropsychological drawing task using Hidden Markov Models is presented. We also present a method of extracting and identifying the position of individual pen strokes relating to individual sides of a shape within a drawing to form training and testing sequences. Our results from two experiments using data from patients with visuo-spatial neglect show the HMM classifier is able to generalise on incorrectly extracted sequences and obtain a diagnostic classification which can be used alongside other forms of conventional assessment. 1.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University