Results 1 -
5 of
5
A hidden Markov-model-based trainable speech synthesizer
, 1999
"... This paper presents a new approach to speech synthesis in which a set of cross-word decision-tree state-clustered context-dependent hidden Markov models are used to define a set of subphone units to be used in a concatenation synthesizer. The models, trees, waveform segments and other parameters ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
This paper presents a new approach to speech synthesis in which a set of cross-word decision-tree state-clustered context-dependent hidden Markov models are used to define a set of subphone units to be used in a concatenation synthesizer. The models, trees, waveform segments and other parameters representing each clustered state are obtained completely automatically through training on a 1 hour single-speaker continuous-speech database. During synthesis the required utterance, specified as a string of words of known phonetic pronounciation, is generated as a sequence of these clustered states using a TD-PSOLA waveform concatenation synthesizer. The system produces speech which, though in a monotone, is both natural sounding and highly intelligible. A Modified Rhyme Test conducted to measure segmental intelligibility yielded a 50% error rate. The speech produced by the system mimics the voice of the speaker used to record the training database. The system can be retrained on...
Segment Pre-Selection In Decision-Tree Based Speech Synthesis Systems
, 2000
"... Corpus based approaches to unit selection for concatenative speech synthesis have become popular in recent years due to their improved sensitivity to unit context over their more simple predecessors. These systems usually make use of large speech databases and employ sophisticated search algorithm ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Corpus based approaches to unit selection for concatenative speech synthesis have become popular in recent years due to their improved sensitivity to unit context over their more simple predecessors. These systems usually make use of large speech databases and employ sophisticated search algorithms to determine the optimal unit sequence to use to synthesise each sentence. For many applications it is not possible to have the entire database, which may be as large as several hundred megabytes, available to the synthesiser at runtime. What is required is some form of off-line pre-selection algorithm to determine which subset of the database enables the highest quality speech synthesis to be performed for a given runtime system size. This paper describes a pre-selection algorithm developed at IBM for use with decision-tree-based concatenative speech synthesisers. 1. INTRODUCTION In recent years corpus based approaches to unit selection for concatenative speech synthesis have become inc...
Speech Processing with Linear and Neural Network Models
, 1996
"... ion, for imposing continuity between models of adjacent speech segments, and learning rate adaptation, for improving back-propagation training, are discussed. For synthesising real speech utterances, an audio tape demonstrates that ARX models produce the highest quality synthetic speech and that the ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
ion, for imposing continuity between models of adjacent speech segments, and learning rate adaptation, for improving back-propagation training, are discussed. For synthesising real speech utterances, an audio tape demonstrates that ARX models produce the highest quality synthetic speech and that the quality is maintained when pitch modifications are applied. The second part of the dissertation studies the operation of recurrent neural networks in classifying patterns of correlated feature vectors. Such patterns are typical of speech classification tasks. The operation of a hidden node with a recurrent connection is explained in terms of a decision boundary which changes position in feature space. The feedback is shown to delay switching from one class to another and to smooth output decisions for sequences of feature vectors from the same class. For networks trained with constant class targets, a sequence of feature vectors from the same class tends to drive the operation of hidden nod
and
, 1989
"... Abstract. Previous work has shown that it is possible to train a multi-layer perceptron to estimate the voice fundamental period (Tx) for multiple speakers in the presence of high levels of background noise. The algorithm has been implemented in real-time on a TMS320C25 based development system. A p ..."
Abstract
- Add to MetaCart
Abstract. Previous work has shown that it is possible to train a multi-layer perceptron to estimate the voice fundamental period (Tx) for multiple speakers in the presence of high levels of background noise. The algorithm has been implemented in real-time on a TMS320C25 based development system. A prototype pocket-sized portable device has been constructed and the real-time software transferred to it. This will provide the basis for a new generation of signal processing hearing aids for the profoundly and totally deaf. Power supply current is sufficiently low for battery operation for periods of 12 hours between charges. The basic algorithm has been adapted to provide the higher time resolution which will make it applicable to a wide range of other applications. Zusammenfassung. Wie fruhere Arbeiten gezeigt haben, ist es moglich ein mehrschichtiges Perzeptron darauf zu trainieren, die Grundperiode von Sprachsignalen sprecherunabhangig hei hohen Gerauschpegeln zu bestimmen. Der Algorithmus wurde in Echtzeit auf einem Signalprozessorsystem TMS320C25 entwickelt. Diese Software wurde auf einen tragbaren Prototypen in Taschenformat portiert. Dieser Entwurf liefert die Basis fur eine neue Generation signalverarbeitender Horhilfen fur schwerst horgeschadigte und vollig taube Patienten. Der Energiebedarf ist hinreichend klein, um einen 12-stundigen Akkubereich zu errnoglichen. Der Grundalgorithmus wurde dahingehend erweitert, eine bessere Zeitauflosung
PROSICE: A spoken English database for prosody research
, 1996
"... Introduction Prosody - the study of the intonation, stress and rhythm of speech - is now assuming a greater importance in phonetics, phonology and speech technology than ever before. Once regarded as subservient to studies of segmental structure, it is now being seen as providing the `framework' wh ..."
Abstract
- Add to MetaCart
Introduction Prosody - the study of the intonation, stress and rhythm of speech - is now assuming a greater importance in phonetics, phonology and speech technology than ever before. Once regarded as subservient to studies of segmental structure, it is now being seen as providing the `framework' which holds different levels of phonetic description together. The recent past has seen novel views of the phonology of intonation (e.g. Pierrehumbert, 1980), a new interest in prosodic phrase structure and prominence (e.g. Liberman and Prince, 1977) and the rise of autosegmental or non-linear accounts of phonetic description which integrate metrical structure with phonetic substance (e.g. Clements and Keyser, 1983). The role of prosody is also changing in speech synthesis and recognition. In speech synthesis, the success of concatenative systems - whereby recorded segments of speech are glued together to make novel utterances - has meant that the key issues have changed from segmental to su

