Results 1 - 10
of
29
Generating Expression in Synthesized Speech
, 1990
"... The document examines the proposal that affect can be reproduced in synthesized speech by imitating the effects of emotion in human speech. A program, the Affect Editor, was constructed to systematically vary the influence of the speech correlates of emo... ..."
Abstract
-
Cited by 34 (1 self)
- Add to MetaCart
The document examines the proposal that affect can be reproduced in synthesized speech by imitating the effects of emotion in human speech. A program, the Affect Editor, was constructed to systematically vary the influence of the speech correlates of emo...
Whistler: A Trainable Text-To-Speech System
- Proc. ICSLP
"... We introduce Whistler, a trainable Text-to-Speech (TTS) system, that automatically learns the model parameters from a corpus. Both prosody parameters and concatenative speech units are derived through the use of probabilistic learning methods that have been successfully used for speech recognition. ..."
Abstract
-
Cited by 27 (4 self)
- Add to MetaCart
We introduce Whistler, a trainable Text-to-Speech (TTS) system, that automatically learns the model parameters from a corpus. Both prosody parameters and concatenative speech units are derived through the use of probabilistic learning methods that have been successfully used for speech recognition. Whistler can produce synthetic speech that sounds very natural and resembles the acoustic and prosodic characteristics of the original speaker. The underlying technologies used in Whistler can significantly facilitate the process of creating generic TTS systems for a new language, a new voice, or a new speech style.
The Computational Processing of Intonational Prominence: A Functional Prosody Perspective
, 1997
"... Intonational prominence, or accent, is a fundamental prosodic feature that is said to contribute to discourse meaning. This thesis outlines a new, computational theory of the discourse interpretation of prominence, from a FUNCTIONAL PROSODY perspective. Functional prosody makes the following two imp ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Intonational prominence, or accent, is a fundamental prosodic feature that is said to contribute to discourse meaning. This thesis outlines a new, computational theory of the discourse interpretation of prominence, from a FUNCTIONAL PROSODY perspective. Functional prosody makes the following two important assumptions: first, there is an aspect of prominence interpretation that centrally concerns discourse processes, namely the discourse focusing nature of prominence; and second, the role of prominence in language processing in general, and discourse processing in particular, is not essentially separate from the processing of other grammatical, nonprosodic information. This thesis develops a computational theory of prominence interpretation by explaining how prominence serves as an inference cue in discourse processing. Prominence signals changes in the attentional status of entities in a discourse model, while nonprominence signals that the realized entities are already in discourse fo...
Characterisation of Rhythmic Patterns for Text-to-Speech Synthesis
, 1994
"... This article proposes an alternative rhythmic unit for the syllable: the inter-Perceptual Center group. This group is delimited by events which can be detected using only acoustic correlates [29]. The rhythmic patterns for French are described using this characterisation: we show that realisation of ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
This article proposes an alternative rhythmic unit for the syllable: the inter-Perceptual Center group. This group is delimited by events which can be detected using only acoustic correlates [29]. The rhythmic patterns for French are described using this characterisation: we show that realisation of accents is gradual over the trailed accentual group and that this gradual lengthening is needed for perception. A model of repartition of the IPCG duration among its segmental constituents incorporating automatic generation of pauses (emergence and duration) according to speech rate is then described.
A Computational Memory And Processing Model For Prosody
- In Proceedings of the Intl. Conf. on Spoken Language Processing
, 1998
"... This paper links prosody to the information in the text and how it is processed by the speaker. It describes the operation and output of Loq, a text-to-speech implementation that includes a model of limited attention and working memory. Attentional limitations are key. Varying the attentional parame ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
This paper links prosody to the information in the text and how it is processed by the speaker. It describes the operation and output of Loq, a text-to-speech implementation that includes a model of limited attention and working memory. Attentional limitations are key. Varying the attentional parameter in the simulations varies in turn what counts as given and new in a text, and therefore, the intonational contours with which it is uttered. Currently, the system produces prosody in three different styles: child-like, adult expressive, and knowledgeable. This prosody also exhibits differences within each style -- no two simulations are alike. The limited resource approach captures some of the stylistic and individual variety found in natural prosody. 1. INTRODUCTION Ask any lay person to imitate computer speech and you will be treated to an utterance delivered in melodic and rhythmic monotone, possibly accompanied by choppy articulation and a voice quality that is nasal and strained. ...
Recent improvements on microsoft’s trainable text-to-speech synthesizer: Whistler
- In ICASSP-97, volume II
, 1997
"... Whistler Text-to-Speech engine was designed so that we can automatically construct the model parameters from training data [7]. This paper will focus on recent improvements on prosody and acoustic modeling, which are all derived through the use of probabilistic learning methods. Whistler can produce ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Whistler Text-to-Speech engine was designed so that we can automatically construct the model parameters from training data [7]. This paper will focus on recent improvements on prosody and acoustic modeling, which are all derived through the use of probabilistic learning methods. Whistler can produce synthetic speech that sounds very natural and resembles the acoustic and prosodic characteristics of the original speaker. The underlying technologies used in Whistler can significantly facilitate the process of creating generic TTS systems for a new language, a new voice, or a new speech style. Whisper TTS engine supports Microsoft Speech API [10] and requires less than 3 MB of working memory. 1.
Prosodic Models, Automatic Speech Understanding, and Speech Synthesis: Towards the Common Ground
"... Automatic speech understanding and speech synthesis, two of the major speech processing applications, impose strikingly different constraints and requirements on prosodic models. The prevalent models of prosody and intonation fail to offer a unified solution to these conflicting constraints. As a co ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Automatic speech understanding and speech synthesis, two of the major speech processing applications, impose strikingly different constraints and requirements on prosodic models. The prevalent models of prosody and intonation fail to offer a unified solution to these conflicting constraints. As a consequence, prosodic models have been applied only occasionally in end-toend automatic speech understanding systems; in contrast, they have been applied extensively in speech synthesis systems. In this paper we want to discuss the reasons for this state of affairs as well as possible strategies to overcome the shortcomings of the use of prosodic modelling in automatic speech processing.
A Metrical Model Of Rhythm And Intonation For French Text-To-Speech Synthesis
- Proc. ESCA Workshop on Intonation
, 1997
"... This paper presents the prosodic component of a French text-to-speech synthesis system based on a metrical model of rhythm and intonation in which the prosodic well-formedness of utterances is governed by a set of rhythmic and morphosyntactic constraints. We first set out the theoretic basis of the ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
This paper presents the prosodic component of a French text-to-speech synthesis system based on a metrical model of rhythm and intonation in which the prosodic well-formedness of utterances is governed by a set of rhythmic and morphosyntactic constraints. We first set out the theoretic basis of the generation of prosodic levels that correspond to the metrical and tonal structure of utterances. Then, we outline the implementation in our system, and, in particular, the prosodic module that produces a metrical interpretation of phrase-level parsed text, by computing relative prominence levels and generating the F0 patterns and segmental duration. This approach produces high quality results for text-tospeech synthesis at a very minimal implementation cost, and enables a realistic modelling of the prosodic variability observed in real speech. 1. INTRODUCTION We have undertaken a modular research program on the metrical, morpho-syntactic and semantico-pragmatic constraints which govern the...

