Results 1 - 10
of
14
A Semantics of Contrast and Information Structure for Specifying Intonation in Spoken Language Generation
, 1996
"... ..."
Specifying Intonation from Context for Speech Synthesis
- SPEECH COMMUNICATION
, 1994
"... This paper presents a theory and a computational implementation for generating prosodically appropriate synthetic speech in response to database queries. Proper distinctions of contrast and emphasis are expressed in an intonation contour that is synthesized by rule under the control of a grammar, a ..."
Abstract
-
Cited by 65 (14 self)
- Add to MetaCart
This paper presents a theory and a computational implementation for generating prosodically appropriate synthetic speech in response to database queries. Proper distinctions of contrast and emphasis are expressed in an intonation contour that is synthesized by rule under the control of a grammar, a discourse model, and a knowledge base. The theory is based on Combinatory Categorial Grammar, a formalism which easily integrates the notions of syntactic constituency, semantics, prosodic phrasing and information structure. Results from our current implementation demonstrate the system's ability to generate a variety of intonational possibilities for a given sentence depending on the discourse context.
Structure and ostension in the interpretation of discourse deixis
- Natural Language and Cognitive Processes
, 1991
"... This paper examines demonstrative pronouns used as deictics to refer to the interpretation of one or more clauses. Although this usage is frowned upon in style manuals (for example Strunk and White (1959) state that “This. The pronoun this, referring to the complete sense of a preceding sentence or ..."
Abstract
-
Cited by 61 (8 self)
- Add to MetaCart
This paper examines demonstrative pronouns used as deictics to refer to the interpretation of one or more clauses. Although this usage is frowned upon in style manuals (for example Strunk and White (1959) state that “This. The pronoun this, referring to the complete sense of a preceding sentence or clause, cannot always carry the load and so may produce an imprecise statement.”), it is nevertheless very common in written text. Handling this usage poses a problem for Natural Language Understanding systems. The solution I propose is based on distinguishing between what can be pointed to and what can be referred to by virtue of pointing. I argue that a restricted set of discourse segments yield what such demonstrative pronouns can point to and a restricted set of what Nunberg (1979) has called referring functions yield what they can refer to by virtue of that pointing.
Generating Contextually Appropriate Intonation
- In Proceedings of the 6th Conference of the European Chapter of the Association for Computational Linguistics
, 1993
"... One source of unnaturalness in the output of text-to-speech systems stems from the involvement of algorithmically generated default intonation contours, applied under minimal control from syntax and semantics. ..."
Abstract
-
Cited by 38 (12 self)
- Add to MetaCart
One source of unnaturalness in the output of text-to-speech systems stems from the involvement of algorithmically generated default intonation contours, applied under minimal control from syntax and semantics.
From Data to Speech: A General Approach
- Natural Language Engineering
, 2000
"... We present a data-to-speech system called D2S, which can be used for the creation of datato -speech systems in different languages and domains. The most important characteristic of a data-to-speech system is that it combines language and speech generation: language generation is used to produce a na ..."
Abstract
-
Cited by 21 (9 self)
- Add to MetaCart
We present a data-to-speech system called D2S, which can be used for the creation of datato -speech systems in different languages and domains. The most important characteristic of a data-to-speech system is that it combines language and speech generation: language generation is used to produce a natural language text expressing the system's input data, and speech generation is used to make this text audible. In D2S, this combination is exploited by using linguistic information available in the language generation module for the computation of prosody. This allows us to achieve a better prosodic output quality than can be achieved in a plain text-to-speech system. For language generation in D2S, the use of syntactically enriched templates is guided by knowledge of the discourse context, while for speech generation pre-recorded phrases are combined in a prosodically sophisticated manner. This combination of techniques makes it possible to create linguistically sound but efficient system...
Higher order unification and the interpretation of focus
- Linguistics and Philosophy
, 1997
"... 1.1 The range of focus phenomena.................... 3 ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
1.1 The range of focus phenomena.................... 3
Predicting intonational boundaries automatically from text: The ATIS domain
- In Proceedings. DARPA Speech and Natural Language Workshop
, 1991
"... Relating the intonational characteristics of an utter-ance to other features inferable from its text is impor-tant both for speech recognition and for speech syn-thesis. This work investigates techniques for predicting the location of intonational phrase boundaries in natural speech, through analyzi ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Relating the intonational characteristics of an utter-ance to other features inferable from its text is impor-tant both for speech recognition and for speech syn-thesis. This work investigates techniques for predicting the location of intonational phrase boundaries in natural speech, through analyzing a utterances from the DARPA Air Travel Information Service database. For statistical modeling, we employ Classification and Regression Tree (CART) techniques. We achieve success rates of just over 90%. 1
Contextual Aspects of Prosody in Monologue Generation
- In IJCAI Workshop on Context in Natural Language Processing
, 1995
"... This paper presents a theory concerning the dependence of intonational contours on discourse context, and expounds a framework for modeling contrastive stress, a phenomena whereby pitch accents are strategically placed based on their ability to discriminate among available discourse entities. T ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper presents a theory concerning the dependence of intonational contours on discourse context, and expounds a framework for modeling contrastive stress, a phenomena whereby pitch accents are strategically placed based on their ability to discriminate among available discourse entities. The intonation theory, which is presented with respect to a model of discourse coherence inspired by Centering Theory [Grosz et al., 1986] , is utilized to implement a monologue generation system which produces paragraph-length, prosodically appropriate, spoken descriptions of entities from a small knowledge base. Keywords: Natural Language Generation, Speech Synthesis 1 Introduction In spoken English, the prosodic characteristics of an utterance determine how the listener perceives its meaning and how the information it conveys is accommodated into a listener 's discourse model, a theoretical abstraction that relates various aspects of a discourse (e.g. propositions, concepts, obj...
Talking Heads: Physical, Linguistic and Cognitive Issues in Facial Animation
, 1995
"... Facial modeling and animation are increasingly receiving attention in the graphics and artificial intelligence (AI) research communities, both of which share the common goal of synthesizing believable, simulated agents. While computer graphics researchers have been primarily concerned with physical ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Facial modeling and animation are increasingly receiving attention in the graphics and artificial intelligence (AI) research communities, both of which share the common goal of synthesizing believable, simulated agents. While computer graphics researchers have been primarily concerned with physical and anatomical aspects of facial movements, AI researchers and cognitive scientists have focused on understanding and modeling the motivation behind those movements and expressions. The combination of these two avenues of research may eventually lead to agents that can interact autonomously, with humans or with each other, bearing faces that believably model the underlying meaning of the interactions. While such synthetic speaking faces are undoubtedly useful for cognitive research, their practical applications are also vast in number, encompassing such diverse fields as medicine, education, telecommunications and the entertainment industry. Facial expressions have fascinated mankind for ce...

