Results 1 -
5 of
5
Learning to speak. Sensori-motor control of speech movements
, 1998
"... This paper shows how an articulatory model, able to produce acoustic signals from articulatory motion, can learn to speak, i.e. coordinate its movements in such a way that it utters meaningful sequences of sounds belonging to a given language. This complex learning procedure is accomplished in four ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
This paper shows how an articulatory model, able to produce acoustic signals from articulatory motion, can learn to speak, i.e. coordinate its movements in such a way that it utters meaningful sequences of sounds belonging to a given language. This complex learning procedure is accomplished in four major steps: (a) a babbling phase, where the device builds up a model of the forward transforms, i.e. the articulatory-to-audio-visual mapping; (b) an imitation stage, where it tries to reproduce a limited set of sound sequences by audio-visual-to-articulatory inversion; (c) a "shaping" stage, where phonemes are associated with the most efficient available sensori-motor representation; and finally, (d) a "rhythmic" phase, where it learns the appropriate coordination of the activations of these sensori-motor targets.
Combining MRI, EMA and EPG measurements in a three-dimensional tongue model
- Speech Communication
, 2003
"... A three-dimensional (3D) tongue model has been developed using MR images of a reference subject producing 44 artificially sustained Swedish articulations. Based on the difference in tongue shape between the articulations and a reference, the six linear parameters jaw height, tongue body, tongue dors ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
A three-dimensional (3D) tongue model has been developed using MR images of a reference subject producing 44 artificially sustained Swedish articulations. Based on the difference in tongue shape between the articulations and a reference, the six linear parameters jaw height, tongue body, tongue dorsum, tongue tip, tongue advance and tongue width were determined using an ordered linear factor analysis controlled by articulatory measures. The first five factors explained 88 % of the tongue data variance in the midsagittal plane and 78 % in the 3D analysis. The six-parameter model is able to reconstruct the modelled articulations with an overall mean reconstruction error of 0.13 cm, and it specifically handles lateral differences and asymmetries in tongue shape. In order to correct articulations that were hyperarticulated due to the artificial sustaining in the magnetic resonance imaging (MRI) acquisition, the parameter values in the tongue model were readjusted based on a comparison of virtual and natural linguopalatal contact patterns, collected with electropalatography (EPG). Electromagnetic articulography (EMA) data was collected to control the kinematics of the tongue model for vowel-fricative sequences and an algorithm to handle surface contacts has been implemented, preventing the tongue from protruding through the palate and teeth. Ó 2002 Elsevier B.V. All rights reserved. Resume
Sensori-Motor Control of Speech Movements
- 4 th Speech Production Seminar, ESCA
, 1996
"... This paper shows how an articulatory model, able to produce acoustic signals from articulatory motion, can learn to speak, i.e. coordinate its movements in such a way that it utters meaningful sequences of sounds belonging to a given language. This complex learning procedure is accomplished in four ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
This paper shows how an articulatory model, able to produce acoustic signals from articulatory motion, can learn to speak, i.e. coordinate its movements in such a way that it utters meaningful sequences of sounds belonging to a given language. This complex learning procedure is accomplished in four major steps: (a) a babbling phase, where the device builds up a model of the forward transforms, i.e. the articulatoryto -audio-visual mapping; (b) an imitation stage, where it tries to reproduce a limited set of sound sequences by audio-visual-toarticulatory inversion; (c) a "shaping" stage, where phonemes are associated with the most efficient sensori-motor representation; and finally, (d) a "rhythmic" phase, where it learns the appropriate coordination of the activations of these sensori-motor targets. Résumé Cet article montre comment un modèle articulatoire, doté de la capacité de produire des sons à partir de déplacements de ses articulateurs, peut apprendre à parler, c'est- à-dire co...
Articulatory Synthesis From X-Rays And Inversion For An Adaptive Speech Robot
- In Proceedings of ICSLP'96: The Fourth International Conference on Spoken Language Processing
, 1996
"... This paper describes a speech robotic approach to articulatory synthesis. An anthropomorphic speech robot has been built, based on a real reference subject's data. This speech robot, called the Articulotron, has a set of relevant degrees of freedom for speech articulators, jaw, tongue, lips, and lar ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper describes a speech robotic approach to articulatory synthesis. An anthropomorphic speech robot has been built, based on a real reference subject's data. This speech robot, called the Articulotron, has a set of relevant degrees of freedom for speech articulators, jaw, tongue, lips, and larynx. The associated articulatory model has been elaborated from cineradiographic midsagittal profiles recorded in synchrony with front lips views; the model of noise source for fricative excitation has been derived from acoustic and aerodynamic measurements on the same reference subject. In a first phase, the Articulotron has been used to perform the copy synthesis of the vowels, fricative and plosive consonants in the X-ray corpus. This allows to assess the performance of the Articulotron in producing fairly high quality speech, and provides a reference against which other attempts of articulatory synthesis can be compared. In a second phase, the Articulotron has be used to recover articulatory gestures from audio-visual speech prototypes. At the present stage, a gradient descent algorithm is used to learn the articulatory trajectories of the robot by optimisation, starting from the formant trajectories and the knowledge of constraints for the consonantal constriction or closure, in order to mimic the original VCV audio-visual sequences. The adaptive skill of the robot is demonstrated through articulator perturbation experiments and through the elaboration of relevant strategies in the hyper/hypo speech paradigm. A video tape will demonstrate an animation of the Articulotron, displaying the jaw, the tongue and the lips, for various examples of adaptive articulatory synthesis.
Hearing By Eyes Thanks To The "Labiophone": Exchanging Speech Movements
"... We present here the "labiophone", a virtual system for audio-visual speech communication. A clone of the speaker is animated at distance by articulatory movements extracted from the speaker's image and captured thanks to a video-camera centered on the speaker's face. The clone consists of a mesh dri ..."
Abstract
- Add to MetaCart
We present here the "labiophone", a virtual system for audio-visual speech communication. A clone of the speaker is animated at distance by articulatory movements extracted from the speaker's image and captured thanks to a video-camera centered on the speaker's face. The clone consists of a mesh driven by a few articulatory parameters and clothed by blended textures. The characteristics of the articulatory model and the textures blending are transmitted at the initiation of the dialog. Then only articulatory parameters are transmitted at a very low bit rate through the telecommunication or web network. Preliminary evaluation of such a system is presented below. Keywords: speech, facial animation, articulatory modelling, movement estimation, texture mapping. 1. INTRODUCTION Speech communication is multi-modal: if auditory and visual perception provide complementary information about the speaker and its emotional state, they collaborate intimately to enhance the intelligibility of the ...

