Results 1 - 10
of
40
Functional Phonology -- Formalizing the interactions between articulatory and perceptual drives
, 1998
"... ..."
A theoretical investigation of reference frames for the planning of speech movements
- Psychological Review
, 1998
"... Running title: Speech reference frames Does the speech motor control system utilize invariant vocal tract shape targets of any kind when producing phonemes? We present a four-part theoretical treatment favoring models whose only invariant targets are auditory perceptual targets over models that posi ..."
Abstract
-
Cited by 39 (21 self)
- Add to MetaCart
Running title: Speech reference frames Does the speech motor control system utilize invariant vocal tract shape targets of any kind when producing phonemes? We present a four-part theoretical treatment favoring models whose only invariant targets are auditory perceptual targets over models that posit invariant constriction targets. When combined with earlier theoretical and experimental results (Guenther, 1995a,b; Perkell et al., 1993; Savariaux et al., 1995a,b), our hypothesis is that, for vowels and semi-vowels at least, the only invariant targets of the speech production process are multidimensional regions in auditory perceptual space. These auditory perceptual target regions are hypothesized to arise during development as an emergent property of neural map formation in the auditory system. Furthermore, speech movements are planned as trajectories in auditory perceptual space. These trajectories are then mapped into articulator movements through a neural mapping that allows motor equivalent variability in constriction locations and degrees when needed, but maintains approximate constriction invariance for a given sound in most instances. These hypotheses are illustrated and substantiated using computer simulations of the DIVA model of speech acquisition and production. Finally, we pose several difficult challenges to proponents of constriction theories based on this theoretical treatment.
A Modeling Framework for Speech Motor Development and Kinematic Articulator Control
, 1995
"... This paper presents three hypotheses that are central to a computational model of speech production: (1) Sound targets take the form of regions, rather than points, in a planning reference frame. (2) The planning frame is more acoustic-like than the frames used in most recent models. (3) A direction ..."
Abstract
-
Cited by 17 (11 self)
- Add to MetaCart
This paper presents three hypotheses that are central to a computational model of speech production: (1) Sound targets take the form of regions, rather than points, in a planning reference frame. (2) The planning frame is more acoustic-like than the frames used in most recent models. (3) A direction-to-direction mapping transforms planned trajectories into articulator movements. These hypotheses are supported by experimental data and simulation results. 1. INTRODUCTION: REFERENCE FRAMES AND MAPPINGS It is useful to think of speech production as the process of formulating a trajectory within a planning reference frame to pass through a sequence of targets, each corresponding to a different phoneme in the string being produced. This trajectory can then be mapped into a set of articulator movements that carry out the planned trajectory. The articulator movements are defined within an articulatory reference frame that relates closely to the musculature or primary movement degrees of free...
Neural modeling and imaging of the cortical interactions underlying syllable production
- Brain and Language
, 2006
"... Keywords: speech production; model; fMRI; Broca’s area; premotor cortex; motor cortex; speech acquisition; sensorimotor learning; neural transmission delays This paper describes a neural model of speech acquisition and production that accounts for a wide range of acoustic, kinematic, and neuroimagin ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
Keywords: speech production; model; fMRI; Broca’s area; premotor cortex; motor cortex; speech acquisition; sensorimotor learning; neural transmission delays This paper describes a neural model of speech acquisition and production that accounts for a wide range of acoustic, kinematic, and neuroimaging data concerning the control of speech movements. The model is a neural network whose components correspond to regions of the cerebral cortex and cerebellum, including premotor, motor, auditory, and somatosensory cortical areas. Computer simulations of the model verify its ability to account for compensation to lip and jaw perturbations during speech. Specific anatomical locations of the model’s components are estimated, and these estimates are used to simulate fMRI experiments of simple syllable production. 1 1
Articulatory Tradeoffs Reduce Acoustic Variability during American English /r/ Production
, 1999
"... The American English phoneme /r/ has long been associated with large amounts of articulatory variability during production. This paper investigates the hypothesis that the articulatory variations used by a speaker to produce /r/ in different contexts exhibit systematic tradeoffs, or articulatory tra ..."
Abstract
-
Cited by 9 (7 self)
- Add to MetaCart
The American English phoneme /r/ has long been associated with large amounts of articulatory variability during production. This paper investigates the hypothesis that the articulatory variations used by a speaker to produce /r/ in different contexts exhibit systematic tradeoffs, or articulatory trading relations, that act to maintain a relatively stable acoustic signal despite the large variations in vocal tract shape. Acoustic and articulatory recordings were collected from seven speakers producing /r/ in five phonetic contexts. For every speaker, the different articulator configurations used to produce /r/ in the different phonetic contexts showed systematic tradeoffs, as evidenced by significant correlations between the positions of transducers mounted on the tongue. Analysis of acoustic and articulatory variabilities revealed that these tradeoffs act to reduce acoustic variability, thus allowing relatively large contextual variations in vocal tract shape for /r/ without seriously ...
Emergence of Sound Systems Through Self-Organisation
, 1998
"... this paper tries to explain the emergence and structure of systems of speech sounds. It investigates how a coherent system of speech sounds can emerge in a popu- lation of agents and how the constraints under which the system emerges impose structure through self-organisation. If self-organisation c ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
this paper tries to explain the emergence and structure of systems of speech sounds. It investigates how a coherent system of speech sounds can emerge in a popu- lation of agents and how the constraints under which the system emerges impose structure through self-organisation. If self-organisation can explain structure, then innate and biologi- cally evolved mechanisms are not necessary. This effectively decreases the number of linguistic phenomena that have to be explained by biological evolution
A Parametric Three-Dimensional Model Of The Vocal-Tract Based On MRI Data
- in Proc ICASSP
, 1997
"... In this paper, 24 three-dimensional (3D) vocal-tract (VT) shapes extracted from MRI data are used to derive a parametric model for the vocal-tract. The method is as follows: first, each 3D VT shape is sampled using a semi-cylindrical grid whose position is determined by reference points based on VT ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
In this paper, 24 three-dimensional (3D) vocal-tract (VT) shapes extracted from MRI data are used to derive a parametric model for the vocal-tract. The method is as follows: first, each 3D VT shape is sampled using a semi-cylindrical grid whose position is determined by reference points based on VT anatomy. After that, the VT projections onto each plane of the grid are represented by their two main components obtained via principal component analysis (PCA). PCA is once again used to parametrize the sequences of coefficients that represent the sections along the tract. It was verified that the first four components can explain about 90% of the total variance of the observed shapes. Following this procedure, 3D VT shapes are approximated by linear combinations of four 3D basis functions. Finally, it is shown that the four parameters of the model can be estimated from VT midsagittal profiles. 1. INTRODUCTION Vocal-tract (VT) models play important roles in the investigation of articulator...
3D Models of the Lips for Realistic Speech Animation
- In Computer Animation'96
, 1996
"... 3D models of the lips have been developed in the framework of an audiovisual articulatory speech synthetizer. Unlike most of the regions of the human face, the lips are essentially characterized by their border contours. The internal and external contours of the vermilion zone can be fitted by means ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
3D models of the lips have been developed in the framework of an audiovisual articulatory speech synthetizer. Unlike most of the regions of the human face, the lips are essentially characterized by their border contours. The internal and external contours of the vermilion zone can be fitted by means of algebraic equations. The coe#cients of these equations must be controlled so that the lip shape can be adapted to various speakers conformations and to any speech gesture. To reach this goal, a 3D model of the lips has been worked out from geometrical analysis of the natural lips of a French speaker. Our lip model was developed to adjust a set of continuous functions best fitting the contours of 22 reference lip shapes. Only five parameters are necessary to predict all the equations of the contours of the lip model. From this model, a volumic model based on implicit surfaces was also developped to take in account lip contact. 1 Introduction Over the last score years, many researchers at...
From Form to Formation of Phonetic Structures: An evolutionary computing perspective
, 1996
"... The purpose of this paper is to explain how evolutionary computing and machine learning open new perspectives in Phonetics and Speech Science. Using these techniques, it is possible to simulate the emergence and the evolution of a common language in a society of speech robots. Experimental results s ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
The purpose of this paper is to explain how evolutionary computing and machine learning open new perspectives in Phonetics and Speech Science. Using these techniques, it is possible to simulate the emergence and the evolution of a common language in a society of speech robots. Experimental results show how simple local rules of interaction between robots may explain some of the universal characteristics of the phonological structure of world's languages. On going work aiming to answer more complex questions, such as language evolution or dialect apparition, is presented. 1 INTRODUCTION Languages have very specific forms, their phonetic structures are not completely arbitrary. Typological studies have shown that the sound systems of world's languages exhibit systematic structural characteristics. For example, the vowel /i/ 1 is present in 87% of world's languages, /a/ in 87% and /u/ in 82% [13, 18]. As regards the consonants, some regularities are also found, for example, the frequen...
A Method to Combine Acoustic and Morphological Constraints in the Speech Production Inverse Problem
, 1995
"... . This paper approaches the articulatory-to-acoustic speech production inverse case. A framework based on an explicit combination of vocal-tract morphological and acoustic constraints is proposed. The solution is based on a Fourier analysis of the vocal-tract log-area function: the relationship betw ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
. This paper approaches the articulatory-to-acoustic speech production inverse case. A framework based on an explicit combination of vocal-tract morphological and acoustic constraints is proposed. The solution is based on a Fourier analysis of the vocal-tract log-area function: the relationship between the log-area Fourier cosine coefficients and the corresponding formants is used to formulate an acoustic constraint. The same set of coefficients is then used to express a morphological constraint. This representation of both acoustic and morphological constraints in the same parameter space allows an efficient solution for the inverse problem. The basis of the acoustic constraint formulation was first proposed by Mermelstein (1967). However, at that time, the combination with morphological constraints was not realized. The method is tested for some vowels. The results confirm the validity of the method, but they also show the need for dynamic constraints. Zusammenfassung. Diese Arbeit...

