Results 1 - 10
of
23
Functional Phonology -- Formalizing the interactions between articulatory and perceptual drives
, 1998
"... ..."
A theoretical investigation of reference frames for the planning of speech movements
- Psychological Review
, 1998
"... Running title: Speech reference frames Does the speech motor control system utilize invariant vocal tract shape targets of any kind when producing phonemes? We present a four-part theoretical treatment favoring models whose only invariant targets are auditory perceptual targets over models that posi ..."
Abstract
-
Cited by 39 (21 self)
- Add to MetaCart
Running title: Speech reference frames Does the speech motor control system utilize invariant vocal tract shape targets of any kind when producing phonemes? We present a four-part theoretical treatment favoring models whose only invariant targets are auditory perceptual targets over models that posit invariant constriction targets. When combined with earlier theoretical and experimental results (Guenther, 1995a,b; Perkell et al., 1993; Savariaux et al., 1995a,b), our hypothesis is that, for vowels and semi-vowels at least, the only invariant targets of the speech production process are multidimensional regions in auditory perceptual space. These auditory perceptual target regions are hypothesized to arise during development as an emergent property of neural map formation in the auditory system. Furthermore, speech movements are planned as trajectories in auditory perceptual space. These trajectories are then mapped into articulator movements through a neural mapping that allows motor equivalent variability in constriction locations and degrees when needed, but maintains approximate constriction invariance for a given sound in most instances. These hypotheses are illustrated and substantiated using computer simulations of the DIVA model of speech acquisition and production. Finally, we pose several difficult challenges to proponents of constriction theories based on this theoretical treatment.
Combining MRI, EMA and EPG measurements in a three-dimensional tongue model
- Speech Communication
, 2003
"... A three-dimensional (3D) tongue model has been developed using MR images of a reference subject producing 44 artificially sustained Swedish articulations. Based on the difference in tongue shape between the articulations and a reference, the six linear parameters jaw height, tongue body, tongue dors ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
A three-dimensional (3D) tongue model has been developed using MR images of a reference subject producing 44 artificially sustained Swedish articulations. Based on the difference in tongue shape between the articulations and a reference, the six linear parameters jaw height, tongue body, tongue dorsum, tongue tip, tongue advance and tongue width were determined using an ordered linear factor analysis controlled by articulatory measures. The first five factors explained 88 % of the tongue data variance in the midsagittal plane and 78 % in the 3D analysis. The six-parameter model is able to reconstruct the modelled articulations with an overall mean reconstruction error of 0.13 cm, and it specifically handles lateral differences and asymmetries in tongue shape. In order to correct articulations that were hyperarticulated due to the artificial sustaining in the magnetic resonance imaging (MRI) acquisition, the parameter values in the tongue model were readjusted based on a comparison of virtual and natural linguopalatal contact patterns, collected with electropalatography (EPG). Electromagnetic articulography (EMA) data was collected to control the kinematics of the tongue model for vowel-fricative sequences and an algorithm to handle surface contacts has been implemented, preventing the tongue from protruding through the palate and teeth. Ó 2002 Elsevier B.V. All rights reserved. Resume
Articulatory Methods for Speech Production and Recognition
, 1996
"... roduction-based knowledge into the recognition framework. By using an explicit time-domain articulatory model of the mechanisms of co-articulation, it is hoped to obtain a more accurate model of contextual effects in the acoustic signal, while using fewer parameters than traditional acoustically-dri ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
roduction-based knowledge into the recognition framework. By using an explicit time-domain articulatory model of the mechanisms of co-articulation, it is hoped to obtain a more accurate model of contextual effects in the acoustic signal, while using fewer parameters than traditional acoustically-driven approaches. Separate articulatory and acoustic models are provided, and in each case the parameters of the models are automatically optimised over a training data set. A predictive statistically-based model of co-articulation is described, and found to yield improved articulatory modelling accuracy compared with X-ray articulatory traces. Parameterised acoustic vectors are synthesised by a set of artificial neural networks, and the resulting acoustic representations are used to re-score N-best recognition hypothesis lists produced by an HMM-based recogniser. The system is evaluated on two test databases, one including speaker-specific X-ray training data and the other aco
Emergence of Sound Systems Through Self-Organisation
, 1998
"... this paper tries to explain the emergence and structure of systems of speech sounds. It investigates how a coherent system of speech sounds can emerge in a popu- lation of agents and how the constraints under which the system emerges impose structure through self-organisation. If self-organisation c ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
this paper tries to explain the emergence and structure of systems of speech sounds. It investigates how a coherent system of speech sounds can emerge in a popu- lation of agents and how the constraints under which the system emerges impose structure through self-organisation. If self-organisation can explain structure, then innate and biologi- cally evolved mechanisms are not necessary. This effectively decreases the number of linguistic phenomena that have to be explained by biological evolution
A Parametric Three-Dimensional Model Of The Vocal-Tract Based On MRI Data
- in Proc ICASSP
, 1997
"... In this paper, 24 three-dimensional (3D) vocal-tract (VT) shapes extracted from MRI data are used to derive a parametric model for the vocal-tract. The method is as follows: first, each 3D VT shape is sampled using a semi-cylindrical grid whose position is determined by reference points based on VT ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
In this paper, 24 three-dimensional (3D) vocal-tract (VT) shapes extracted from MRI data are used to derive a parametric model for the vocal-tract. The method is as follows: first, each 3D VT shape is sampled using a semi-cylindrical grid whose position is determined by reference points based on VT anatomy. After that, the VT projections onto each plane of the grid are represented by their two main components obtained via principal component analysis (PCA). PCA is once again used to parametrize the sequences of coefficients that represent the sections along the tract. It was verified that the first four components can explain about 90% of the total variance of the observed shapes. Following this procedure, 3D VT shapes are approximated by linear combinations of four 3D basis functions. Finally, it is shown that the four parameters of the model can be estimated from VT midsagittal profiles. 1. INTRODUCTION Vocal-tract (VT) models play important roles in the investigation of articulator...
The Sensorimotor Foundations of Phonology: A Computational Model of Early Childhood Articulatory and Phonetic Development
, 1994
"... This thesis describes HABLAR, a computational model of the sensorimotor foundations of early childhood phonological development. HABLAR (an acronym for "Hierarchical Articulatory Based Language Acquisition by Reinforcement learning" and Spanish for "to speak") is intended to replicate the major mile ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This thesis describes HABLAR, a computational model of the sensorimotor foundations of early childhood phonological development. HABLAR (an acronym for "Hierarchical Articulatory Based Language Acquisition by Reinforcement learning" and Spanish for "to speak") is intended to replicate the major milestones of emerging speech and demonstrate key characteristics of normal development, including the phonetic characteristics of babble, systematic and context-sensitive patterns of sound substitutions and deletions, overgeneralization errors, and the emergence of adult phonemic organization. It should also mimic abnormal phonological development under certain conditions of damage or degradation. HABLAR simulates a complete sensorimotor system consisting of an auditory system that detects and categorizes speech sounds using only acoustic cues drawn from its linguistic environment, an articulatory system that generates synthetic speech based on a realistic computer model of the vocal tract, an...
A Novel Self-Organising Speech Production System Using Pseudo-Articulators
- Int. Congr. Phon. Sc
, 1995
"... A novel articulatory speech production system which is stochastically trained from a pre-specified initialisation state is presented. The target positions for a set of pseudo-articulators and the mapping from these to output speech spectral vectors are jointly optimised using linearised Kalman filte ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
A novel articulatory speech production system which is stochastically trained from a pre-specified initialisation state is presented. The target positions for a set of pseudo-articulators and the mapping from these to output speech spectral vectors are jointly optimised using linearised Kalman filtering and an assembly of neural networks. The techniques used to initialise and train the system are described, and preliminary results when synthesising speech are demonstrated. INTRODUCTION Articulatory speech synthesisers model human speech dynamics and hence theoretically can produce very high quality speech waveforms with explicit timedomain modelling of co-articulation [8, 12, 15]. Two major problems confronting such systems are: ffl Specification of the sequence of articulator positions or vocal tract area functions corresponding to a given text. ffl Provision of an accurate model of the human vocal tract. The former is frequently achieved using an "inverse" model to map parametris...
Synthesizing static vowels and dynamic sounds using a 3D vocal tract model
"... The KTH 3D Vocal Tract project aims at multimodal synthesis, producing both visual and acoustic output from an articulatory model. The intra-oral visual synthesis has been developped over the last couple of years combing measurements from Magnetic Resonance Imaging, Electromagnetic articulography an ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The KTH 3D Vocal Tract project aims at multimodal synthesis, producing both visual and acoustic output from an articulatory model. The intra-oral visual synthesis has been developped over the last couple of years combing measurements from Magnetic Resonance Imaging, Electromagnetic articulography and Electropalatography. This paper presents the first acoustic evaluation of the model. Nine static vowels have been synthesized with fairly good correspondence between the reference subject's target and the model's formants. The synthesis is based on the area function calculated directly from the vocal tract model, sampling the cross-sectional area at 23 semi-polar planes. The generation of the vocal tract walls, modeled on one reference subject, the algorithms for collision handling and cross-sectional contour extraction and the results of the acoustic synthesis are presented.
Learning New Articulator Trajectories for a Speech Production Model using Artificial Neural Networks
- IEEE International Conference on Neural Networks, 4:2046--2051
, 1995
"... We present a novel method for generating additional pseudo-articulator trajectories suitable for use within the framework of a stochastically trained speech production system recently developed at CUED. The system is initialised by inverting a codebook of (articulator, spectral vector) pairs, and th ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We present a novel method for generating additional pseudo-articulator trajectories suitable for use within the framework of a stochastically trained speech production system recently developed at CUED. The system is initialised by inverting a codebook of (articulator, spectral vector) pairs, and the target positions for a set of pseudo-articulators and the mapping from these to speech spectral vectors are then jointly optimised using linearised Kalman filtering and an assembly of neural networks. A separate network is then used to hypothesise a new articulator trajectory as a function of the existing articulators and the outputerror of the system. The techniques used to initialise and train the system are described, and preliminary results for the generation of new pseudo-articulatory inputs are presented. 1. Introduction Articulatory speech synthesis from text requires the specification of a set of articulator trajectories corresponding to a time-aligned phoneme string, together wi...

