Results 1 - 10
of
24
Functional Phonology -- Formalizing the interactions between articulatory and perceptual drives
, 1998
"... ..."
A theoretical investigation of reference frames for the planning of speech movements
- Psychological Review
, 1998
"... Running title: Speech reference frames Does the speech motor control system utilize invariant vocal tract shape targets of any kind when producing phonemes? We present a four-part theoretical treatment favoring models whose only invariant targets are auditory perceptual targets over models that posi ..."
Abstract
-
Cited by 39 (21 self)
- Add to MetaCart
Running title: Speech reference frames Does the speech motor control system utilize invariant vocal tract shape targets of any kind when producing phonemes? We present a four-part theoretical treatment favoring models whose only invariant targets are auditory perceptual targets over models that posit invariant constriction targets. When combined with earlier theoretical and experimental results (Guenther, 1995a,b; Perkell et al., 1993; Savariaux et al., 1995a,b), our hypothesis is that, for vowels and semi-vowels at least, the only invariant targets of the speech production process are multidimensional regions in auditory perceptual space. These auditory perceptual target regions are hypothesized to arise during development as an emergent property of neural map formation in the auditory system. Furthermore, speech movements are planned as trajectories in auditory perceptual space. These trajectories are then mapped into articulator movements through a neural mapping that allows motor equivalent variability in constriction locations and degrees when needed, but maintains approximate constriction invariance for a given sound in most instances. These hypotheses are illustrated and substantiated using computer simulations of the DIVA model of speech acquisition and production. Finally, we pose several difficult challenges to proponents of constriction theories based on this theoretical treatment.
Towards Improved Speech Recognition Using A Speech Production Model
- Europ. Conf. Sp. Comm. Tech
, 1995
"... Considerable improvement in the performance of continuous speech recognition systems, particularly those based on Hidden Markov Models (HMMs), has been shown in recent years. Nevertheless a number of unsolved problems remain which limit this progress, including the successful modelling of co-articul ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
Considerable improvement in the performance of continuous speech recognition systems, particularly those based on Hidden Markov Models (HMMs), has been shown in recent years. Nevertheless a number of unsolved problems remain which limit this progress, including the successful modelling of co-articulation and the identification of out of vocabulary utterances. One possible solution is to re-synthesise speech from the N-best time-aligned phonemic transcriptions produced by an HMM, and re-score this list based on a spectral comparison between the original and re-synthesised speech frames. In this paper a novel speech production model (SPM) suitable for use in such a system is introduced, and preliminary re-scoring results are presented. 1. INTRODUCTION The application of speech production models to the task of automatic speech recognition is a relatively new area of research which has attracted increasing interest over the past few years [10]. The basic operation of such a combined syste...
Combining MRI, EMA and EPG measurements in a three-dimensional tongue model
- Speech Communication
, 2003
"... A three-dimensional (3D) tongue model has been developed using MR images of a reference subject producing 44 artificially sustained Swedish articulations. Based on the difference in tongue shape between the articulations and a reference, the six linear parameters jaw height, tongue body, tongue dors ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
A three-dimensional (3D) tongue model has been developed using MR images of a reference subject producing 44 artificially sustained Swedish articulations. Based on the difference in tongue shape between the articulations and a reference, the six linear parameters jaw height, tongue body, tongue dorsum, tongue tip, tongue advance and tongue width were determined using an ordered linear factor analysis controlled by articulatory measures. The first five factors explained 88 % of the tongue data variance in the midsagittal plane and 78 % in the 3D analysis. The six-parameter model is able to reconstruct the modelled articulations with an overall mean reconstruction error of 0.13 cm, and it specifically handles lateral differences and asymmetries in tongue shape. In order to correct articulations that were hyperarticulated due to the artificial sustaining in the magnetic resonance imaging (MRI) acquisition, the parameter values in the tongue model were readjusted based on a comparison of virtual and natural linguopalatal contact patterns, collected with electropalatography (EPG). Electromagnetic articulography (EMA) data was collected to control the kinematics of the tongue model for vowel-fricative sequences and an algorithm to handle surface contacts has been implemented, preventing the tongue from protruding through the palate and teeth. Ó 2002 Elsevier B.V. All rights reserved. Resume
Articulatory Methods for Speech Production and Recognition
, 1996
"... roduction-based knowledge into the recognition framework. By using an explicit time-domain articulatory model of the mechanisms of co-articulation, it is hoped to obtain a more accurate model of contextual effects in the acoustic signal, while using fewer parameters than traditional acoustically-dri ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
roduction-based knowledge into the recognition framework. By using an explicit time-domain articulatory model of the mechanisms of co-articulation, it is hoped to obtain a more accurate model of contextual effects in the acoustic signal, while using fewer parameters than traditional acoustically-driven approaches. Separate articulatory and acoustic models are provided, and in each case the parameters of the models are automatically optimised over a training data set. A predictive statistically-based model of co-articulation is described, and found to yield improved articulatory modelling accuracy compared with X-ray articulatory traces. Parameterised acoustic vectors are synthesised by a set of artificial neural networks, and the resulting acoustic representations are used to re-score N-best recognition hypothesis lists produced by an HMM-based recogniser. The system is evaluated on two test databases, one including speaker-specific X-ray training data and the other aco
The Sensorimotor Foundations of Phonology: A Computational Model of Early Childhood Articulatory and Phonetic Development
, 1994
"... This thesis describes HABLAR, a computational model of the sensorimotor foundations of early childhood phonological development. HABLAR (an acronym for "Hierarchical Articulatory Based Language Acquisition by Reinforcement learning" and Spanish for "to speak") is intended to replicate the major mile ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This thesis describes HABLAR, a computational model of the sensorimotor foundations of early childhood phonological development. HABLAR (an acronym for "Hierarchical Articulatory Based Language Acquisition by Reinforcement learning" and Spanish for "to speak") is intended to replicate the major milestones of emerging speech and demonstrate key characteristics of normal development, including the phonetic characteristics of babble, systematic and context-sensitive patterns of sound substitutions and deletions, overgeneralization errors, and the emergence of adult phonemic organization. It should also mimic abnormal phonological development under certain conditions of damage or degradation. HABLAR simulates a complete sensorimotor system consisting of an auditory system that detects and categorizes speech sounds using only acoustic cues drawn from its linguistic environment, an articulatory system that generates synthetic speech based on a realistic computer model of the vocal tract, an...
Artisynth: A biomechanical simulation platform for the vocal tract and upper airway
- Journal of Motor Behavior
, 1986
"... platform directed toward modeling the vocal tract and upper airway. It provides an open-source environment in which researchers can create and interconnect various kinds of dynamic and parametric models to form a complete integrated biomechanical system which is capable of articulatory speech synthe ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
platform directed toward modeling the vocal tract and upper airway. It provides an open-source environment in which researchers can create and interconnect various kinds of dynamic and parametric models to form a complete integrated biomechanical system which is capable of articulatory speech synthesis. An interactive graphical Timeline runs the simulation and allows the temporal arrangement of input/output channels to control or observe properties of the model’s components. Library support is available for particle-spring and rigid body systems, finite element models, and spline-based curves and surfaces. To date, these have been used to create a dynamic muscle-based model of the jaw, a deformable tongue model, a deformable airway, and a linear acoustics model, which have been connected together to form a complete vocal tract that produces speech and is drivable both by data and by dynamics. I.
A Novel Self-Organising Speech Production System Using Pseudo-Articulators
- Int. Congr. Phon. Sc
, 1995
"... A novel articulatory speech production system which is stochastically trained from a pre-specified initialisation state is presented. The target positions for a set of pseudo-articulators and the mapping from these to output speech spectral vectors are jointly optimised using linearised Kalman filte ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
A novel articulatory speech production system which is stochastically trained from a pre-specified initialisation state is presented. The target positions for a set of pseudo-articulators and the mapping from these to output speech spectral vectors are jointly optimised using linearised Kalman filtering and an assembly of neural networks. The techniques used to initialise and train the system are described, and preliminary results when synthesising speech are demonstrated. INTRODUCTION Articulatory speech synthesisers model human speech dynamics and hence theoretically can produce very high quality speech waveforms with explicit timedomain modelling of co-articulation [8, 12, 15]. Two major problems confronting such systems are: ffl Specification of the sequence of articulator positions or vocal tract area functions corresponding to a given text. ffl Provision of an accurate model of the human vocal tract. The former is frequently achieved using an "inverse" model to map parametris...
Speech Processing with Linear and Neural Network Models
, 1996
"... ion, for imposing continuity between models of adjacent speech segments, and learning rate adaptation, for improving back-propagation training, are discussed. For synthesising real speech utterances, an audio tape demonstrates that ARX models produce the highest quality synthetic speech and that the ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
ion, for imposing continuity between models of adjacent speech segments, and learning rate adaptation, for improving back-propagation training, are discussed. For synthesising real speech utterances, an audio tape demonstrates that ARX models produce the highest quality synthetic speech and that the quality is maintained when pitch modifications are applied. The second part of the dissertation studies the operation of recurrent neural networks in classifying patterns of correlated feature vectors. Such patterns are typical of speech classification tasks. The operation of a hidden node with a recurrent connection is explained in terms of a decision boundary which changes position in feature space. The feedback is shown to delay switching from one class to another and to smooth output decisions for sequences of feature vectors from the same class. For networks trained with constant class targets, a sequence of feature vectors from the same class tends to drive the operation of hidden nod
Synthesizing static vowels and dynamic sounds using a 3D vocal tract model
"... The KTH 3D Vocal Tract project aims at multimodal synthesis, producing both visual and acoustic output from an articulatory model. The intra-oral visual synthesis has been developped over the last couple of years combing measurements from Magnetic Resonance Imaging, Electromagnetic articulography an ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The KTH 3D Vocal Tract project aims at multimodal synthesis, producing both visual and acoustic output from an articulatory model. The intra-oral visual synthesis has been developped over the last couple of years combing measurements from Magnetic Resonance Imaging, Electromagnetic articulography and Electropalatography. This paper presents the first acoustic evaluation of the model. Nine static vowels have been synthesized with fairly good correspondence between the reference subject's target and the model's formants. The synthesis is based on the area function calculated directly from the vocal tract model, sampling the cross-sectional area at 23 semi-polar planes. The generation of the vocal tract walls, modeled on one reference subject, the algorithms for collision handling and cross-sectional contour extraction and the results of the acoustic synthesis are presented.

