Results 1 - 10
of
11
The perceptual magnet effect as an emergent property of neural map formation
- Journal of the Acoustical Society of America
, 1996
"... The perceptual magnet effect is one of the earliest known language-specific phenomena arising in infant speech development. The effect is characterized by a warping of perceptual space near phonemic category centers. Previous explanations have been formulated within the theoretical framework of cogn ..."
Abstract
-
Cited by 62 (7 self)
- Add to MetaCart
The perceptual magnet effect is one of the earliest known language-specific phenomena arising in infant speech development. The effect is characterized by a warping of perceptual space near phonemic category centers. Previous explanations have been formulated within the theoretical framework of cognitive psychology. The model proposed in this paper builds on research from both psychology and neuroscience in working toward a more complete account of the effect. The model embodies two principal hypotheses supported by considerable experimental and theoretical research from the neuroscience literature: (1) sensory experience guides language-specific development of an auditory neural map, and (2) a population vector can predict psychological phenomena based on map cell activities. These hypotheses are realized in a selforganizing neural network model. The magnet effect arises in the model from language-specific nonuniformities in the distribution of map cell firing preferences. Numerical simulations verify that the model captures the known general characteristics of the magnet effect and provides accurate fits to specific
A theoretical investigation of reference frames for the planning of speech movements
- Psychological Review
, 1998
"... Running title: Speech reference frames Does the speech motor control system utilize invariant vocal tract shape targets of any kind when producing phonemes? We present a four-part theoretical treatment favoring models whose only invariant targets are auditory perceptual targets over models that posi ..."
Abstract
-
Cited by 39 (21 self)
- Add to MetaCart
Running title: Speech reference frames Does the speech motor control system utilize invariant vocal tract shape targets of any kind when producing phonemes? We present a four-part theoretical treatment favoring models whose only invariant targets are auditory perceptual targets over models that posit invariant constriction targets. When combined with earlier theoretical and experimental results (Guenther, 1995a,b; Perkell et al., 1993; Savariaux et al., 1995a,b), our hypothesis is that, for vowels and semi-vowels at least, the only invariant targets of the speech production process are multidimensional regions in auditory perceptual space. These auditory perceptual target regions are hypothesized to arise during development as an emergent property of neural map formation in the auditory system. Furthermore, speech movements are planned as trajectories in auditory perceptual space. These trajectories are then mapped into articulator movements through a neural mapping that allows motor equivalent variability in constriction locations and degrees when needed, but maintains approximate constriction invariance for a given sound in most instances. These hypotheses are illustrated and substantiated using computer simulations of the DIVA model of speech acquisition and production. Finally, we pose several difficult challenges to proponents of constriction theories based on this theoretical treatment.
Production Models As A Structural Basis For Automatic Speech Recognition
, 1996
"... We postulate in this paper that highly structured speech production models will have much to contribute to the ultimate success of speech recognition in view of the weaknesses of the theoretical foundation underpinning current technology. These weaknesses are analyzed in terms of phonological modeli ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
We postulate in this paper that highly structured speech production models will have much to contribute to the ultimate success of speech recognition in view of the weaknesses of the theoretical foundation underpinning current technology. These weaknesses are analyzed in terms of phonological modeling and of phonetic-interface modeling. We conclude by suggesting that many of the advantages to be gained from interaction between speech production and speech recognition communities will develop from integrating models from the production community with the probabilistic analysis-by-synthesis strategy currently used by the technology community. R ' ESUM ' EE Dans cet article, nous proposons que les mod`eles de production de la parole contribueront beaucoup `a la r'eussite eventuelle des mod`eles de reconnaissance automatique, limit'es en ce moment par les faiblesses de la base th'eorique de la technologie actuelle. Nous analysons ces faiblesses au niveau des mod`eles phonologiques et mod`...
Learning to speak. Sensori-motor control of speech movements
, 1998
"... This paper shows how an articulatory model, able to produce acoustic signals from articulatory motion, can learn to speak, i.e. coordinate its movements in such a way that it utters meaningful sequences of sounds belonging to a given language. This complex learning procedure is accomplished in four ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
This paper shows how an articulatory model, able to produce acoustic signals from articulatory motion, can learn to speak, i.e. coordinate its movements in such a way that it utters meaningful sequences of sounds belonging to a given language. This complex learning procedure is accomplished in four major steps: (a) a babbling phase, where the device builds up a model of the forward transforms, i.e. the articulatory-to-audio-visual mapping; (b) an imitation stage, where it tries to reproduce a limited set of sound sequences by audio-visual-to-articulatory inversion; (c) a "shaping" stage, where phonemes are associated with the most efficient available sensori-motor representation; and finally, (d) a "rhythmic" phase, where it learns the appropriate coordination of the activations of these sensori-motor targets.
Articulatory Tradeoffs Reduce Acoustic Variability during American English /r/ Production
, 1999
"... The American English phoneme /r/ has long been associated with large amounts of articulatory variability during production. This paper investigates the hypothesis that the articulatory variations used by a speaker to produce /r/ in different contexts exhibit systematic tradeoffs, or articulatory tra ..."
Abstract
-
Cited by 9 (7 self)
- Add to MetaCart
The American English phoneme /r/ has long been associated with large amounts of articulatory variability during production. This paper investigates the hypothesis that the articulatory variations used by a speaker to produce /r/ in different contexts exhibit systematic tradeoffs, or articulatory trading relations, that act to maintain a relatively stable acoustic signal despite the large variations in vocal tract shape. Acoustic and articulatory recordings were collected from seven speakers producing /r/ in five phonetic contexts. For every speaker, the different articulator configurations used to produce /r/ in the different phonetic contexts showed systematic tradeoffs, as evidenced by significant correlations between the positions of transducers mounted on the tongue. Analysis of acoustic and articulatory variabilities revealed that these tradeoffs act to reduce acoustic variability, thus allowing relatively large contextual variations in vocal tract shape for /r/ without seriously ...
Neural Modeling of Speech Production
- Proceedings of the 6th International Seminar on Speech Production
, 2003
"... . This paper describes a neural model of speech production and perception-production interactions. This model has been developed to account for a wide variety of experimental data, ranging from kinematic analyses of articulator movements to functional imaging studies of the human brain. We have a ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
. This paper describes a neural model of speech production and perception-production interactions. This model has been developed to account for a wide variety of experimental data, ranging from kinematic analyses of articulator movements to functional imaging studies of the human brain. We have also tested predictions based on the model with these and other experimental techniques. Hypothesized neural correlates of the models components have been identified to facilitate testing of model predictions with techniques such as fMRI. The model also serves as a framework for interpreting and organizing the accumulating mass of data from functional imaging studies of the human brain. 1.
Articulatory Methods for Speech Production and Recognition
, 1996
"... roduction-based knowledge into the recognition framework. By using an explicit time-domain articulatory model of the mechanisms of co-articulation, it is hoped to obtain a more accurate model of contextual effects in the acoustic signal, while using fewer parameters than traditional acoustically-dri ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
roduction-based knowledge into the recognition framework. By using an explicit time-domain articulatory model of the mechanisms of co-articulation, it is hoped to obtain a more accurate model of contextual effects in the acoustic signal, while using fewer parameters than traditional acoustically-driven approaches. Separate articulatory and acoustic models are provided, and in each case the parameters of the models are automatically optimised over a training data set. A predictive statistically-based model of co-articulation is described, and found to yield improved articulatory modelling accuracy compared with X-ray articulatory traces. Parameterised acoustic vectors are synthesised by a set of artificial neural networks, and the resulting acoustic representations are used to re-score N-best recognition hypothesis lists produced by an HMM-based recogniser. The system is evaluated on two test databases, one including speaker-specific X-ray training data and the other aco
A neural model of speech production and its application to studies of the role of auditory feedback in speech
- In
, 2004
"... Abstract. This paper describes a neural model of speech production and perception-production interactions. This model has been developed to account for a wide variety of experimental data, ranging from kinematic analyses of articulator movements to functional imaging studies of the human brain. Hypo ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Abstract. This paper describes a neural model of speech production and perception-production interactions. This model has been developed to account for a wide variety of experimental data, ranging from kinematic analyses of articulator movements to functional imaging studies of the human brain. Hypothesized neural correlates of the model’s components have been identified to facilitate testing of model predictions with techniques such as fMRI. The model also serves as a framework for interpreting and organizing the accumulating mass of data from functional imaging studies of the human brain. According to the model, the goals of speech movements are in auditory-temporal space and the movements are planned with the use of mappings between articulations and their acoustic and auditory consequences. It is hypothesized that the mappings are acquired and maintained with the use of auditory feedback. Data are presented from studies of changes in speech that occur in response to a change in hearing status. These data provide information about the nature of the mappings and how they are used in planning speech movements. 1. The DIVA Model of Speech Production The overall objective of our research is to model the brain activity and the motor, biomechanical and sensory processes involved in speech production. Our approach is to use a combination of computational models and to develop and test them with brain imaging, psychophysical, physiological, anatomical and acoustic data. In particular, we have developed a neural network model of speech motor skill acquisition and speech production, called the DIVA model, that explains a wide range of data on contextual variability, motor equivalence,
Towards a Model of Target Oriented Production of Prosody
- In Proceedings of the European Conference on Speech Communication and Technology
, 2001
"... A new paradigm for prosody research is presented, inspired by the speech production model recently proposed by Guenther, Perkell, and colleagues. This research paradigm aims at generalizing the production model by extending it from a predominantly segmental perspective to a new theory of the product ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
A new paradigm for prosody research is presented, inspired by the speech production model recently proposed by Guenther, Perkell, and colleagues. This research paradigm aims at generalizing the production model by extending it from a predominantly segmental perspective to a new theory of the production of prosody. Speech movements in the prosodic domain are interpreted as intonational gestures that are planned to reach and traverse perceptual target regions. Evidence from F0 alignment studies suggests that the perceptual targets can be approximately represented by regions in a multidimensional acoustictemporal space. These studies also indicate that segmental, spectral, temporal, and prosodic structure are co-produced in such a way as to mutually support and enhance, and not impair, the perceptual targets. Furthermore, examples of multilevel mappings between invariant and variable targets in the domain of prosody are provided, and a dichotomy of phonemic and postural prosodic settings is discussed.
Phonemic and Postural Effects on the Production of Prosody
- in Proceedings of the Speech Prosody 2002 Conference, B. Bel and I. Marlien, Eds., Aix-en-Provence
, 2002
"... Phonemic settings and the internal models that they represent are learned in the process of language and speech acquisition. Postural settings, in contrast, rely on continuous auditory monitoring and tend to break down quickly if this monitoring process is inhibited during speech production. Evidenc ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Phonemic settings and the internal models that they represent are learned in the process of language and speech acquisition. Postural settings, in contrast, rely on continuous auditory monitoring and tend to break down quickly if this monitoring process is inhibited during speech production. Evidence presented in the literature seems to indicate that stable internal models are mostly associated with segmental phonemic targets, whereas prosodic features often display postural characteristics. In this paper it is argued that the dichotomy of phonemic and postural settings applies not only to segmental properties of speech but to prosodic features as well. Phonemic and postural effects on the production of prosody are reviewed and it is suggested that the boundary between phonemic and postural effects on a given prosodic feature is flexible. We further hypothesize that the speaker may rely on a set of acquired internal models and select from this set a particular model depending on communicative and situative constraints.

