Results 1 -
8 of
8
Speech sound acquisition, coarticulation, and rate effects in a neural network model of speech production
- Psychological Review
, 1995
"... This article describes a neural network model of speech motor skill acquisition and speech production that explains a wide range of data on variability, motor equivalence, coarticulation, and rate effects. Model parameters are learned during a babbling phase. To explain how infants learn language-sp ..."
Abstract
-
Cited by 52 (21 self)
- Add to MetaCart
This article describes a neural network model of speech motor skill acquisition and speech production that explains a wide range of data on variability, motor equivalence, coarticulation, and rate effects. Model parameters are learned during a babbling phase. To explain how infants learn language-specific variability limits, speech sound targets take the form of convex regions, rather than points, in orosensory coordinates. Reducing target size for better accuracy during slower speech leads to differential effects for vowels and consonants, as seen in experiments previously used as evidence for separate control processes for the 2 sound types. Anticipatory coarticulation arises when targets are reduced in size on the basis of context; this generalizes the well-known look-ahead model of coarticulation. Computer simulations verify the model's properties. The primary goal of the modeling work described in this article is to provide a coherent theoretical framework that provides explanations for a wide range of data concerning the articulator movements used by humans to produce speech sounds. This is carried out by formulating a model that transforms strings of phonemes into continuous articulator movements for
Neural dynamics of variable-rate speech categorization
- J. Exp. Psych. Hum. Perception Performance
, 1997
"... What is the neural representation of a speech code as it evolves in time? A neural model simulates data concerning segregation and integration of phonetic percepts. Hearing two phonetically related stops in a VC-CV pair (V = vowel; C = consonant) requires 150 ms more closure time than hearing two ph ..."
Abstract
-
Cited by 46 (22 self)
- Add to MetaCart
What is the neural representation of a speech code as it evolves in time? A neural model simulates data concerning segregation and integration of phonetic percepts. Hearing two phonetically related stops in a VC-CV pair (V = vowel; C = consonant) requires 150 ms more closure time than hearing two phonetically different stops in a VC,-C2V pair. Closure time also varies with long-term stimulus rate. The model simulates rate-dependent category boundaries that emerge from feedback: interactions between a working memory for short-term storage of phonetic items and a list categorization network for grouping sequences of items. The conscious speech code is a resonant wave. It emerges after bottom-up signals from the working memory select list chunks which read out top-down expectations that amplify and focus attention on consistent working memory items. In VCi-C2V pairs, resonance is reset by mismatch of Cj with the C, expectation. In VC-CV pairs, resonance prolongs a repeated C. What is the nature of the process that converts brain events into behavioral percepts? An answer to this question is needed in order to understand how the brain controls behavior and how the brain is, in turn, shaped by environmental feedback that is experienced on the behavioral level. The nature of this connection also needs to be understood in order to develop neurally plausible connectionist models. Without it, a correct linking hypothesis cannot be developed between psychological data and the brain mechanisms from which they are generated.
Neural Dynamics Of Perceptual Order And Context Effects For Variable-Rate Speech Syllables
, 1998
"... How does the brain extract invariant properties of variable-rate speech? A neural model, called PHONET, is developed to explain aspects of this process and, along the way, data about perceptual context effects. For example, in consonant vowel (CV) syllables such as /ba/ and /wa/, an increase in the ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
How does the brain extract invariant properties of variable-rate speech? A neural model, called PHONET, is developed to explain aspects of this process and, along the way, data about perceptual context effects. For example, in consonant vowel (CV) syllables such as /ba/ and /wa/, an increase in the duration of the vowel can cause a switch in the percept of the preceding consonant from /w/ to /b/ (Miller and Liberman, 1979). The frequency extent of the initial formant transitions of fixed duration also influences the percept (Schwab, Sawusch, and Nusbaum, 1981). PHONET quantitatively simulates over 98% of the variance in these data using a single set of parameters. The model also qualitatively explains many data about other perceptual context effects. In the model, C and V inputs are filtered by parallel auditory streams that respond preferentially to transient and sustained properties of the acoustic signal before being stored in parallel working memories. A lateral inhibitory network ...
A Spectral Network Model of Pitch Perception
, 1995
"... A model of pitch perception, called the Spatial Pitch Network or SPINET model, is developed and analyzed. The model neurally instantiates ideas from the spectral pitch modeling literature and joins them to basic neural network signal processing designs to simulate a broader range of perceptual pitch ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
A model of pitch perception, called the Spatial Pitch Network or SPINET model, is developed and analyzed. The model neurally instantiates ideas from the spectral pitch modeling literature and joins them to basic neural network signal processing designs to simulate a broader range of perceptual pitch data than previous spectral models. The components of the model are interpreted as peripheral mechanical and neural processing stages, which are capable of being incorporated into a larger network architecture for separating multiple sound sources in the environment. The core of the new model transforms a spectral representation of an acoustic source into a spatial distribution of pitch strengths. The SPINET model uses a weighted "harmonic sieve" whereby the strength of activation of a given pitch depends upon a weighted sum of narrow regions around the harmonics of the nominal pitch value, and higher harmonics contribute less to a pitch than lower ones. Suitably chosen harmonic weighting f...
Visual Schemas in Neural Networks for Object Recognition and Scene Analysis
, 1997
"... VISOR is a large connectionist system that shows how visual schemas can be learned, represented, and used through mechanisms natural to neural networks. Processing in VISOR is based on cooperation, competition, and parallel bottom-up and top-down activation of schema representations. Simulations sho ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
VISOR is a large connectionist system that shows how visual schemas can be learned, represented, and used through mechanisms natural to neural networks. Processing in VISOR is based on cooperation, competition, and parallel bottom-up and top-down activation of schema representations. Simulations show that VISOR is robust against noise and variations in the inputs and parameters. It can indicate the confidence of its analysis, pay attention to important minor differences, and use context to recognize ambiguous objects. Experiments also suggest that the representation and learning are stable, and its behavior is consistent with human processes such as priming, perceptual reversal, and circular reaction in learning. The schema mechanisms of VISOR can serve as a starting point for building robust high-level vision systems, and perhaps for schema-based motor control and natural language processing systems as well. 1 Introduction Neural networks have been successfully applied to problems su...
Unsupervised Learning Of Simple Speech Production Based On Soft Competitive Learning
- Learning, in Eeckman F.H. & Bower J.M.(eds.), Computation and Neural Systems, Kluwer Academic Publishers, Boston/Dordrecht/London
, 1993
"... this paper we present a simple connectionist model for the adaptive sensorymotor loop involved in perceiving and producing speech. At the heart of the production part lies an articulatory model which approximates the human vocal tract through polygons and splines. Output of this model is the envelop ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
this paper we present a simple connectionist model for the adaptive sensorymotor loop involved in perceiving and producing speech. At the heart of the production part lies an articulatory model which approximates the human vocal tract through polygons and splines. Output of this model is the envelope of the acoustic filter function, realized by this vocal tract, which is comparable to the spectrum of real speech segments. The goal of this research was to find a learning method to train a multi-layer neural network to produce the correct set of twelve articulatory parameters when given the spectrum of recorded real speech (stationary vowels). The method introduced in this paper explicitly makes use of a neural network categorization component. Through so-called
Running Title: Cortical Working Memory and Sequence Learning
"... continuous-distracter free recall, sensory-motor imitation, chunking, sequence learning, prefrontal cortex, parietal cortex, position coding, rank order cells, cerebral cortex, laminar computing * Authors are listed in alphabetical order. 1 ..."
Abstract
- Add to MetaCart
continuous-distracter free recall, sensory-motor imitation, chunking, sequence learning, prefrontal cortex, parietal cortex, position coding, rank order cells, cerebral cortex, laminar computing * Authors are listed in alphabetical order. 1

