Results 1 - 10
of
53
The Dynamical Hypothesis in Cognitive Science
- Behavioral and Brain Sciences
, 1997
"... The dynamical hypothesis is the claim that cognitive agents are dynamical systems. It stands opposed to the dominant computational hypothesis, the claim that cognitive agents are digital computers. This target article articulates the dynamical hypothesis and defends it as an open empirical alternati ..."
Abstract
-
Cited by 79 (0 self)
- Add to MetaCart
The dynamical hypothesis is the claim that cognitive agents are dynamical systems. It stands opposed to the dominant computational hypothesis, the claim that cognitive agents are digital computers. This target article articulates the dynamical hypothesis and defends it as an open empirical alternative to the computational hypothesis. Carrying out these objectives requires extensive clarification of the conceptual terrain, with particular focus on the relation of dynamical systems to computers. Key words cognition, systems, dynamical systems, computers, computational systems, computability, modeling, time. Long Abstract The heart of the dominant computational approach in cognitive science is the hypothesis that cognitive agents are digital computers; the heart of the alternative dynamical approach is the hypothesis that cognitive agents are dynamical systems. This target article attempts to articulate the dynamical hypothesis and to defend it as an empirical alternative to the compu...
Speech sound acquisition, coarticulation, and rate effects in a neural network model of speech production
- Psychological Review
, 1995
"... This article describes a neural network model of speech motor skill acquisition and speech production that explains a wide range of data on variability, motor equivalence, coarticulation, and rate effects. Model parameters are learned during a babbling phase. To explain how infants learn language-sp ..."
Abstract
-
Cited by 52 (21 self)
- Add to MetaCart
This article describes a neural network model of speech motor skill acquisition and speech production that explains a wide range of data on variability, motor equivalence, coarticulation, and rate effects. Model parameters are learned during a babbling phase. To explain how infants learn language-specific variability limits, speech sound targets take the form of convex regions, rather than points, in orosensory coordinates. Reducing target size for better accuracy during slower speech leads to differential effects for vowels and consonants, as seen in experiments previously used as evidence for separate control processes for the 2 sound types. Anticipatory coarticulation arises when targets are reduced in size on the basis of context; this generalizes the well-known look-ahead model of coarticulation. Computer simulations verify the model's properties. The primary goal of the modeling work described in this article is to provide a coherent theoretical framework that provides explanations for a wide range of data concerning the articulator movements used by humans to produce speech sounds. This is carried out by formulating a model that transforms strings of phonemes into continuous articulator movements for
A theoretical investigation of reference frames for the planning of speech movements
- Psychological Review
, 1998
"... Running title: Speech reference frames Does the speech motor control system utilize invariant vocal tract shape targets of any kind when producing phonemes? We present a four-part theoretical treatment favoring models whose only invariant targets are auditory perceptual targets over models that posi ..."
Abstract
-
Cited by 39 (21 self)
- Add to MetaCart
Running title: Speech reference frames Does the speech motor control system utilize invariant vocal tract shape targets of any kind when producing phonemes? We present a four-part theoretical treatment favoring models whose only invariant targets are auditory perceptual targets over models that posit invariant constriction targets. When combined with earlier theoretical and experimental results (Guenther, 1995a,b; Perkell et al., 1993; Savariaux et al., 1995a,b), our hypothesis is that, for vowels and semi-vowels at least, the only invariant targets of the speech production process are multidimensional regions in auditory perceptual space. These auditory perceptual target regions are hypothesized to arise during development as an emergent property of neural map formation in the auditory system. Furthermore, speech movements are planned as trajectories in auditory perceptual space. These trajectories are then mapped into articulator movements through a neural mapping that allows motor equivalent variability in constriction locations and degrees when needed, but maintains approximate constriction invariance for a given sound in most instances. These hypotheses are illustrated and substantiated using computer simulations of the DIVA model of speech acquisition and production. Finally, we pose several difficult challenges to proponents of constriction theories based on this theoretical treatment.
Production Models As A Structural Basis For Automatic Speech Recognition
, 1996
"... We postulate in this paper that highly structured speech production models will have much to contribute to the ultimate success of speech recognition in view of the weaknesses of the theoretical foundation underpinning current technology. These weaknesses are analyzed in terms of phonological modeli ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
We postulate in this paper that highly structured speech production models will have much to contribute to the ultimate success of speech recognition in view of the weaknesses of the theoretical foundation underpinning current technology. These weaknesses are analyzed in terms of phonological modeling and of phonetic-interface modeling. We conclude by suggesting that many of the advantages to be gained from interaction between speech production and speech recognition communities will develop from integrating models from the production community with the probabilistic analysis-by-synthesis strategy currently used by the technology community. R ' ESUM ' EE Dans cet article, nous proposons que les mod`eles de production de la parole contribueront beaucoup `a la r'eussite eventuelle des mod`eles de reconnaissance automatique, limit'es en ce moment par les faiblesses de la base th'eorique de la technologie actuelle. Nous analysons ces faiblesses au niveau des mod`eles phonologiques et mod`...
Structured speech modeling
- IEEE Transactions on Audio, Speech and Language Processing (Special Issue on Rich Transcription
, 2006
"... Abstract—Modeling dynamic structure of speech is a novel paradigm in speech recognition research within the generative modeling framework, and it offers a potential to overcome limitations of the current hidden Markov modeling approach. Analogous to structured language models where syntactic structu ..."
Abstract
-
Cited by 19 (11 self)
- Add to MetaCart
Abstract—Modeling dynamic structure of speech is a novel paradigm in speech recognition research within the generative modeling framework, and it offers a potential to overcome limitations of the current hidden Markov modeling approach. Analogous to structured language models where syntactic structure is exploited to represent long-distance relationships among words [5], the structured speech model described in this paper makes use of the dynamic structure in the hidden vocal tract resonance space to characterize long-span contextual influence among phonetic units. A general overview is provided first on hierarchically classified types of dynamic speech models in the literature. A detailed account is then given for a specific model type called the hidden trajectory model, and we describe detailed steps of model construction and the parameter estimation algorithms. We show how the use of resonance target parameters and their temporal filtering enables joint modeling of long-span coarticulation and phonetic reduction effects. Experiments on phonetic recognition evaluation demonstrate superior recognizer performance over a modern hidden Markov model-based system. Error analysis shows that the greatest performance gain occurs within the sonorant speech class. Index Terms—Hidden dynamics, hidden trajectory, long span modeling, maximum-likelihood, nonlinear prediction, parameter learning, structured modeling, vocal tract resonance. I.
A Modeling Framework for Speech Motor Development and Kinematic Articulator Control
, 1995
"... This paper presents three hypotheses that are central to a computational model of speech production: (1) Sound targets take the form of regions, rather than points, in a planning reference frame. (2) The planning frame is more acoustic-like than the frames used in most recent models. (3) A direction ..."
Abstract
-
Cited by 17 (11 self)
- Add to MetaCart
This paper presents three hypotheses that are central to a computational model of speech production: (1) Sound targets take the form of regions, rather than points, in a planning reference frame. (2) The planning frame is more acoustic-like than the frames used in most recent models. (3) A direction-to-direction mapping transforms planned trajectories into articulator movements. These hypotheses are supported by experimental data and simulation results. 1. INTRODUCTION: REFERENCE FRAMES AND MAPPINGS It is useful to think of speech production as the process of formulating a trajectory within a planning reference frame to pass through a sequence of targets, each corresponding to a different phoneme in the string being produced. This trajectory can then be mapped into a set of articulator movements that carry out the planned trajectory. The articulator movements are defined within an articulatory reference frame that relates closely to the musculature or primary movement degrees of free...
Dynamical System Modelling Of Articulator Movement
, 1999
"... We describe the modelling of articulatory movements using (hidden) dynamical system models trained on Electro-Magnetic Articulograph (EMA) data. These models can be used for automatic speech recognition and to give insights into articulatory behaviour. They belong to a class of continuous-state Mark ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
We describe the modelling of articulatory movements using (hidden) dynamical system models trained on Electro-Magnetic Articulograph (EMA) data. These models can be used for automatic speech recognition and to give insights into articulatory behaviour. They belong to a class of continuous-state Markov models, which we believe can offer improved performance over conventional Hidden Markov Models (HMMs) by better accounting for the continuous nature of the underlying speech production process -- that is, the movements of the articulators. To assess the performance of our models, a simple speech recognition task was used, on which the models show promising results. 1. INTRODUCTION Our investigation of dynamical system models is motivated both by an interest in new models for speech recognition, and by the availability of new articulatory measurement data. For speech recognition, we are investigating alternatives to Hidden Markov Models (HMMs) in which speech is generally seen as a sequ...
Neural modeling and imaging of the cortical interactions underlying syllable production
- Brain and Language
, 2006
"... Keywords: speech production; model; fMRI; Broca’s area; premotor cortex; motor cortex; speech acquisition; sensorimotor learning; neural transmission delays This paper describes a neural model of speech acquisition and production that accounts for a wide range of acoustic, kinematic, and neuroimagin ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
Keywords: speech production; model; fMRI; Broca’s area; premotor cortex; motor cortex; speech acquisition; sensorimotor learning; neural transmission delays This paper describes a neural model of speech acquisition and production that accounts for a wide range of acoustic, kinematic, and neuroimaging data concerning the control of speech movements. The model is a neural network whose components correspond to regions of the cerebral cortex and cerebellum, including premotor, motor, auditory, and somatosensory cortical areas. Computer simulations of the model verify its ability to account for compensation to lip and jaw perturbations during speech. Specific anatomical locations of the model’s components are estimated, and these estimates are used to simulate fMRI experiments of simple syllable production. 1 1
The dynamics of audiovisual behavior in speech
- Speechreading by Humans and Machines: Models, Systems, and Applications, volume 150 of NATO ASI Series. Series F: Computer and Systems Sciences
, 1996
"... While it is well-known that faces provide linguistically relevant information during communication, most efforts to identify the visual correlates of the acoustic signal have focused on the shape, position and luminance of the oral aperture. In this work, we extend the analysis to full facial motion ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
While it is well-known that faces provide linguistically relevant information during communication, most efforts to identify the visual correlates of the acoustic signal have focused on the shape, position and luminance of the oral aperture. In this work, we extend the analysis to full facial motion under the assumption that the process of producing speech acoustics generates linguistically salient visual information, which is distributed over large portions of the face. Support for this is drawn from our recent studies of the eye movements of perceivers during a variety of audiovisual speech perception tasks. These studies suggest that perceivers detect visual information at low spatial frequencies and that such information may not be restricted to the region of the oral aperture. Since the biomechanical linkage between the facial and vocal tract systems is one of close proximity and shared physiology, we propose that physiological models of speech and facial motion be integrated into one audiovisual model of speech production. In addition to providing a coherent account of audiovisual motor control, the proposed model could become a useful experimental tool, providing synthetic audiovisual stimuli with realistic control parameters. 2 1.

