Results 1 - 10
of
33
Sound-Source Recognition: A Theory and Computational Model
, 1999
"... The ability of a normal human listener to recognize objects in the environment from only the sounds they produce is extraordinarily robust with regard to characteristics of the acoustic environment and of other competing sound sources. In contrast, computer systems designed to recognize sound source ..."
Abstract
-
Cited by 61 (0 self)
- Add to MetaCart
The ability of a normal human listener to recognize objects in the environment from only the sounds they produce is extraordinarily robust with regard to characteristics of the acoustic environment and of other competing sound sources. In contrast, computer systems designed to recognize sound sources function precariously, breaking down whenever the target sound is degraded by reverberation, noise, or competing sounds. Robust listening requires extensive contextual knowledge, but the potential contribution of sound-source recognition to the process of auditory scene analysis has largely been neglected by researchers building computational models of the scene analysis process. This thesis proposes a theory of sound-source recognition, casting recognition as a process of gathering information to enable the listener to make inferences about
Phonology, reading acquisition, and dyslexia: insights from connectionist models
- PSYCHOL. REV.
, 1999
"... The development of reading skill and bases of developmental dyslexia were explored using connectionist models. Four issues were examined: the acquisition of phonological knowledge prior to reading, how this knowledge facilitates learning to read, phonological and non phonological bases of dyslexia, ..."
Abstract
-
Cited by 52 (3 self)
- Add to MetaCart
The development of reading skill and bases of developmental dyslexia were explored using connectionist models. Four issues were examined: the acquisition of phonological knowledge prior to reading, how this knowledge facilitates learning to read, phonological and non phonological bases of dyslexia, and effects of literacy on phonological representation. Compared with simple feedforward networks, representing phonological knowledge in an attractor network yielded improved learning and generalization. Phonological and surface forms of developmental dyslexia, which are usually attributed to impairments in distinct lexical and nonlexical processing “routes,” were derived from different types of damage to the network. The results provide a computationally explicit account of many aspects of reading acquisition using connectionist principles.
Handling Missing Data In Speech Recognition
, 1994
"... In this paper, we propose a new paradigm for robust ASR based on auditory scene analysis. In previous work, we have shown how models of auditory processing and grouping principles can be used to separate the evidence for a speech signal from arbitrary intrusions. However, this evidence will generall ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
In this paper, we propose a new paradigm for robust ASR based on auditory scene analysis. In previous work, we have shown how models of auditory processing and grouping principles can be used to separate the evidence for a speech signal from arbitrary intrusions. However, this evidence will generally be incomplete since some spectrotemporal regions will be dominated by the other sources. Here, we address the problem of recognising such `occluded' speech. Two investigations are reported: the first applies unsupervised learning and subsequent recognition to spectral vectors with missing components. The second adapts the Viterbi algorithm for HMM-based ASR to the occluded speech case. Both techniques are encouragingly robust: for instance, more than half of the observation vector can be obscured without appreciable deterioration in recognition performance. Additionally, our demonstration that it is possible to learn to recognise speech from partial information suggests a model for the for...
Mid-level representations for Computational Auditory Scene Analysis
, 1998
"... In this paper we consider representations for use in models of the processing that occurs between the eardrum and our conscious experience of sound. We first list "good" properties for such mid-level representations, then present a framework within which to discuss some examples. We compare in detai ..."
Abstract
-
Cited by 22 (4 self)
- Add to MetaCart
In this paper we consider representations for use in models of the processing that occurs between the eardrum and our conscious experience of sound. We first list "good" properties for such mid-level representations, then present a framework within which to discuss some examples. We compare in detail two popularschemes --- sinusoid tracks andcorrelograms --- and propose a new representation, wefts, which seeks to combine their advantages. 1 Introduction: Mid-level representations Mid-level representation is a term usually associated with computer vision, particularly the ideas of David Marr [1982]. It has since becomeacceptedbymany in the computer audition community[Bregman,1990;Cooke,1991;Brown,1992]asan concept useful to models of hearing as well. Auditory perception may be viewed as a sequence of representations from "low" to "high," where low-level representationsare (roughly) those appropriate to describing the soundreaching the cochlea, and high-level representations are those to...
Division of Labor in a Computational Model of Visual Word Recognition
, 1998
"... xi 1 Introduction 1 1.1 Intuitions and Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Previous Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.1 The Classical Dual Route Model . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.2 Se ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
xi 1 Introduction 1 1.1 Intuitions and Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Previous Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.1 The Classical Dual Route Model . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.2 Seidenberg and McClelland 1989 . . . . . . . . . . . . . . . . . . . . . . 10 1.2.3 Plaut and Shallice 1993 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2.4 Plaut et al. 1996: Naming . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2.5 Bullinaria 1996 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.2.6 Plaut 1997: Lexical Decision . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.2.7 Harm and Seidenberg 1998: Naming . . . . . . . . . . . . . . . . . . . . 16 1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2 A New Computational Model 18 2.1 Principles . . . . . . . . . . . . . . . . . . . . . . . . ...
Auditory Scene Analysis And Hidden Markov Model Recognition Of Speech In Noise
- Proceedings of the International Conference on Acoustics, Speech and Signal Processing
, 1995
"... We describe a novel paradigm for automatic speech recognition in noisy environments in which an initial stage of auditory scene analysis separates out the evidence for the speech to be recognised from the evidence for other sounds. In general, this evidence will be incomplete, since intruding sound ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
We describe a novel paradigm for automatic speech recognition in noisy environments in which an initial stage of auditory scene analysis separates out the evidence for the speech to be recognised from the evidence for other sounds. In general, this evidence will be incomplete, since intruding sound sources will dominate some spectro-temporal regions. We generalise continuous-density hidden Markov model recognition to this `occluded speech' case. The technique is based on estimating the probability that a Gaussian mixture density distribution for an auditory firing rate map will generate an observation such that the separated components are at their observed values and the remaining components are not greater than their values in the acoustic mixture. Experiments on isolated digit recognition in noise demonstrate the potential of the new approach to yield performance comparable to that of listeners. 1. AUDITORY SCENE ANALYSIS AS A PREPROCESSOR FOR SPEECH RECOGNITION Auditory scene anal...
Pattern Theory: the Mathematics of Perception
- in ICM
, 2002
"... Introduction How can we understand intelligent behavior? How can we design intelligent computers? These are questions that have been discussed by scientists and the public at large for over 50 years. As mathematicians, however, the question we want to ask is "is there a mathematical theory underlyi ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Introduction How can we understand intelligent behavior? How can we design intelligent computers? These are questions that have been discussed by scientists and the public at large for over 50 years. As mathematicians, however, the question we want to ask is "is there a mathematical theory underlying intelligence?" I believe the first mathematical attack on these issues was Control Theory, led by Wiener and Pontryagin. They were studying how to design a controller which drives a motor a#ecting the world and also sits in a feedback loop receiving measurements from the world about the e#ect of the motor action. The goal was to control the motor so that the world, as measured, did something specific, i.e. move the tiller so that the boat stays on course. The main complication is that nothing is precisely predictable: the motor control is not exact, the world does unexpected things because of its complexities and the measurements you take of it are imprecise. All this led, in the simple
Computational and behavioral investigations of lexically induced delays in phoneme recognition
- JOURNAL OF MEMORY & LANGUAGE
, 2005
"... Previous studies have failed to demonstrate lexically induced delays in phoneme recognition, casting doubt on interactive models of speech perception. We present TRACE simulations that explain these failures: previously tested conditions failed to produce lexically induced delay effects because the ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
Previous studies have failed to demonstrate lexically induced delays in phoneme recognition, casting doubt on interactive models of speech perception. We present TRACE simulations that explain these failures: previously tested conditions failed to produce lexically induced delay effects because the input was too unambiguous and the control condition was conflated with lexical status and neighborhood structure. Since between-layer connections are solely excitatory, between-layer delay effects can emerge only indirectly through facilitation of within-layer competition. If the lexically consistent phoneme partially matches the input acoustics, it will become partially active. Additional support from lexical feedback will extend the duration of competition between the acoustically present phoneme and the lexically consistent phoneme, thus delaying detection. This prediction holds across a range of relevant parameter values. Two behavioral experiments tested and confirmed this prediction. These results answer one of the challenges to the interactive view of speech perception.
A schema-based model for phonemic restoration
- Speech Comm
, 2005
"... Phonemic restoration refers to the synthesis of missing phonemes in speech when sufficient lexical context is present. Current models for phonemic restoration however, make no use of any lexical knowledge. Such models are inherently inadequate for restoring unvoiced phonemes and may be limited in th ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Phonemic restoration refers to the synthesis of missing phonemes in speech when sufficient lexical context is present. Current models for phonemic restoration however, make no use of any lexical knowledge. Such models are inherently inadequate for restoring unvoiced phonemes and may be limited in their ability to restore voiced phonemes too. We present a predominantly top-down model for phonemic restoration. The model uses a missing data speech recognition system to recognize speech utterances as word sequences and activates word templates corresponding to the words containing the masked phonemes. An activated template is dynamically time warped to the noisy word and is then used to restore the speech frames corresponding to the masked phoneme, thereby synthesizing it. The model is able to restore both voiced and unvoiced phonemes. Systematic testing shows that this model performs better than the Kalman-filter based model. 1.

