Results 1 - 10
of
71
Shortlist: a connectionist model of continuous speech recognition
- Cognition
, 1994
"... Previous work has shown how a back-propagation network with recurrent connections can successfully model many aspects of human spoken word recogni-tion (Norris, 1988, 1990, 1992, 1993). However, such networks are unable to revise their decisions in the light of subsequent context. TRACE (McClelland ..."
Abstract
-
Cited by 117 (5 self)
- Add to MetaCart
Previous work has shown how a back-propagation network with recurrent connections can successfully model many aspects of human spoken word recogni-tion (Norris, 1988, 1990, 1992, 1993). However, such networks are unable to revise their decisions in the light of subsequent context. TRACE (McClelland & Elman, 1986), on the other hand, manages to deal appropriately with following context, but only by using a highly implausible architecture that fails to account for some important experimental results. A new model is presented which displays the more desirable properties of each of these models. In contrast to TRACE the new model is entirely bottom-up and can readily perform simulations with vocabularies of tens of thousands of words. 1.
Heterogeneous Acoustic Measurement And Multiple Classifiers For Speech Recognition
, 1998
"... The acoustic-phonetic modeling component of most current speech recognition systems calculates a small set of homogeneous frame-based measurements at a single, #xed time-frequency resolution. This thesis presents evidence indicating that recognition performance can be signi#cantly improved through a ..."
Abstract
-
Cited by 29 (1 self)
- Add to MetaCart
The acoustic-phonetic modeling component of most current speech recognition systems calculates a small set of homogeneous frame-based measurements at a single, #xed time-frequency resolution. This thesis presents evidence indicating that recognition performance can be signi#cantly improved through a contrasting approach using more detailed and more diverse acoustic measurements, which we refer to as heterogeneous measurements.
Integration of acoustic and visual speech signals using neural networks
- IEEE Communications Magazine
, 1989
"... rely almost exclusively on the acoustic speech signal and, consequently, these systems often perform poorly in noisy environments [I]. Attempts to clean up the acoustic input have had limited success [2]. Another approach is to use other sources of speech information, such as visual speech signals. ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
rely almost exclusively on the acoustic speech signal and, consequently, these systems often perform poorly in noisy environments [I]. Attempts to clean up the acoustic input have had limited success [2]. Another approach is to use other sources of speech information, such as visual speech signals. The perception of acoustic speech by humans can be affected by the visible speech signals [3-51. Specifically, when the acoustic signal is degraded by noise, the visual signal can provide supplemental speech information that improves speech perception [6-81. When no acoustic signal is available, as for the profoundly deaf, the visual signal alone can provide speech information through lip reading [9- 1 I]. Here we answer two questions: Can the speech information conveyed by visual speech signals be extracted automatically? How can this information be combined with information from the acoustic signal to improve automat
Multimodal Interfaces
- Artificial Intelligence Review Journal, special issue
, 1994
"... In this paper, we present an overview of research in our laboratories on Multimodal Human Computer Interfaces. The goal for such interfaces is to free human computer interaction from the limitations and acceptance barriers due to rigid operating commands and keyboards as only/main I/O-device. Instea ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
In this paper, we present an overview of research in our laboratories on Multimodal Human Computer Interfaces. The goal for such interfaces is to free human computer interaction from the limitations and acceptance barriers due to rigid operating commands and keyboards as only/main I/O-device. Instead we move to involve all available human communication modalities. These human modalities include Speech, Gesture and Pointing,
Unsupervised Classification Learning from Cross-Modal Environmental Structure
, 1994
"... This dissertation addresses the problem of unsupervised learning for pattern classification or category learning. A model that is based on gross cortical anatomy and implements biologically plausible computations is developed and shown to have classification power approaching that of a supervised di ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
This dissertation addresses the problem of unsupervised learning for pattern classification or category learning. A model that is based on gross cortical anatomy and implements biologically plausible computations is developed and shown to have classification power approaching that of a supervised discriminant algorithm. The advantage of supervised learning is that the final error metric is available during training. Unfortunately, when modeling human category learning, or in constructing classifiers for autonomous robots, one must deal with not having an omniscient entity labeling all incoming sensory patterns. We show that we can substitute for the labels by making use of structure between the pattern distributions to different sensory modalities. For example the co-occurrence of a visual image of a cow with a "moo" sound can be used to simultaneously develop appropriate visual features for distinguishing the cow image and appropriate auditory features for recognizing the moo. We mode...
A temporal ratio model of memory
- Psychological Review
, 2007
"... A model of memory retrieval is described. The model embodies 4 main claims: (a) temporal memory— traces of items are represented in memory partly in terms of their temporal distance from the present; (b) scale-similarity—similar mechanisms govern retrieval from memory over many different timescales; ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
A model of memory retrieval is described. The model embodies 4 main claims: (a) temporal memory— traces of items are represented in memory partly in terms of their temporal distance from the present; (b) scale-similarity—similar mechanisms govern retrieval from memory over many different timescales; (c) local distinctiveness—performance on a range of memory tasks is determined by interference from near psychological neighbors; and (d) interference-based forgetting—all memory loss is due to interference and not trace decay. The model is applied to data on free recall and serial recall. The account emphasizes qualitative similarity in the retrieval principles involved in memory performance at all timescales, contrary to models that emphasize distinctions between short-term and long-term memory.
The Generalized Universal Law of Generalization
- Journal of Mathematical Psychology
, 2001
"... It has been argued by Shepard that there is a robust psychological law that relates the distance between a pair of items in psychological space and the probability that they will be confused with each other. Specifically, the probability of confusion is a negative exponential function of the dista ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
It has been argued by Shepard that there is a robust psychological law that relates the distance between a pair of items in psychological space and the probability that they will be confused with each other. Specifically, the probability of confusion is a negative exponential function of the distance between the pair of items. In experimental contexts, distance is typically defined in terms of a multidimensional Euclidean space---but this assumption seems unlikely to hold for complex stimuli. We show that, nonetheless, the Universal Law of Generalization can be derived in the more complex setting of arbitrary stimuli, using a much more universal measure of distance. This universal distance is defined as the length of the shortest program that transforms the representations of the two items of interest into one another: the algorithmic information distance. It is universal in the sense that it minorizes every computable distance: it is the smallest computable distance. We show ...
LANDMARK-BASED SPEECH RECOGNITION: REPORT OF THE 2004 Johns Hopkins Summer Workshop
, 2005
"... ..."
Articulatory Features for Robust Visual Speech Recognition
, 2004
"... Visual information has been shown to improve the performance of speech recognition systems in noisy acoustic environments. However, most audio-visual speech recognizers rely on a clean visual signal. In this paper, we explore a novel approach to visual speech modeling, based on articulatory features ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
Visual information has been shown to improve the performance of speech recognition systems in noisy acoustic environments. However, most audio-visual speech recognizers rely on a clean visual signal. In this paper, we explore a novel approach to visual speech modeling, based on articulatory features, which has potential benefits under visually challenging conditions. The idea is to use a set of parallel SVM classifiers to extract different articulatory attributes from the input images, and then combine their decisions to obtain higher-level units, such as visemes or words. We evaluate our approach in a preliminary experiment on a small audio-visual database, using several image noise conditions, and compare it to the standard viseme-based modeling approach.

