Results 1 - 10
of
66
Learning to detect objects in images via a sparse, part-based representation
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2004
"... Abstract — We study the problem of detecting objects in still, grayscale images. Our primary focus is development of a learning-based approach to the problem, that makes use of a sparse, part-based representation. A vocabulary of distinctive object parts is automatically constructed from a set of sa ..."
Abstract
-
Cited by 203 (1 self)
- Add to MetaCart
Abstract — We study the problem of detecting objects in still, grayscale images. Our primary focus is development of a learning-based approach to the problem, that makes use of a sparse, part-based representation. A vocabulary of distinctive object parts is automatically constructed from a set of sample images of the object class of interest; images are then represented using parts from this vocabulary, together with spatial relations observed among the parts. Based on this representation, a learning algorithm is used to automatically learn to detect instances of the object class in new images. The approach can be applied to any object with distinguishable parts in a relatively fixed spatial configuration; it is evaluated here on difficult sets of real-world images containing side views of cars, and is seen to successfully detect objects in varying conditions amidst background clutter and mild occlusion. In evaluating object detection approaches, several important methodological issues arise that have not been satisfactorily addressed in previous work. A secondary focus of this paper is to highlight these issues and to develop rigorous evaluation standards for the object detection problem. A critical evaluation of our approach under the proposed standards is presented.
A theory of object recognition: computations and circuits in the feedforward path of the ventral stream in primate visual cortex
, 2005
"... ..."
A limit to the speed of processing in ultra-rapid visual categorization of novel natural scenes
- Journal of Cognitive Neuroscience
, 2001
"... & The processing required to decide whether a briefly flashed natural scene contains an animal can be achieved in 150 msec (Thorpe, Fize, & Marlot, 1996). Here we report that extensive training with a subset of photographs over a 3-week period failed to increase the speed of the processing underlyi ..."
Abstract
-
Cited by 37 (9 self)
- Add to MetaCart
& The processing required to decide whether a briefly flashed natural scene contains an animal can be achieved in 150 msec (Thorpe, Fize, & Marlot, 1996). Here we report that extensive training with a subset of photographs over a 3-week period failed to increase the speed of the processing underlying such rapid visual categorizations: Completely novel scenes could be categorized just as fast as highly familiar ones. Such data imply that the visual system processes new stimuli at a speed and with a number of stages that cannot be compressed. This rapid processing mode was seen with a wide range of visual complex images challenging the idea that short reaction times can only be seen with simple visual stimuli and implying that highly automatic feed-forward mechanisms underlie a far greater proportion of the sophisticated image analysis needed for everyday vision than is generally assumed. & Both humans and monkeys are able to categorize natural images accurately and very rapidly (Fabre-Thorpe, Richard, & Thorpe, 1998; Thorpe, Fize, & Marlot, 1996). The nature of the underlying mechanisms is currently
Minimizing Binding Errors Using Learned Conjunctive Features
, 2000
"... this article, we describe our work to test a simple analytical model that captures several trade-offs governing the performance of visual recognition systems based on spatially invariant conjunctive features. In addition, we introduce a supervised greedy algorithm for feature learning that grows a v ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
this article, we describe our work to test a simple analytical model that captures several trade-offs governing the performance of visual recognition systems based on spatially invariant conjunctive features. In addition, we introduce a supervised greedy algorithm for feature learning that grows a visual representation in such a way as to minimize false-positive recognition errors. Finally, we consider some of the surprising properties of "good" representations and the implications of our results for more realistic visual recognition problems.
Cue-invariant activation in object-related areas of the human occipital lobe. Neuron 21:191–202
- Neuron
, 1998
"... performing a specific visual task (for an exposition of this issue, see Ungerleider and Haxby, 1994; DeYoe et al., 1994; Goodale et al., 1994). Single neuron recordings in the macaque provide evidence both for segregation of visual cues into different †Diagnostic Imaging Department channels (Livings ..."
Abstract
-
Cited by 29 (1 self)
- Add to MetaCart
performing a specific visual task (for an exposition of this issue, see Ungerleider and Haxby, 1994; DeYoe et al., 1994; Goodale et al., 1994). Single neuron recordings in the macaque provide evidence both for segregation of visual cues into different †Diagnostic Imaging Department channels (Livingstone and Hubel, 1988; DeYoe et al., The Chaim Sheba Medical Center 1994) and for convergence of several primary cues even Tel Hashomer 52621 at the level of single cortical neurons (Sary et al., 1993). Israel Similarly, studies of anatomical connections indicate ‡School of Cognitive and Computing Sciences the existence of parallel specialized cortical streams University of Sussex, Falmer (Young, 1992) but also show substantial interstream Brighton BN1 9QH communication both between areas (reviewed by Felle-
Computational Theories of Object Recognition
- Trends in Cognitive Science
, 1997
"... This paper examines four current theoretical approaches to the representation and recognition of visual objects: structural descriptions, geometric constraints, multidimensional feature spaces, and shape-space approximation. The strengths and the weaknesses of the theories are considered, with a spe ..."
Abstract
-
Cited by 24 (5 self)
- Add to MetaCart
This paper examines four current theoretical approaches to the representation and recognition of visual objects: structural descriptions, geometric constraints, multidimensional feature spaces, and shape-space approximation. The strengths and the weaknesses of the theories are considered, with a special focus on their approach to categorization --- a computationally challenging task which is not widely addressed in computer vision (where the stress is rather on the generalization of recognition across changes of viewpoint).
EMPATH: A Neural Network that Categorizes Facial Expressions
- Journal of cognitive neuroscience
, 2002
"... & There are two competing theories of facial expression recognition. Some researchers have suggested that it is an example of ‘‘categorical perception.’ ’ In this view, expression categories are considered to be discrete entities with sharp boundaries, and discrimination of nearby pairs of expressiv ..."
Abstract
-
Cited by 24 (7 self)
- Add to MetaCart
& There are two competing theories of facial expression recognition. Some researchers have suggested that it is an example of ‘‘categorical perception.’ ’ In this view, expression categories are considered to be discrete entities with sharp boundaries, and discrimination of nearby pairs of expressive faces is enhanced near those boundaries. Other researchers, however, suggest that facial expression perception is more graded and that facial expressions are best thought of as points in a continuous, low-dimensional space, where, for instance, ‘‘surprise’ ’ expressions lie between ‘‘happiness’ ’ and ‘‘fear’’ expressions due to their perceptual similarity. In this article, we show that a simple yet biologically plausible neural network model, trained to classify facial expressions into six basic emotions, predicts data used to support both of these theories. Without any parameter tuning, the model matches a variety of psychological data on categorization, similarity, reaction times, discrimination, and recognition difficulty, both qualitatively and quantitatively. We thus explain many of the seemingly complex psychological phenomena related to facial expression perception as natural consequences of the tasks’ implementations in the brain. &
Can Face Recognition Really Be Dissociated From Object Recognition?
- JOURNAL OF COGNITIVE NEUROSCIENCE
, 1999
"... ..."
Predictability, Complexity, and Learning
, 2001
"... We define predictive information Ipred(T) as the mutual information between the past and the future of a time series. Three qualitatively different behaviors are found in the limit of large observation times T: Ipred(T) can remain finite, grow logarithmically, or grow as a fractional power law. If t ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
We define predictive information Ipred(T) as the mutual information between the past and the future of a time series. Three qualitatively different behaviors are found in the limit of large observation times T: Ipred(T) can remain finite, grow logarithmically, or grow as a fractional power law. If the time series allows us to learn a model with a finite number of parameters, then Ipred(T) grows logarithmically with a coefficient that counts the dimensionality of the model space. In contrast, power-law growth is associated, for example, with the learning of infinite parameter (or nonparametric) models such as continuous functions with smoothness constraints. There are connections between the predictive information and measures of complexity that have been defined both in learning theory and the analysis of physical systems through statistical mechanics and dynamical systems theory. Furthermore, in the same way that entropy provides the unique measure of available information consistent with some simple and plausible conditions, we argue that the divergent part of Ipred(T) provides the unique measure for the complexity of dynamics underlying a time series. Finally, we discuss how these ideas may be useful in problems in physics, statistics, and biology.
Learning Representative Local Features For Face Detection
"... This paper describes a face detection approach via learning local features. The key idea is that local features, being manifested by a collection of pixels in a local region, are learnt from the training set instead of arbitrarily defined. The learning procedure consists of two steps. First, a modif ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
This paper describes a face detection approach via learning local features. The key idea is that local features, being manifested by a collection of pixels in a local region, are learnt from the training set instead of arbitrarily defined. The learning procedure consists of two steps. First, a modified version of NMF (Non-negative Matrix Factorization), namely local NMF (LNMF), is applied to get an overcomplete set of local features. Second, a learning algorithm based on AdaBoost is used to select a small number of local features and yields extremely efficient classifiers. Experiments are presented which show that the face detection performance is comparable to the state-of-the-art face detection systems.

