Results 1  10
of
202
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope
 International Journal of Computer Vision
, 2001
"... In this paper, we propose a computational model of the recognition of real world scenes that bypasses the segmentation and the processing of individual objects or regions. The procedure is based on a very low dimensional representation of the scene, that we term the Spatial Envelope. We propose a se ..."
Abstract

Cited by 1287 (81 self)
 Add to MetaCart
(Show Context)
In this paper, we propose a computational model of the recognition of real world scenes that bypasses the segmentation and the processing of individual objects or regions. The procedure is based on a very low dimensional representation of the scene, that we term the Spatial Envelope. We propose a set of perceptual dimensions (naturalness, openness, roughness, expansion, ruggedness) that represent the dominant spatial structure of a scene. Then, we show that these dimensions may be reliably estimated using spectral and coarsely localized information. The model generates a multidimensional space in which scenes sharing membership in semantic categories (e.g., streets, highways, coasts) are projected closed together. The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.
Sparse coding with an overcomplete basis set: a strategy employed by V1
 Vision Research
, 1997
"... The spatial receptive fields of simple cells in mammalian striate cortex have been reasonably well described physiologically and can be characterized as being localized, oriented, and ban@ass, comparable with the basis functions of wavelet transforms. Previously, we have shown that these receptive f ..."
Abstract

Cited by 954 (12 self)
 Add to MetaCart
The spatial receptive fields of simple cells in mammalian striate cortex have been reasonably well described physiologically and can be characterized as being localized, oriented, and ban@ass, comparable with the basis functions of wavelet transforms. Previously, we have shown that these receptive field properties may be accounted for in terms of a strategy for producing a sparse distribution of output activity in response to natural images. Here, in addition to describing this work in a more expansive fashion, we examine the neurobiological implications of sparse coding. Of particular interest is the case when the code is overcompletei.e., when the number of code elements is greater than the effective dimensionality of the input space. Because the basis functions are nonorthogonal and not linearly independent of each other, sparsifying the code will recruit only those basis functions necessary for representing a given input, and so the inputoutput function will deviate from being purely linear. These deviations from linearity provide a potential explanation for the weak forms of nonlinearity observed in the response properties of cortical simple cells, and they further make predictions about the expected interactions among units in
Classifying Facial Actions
 IEEE Trans. Pattern Anal and Machine Intell
, 1999
"... AbstractÐThe Facial Action Coding System (FACS) [23] is an objective method for quantifying facial movement in terms of component actions. This system is widely used in behavioral investigations of emotion, cognitive processes, and social interaction. The coding is presently performed by highly trai ..."
Abstract

Cited by 340 (36 self)
 Add to MetaCart
AbstractÐThe Facial Action Coding System (FACS) [23] is an objective method for quantifying facial movement in terms of component actions. This system is widely used in behavioral investigations of emotion, cognitive processes, and social interaction. The coding is presently performed by highly trained human experts. This paper explores and compares techniques for automatically recognizing facial actions in sequences of images. These techniques include analysis of facial motion through estimation of optical flow; holistic spatial analysis, such as principal component analysis, independent component analysis, local feature analysis, and linear discriminant analysis; and methods based on the outputs of local filters, such as Gabor wavelet representations and local principal components. Performance of these systems is compared to naive and expert human subjects. Best performances were obtained using the Gabor wavelet representation and the independent component representation, both of which achieved 96 percent accuracy for classifying 12 facial actions of the upper and lower face. The results provide converging evidence for the importance of using local filters, high spatial frequencies, and statistical independence for classifying facial actions.
Face recognition by independent component analysis
 IEEE Transactions on Neural Networks
, 2002
"... Abstract—A number of current face recognition algorithms use face representations found by unsupervised statistical methods. Typically these methods find a set of basis images and represent faces as a linear combination of those images. Principal component analysis (PCA) is a popular example of such ..."
Abstract

Cited by 333 (5 self)
 Add to MetaCart
(Show Context)
Abstract—A number of current face recognition algorithms use face representations found by unsupervised statistical methods. Typically these methods find a set of basis images and represent faces as a linear combination of those images. Principal component analysis (PCA) is a popular example of such methods. The basis images found by PCA depend only on pairwise relationships between pixels in the image database. In a task such as face recognition, in which important information may be contained in the highorder relationships among pixels, it seems reasonable to expect that better basis images may be found by methods sensitive to these highorder statistics. Independent component analysis (ICA), a generalization of PCA, is one such method. We used a version of ICA derived from the principle of optimal information transfer through sigmoidal neurons. ICA was performed on face images in the FERET database under two different architectures, one which treated the images as random variables and the pixels as outcomes, and a second which treated the pixels as random variables and the images as outcomes. The first architecture found spatially local basis images for the faces. The second architecture produced a factorial face code. Both ICA representations were superior to representations based on PCA for recognizing faces across days and changes in expression. A classifier that combined the two ICA representations gave the best performance. Index Terms—Eigenfaces, face recognition, independent component analysis (ICA), principal component analysis (PCA), unsupervised learning. I.
Non Linear Neurons in the Low Noise Limit: A Factorial Code Maximizes Information Transfer
, 1994
"... We investigate the consequences of maximizing information transfer in a simple neural network (one input layer, one output layer), focussing on the case of non linear transfer functions. We assume that both receptive fields (synaptic efficacies) and transfer functions can be adapted to the environm ..."
Abstract

Cited by 163 (18 self)
 Add to MetaCart
We investigate the consequences of maximizing information transfer in a simple neural network (one input layer, one output layer), focussing on the case of non linear transfer functions. We assume that both receptive fields (synaptic efficacies) and transfer functions can be adapted to the environment. The main result is that, for bounded and invertible transfer functions, in the case of a vanishing additive output noise, and no input noise, maximization of information (Linsker'sinfomax principle) leads to a factorial code  hence to the same solution as required by the redundancy reduction principle of Barlow. We show also that this result is valid for linear, more generally unbounded, transfer functions, provided optimization is performed under an additive constraint, that is which can be written as a sum of terms, each one being specific to one output neuron. Finally we study the effect of a non zero input noise. We find that, at first order in the input noise, assumed to be small ...
Pointtopoint connectivity between neuromorphic chips using addressevents
 IEEE Trans. Circuits Syst. II
, 2000
"... Abstract — I discuss connectivity between neuromorphic chips, which use the timing of fixedheight, fixedwidth, pulses to encode information. Addressevents—log2 (N)bit packets that uniquely identify one of N neurons—are used to transmit these pulses in realtime on a randomaccess, timemultiplex ..."
Abstract

Cited by 128 (19 self)
 Add to MetaCart
(Show Context)
Abstract — I discuss connectivity between neuromorphic chips, which use the timing of fixedheight, fixedwidth, pulses to encode information. Addressevents—log2 (N)bit packets that uniquely identify one of N neurons—are used to transmit these pulses in realtime on a randomaccess, timemultiplexed, communication channel. Activity is assumed to consist of neuronal ensembles—spikes clustered in space and in time. I quantify tradeoffs faced in allocating bandwidth, granting access, and queuing, as well as throughput requirements, and conclude that an arbitered channel design is the best choice. I implement the arbitered channel with a formal design methodology for asynchronous digital VLSI CMOS systems, after introducing the reader to this topdown synthesis technique. Following the evolution of three generations of designs, I show how the overhead of arbitrating, and encoding and decoding, can be reduced in area (from N to √ N) by organizing neurons into rows and columns, and reduced in time (from log2 (N) to 2) by exploiting locality in the arbiter tree and in the row–column architecture, and clustered activity. Throughput is boosted by pipelining and by reading spikes in parallel. Simple techniques that reduce crosstalk in these mixed analog–digital systems are described.
Responses of Neurons in Primary and Inferior Temporal Visual Cortices to Natural Scenes
, 1997
"... Introduction It has been suggested that visual representations are optimised to transmit the maximum information about the images encountered in everyday life (Uttley, 1973; Linsker, 1988; Barlow, 1989). This simple assumption has proven sufficient to account for the characteristics of large monopo ..."
Abstract

Cited by 125 (6 self)
 Add to MetaCart
(Show Context)
Introduction It has been suggested that visual representations are optimised to transmit the maximum information about the images encountered in everyday life (Uttley, 1973; Linsker, 1988; Barlow, 1989). This simple assumption has proven sufficient to account for the characteristics of large monopolar cells in the fly (Srinivasan et al., 1982; Hateren, 1992; Laughlin, 1981), the temporal characteristics of retinal ganglion cells (Dong & Atick, 1995), human spatial frequency thresholds (Atick & Redlich, 1992; Van Hateren, 1993), and the psychophysics of orientation perception for short presentation times (Baddeley & Hancock, 1991). Maximisation of information is a powerful theoretical principle that leads to testable predictions about the firing patterns of neurons. However, to generate specific predictions we must make some assumptions about the nature of the neural code and the type of constraint that limits its information carrying capacity. To appl
Origins of Scaling in Natural Images
, 1997
"... One of the most robust qualities of our visual world is the scaleinvariance of natural images. Not only has scaling been found in different visual environments, but the phenomenon also appears to be calibration independent. This paper proposes a simple property of natural images which explains this ..."
Abstract

Cited by 110 (3 self)
 Add to MetaCart
One of the most robust qualities of our visual world is the scaleinvariance of natural images. Not only has scaling been found in different visual environments, but the phenomenon also appears to be calibration independent. This paper proposes a simple property of natural images which explains this robustness: They are collages of regions corresponding to statistically independent "objects". Evidence is provided for these objects having a powerlaw distribution of sizes within images, from which follows scaling in natural images. It is commonly suggested that scaling instead results from edges, each with power spectrum 1/k². This hypothesis is refuted by example.
Ecological statistics of Gestalt laws for the perceptual organization of contours
, 2002
"... Although numerous studies have measured the strength of visual grouping cues for controlled psychophysical stimuli, little is known about the statistical utility of these various cues for natural images. In this study, we conducted eFperiments in which human participants trace perceived contours in ..."
Abstract

Cited by 99 (6 self)
 Add to MetaCart
Although numerous studies have measured the strength of visual grouping cues for controlled psychophysical stimuli, little is known about the statistical utility of these various cues for natural images. In this study, we conducted eFperiments in which human participants trace perceived contours in natural images. These contours are automatically mapped to seGuences of discrete tangent elements detected in the image. By eFamining relational properties between pairs of successive tangents on these traced curves, and between randomly selected pairs of tangents, we are able to estimate the likelihood distributions reGuired to construct an optimal Bayesian model for contour grouping. We employed this novel methodology to investigate the inferential power of three classical Gestalt cues for contour groupingJ proFimity, good continuation, and luminance similarity. The study yielded a number of important resultsJ K1M these cues, when appropriately defined, are approFimately uncorrelated, suggesting a simple factorial model for statistical inferenceN K2M moderate imagetoimage variation of the statistics indicates the utility of general probabilistic models for perceptual organiQationN KRM these cues differ greatly in their inferential power, proFimity being by far the most powerfulN and KSM statistical modeling of the proFimity cue indicates a scaleinvariant power law in close agreement with prior psychophysics.