Results 1  10
of
494
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope
 International Journal of Computer Vision
, 2001
"... In this paper, we propose a computational model of the recognition of real world scenes that bypasses the segmentation and the processing of individual objects or regions. The procedure is based on a very low dimensional representation of the scene, that we term the Spatial Envelope. We propose a se ..."
Abstract

Cited by 666 (62 self)
 Add to MetaCart
In this paper, we propose a computational model of the recognition of real world scenes that bypasses the segmentation and the processing of individual objects or regions. The procedure is based on a very low dimensional representation of the scene, that we term the Spatial Envelope. We propose a set of perceptual dimensions (naturalness, openness, roughness, expansion, ruggedness) that represent the dominant spatial structure of a scene. Then, we show that these dimensions may be reliably estimated using spectral and coarsely localized information. The model generates a multidimensional space in which scenes sharing membership in semantic categories (e.g., streets, highways, coasts) are projected closed together. The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.
Sparse coding with an overcomplete basis set: a strategy employed by V1
 Vision Research
, 1997
"... The spatial receptive fields of simple cells in mammalian striate cortex have been reasonably well described physiologically and can be characterized as being localized, oriented, and ban@ass, comparable with the basis functions of wavelet transforms. Previously, we have shown that these receptive f ..."
Abstract

Cited by 591 (7 self)
 Add to MetaCart
The spatial receptive fields of simple cells in mammalian striate cortex have been reasonably well described physiologically and can be characterized as being localized, oriented, and ban@ass, comparable with the basis functions of wavelet transforms. Previously, we have shown that these receptive field properties may be accounted for in terms of a strategy for producing a sparse distribution of output activity in response to natural images. Here, in addition to describing this work in a more expansive fashion, we examine the neurobiological implications of sparse coding. Of particular interest is the case when the code is overcompletei.e., when the number of code elements is greater than the effective dimensionality of the input space. Because the basis functions are nonorthogonal and not linearly independent of each other, sparsifying the code will recruit only those basis functions necessary for representing a given input, and so the inputoutput function will deviate from being purely linear. These deviations from linearity provide a potential explanation for the weak forms of nonlinearity observed in the response properties of cortical simple cells, and they further make predictions about the expected interactions among units in
Distortion invariant object recognition in the dynamic link architecture
 IEEE Transactions on Computers
, 1993
"... AbstractWe present an object recognition system based ..."
Abstract

Cited by 491 (54 self)
 Add to MetaCart
AbstractWe present an object recognition system based
Feature detection with automatic scale selection
 International Journal of Computer Vision
, 1998
"... The fact that objects in the world appear in different ways depending on the scale of observation has important implications if one aims at describing them. It shows that the notion of scale is of utmost importance when processing unknown measurement data by automatic methods. In their seminal works ..."
Abstract

Cited by 488 (26 self)
 Add to MetaCart
The fact that objects in the world appear in different ways depending on the scale of observation has important implications if one aims at describing them. It shows that the notion of scale is of utmost importance when processing unknown measurement data by automatic methods. In their seminal works, Witkin (1983) and Koenderink (1984) proposed to approach this problem by representing image structures at different scales in a socalled scalespace representation. Traditional scalespace theory building on this work, however, does not address the problem of how to select local appropriate scales for further analysis. This article proposes a systematic methodology for dealing with this problem. A framework is proposed for generating hypotheses about interesting scale levels in image data, based on a general principle stating that local extrema over scales of different combinations of γnormalized derivatives are likely candidates to correspond to interesting structures. Specifically, it is shown how this idea can be used as a major mechanism in algorithms for automatic scale selection, which
The "Independent Components" of Natural Scenes are Edge Filters
, 1997
"... It has previously been suggested that neurons with line and edge selectivities found in primary visual cortex of cats and monkeys form a sparse, distributed representation of natural scenes, and it has been reasoned that such responses should emerge from an unsupervised learning algorithm that attem ..."
Abstract

Cited by 477 (27 self)
 Add to MetaCart
It has previously been suggested that neurons with line and edge selectivities found in primary visual cortex of cats and monkeys form a sparse, distributed representation of natural scenes, and it has been reasoned that such responses should emerge from an unsupervised learning algorithm that attempts to find a factorial code of independent visual features. We show here that a new unsupervised learning algorithm based on information maximization, a nonlinear "infomax" network, when applied to an ensemble of natural scenes produces sets of visual filters that are localized and oriented. Some of these filters are Gaborlike and resemble those produced by the sparsenessmaximization network. In addition, the outputs of these filters are as independent as possible, since this infomax network performs Independent Components Analysis or ICA, for sparse (supergaussian) component distributions. We compare the resulting ICA filters and their associated basis functions, with other decorrelating filters produced by Principal Components Analysis (PCA) and zerophase whitening filters (ZCA). The ICA filters have more sparsely distributed (kurtotic) outputs on natural scenes. They also resemble the receptive fields of simple cells in visual cortex, which suggests that these neurons form a natural, informationtheoretic
Image denoising using a scale mixture of Gaussians in the wavelet domain
 IEEE Trans Image Processing
, 2003
"... Abstract—We describe a method for removing noise from digital images, based on a statistical model of the coefficients of an overcomplete multiscale oriented basis. Neighborhoods of coefficients at adjacent positions and scales are modeled as the product of two independent random variables: a Gaussi ..."
Abstract

Cited by 350 (18 self)
 Add to MetaCart
Abstract—We describe a method for removing noise from digital images, based on a statistical model of the coefficients of an overcomplete multiscale oriented basis. Neighborhoods of coefficients at adjacent positions and scales are modeled as the product of two independent random variables: a Gaussian vector and a hidden positive scalar multiplier. The latter modulates the local variance of the coefficients in the neighborhood, and is thus able to account for the empirically observed correlation between the coefficient amplitudes. Under this model, the Bayesian least squares estimate of each coefficient reduces to a weighted average of the local linear estimates over all possible values of the hidden multiplier variable. We demonstrate through simulations with images contaminated by additive white Gaussian noise that the performance of this method substantially surpasses that of previously published methods, both visually and in terms of mean squared error.
Nonnegative matrix factorization with sparseness constraints
 Jour. of
, 2004
"... www.cs.helsinki.fi/patrik.hoyer ..."
Independent Component Filters Of Natural Images Compared With Simple Cells In Primary Visual Cortex
, 1998
"... this article we investigate to what extent the statistical properties of natural images can be used to understand the variation of receptive field properties of simple cells in the mammalian primary visual cortex. The receptive fields of simple cells have been studied extensively (e.g., Hubel & Wies ..."
Abstract

Cited by 273 (0 self)
 Add to MetaCart
this article we investigate to what extent the statistical properties of natural images can be used to understand the variation of receptive field properties of simple cells in the mammalian primary visual cortex. The receptive fields of simple cells have been studied extensively (e.g., Hubel & Wiesel 1968, DeValois et al. 1982a, DeAngelis et al. 1993): they are localised in space and time, have bandpass characteristics in the spatial and temporal frequency domains, are oriented, and are often sensitive to the direction of motion of a stimulus. Here we will concentrate on the spatial properties of simple cells. Several hypotheses as to the function of these cells have been proposed. As the cells preferentially respond to oriented edges or lines, they can be viewed as edge or line detectors. Their joint localisation in both the spatial domain and the spatial frequency domain has led to the suggestion that they mimic Gabor filters, minimising uncertainty in both domains (Daugman 1980, Marcelja 1980). More recently, the match between the operations performed by simple cells and the wavelet transform has attracted attention (e.g., Field 1993). The approaches based on Gabor filters and wavelets basically consider processing by the visual cortex as a general image processing strategy, relatively independent of detailed assumptions about image statistics. On the other hand, the edge and line detector hypothesis is based on the intuitive notion that edges and lines are both abundant and important in images. This theme of relating simple cell properties with the statistics of natural images was explored extensively by Field (1987, 1994). He proposed that the cells are optimized specifically for coding natural images. He argued that one possibility for such a code, sparse coding...
Statistics of Natural Images and Models
"... Large calibrated datasets of `random' natural images have recently become available. These make possible precise and intensive statistical studies of the local nature of images. We report results ranging from the simplest single pixel intensity to joint distribution of 3 Haar wavelet responses. Some ..."
Abstract

Cited by 198 (5 self)
 Add to MetaCart
Large calibrated datasets of `random' natural images have recently become available. These make possible precise and intensive statistical studies of the local nature of images. We report results ranging from the simplest single pixel intensity to joint distribution of 3 Haar wavelet responses. Some of these statistics shed light on old issues such as the near scaleinvariance of image statistics and some are entirely new. We fit mathematical models to some of the statistics and explain others in terms of local image features. 1
Contextual Priming for Object Detection
 IJCV
, 2003
"... There is general consensus that context can be a rich source of information about an object's identity, location and scale. In fact, the structure of many realworld scenes is governed by strong configurational rules akin to those that apply to a single object. Here we introduce a simple framework f ..."
Abstract

Cited by 194 (19 self)
 Add to MetaCart
There is general consensus that context can be a rich source of information about an object's identity, location and scale. In fact, the structure of many realworld scenes is governed by strong configurational rules akin to those that apply to a single object. Here we introduce a simple framework for modeling the relationship between context and object properties based on the correlation between the statistics of lowlevel features across the entire scene and the objects that it contains. The resulting scheme serves as an effective procedure for object priming, context driven focus of attention and automatic scaleselection on realworld scenes.