Results 1 - 10
of
57
Shiftable Multi-scale Transforms
, 1992
"... Orthogonal wavelet transforms have recently become a popular representation for multiscale signal and image analysis. One of the major drawbacks of these representations is their lack of translation invariance: the content of wavelet subbands is unstable under translations of the input signal. Wavel ..."
Abstract
-
Cited by 365 (34 self)
- Add to MetaCart
Orthogonal wavelet transforms have recently become a popular representation for multiscale signal and image analysis. One of the major drawbacks of these representations is their lack of translation invariance: the content of wavelet subbands is unstable under translations of the input signal. Wavelet transforms are also unstable with respect to dilations of the input signal, and in two dimensions, rotations of the input signal. We formalize these problems by defining a type of translation invariance that we call "shiftability". In the spatial domain, shiftability corresponds to a lack of aliasing; thus, the conditions under which the property holds are specified by the sampling theorem. Shiftability may also be considered in the context of other domains, particularly orientation and scale. We explore "jointly shiftable" transforms that are simultaneously shiftable in more than one domain. Two examples of jointly shiftable transforms are designed and implemented: a one-dimensional tran...
Contour and Texture Analysis for Image Segmentation
, 2001
"... This paper provides an algorithm for partitioning grayscale images into disjoint regions of coherent brightness and texture. Natural images contain both textured and untextured regions, so the cues of contour and texture differences are exploited simultaneously. Contours are treated in the interveni ..."
Abstract
-
Cited by 233 (27 self)
- Add to MetaCart
This paper provides an algorithm for partitioning grayscale images into disjoint regions of coherent brightness and texture. Natural images contain both textured and untextured regions, so the cues of contour and texture differences are exploited simultaneously. Contours are treated in the intervening contour framework, while texture is analyzed using textons. Each of these cues has a domain of applicability, so to facilitate cue combination we introduce a gating operator based on the texturedness of the neighborhood at a pixel. Having obtained a local measure of how likely two nearby pixels are to belong to the same region, we use the spectral graph theoretic framework of normalized cuts to find partitions of the image into regions of coherent texture and brightness. Experimental results on a wide range of images are shown.
The steerable pyramid: A flexible architecture for multi-scale derivative computation
, 1995
"... We describe an architecture for efficient and accurate linear decomposition of an image into scale and orientation subbands. The basis functions of this decomposition are directional derivative operators of any desired order. We describe the construction and implementation of the transform. 1 Differ ..."
Abstract
-
Cited by 177 (24 self)
- Add to MetaCart
We describe an architecture for efficient and accurate linear decomposition of an image into scale and orientation subbands. The basis functions of this decomposition are directional derivative operators of any desired order. We describe the construction and implementation of the transform. 1 Differential algorithms are used in a wide variety of image processing problems. For example, gradient measurements are used as a first stage of many edge detection, depth-from-stereo, and optical flow algorithms. Higher-order derivatives have also been found useful in these applications. Extraction of these derivative quantities may be viewed as a decomposition of a signal via terms of a local Taylor series expansions [1]. Another widespread tool in signal and image processing is multi-scale decomposition. Apart from the advantages of decomposing signals into information at different scales, the typical recursive form of these algorithms leads to large improvements in computational efficiency. Many authors have combined multi-scale decompositions with differential measurements (eg., [2, 3]). In these cases, a multi-scale pyramid is constructed, and then differential operators (typically, differences of neighboring pixels) are applied to the subbands of the pyramid. Since both the pyramid decomposition and the derivative operation are linear and shift-invariant, we may combine them into a single operation. The advantages of doing so are that the resulting derivatives may be more accurate (see [4]). In this paper, we propose a simple, efficient decomposition architecture for combining these two operations. The decomposition is the latest incarnation of 1 Source code and filter kernels for implementation of the steerable pyramid are available via anonymous ftp from ftp.cis.upenn.edu:pub/eero/steerpyr.tar.Z
An Active Vision Architecture based on Iconic Representations
- Artificial Intelligence
, 1995
"... Active vision systems have the capability of continuously interacting with the environment. The rapidly changing environment of such systems means that it is attractive to replace static representations with visual routines that compute information on demand. Such routines place a premium on image d ..."
Abstract
-
Cited by 116 (12 self)
- Add to MetaCart
Active vision systems have the capability of continuously interacting with the environment. The rapidly changing environment of such systems means that it is attractive to replace static representations with visual routines that compute information on demand. Such routines place a premium on image data structures that are easily computed and used. The purpose of this paper is to propose a general active vision architecture based on efficiently computable iconic representations. This architecture employs two primary visual routines, one for identifying the visual image near the fovea (object identification), and another for locating a stored prototype on the retina (object location). This design allows complex visual behaviors to be obtained by composing these two routines with different parameters. The iconic representations are comprised of high-dimensional feature vectors obtained from the responses of an ensemble of Gaussian derivative spatial filters at a number of orientations and...
Deformable Kernels for Early Vision
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1991
"... Early vision algorithms often have a first stage of linear-filtering that `extracts' from the image information at multiple scales of resolution and multiple orientations. A common difficulty in the design and implementation of such schemes is that one feels compelled to discretize coarsely the spac ..."
Abstract
-
Cited by 112 (8 self)
- Add to MetaCart
Early vision algorithms often have a first stage of linear-filtering that `extracts' from the image information at multiple scales of resolution and multiple orientations. A common difficulty in the design and implementation of such schemes is that one feels compelled to discretize coarsely the space of scales and orientations in order to reduce computation and storage costs. This discretization produces anisotropies due to a loss of traslation-, rotation-, scaling-invariance that makes early vision algorithms less precise and more difficult to design. This need not be so: one can compute and store efficiently the response of families of linear filters defined on a continuum of orientations and scales. A technique is presented that allows (1) to compute the best approximation of a given family using linear combinations of a small number of `basis' functions; (2) to describe all finite-dimensional families, i.e. the families of filters for which a finite dimensional representation is p...
Representing local structure using tensors
- Computer Vision Laboratory, Linkoping University
, 1989
"... The fundamental problem of finding a suitable representation of the orientation of 3D surfaces is considered. A representation is regarded suitable if it meets three basic requirements: Uniqueness, Uniformity and Polar separability. A suitable tensor representation is given. At the heart of the prob ..."
Abstract
-
Cited by 92 (25 self)
- Add to MetaCart
The fundamental problem of finding a suitable representation of the orientation of 3D surfaces is considered. A representation is regarded suitable if it meets three basic requirements: Uniqueness, Uniformity and Polar separability. A suitable tensor representation is given. At the heart of the problem lies the fact that orientation can only be defined mod 180 ◦ , i.e the fact that a 180 ◦ rotation of a line or a plane amounts to no change at all. For this reason representing a plane using its normal vector leads to ambiguity and such a representation is consequently not suitable. The ambiguity can be eliminated by establishing a mapping between R3 and a higherdimensional tensor space. The uniqueness requirement implies a mapping that map all pairs of 3D vectors x and-x onto the same tensor T. Uniformity implies that the mapping implicitly carries a definition of distance between 3D planes (and lines) that is rotation invariant and monotone with the angle between the planes. Polar separability means that the norm of the representing tensor T is rotation invariant. One way to describe the mapping is that it maps a 3D sphere into 6D in such a way that the surface is uni-
Television control by hand gestures
- International Workshop on Automatic Face and Gesture Recognition
, 1995
"... We study how a viewer can control a television set remotely by hand gestures. We address two fundamental issues of gesture–based human–computer interaction: (1) How can one communicate a rich set of commands without extensive user training and memorization of gestures? (2) How can the computer recog ..."
Abstract
-
Cited by 73 (3 self)
- Add to MetaCart
We study how a viewer can control a television set remotely by hand gestures. We address two fundamental issues of gesture–based human–computer interaction: (1) How can one communicate a rich set of commands without extensive user training and memorization of gestures? (2) How can the computer recognize the commands in a complicated visual environment? We made a prototype of this system using a computer workstation and a television. The graphical overlays appear on the computer screen, although they could be mixed with the video to appear on the television. The computer controls the television set through serial port commands to an electronically controlled remote control. We describe knowledge we gained from building the prototype.
Textons, contours and regions: Cue integration in image segmentation
- In International Conference on Computer Vision
, 1999
"... This paper makes two contributions. It provides (1) an operational definition of textons, the putative elementary units of texture perception, and (2) an algorithm for partitioning the image into disjoint regions of coherent brightness and texture, where boundaries of regions are defined by peaks in ..."
Abstract
-
Cited by 68 (6 self)
- Add to MetaCart
This paper makes two contributions. It provides (1) an operational definition of textons, the putative elementary units of texture perception, and (2) an algorithm for partitioning the image into disjoint regions of coherent brightness and texture, where boundaries of regions are defined by peaks in contour orientation energy and differences in texton densities across the contour. Julesz introduced the term texton, analogous to a phoneme in speech recognition, but did not provide an operational definition for gray-level images. Here we re-invent textons as frequently co-occurring combinations of oriented linear filter outputs. These can be learned using a K-means approach. By mapping each pixel to its nearest texton, the image can be analyzed into texton channels, each of which is a point set where discrete techniques such as Voronoi diagrams become applicable. Local histograms of texton frequencies can be used with a � test for significant differences to find texture boundaries. Natural images contain both textured and untextured regions, so we combine this cue with that of the presence of peaks of contour energy derived from outputs of odd- and even-symmetric oriented Gaussian derivative filters. Each of these cues has a domain of applicability, so to facilitate cue combination we introduce a gating operator based on a statistical test for isotropy of Delaunay neighbors. Having obtained a local measure of how likely two nearby pixels are to belong to the same region, we use the spectral graph theoretic framework of normalized cuts to find partitions of the image into regions of coherent texture and brightness. Experimental results on a wide range of images are shown. 1
Steerable-Scalable Kernels for Edge Detection and Junction Analysis
- Image and Vision Computing
, 1992
"... Families of kernels that are useful in a variety of early vision algorithms may be obtained by rotating and scaling in a continuum a `template' kernel. These multi-scale multi-orientation family may be approximated by linear interpolation of a discrete finite set of appropriate `basis' kernels. A sc ..."
Abstract
-
Cited by 64 (0 self)
- Add to MetaCart
Families of kernels that are useful in a variety of early vision algorithms may be obtained by rotating and scaling in a continuum a `template' kernel. These multi-scale multi-orientation family may be approximated by linear interpolation of a discrete finite set of appropriate `basis' kernels. A scheme for generating such a basis together with the appropriate interpolation weights is described. Unlike previous schemes by Perona, and Simoncelli et al. it is guaranteed to generate the most parsimonious one. Additionally, it is shown how to exploit two symmetries in edge-detection kernels for reducing storage and computational costs and generating simultaneously endstop- and junction-tuned filters for free.
Distributed Representation and Analysis of Visual Motion
, 1993
"... This thesis describes some new approaches to the representation and analysis of visual motion, as perceived by a biological or machine visual system. We begin by discussing the computation of image motion fields, the projection of motion in the three-dimensional world onto the two-dimensional image ..."
Abstract
-
Cited by 58 (3 self)
- Add to MetaCart
This thesis describes some new approaches to the representation and analysis of visual motion, as perceived by a biological or machine visual system. We begin by discussing the computation of image motion fields, the projection of motion in the three-dimensional world onto the two-dimensional image plane. This computation is notoriously difficult, and there are a wide variety of approaches that have been developed for use in image processing, machine vision, and biological modeling. We show that a large number of the basic techniques are quite similar in nature, differing primarily in conceptual motivation, and that they each fail to handle a set of situations that occur commonly in natural scenery. The central theme of the thesis is that the failure of these algorithms is due primarily to the use of vector fields as a representation for visual motion. We argue that the translational vector field representation is inherently impoverished and error-prone. Furthermore, there is evidence that a ...

