Results 1 - 10
of
24
Geometric Context from a Single Image
- In ICCV
, 2005
"... Many computer vision algorithms limit their performance by ignoring the underlying 3D geometric structure in the image. We show that we can estimate the coarse geometric properties of a scene by learning appearance-based models of geometric classes, even in cluttered natural scenes. Geometric classe ..."
Abstract
-
Cited by 111 (27 self)
- Add to MetaCart
Many computer vision algorithms limit their performance by ignoring the underlying 3D geometric structure in the image. We show that we can estimate the coarse geometric properties of a scene by learning appearance-based models of geometric classes, even in cluttered natural scenes. Geometric classes describe the 3D orientation of an image region with respect to the camera. We provide a multiplehypothesis framework for robustly estimating scene structure from a single image and obtaining confidences for each geometric label. These confidences can then be used to improve the performance of many other applications. We provide a thorough quantitative evaluation of our algorithm on a set of outdoor images and demonstrate its usefulness in two applications: object detection and automatic singleview reconstruction.
Putting objects in perspective
- In CVPR
, 2006
"... Image understanding requires not only individually estimating elements of the visual world but also capturing the interplay among them. In this paper, we provide a framework for placing local object detection in the context of the overall 3D scene by modeling the interdependence of objects, surface ..."
Abstract
-
Cited by 106 (10 self)
- Add to MetaCart
Image understanding requires not only individually estimating elements of the visual world but also capturing the interplay among them. In this paper, we provide a framework for placing local object detection in the context of the overall 3D scene by modeling the interdependence of objects, surface orientations, and camera viewpoint. Most object detection methods consider all scales and locations in the image as equally likely. We show that with probabilistic estimates of 3D geometry, both in terms of surfaces and world coordinates, we can put objects into perspective and model the scale and location variance in the image. Our approach reflects the cyclical nature of the problem by allowing probabilistic object hypotheses to refine geometry and vice-versa. Our framework allows painless substitution of almost any object detector and is easily extended to include other aspects of image understanding. Our results confirm the benefits of our integrated approach. 1.
A Stochastic Grammar of Images
- Foundations and Trends in Computer Graphics and Vision
, 2006
"... This exploratory paper quests for a stochastic and context sensitive grammar of images. The grammar should achieve the following four objectives and thus serves as a unified framework of representation, learning, and recognition for a large number of object categories. (i) The grammar represents bot ..."
Abstract
-
Cited by 38 (8 self)
- Add to MetaCart
This exploratory paper quests for a stochastic and context sensitive grammar of images. The grammar should achieve the following four objectives and thus serves as a unified framework of representation, learning, and recognition for a large number of object categories. (i) The grammar represents both the hierarchical decompositions from scenes, to objects, parts, primitives and pixels by terminal and non-terminal nodes and the contexts for spatial and functional relations by horizontal links between the nodes. It formulates each object category as the set of all possible valid configurations produced by the grammar. (ii) The grammar is embodied in a simple And–Or graph representation where each Or-node points to alternative sub-configurations and an And-node is decomposed into a number of components. This representation supports recursive top-down/bottom-up procedures for image parsing under the Bayesian framework and make it convenient to scale
ELVIS: Eigenvectors for Land Vehicle Image System
, 1995
"... ELVIS (Eigenvectors for Land Vehicle Image System) is a road-following system designed to drive the CMU Navlabs. It is based on ALVINN, the neural network road-following system built by Dean Pomerleau at CMU. ALVINN provided the motivation for creating ELVIS: although ALVINN is successful, it is not ..."
Abstract
-
Cited by 22 (2 self)
- Add to MetaCart
ELVIS (Eigenvectors for Land Vehicle Image System) is a road-following system designed to drive the CMU Navlabs. It is based on ALVINN, the neural network road-following system built by Dean Pomerleau at CMU. ALVINN provided the motivation for creating ELVIS: although ALVINN is successful, it is not entirely clear why the system works. ELVIS is an attempt to more fully understand ALVINN and to determine whether it is possible to design a system that can rival ALVINN using the same input and output, but without using a neural network. Like ALVINN, ELVIS observes the road through a video camera and observes human steering response through encoders mounted on the steering column. After a few minutes of observing the human trainer, ELVIS can take control. ELVIS learns the eigenvectors of the image and steering training set via principal component analysis. These eigenvectors roughly correspond to the primary features of the image set and their correlations to steering. Road-following is th...
Fast and Robust Segmentation of Natural Color Scenes
- In Proceedings of the 3 rd Asian Conference on Computer Vision
, 1997
"... This paper describes our entire color segmentation system, called CSC (Color Structure Code), in detail. In section 2 we introduce the hexagonal, hierarchical island structure on which our method is based. Section 3 describes the actual segmentation method. In Section 4 the new color similarity meas ..."
Abstract
-
Cited by 22 (2 self)
- Add to MetaCart
This paper describes our entire color segmentation system, called CSC (Color Structure Code), in detail. In section 2 we introduce the hexagonal, hierarchical island structure on which our method is based. Section 3 describes the actual segmentation method. In Section 4 the new color similarity measure is presented. Section 5 discusses the complexity of our approach. The system is very fast and thus applicable in real world problems. Finally we present some results and conclusions in section 6. 2 Hexagonal, hierarchical island structure
3-d vision techniques for autonomous vehicles
, 1988
"... those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the funding agencies. 4 Contents ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the funding agencies. 4 Contents
Image interpretation Using Bayesian Networks
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1996
"... The problem of image interpretation is one of inference with the help of domain knowledge. In this correspondence, we formulate the problem as the maximum a posteriori (MAP) estimate of a properly defined probability distribution function. We show that a Bayesian network can be used to represent thi ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
The problem of image interpretation is one of inference with the help of domain knowledge. In this correspondence, we formulate the problem as the maximum a posteriori (MAP) estimate of a properly defined probability distribution function. We show that a Bayesian network can be used to represent this p.d.f. as well as the domain knowledge needed for interpretation. The Bayesian network may be relaxed to obtain the set of optimum interpretations.
FPGA Implementation of an Adaptable-Size Neural Network
- In Proceedings of the International Conference on Artificial Neural Networks ICANN96
, 1996
"... . Artificial neural networks achieve fast parallel processing via massively parallel non-linear computational elements. Most neural network models base their ability to adapt to problems on changing the strength of the interconnections between computational elements according to a given learning alg ..."
Abstract
-
Cited by 15 (7 self)
- Add to MetaCart
. Artificial neural networks achieve fast parallel processing via massively parallel non-linear computational elements. Most neural network models base their ability to adapt to problems on changing the strength of the interconnections between computational elements according to a given learning algorithm. However, constrained interconnection structures may limit such ability. Field programmable hardware devices allow the implementation of neural networks with in-circuit structure adaptation. This paper describes an FPGA implementation of the FAST (Flexible Adaptable-Size Topology) architecture, a neural network that dynamically changes its size. Since initial experiments indicated a good performance on pattern clustering tasks, we have applied our dynamicstructure FAST neural network to an image segmentation and recognition problem. 1 Introduction Artificial neural network models offer an attractive paradigm: learning to solve problems from examples. Most neural network models base ...
Feature Representation and Signal Classification in Fluorescence In-Situ Hybridization Image Analysis
- IEEE Trans. Syst. Man Cybernet. A
, 2001
"... Fast and accurate analysis of fluorescence in-situ hybridization (FISH) images for signal counting will depend mainly upon two components: a classifier to discriminate between artifacts and valid signals of several fluorophores (colors), and well discriminating features to represent the signals. Our ..."
Abstract
-
Cited by 12 (7 self)
- Add to MetaCart
Fast and accurate analysis of fluorescence in-situ hybridization (FISH) images for signal counting will depend mainly upon two components: a classifier to discriminate between artifacts and valid signals of several fluorophores (colors), and well discriminating features to represent the signals. Our previous work has focused on the first component. To investigate the second component, we evaluate candidate feature sets by illustrating the probability density functions (pdfs) and scatter plots for the features. The analysis provides first insight into dependencies between features, indicates the relative importance of members of a feature set, and helps in identifying sources of potential classification errors. Class separability yielded by different feature subsets is evaluated using the accuracy of several neural network (NN)-based classification strategies, some of them hierarchical, as well as using a feature selection technique making use of a scatter criterion. The complete analysis recommends several intensity and hue features for representing FISH signals. Represented by these features, around 90% of valid signals and artifacts of two fluorophores are correctly classified using the NN. Although applied to cytogenetics, the paper presents a comprehensive, unifying methodology of qualitative and quantitative evaluation of pattern feature representation essential for accurate image classification. This methodology is applicable to many other real-world pattern recognition problems.
Bottom-Up/Top-Down Image Parsing with Attribute Grammar
"... Abstract—This paper presents a simple attribute graph grammar as a generative representation for man-made scenes such as buildings, hallways, kitchens, and living rooms and studies an effective top-down/bottom-up inference algorithm for parsing images in the process of maximizing a Bayesian posterio ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Abstract—This paper presents a simple attribute graph grammar as a generative representation for man-made scenes such as buildings, hallways, kitchens, and living rooms and studies an effective top-down/bottom-up inference algorithm for parsing images in the process of maximizing a Bayesian posterior probability or equivalently minimizing a description length (MDL). This simple grammar has one class of primitives as its terminal nodes, i.e., the projection of planar rectangles in 3-space into the image plane, and six production rules for the spatial layout of the rectangular surfaces. All of the terminal and nonterminal nodes in the grammar are described by attributes for their geometric properties and image appearance. Each production rule is associated with some equations that constrain the attributes of a parent node and those of its children. Given an input image, the inference algorithm computes (or constructs) a parse graph, which includes a parse tree for the hierarchical decomposition and a number of spatial constraints. In the inference algorithm, the bottom-up step detects an excessive number of rectangles as weighted candidates, which are sorted in a certain order and activate top-down predictions of occluded or missing components through the grammar rules. The whole procedure is, in spirit, similar to the data-driven Markov chain Monte Carlo paradigm [39], [33], except that a greedy algorithm is adopted for simplicity. In the experiment, we show that the grammar and top-down inference can largely improve the performance of bottom-up detection. Index Terms—Attribute graph grammar, bottom-up/top-down, image parsing, primal sketch, generative model. Ç 1

