Results 1 - 10
of
38
R.: Holistic scene understanding for 3D object detection with rgbd cameras
- In: ICCV. (2013
"... In this paper, we tackle the problem of indoor scene un-derstanding using RGBD data. Towards this goal, we pro-pose a holistic approach that exploits 2D segmentation, 3D geometry, as well as contextual relations between scenes and objects. Specifically, we extend the CPMC [3] frame-work to 3D in ord ..."
Abstract
-
Cited by 21 (4 self)
- Add to MetaCart
(Show Context)
In this paper, we tackle the problem of indoor scene un-derstanding using RGBD data. Towards this goal, we pro-pose a holistic approach that exploits 2D segmentation, 3D geometry, as well as contextual relations between scenes and objects. Specifically, we extend the CPMC [3] frame-work to 3D in order to generate candidate cuboids, and develop a conditional random field to integrate informa-tion from different sources to classify the cuboids. With this formulation, scene classification and 3D object recognition are coupled and can be jointly solved through probabilis-tic inference. We test the effectiveness of our approach on the challenging NYU v2 dataset. The experimental results demonstrate that through effective evidence integration and holistic reasoning, our approach achieves substantial im-provement over the state-of-the-art. 1.
Fusing Time-of-Flight Depth and Color for Real-Time Segmentation and Tracking
"... Abstract. We present an improved framework for real-time segmentation and tracking by fusing depth and RGB color data. We are able to solve common problems seen in tracking and segmentation of RGB images, such as occlusions, fast motion, and objects of similar color. Our proposed real-time mean shif ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
(Show Context)
Abstract. We present an improved framework for real-time segmentation and tracking by fusing depth and RGB color data. We are able to solve common problems seen in tracking and segmentation of RGB images, such as occlusions, fast motion, and objects of similar color. Our proposed real-time mean shift based algorithm outperforms the current state of the art and is significantly better in difficult scenarios. 1
Using stereo for object recognition
- In Accepted to appear in the proceedings of the IEEE International Conference of Robotics and Automation (ICRA
, 2010
"... Abstract — There has been significant progress recently in object recognition research, but many of the current approaches still fail for object classes with few distinctive features, and in settings with significant clutter and viewpoint variance. One such setting is visual search in mobile robotic ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
(Show Context)
Abstract — There has been significant progress recently in object recognition research, but many of the current approaches still fail for object classes with few distinctive features, and in settings with significant clutter and viewpoint variance. One such setting is visual search in mobile robotics, where tasks such as finding a mug or stapler require robust recognition. The focus of this paper is on integrating stereo vision with appearance based recognition to increase accuracy and efficiency. We propose a model that utilizes a chamfer-type silhouette classifier which is weighted by a prior on scale, which is robust to missing stereo depth information. Our approach is validated on a set of challenging indoor scenes containing mugs and shoes, where we find that priors remove a significant number of false positives, improving the average precision by 0.2 on each dataset. We additionally experiment with an additional classifer by Felzenszwalb et al.[1] to demonstrate the approach’s robustness. I.
Co-inference for Multi-modal Scene Analysis
"... Abstract. We address the problem of understanding scenes from multiple sources of sensor data (e.g., a camera and a laser scanner) in the case where there is no one-to-one correspondence across modalities (e.g., pixels and 3-D points). This is an important scenario that frequently arises in practice ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Abstract. We address the problem of understanding scenes from multiple sources of sensor data (e.g., a camera and a laser scanner) in the case where there is no one-to-one correspondence across modalities (e.g., pixels and 3-D points). This is an important scenario that frequently arises in practice not only when two different types of sensors are used, but also when the sensors are not co-located and have different sampling rates. Previous work has addressed this problem by restricting interpretation to a single representation in one of the domains, with augmented features that attempt to encode the information from the other modalities. Instead, we propose to analyze all modalities simultaneously while propagating information across domains during the inference procedure. In addition to the immediate benefit of generating a complete interpretation in all of the modalities, we demonstrate that this co-inference approach also improves performance over the canonical approach. 1
Structure Discovery in Multi-modal Data: a Region-based Approach
"... Abstract—The ability of a perception system to discern what is important in a scene and what is not is an invaluable asset, with multiple applications in object recognition, people detection and SLAM, among others. In this paper, we aim to analyze all sensory data available to separate a scene into ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
(Show Context)
Abstract—The ability of a perception system to discern what is important in a scene and what is not is an invaluable asset, with multiple applications in object recognition, people detection and SLAM, among others. In this paper, we aim to analyze all sensory data available to separate a scene into a few physically meaningful parts, which we term structure, while discarding background clutter. In particular, we consider the combination of image and range data, and base our decision in both appearance and 3D shape. Our main contribution is the development of a framework to perform scene segmentation that preserves physical objects using multi-modal data. We combine image and range data using a novel mid-level fusion technique based on the concept of regions that avoids any pixel-level correspondences between data sources. We associate groups of pixels with 3D points into multi-modal regions that we term regionlets, and measure the structure-ness of each regionlet using simple, bottom-up cues from image and range features. We show that the highest-ranked regionlets correspond to the most prominent objects in the scene. We verify the validity of our approach on 105 scenes of household environments. I.
Using Manipulation Primitives for Brick Sorting in Clutter
"... This paper explores the idea of manipulation-aided perception and grasping in the context of sorting small objects on a tabletop. We present a robust pipeline that combines perception and manipulation to accurately sort Duplo bricks by color and size. The pipeline uses two simple motion primitives ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
(Show Context)
This paper explores the idea of manipulation-aided perception and grasping in the context of sorting small objects on a tabletop. We present a robust pipeline that combines perception and manipulation to accurately sort Duplo bricks by color and size. The pipeline uses two simple motion primitives to manipulate the scene in ways that help the robot to improve its perception. This results in the ability to sort cluttered piles of Duplo bricks accurately. We present experimental results on the PR2 robot comparing brick sorting without the aid of manipulation to sorting with manipulation primitives that show the benefits of the latter, particularly as the degree of clutter in the environment increases.
M.: Improving the kinect by cross-modal stereo
- In: 22nd British Machine Vision Conference (BMVC) (2011
"... The introduction of the Microsoft Kinect Sensors has stirred significant interest in the robotics community. While originally developed as a gaming interface, a high quality depth sensor and affordable price have made it a popular choice for robotic perception. Its active sensing strategy is very we ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
(Show Context)
The introduction of the Microsoft Kinect Sensors has stirred significant interest in the robotics community. While originally developed as a gaming interface, a high quality depth sensor and affordable price have made it a popular choice for robotic perception. Its active sensing strategy is very well suited to produce robust and high-frame rate depth maps for human pose estimation. But the shift to the robotics domain surfaced applications under a wider set of operation condition it wasn’t originally designed for. We see the sensor fail completely on transparent and specular surfaces which are very common to every day household objects. As these items are of great interest in home robotics and assistive technologies, we have investigated methods to reduce and sometimes even eliminate these effects without any modification of the hardware. In particular, we complement the depth estimate within the Kinect by a cross-modal stereo path that we obtain from disparity matching between the included IR and RGB sensor of the Kinect. We investigate how the RGB channels can be combined optimally in order to mimic the image response of the IR sensor by an early fusion scheme of weighted channels as well as a late fusion scheme that computes stereo matches between the different channels independently. We show a strong improvement in the reliability of the depth estimate as well as improved performance on a object segmentation task in a table top scenario. 1
Scalable learning for object detection with gpu hardware,”
- in Intelligent Robots and Systems,
, 2009
"... Abstract-We consider the problem of robotic object detection of such objects as mugs, cups, and staplers in indoor environments. While object detection has made significant progress in recent years, many current approaches involve extremely complex algorithms, and are prohibitively slow when applie ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
(Show Context)
Abstract-We consider the problem of robotic object detection of such objects as mugs, cups, and staplers in indoor environments. While object detection has made significant progress in recent years, many current approaches involve extremely complex algorithms, and are prohibitively slow when applied to large scale robotic settings. In this paper, we describe an object detection system that is designed to scale gracefully to large data sets and leverages upward trends in computational power (as exemplified by Graphics Processing Unit (GPU) technology) and memory. We show that our GPU-based detector is up to 90 times faster than a well-optimized software version and can be easily trained on millions of examples. Using inexpensive off-the-shelf hardware, it can recognize multiple object types reliably in just a few seconds per frame.
A unified framework for planning and execution-monitoring of mobile robots
"... We present an original integration of high level planning and execution with incoming perceptual information from vision, SLAM, topological map segmentation and dialogue. The task of the robot system, implementing the integrated model, is to explore unknown areas and report detected objects to an op ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
(Show Context)
We present an original integration of high level planning and execution with incoming perceptual information from vision, SLAM, topological map segmentation and dialogue. The task of the robot system, implementing the integrated model, is to explore unknown areas and report detected objects to an operator, by speaking loudly. The knowledge base of the planner maintains a graph-based representation of the metric map that is dynamically constructed via an unsupervised topological segmentation method, and augmented with information about the type and position of detected objects, within the map, such as cars or containers. According to this knowledge the cognitive robot can infer strategies in so generating parametric plans that are instantiated from the perceptual processes. Finally, a model-based approach for the execution and control of the robot system is proposed to monitor, concurrently, the low level status of the system and the execution of the activities, in order to achieve the goal, instructed by the operator.
High-Resolution Depth Maps Based on TOF-Stereo Fusion
"... Abstract — The combination of range sensors with color cameras can be very useful for robot navigation, semantic perception, manipulation, and telepresence. Several methods of combining range- and color-data have been investigated and successfully used in various robotic applications. Most of these ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
(Show Context)
Abstract — The combination of range sensors with color cameras can be very useful for robot navigation, semantic perception, manipulation, and telepresence. Several methods of combining range- and color-data have been investigated and successfully used in various robotic applications. Most of these systems suffer from the problems of noise in the range-data and resolution mismatch between the range sensor and the color cameras, since the resolution of current range sensors is much less than the resolution of color cameras. High-resolution depth maps can be obtained using stereo matching, but this often fails to construct accurate depth maps of weakly/repetitively textured scenes, or if the scene exhibits complex self-occlusions. Range sensors provide coarse depth information regardless of presence/absence of texture. The use of a calibrated system, composed of a time-of-flight (TOF) camera and of a stereoscopic camera pair, allows data fusion thus overcoming the weaknesses of both individual sensors. We propose a novel TOF-stereo fusion method based on an efficient seed-growing algorithm which uses the TOF data projected onto the stereo image pair as an initial set of correspondences. These initial “seeds ” are then propagated based on a Bayesian model which combines an image similarity score with rough depth priors computed from the low-resolution range data. The overall result is a dense and accurate depth map at the resolution of the color cameras at hand. We show that the proposed algorithm outperforms 2D image-based stereo algorithms and that the results are of higher resolution than off-the-shelf color-range sensors, e.g., Kinect. Moreover, the algorithm potentially exhibits real-time performance on a single CPU. I.