Results 1 - 10
of
43
Learning to detect a salient object
- in: Proceedings of IEEE Computer Society Conference on Computer and Vision Pattern Recognition (CVPR
, 2007
"... Abstract We study visual attention by detecting a salient object in an input image. We formulate salient object detection as an image segmentation problem, where we separate the salient object from the image background. We propose a set of novel features including multi-scale contrast, centersurroun ..."
Abstract
-
Cited by 32 (1 self)
- Add to MetaCart
Abstract We study visual attention by detecting a salient object in an input image. We formulate salient object detection as an image segmentation problem, where we separate the salient object from the image background. We propose a set of novel features including multi-scale contrast, centersurround histogram, and color spatial distribution to describe a salient object locally, regionally, and globally. A Conditional Random Field is learned to effectively combine these features for salient object detection. We also constructed a large image database containing tens of thousands of carefully labeled images by multiple users. To our knowledge, it is the first large image database for quantitative evaluation of visual attention algorithms. We validate our approach on this image database, which is public available with this paper. 1.
Space-time video montage
- In CVPR’06
, 2006
"... Conventional video summarization methods focus predominantly on summarizing videos along the time axis, such as building a movie trailer. The resulting video trailer tends to retain much empty space in the background of the video frames while discarding much informative video content due to size lim ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Conventional video summarization methods focus predominantly on summarizing videos along the time axis, such as building a movie trailer. The resulting video trailer tends to retain much empty space in the background of the video frames while discarding much informative video content due to size limit. In this paper, we propose a novel spacetime video summarization method which we call space-time video montage. The method simultaneously analyzes both the spatial and temporal information distribution in a video sequence, and extracts the visually informative space-time portions of the input videos. The informative video portions are represented in volumetric layers. The layers are then packed together in a small output video volume such that the total amount of visual information in the video volume is maximized. To achieve the packing process, we develop a new algorithm based upon the first-fit and Graph cut optimization techniques. Since our method is able to cut off spatially and temporally less informative portions, it is able to generate much more compact yet highly informative output videos. The effectiveness of our method is validated by extensive experiments over a wide variety of videos. 1.
Differences of monkey and human overt attention under natural conditions
, 2006
"... Rhesus monkeys are widely used as animal models of human attention. Such research rests upon the assumption that similar mechanisms underlie attention in both species. Here, we directly compare the influence of low-level stimulus features on overt attention in monkeys and humans under natural condit ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Rhesus monkeys are widely used as animal models of human attention. Such research rests upon the assumption that similar mechanisms underlie attention in both species. Here, we directly compare the influence of low-level stimulus features on overt attention in monkeys and humans under natural conditions. We recorded eye-movements in humans and rhesus monkeys during free-viewing of natural images. We find that intrinsic low-level features, such luminance-contrast, texture-contrast and saliency—as predicted by a standard model, are elevated at fixation points in the majority of images. These correlative effects are not significantly different between species. However, local image modifications affect both species differently: moderate modifications, which are in the range of natural fluctuations, attract overt attention in monkeys significantly stronger than they do in humans. In addition, humans show a higher inter-individual consistency regarding which locations they fixate than monkeys, in spite of the similarity for intrinsic low-level features. Taken together, these data demonstrate that—under natural conditions—low-level stimulus features affect attention in monkeys and humans differently.
Static and Space-time Visual Saliency Detection by Self-Resemblance
"... We present a novel unified framework for both static and space-time saliency detection. Our method is a bottom-up approach and computes so-called local regression kernels (i.e., local descriptors) from the given image (or a video), which measure the likeness of a pixel (or voxel) to its surroundings ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
We present a novel unified framework for both static and space-time saliency detection. Our method is a bottom-up approach and computes so-called local regression kernels (i.e., local descriptors) from the given image (or a video), which measure the likeness of a pixel (or voxel) to its surroundings. Visual saliency is then computed using the said “self-resemblance ” measure. The framework results in a saliency map where each pixel (or voxel) indicates the statistical likelihood of saliency of a feature matrix given its surrounding feature matrices. As a similarity measure, matrix cosine similarity (a generalization of cosine similarity) is employed. State of the art performance is demonstrated on commonly used human eye fixation data (static scenes [5] and dynamic scenes [16]) and some psychological patterns.
A rarity-based visual attention map -application to texture description
- Proc. IEEE ICIP, 2006
, 2006
"... This paper describes a simple and “pre-cortical ” visual attention model, which does not take image directions into account. We compute rarity-based saliency maps and then we describe the relation between texture and visual attention. Finally we decompose the image into several textures with differe ..."
Abstract
-
Cited by 8 (7 self)
- Add to MetaCart
This paper describes a simple and “pre-cortical ” visual attention model, which does not take image directions into account. We compute rarity-based saliency maps and then we describe the relation between texture and visual attention. Finally we decompose the image into several textures with different regularities. Our purpose is to compress textures into images using small repeating patterns. Index Terms — Visual attention, texture, saliency 1.
Visual Saliency Model for Robot Cameras
"... Abstract — Recent years have seen an explosion of research on the computational modeling of human visual attention in task free conditions, i.e., given an image predict where humans are likely to look. This area of research could potentially provide general purpose mechanisms for robots to orient th ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Abstract — Recent years have seen an explosion of research on the computational modeling of human visual attention in task free conditions, i.e., given an image predict where humans are likely to look. This area of research could potentially provide general purpose mechanisms for robots to orient their cameras in open ended conditions. One difficulty is that most current models of visual saliency are computationally very expensive and not suited to real time implementations needed for robotic applications. Here we propose a very fast approximation to a Bayesian model of visual saliency recently proposed in the literature. The approximation can run in real time on current computers at very little computational cost, leaving plenty of CPU cycles for other tasks. We empirically evaluate the potential usefulness of the visual saliency model to control saccades of a camera in social robotics situations. We found that this simple general purpose saliency model doubled the success rate of the camera: it captured images of people 70 % of the time, when compared to a 35 % success rate when the camera was controlled using an open-loop scheme. After 3 saccades (camera movements), the robot was 96 % likely to capture at least one person. The results suggest that visual saliency models may provide a useful front end for camera control in robotics applications. I.
Spatiotemporal Saliency in Dynamic Scenes
, 2010
"... A spatiotemporal saliency algorithm based on a center-surround framework is proposed. The algorithm is inspired by biological mechanisms of motion-based perceptual grouping and extends a discriminant formulation of center-surround saliency previously proposed for static imagery. Under this formulati ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
A spatiotemporal saliency algorithm based on a center-surround framework is proposed. The algorithm is inspired by biological mechanisms of motion-based perceptual grouping and extends a discriminant formulation of center-surround saliency previously proposed for static imagery. Under this formulation, the saliency of a location is equated to the power of a predefined set of features to discriminate between the visual stimuli in a center and a surround window, centered at that location. The features are spatiotemporal video patches and are modeled as dynamic textures, to achieve a principled joint characterization of the spatial and temporal components of saliency. The combination of discriminant center-surround saliency with the modeling power of dynamic textures yields a robust, versatile, and fully unsupervised spatiotemporal saliency algorithm, applicable to scenes with highly dynamic backgrounds and moving cameras. The related problem of background subtraction is treated as the complement of saliency detection, by classifying nonsalient (with respect to appearance and motion dynamics) points in the visual field as background. The algorithm is tested for background subtraction on challenging sequences, and shown to substantially outperform various state-of-the-art techniques. Quantitatively, its average error rate is almost half that of the closest competitor.
Detecting Subdimensional Motifs: An Efficient Algorithm for Generalized Multivariate Pattern Discovery
"... Discovering recurring patterns in time series data is a fundamental problem for temporal data mining. This paper addresses the problem of locating subdimensional motifs in real-valued, multivariate time series, which requires the simultaneous discovery of sets of recurring patterns along with the co ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Discovering recurring patterns in time series data is a fundamental problem for temporal data mining. This paper addresses the problem of locating subdimensional motifs in real-valued, multivariate time series, which requires the simultaneous discovery of sets of recurring patterns along with the corresponding relevant dimensions. While many approaches to motif discovery have been developed, most are restricted to categorical data, univariate time series, or multivariate data in which the temporal patterns span all of the dimensions. In this paper, we present an expected linear-time algorithm that addresses a generalization of multivariate pattern discovery in which each motif may span only a subset of the dimensions. To validate our algorithm, we discuss its theoretical properties and empirically evaluate it using several data sets including synthetic data and motion capture data collected by an on-body inertial sensor. 1.
A Three-Level Computational Attention Model
- Proceedings of ICVS Workshop on Computational Attention & Applications (WCAA
"... Abstract. This article deals with a biologically-motivated three-level computational attention model architecture based on the rarity and the information theory framework. It mainly focuses on a low-level step which aims in fastly highlighting important areas and a middle-level step which analyses t ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Abstract. This article deals with a biologically-motivated three-level computational attention model architecture based on the rarity and the information theory framework. It mainly focuses on a low-level step which aims in fastly highlighting important areas and a middle-level step which analyses the behaviour of the detected areas. Their application on both still images and videos provide results to be used by the third high-level step.

