Results 1  10
of
87
Fast random walk with restart and its applications
 In ICDM ’06: Proceedings of the 6th IEEE International Conference on Data Mining
, 2006
"... How closely related are two nodes in a graph? How to compute this score quickly, on huge, diskresident, real graphs? Random walk with restart (RWR) provides a good relevance score between two nodes in a weighted graph, and it has been successfully used in numerous settings, like automatic captionin ..."
Abstract

Cited by 96 (15 self)
 Add to MetaCart
How closely related are two nodes in a graph? How to compute this score quickly, on huge, diskresident, real graphs? Random walk with restart (RWR) provides a good relevance score between two nodes in a weighted graph, and it has been successfully used in numerous settings, like automatic captioning of images, generalizations to the “connection subgraphs”, personalized PageRank, and many more. However, the straightforward implementations of RWR do not scale for large graphs, requiring either quadratic space and cubic precomputation time, or slow response time on queries. We propose fast solutions to this problem. The heart of our approach is to exploit two important properties shared by many real graphs: (a) linear correlations and (b) blockwise, communitylike structure. We exploit the linearity by using lowrank matrix approximation, and the community structure by graph partitioning, followed by the ShermanMorrison lemma for matrix inversion. Experimental results on the Corel image and the DBLP dabasets demonstrate that our proposed methods achieve significant savings over the straightforward implementations: they can save several orders of magnitude in precomputation and storage cost, and they achieve up to 150x speed up with 90%+ quality preservation. 1
Tagprop: Discriminative metric learning in nearest neighbor models for image autoannotation
 In ICCV
, 2009
"... Image autoannotation is an important open problem in computer vision. For this task we propose TagProp, a discriminatively trained nearest neighbor model. Tags of test images are predicted using a weighted nearestneighbor model to exploit labeled training images. Neighbor weights are based on neig ..."
Abstract

Cited by 73 (13 self)
 Add to MetaCart
Image autoannotation is an important open problem in computer vision. For this task we propose TagProp, a discriminatively trained nearest neighbor model. Tags of test images are predicted using a weighted nearestneighbor model to exploit labeled training images. Neighbor weights are based on neighbor rank or distance. TagProp allows the integration of metric learning by directly maximizing the loglikelihood of the tag predictions in the training set. In this manner, we can optimally combine a collection of image similarity metrics that cover different aspects of image content, such as local shape descriptors, or global color histograms. We also introduce a word specific sigmoidal modulation of the weighted neighbor tag predictions to boost the recall of rare words. We investigate the performance of different variants of our model and compare to existing work. We present experimental results for three challenging data sets. On all three, TagProp makes a marked improvement as compared to the current stateoftheart. 1.
PEGASUS: A PetaScale Graph Mining System Implementation and Observations
 IEEE INTERNATIONAL CONFERENCE ON DATA MINING
, 2009
"... Abstract—In this paper, we describe PEGASUS, an open source Peta Graph Mining library which performs typical graph mining tasks such as computing the diameter of the graph, computing the radius of each node and finding the connected components. As the size of graphs reaches several Giga, Tera or P ..."
Abstract

Cited by 64 (21 self)
 Add to MetaCart
Abstract—In this paper, we describe PEGASUS, an open source Peta Graph Mining library which performs typical graph mining tasks such as computing the diameter of the graph, computing the radius of each node and finding the connected components. As the size of graphs reaches several Giga, Tera or Petabytes, the necessity for such a library grows too. To the best of our knowledge, PEGASUS is the first such library, implemented on the top of the HADOOP platform, the open source version of MAPREDUCE. Many graph mining operations (PageRank, spectral clustering, diameter estimation, connected components etc.) are essentially a repeated matrixvector multiplication. In this paper we describe a very important primitive for PEGASUS, called GIMV (Generalized Iterated MatrixVector multiplication). GIMV is highly optimized, achieving (a) good scaleup on the number of available machines (b) linear running time on the number of edges, and (c) more than 5 times faster performance over the nonoptimized version of GIMV. Our experiments ran on M45, one of the top 50 supercomputers in the world. We report our findings on several real graphs, including one of the largest publicly available Web Graphs, thanks to Yahoo!, with ≈ 6,7 billion edges. KeywordsPEGASUS; graph mining; hadoop I.
Centerpiece subgraphs: Problem definition and fast solutions
 In KDD
, 2006
"... Given Q nodes in a social network (say, authorship network), how can we find the node/author that is the centerpiece, and has direct or indirect connections to all, or most of them? For example, this node could be the common advisor, or someone who started the research area that the Q nodes belong t ..."
Abstract

Cited by 52 (17 self)
 Add to MetaCart
Given Q nodes in a social network (say, authorship network), how can we find the node/author that is the centerpiece, and has direct or indirect connections to all, or most of them? For example, this node could be the common advisor, or someone who started the research area that the Q nodes belong to. Isomorphic scenarios appear in law enforcement (find the mastermind criminal, connected to all current suspects), gene regulatory networks (find the protein that participates in pathways with all or most of the given Q proteins), viral marketing and many more. Connection subgraphs is an important first step, handling the case of Q=2 query nodes. Then, the connection subgraph algorithm finds the b intermediate nodes, that provide a good connection between the two original query nodes. Here we generalize the challenge in multiple dimensions: First, we allow more than two query nodes. Second, we allow a whole family of queries, ranging from ’OR ’ to ’AND’, with ’softAND ’ inbetween. Finally, we design and compare a fast approximation, and study the quality/speed tradeoff. We also present experiments on the DBLP dataset. The experiments confirm that our proposed method naturally deals with multisource queries and that the resulting subgraphs agree with our intuition. Wallclock timing results on the DBLP dataset show that our proposed approximation achieve good accuracy for about 6: 1 speedup. This material is based upon work supported by the
Unsupervised modeling of object categories using link analysis techniques
 In CVPR
, 2008
"... We propose an approach for learning visual models of object categories in an unsupervised manner in which we first build a largescale complex network which captures the interactions of all unit visual features across the entire training set and we infer information, such as which features are in wh ..."
Abstract

Cited by 33 (6 self)
 Add to MetaCart
We propose an approach for learning visual models of object categories in an unsupervised manner in which we first build a largescale complex network which captures the interactions of all unit visual features across the entire training set and we infer information, such as which features are in which categories, directly from the graph by using link analysis techniques. The link analysis techniques are based on wellestablished graph mining techniques used in diverse applications such as WWW, bioinformatics, and social networks. The techniques operate directly on the patterns of connections between features in the graph rather than on statistical properties, e.g., from clustering in feature space. We argue that the resulting techniques are simpler, and we show that they perform similarly or better compared to state of the art techniques on common data sets. We also show results on more challenging data sets than those that have been used in prior work on unsupervised modeling.
Fast DirectionAware Proximity for Graph Mining
, 2007
"... In this paper we study asymmetric proximity measures on directed graphs, which quantify the relationships between two nodes or two groups of nodes. The measures are useful in several graph mining tasks, including clustering, link prediction and connection subgraph discovery. Our proximity measure is ..."
Abstract

Cited by 30 (7 self)
 Add to MetaCart
In this paper we study asymmetric proximity measures on directed graphs, which quantify the relationships between two nodes or two groups of nodes. The measures are useful in several graph mining tasks, including clustering, link prediction and connection subgraph discovery. Our proximity measure is based on the concept of escape probability. This way, we strive to summarize the multiple facets of nodesproximity, while avoiding some of the pitfalls to which alternative proximity measures are susceptible. A unique feature of the measures is accounting for the underlying directional information. We put a special emphasis on computational efficiency, and develop fast solutions that are applicable in several settings. Our experimental study shows the usefulness of our proposed directionaware proximity method for several applications, and that our algorithms achieve a significant speedup (up to 50,000x) over straightforward implementations.
Fast besteffort pattern matching in large attributed graphs
 In KDD
, 2007
"... We focus on large graphs where nodes have attributes, such as a social network where the nodes are labelled with each person’s job title. In such a setting, we want to find subgraphs that match a user query pattern. For example, a ‘star ’ query would be, “find a CEO who has strong interactions with ..."
Abstract

Cited by 25 (12 self)
 Add to MetaCart
We focus on large graphs where nodes have attributes, such as a social network where the nodes are labelled with each person’s job title. In such a setting, we want to find subgraphs that match a user query pattern. For example, a ‘star ’ query would be, “find a CEO who has strong interactions with a Manager, a Lawyer, and an Accountant, or another structure as close to that as possible”. Similarly, a ‘loop ’ query could help spot a money laundering ring. Traditional SQLbased methods, as well as more recent graph indexing methods, will return no answer when an exact match does not exist. Our method can find exact, as well as nearmatches, and it will present them to the user in our proposed ‘goodness ’ order. For example, our method tolerates indirect paths between, say, the ‘CEO ’ and the ‘Accountant ’ of the above sample query, when direct paths do not exist. Its second feature is scalability. In general, if the query has nq nodes and the data graph has n nodes, the problem needs polynomial time complexity O(n nq), which is prohibitive. Our GRay (“Graph XRay”) method finds highquality subgraphs in time linear on the size of the data graph. Experimental results on the DLBP authorpublication graph (with 356K nodes and 1.9M edges) illustrate both the effectiveness and scalability of our approach. The results agree with our intuition, and the speed is excellent. It takes 4 seconds on average for a 4node query on the DBLP graph.
Automatic image annotation using group sparsity
 In CVPR
, 2010
"... Automatically assigning relevant text keywords to images is an important problem. Many algorithms have been proposed in the past decade and achieved good performance. Efforts have focused upon model representations of keywords, but properties of features have not been well investigated. In most case ..."
Abstract

Cited by 21 (9 self)
 Add to MetaCart
Automatically assigning relevant text keywords to images is an important problem. Many algorithms have been proposed in the past decade and achieved good performance. Efforts have focused upon model representations of keywords, but properties of features have not been well investigated. In most cases, a group of features is preselected, yet important feature properties are not well used to select features. In this paper, we introduce a regularization based feature selection algorithm to leverage both the sparsity and clustering properties of features, and incorporate it into the image annotation task. A novel approach is also proposed to iteratively obtain similar and dissimilar pairs from both the keyword similarity and the relevance feedback. Thus keyword similarity is modeled in the annotation framework. Numerous experiments are designed to compare the performance between features, feature combinations and regularization based feature selection methods applied on the image annotation task, which gives insight into the properties of features in the image annotation task. The experimental results demonstrate that the group sparsity based method is more accurate and stable than others. 1.
Unsupervised Detection of Regions of Interest Using Iterative Link Analysis
"... This paper proposes a fast and scalable alternating optimization technique to detect regions of interest (ROIs) in cluttered Web images without labels. The proposed approach discovers highly probable regions of object instances by iteratively repeating the following two functions: (1) choose the exe ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
This paper proposes a fast and scalable alternating optimization technique to detect regions of interest (ROIs) in cluttered Web images without labels. The proposed approach discovers highly probable regions of object instances by iteratively repeating the following two functions: (1) choose the exemplar set (i.e. a small number of highly ranked reference ROIs) across the dataset and (2) refine the ROIs of each image with respect to the exemplar set. These two subproblems are formulated as ranking in two different similarity networks of ROI hypotheses by link analysis. The experiments with the PASCAL 06 dataset show that our unsupervised localization performance is better than one of stateoftheart techniques and comparable to supervised methods. Also, we test the scalability of our approach with five objects in Flickr dataset consisting of more than 200K images. 1
An experimental investigation of graph kernels on a collaborative recommendation task
 Proceedings of the 6th International Conference on Data Mining (ICDM 2006
, 2006
"... This paper presents a survey as well as a systematic empirical comparison of seven graph kernels and two related similarity matrices (simply referred to as graph kernels), namely the exponential diffusion kernel, the Laplacian exponential diffusion kernel, the von Neumann diffusion kernel, the regul ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
This paper presents a survey as well as a systematic empirical comparison of seven graph kernels and two related similarity matrices (simply referred to as graph kernels), namely the exponential diffusion kernel, the Laplacian exponential diffusion kernel, the von Neumann diffusion kernel, the regularized Laplacian kernel, the commutetime kernel, the randomwalkwithrestart similarity matrix, and finally, three graph kernels introduced in this paper: the regularized commutetime kernel, the Markov diffusion kernel, and the crossentropy diffusion matrix. The kernelonagraph approach is simple and intuitive. It is illustrated by applying the nine graph kernels to a collaborativerecommendation task and to a semisupervised classification task, both on several databases. The graph methods compute proximity measures between nodes that help study the structure of the graph. Our comparisons suggest that the regularized commutetime and the Markov diffusion kernels perform best, closely followed by the regularized Laplacian kernel. 1