Results 1 -
7 of
7
Fast random walk with restart and its applications
- In ICDM ’06: Proceedings of the 6th IEEE International Conference on Data Mining
, 2006
"... How closely related are two nodes in a graph? How to compute this score quickly, on huge, disk-resident, real graphs? Random walk with restart (RWR) provides a good relevance score between two nodes in a weighted graph, and it has been successfully used in numerous settings, like automatic captionin ..."
Abstract
-
Cited by 65 (12 self)
- Add to MetaCart
How closely related are two nodes in a graph? How to compute this score quickly, on huge, disk-resident, real graphs? Random walk with restart (RWR) provides a good relevance score between two nodes in a weighted graph, and it has been successfully used in numerous settings, like automatic captioning of images, generalizations to the “connection subgraphs”, personalized PageRank, and many more. However, the straightforward implementations of RWR do not scale for large graphs, requiring either quadratic space and cubic pre-computation time, or slow response time on queries. We propose fast solutions to this problem. The heart of our approach is to exploit two important properties shared by many real graphs: (a) linear correlations and (b) blockwise, community-like structure. We exploit the linearity by using low-rank matrix approximation, and the community structure by graph partitioning, followed by the Sherman-Morrison lemma for matrix inversion. Experimental results on the Corel image and the DBLP dabasets demonstrate that our proposed methods achieve significant savings over the straightforward implementations: they can save several orders of magnitude in pre-computation and storage cost, and they achieve up to 150x speed up with 90%+ quality preservation. 1
Center-piece subgraphs: Problem definition and fast solutions
- In KDD
, 2006
"... Given Q nodes in a social network (say, authorship network), how can we find the node/author that is the centerpiece, and has direct or indirect connections to all, or most of them? For example, this node could be the common advisor, or someone who started the research area that the Q nodes belong t ..."
Abstract
-
Cited by 38 (11 self)
- Add to MetaCart
Given Q nodes in a social network (say, authorship network), how can we find the node/author that is the centerpiece, and has direct or indirect connections to all, or most of them? For example, this node could be the common advisor, or someone who started the research area that the Q nodes belong to. Isomorphic scenarios appear in law enforcement (find the master-mind criminal, connected to all current suspects), gene regulatory networks (find the protein that participates in pathways with all or most of the given Q proteins), viral marketing and many more. Connection subgraphs is an important first step, handling the case of Q=2 query nodes. Then, the connection subgraph algorithm finds the b intermediate nodes, that provide a good connection between the two original query nodes. Here we generalize the challenge in multiple dimensions: First, we allow more than two query nodes. Second, we allow a whole family of queries, ranging from ’OR ’ to ’AND’, with ’softAND ’ in-between. Finally, we design and compare a fast approximation, and study the quality/speed trade-off. We also present experiments on the DBLP dataset. The experiments confirm that our proposed method naturally deals with multi-source queries and that the resulting subgraphs agree with our intuition. Wall-clock timing results on the DBLP dataset show that our proposed approximation achieve good accuracy for about 6: 1 speedup. This material is based upon work supported by the
DBconnect: Mining Research Community on DBLP Data
, 2007
"... Extracting information from large collections of structured, semi-structured or even unstructured data can be a considerable challenge when much of the hidden information is implicit within relationships among entities within the data. Social networks are such data collections in which relationships ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Extracting information from large collections of structured, semi-structured or even unstructured data can be a considerable challenge when much of the hidden information is implicit within relationships among entities within the data. Social networks are such data collections in which relationships play a vital role in the knowledge these networks can convey. A bibliographic database is an essential tool for the research community, yet finding and making use of relationships comprised within such a social network is difficult. In this paper we introduce DBconnect, a prototype that exploits the social network coded within the DBLP database by drawing on a new random walk approach to reveal interesting knowledge about the research community and even recommend collaborations.
Focusing Keywords to Automatically Extracted Image Segments Using Self-Organising Maps, volume 210
- of Studies in Fuzziness and Soft Computing
, 2006
"... the input data is a collection of images that are annotated with a given keyword, such as “car”. The problem is to attribute the annotation to specific parts of the images. There exists plenty of suitable input data readily ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
the input data is a collection of images that are annotated with a given keyword, such as “car”. The problem is to attribute the annotation to specific parts of the images. There exists plenty of suitable input data readily
QMAS: Querying, Mining And Summarization of Multi-modal Databases ∗
, 2010
"... the same co-authors. Work performed during Mr. Cordeiro’s visit to Carnegie Mellon University. This material is based upon work ..."
Abstract
- Add to MetaCart
the same co-authors. Work performed during Mr. Cordeiro’s visit to Carnegie Mellon University. This material is based upon work
Mining and Querying Multimedia Data
, 2011
"... for the degree of Doctor of Philosophy. The emerging popularity of multimedia data, as digital representation of text, image, video and countless other milieus, with prodigious volumes and wild diversity, exhibits the phenomenal impact of modern technologies in reforming the way information is acces ..."
Abstract
- Add to MetaCart
for the degree of Doctor of Philosophy. The emerging popularity of multimedia data, as digital representation of text, image, video and countless other milieus, with prodigious volumes and wild diversity, exhibits the phenomenal impact of modern technologies in reforming the way information is accessed, disseminated, digested and retained. This has iteratively ignited the data-driven perspective of research and development, to characterize perspicuous patterns, crystallize informative insights, and realize elevated experience for end-users, where innovations in a spectrum of areas of computer science, including databases, distributed systems, machine learning, vision, speech and natural languages, has been incessantly absorbed and integrated to elicit the extent and efficacy of contemporary and future multimedia applications and solutions. Under the theme of pattern mining and similarity querying, this manuscript presents a number of pieces of research concerning multimedia data, to address an array of practical tasks encompassing automatic annotation, outlier detection,
Fast Algorithms for Querying and Mining Large Graphs
, 2009
"... Graphs appear in a wide range of settings and have posed a wealth of fascinating problems. In this thesis, we focus on two types of tasks according to the interaction with users: (1) querying (e.g., given a social network, how to measure the closeness between two persons? how to track it over time ..."
Abstract
- Add to MetaCart
Graphs appear in a wide range of settings and have posed a wealth of fascinating problems. In this thesis, we focus on two types of tasks according to the interaction with users: (1) querying (e.g., given a social network, how to measure the closeness between two persons? how to track it over time?) and (2) mining (e.g., how to identify abnormal behaviors of computer networks? In the case of virus attacks, which nodes are the best to immunize?). The task of querying includes three sub-tasks. In the first one, we found that many complex user-specific patterns on large graphs can be answered by means of proximity measurement. In other words, proximity allows us to query large graphs on the atomic level. We support our claim by conducting three case studies (connection subgraphs, user feedback, and gateway), all of which (despite their diversity) rely on the proximity

