Results 1 
5 of
5
MinHash Fingerprints for Graph Kernels: A Tradeoff among Accuracy, Efficiency, and Compression
"... Abstract. Graph databases that emerge from several relevant scenarios (e.g., social networks, the Web) require powerful data management algorithms and techniques. A fundamental operation in graph data management is computing the similarity between two graphs. However, due to the large scale and high ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Graph databases that emerge from several relevant scenarios (e.g., social networks, the Web) require powerful data management algorithms and techniques. A fundamental operation in graph data management is computing the similarity between two graphs. However, due to the large scale and high dimensionality of real graph databases, computing graph similarity becomes a challenging problem in real settings. Many graph data management tasks, such as graph mining, classification, and retrieval, can be contextualized in the framework of graph kernels. A graph kernel is, roughly speaking, a function that computes the similarity between graph structures as means to enable the application of linear methods to graph data. Nevertheless, large databases usually require the use of compact representations of graphs known as graph fingerprints (or signatures). Graph fingerprinting techniques provide a solution that is a tradeoff among accuracy, efficiency, and compression in graph kernels. In this paper, we study the problem of generating fingerprints for graph kernels. We introduce a graph fingerprinting technique based on the minhashing scheme, which is a powerful strategy for computing the similarity between large sets of items using a small amount of data. An algorithm for the generation of graph fingerprints as vectors of minhash values is presented and integrated into the framework of graph kernels. Results show that graph fingerprinting achieves efficiency gains of up to one order of magnitude with up to 97 % space savings when compared against the complete set of graph substructures. Moreover, the proposed technique is up to 9 times more accurate than a baseline method.
Efficient and Scalable Graph Similarity Joins in MapReduce
"... Along with the emergence of massive graphmodeled data, it is of great importance to investigate graph similarity joins due to their wide applications for multiple purposes, including data cleaning, and near duplicate detection. This paper considers graph similarity joins with edit distance constra ..."
Abstract
 Add to MetaCart
(Show Context)
Along with the emergence of massive graphmodeled data, it is of great importance to investigate graph similarity joins due to their wide applications for multiple purposes, including data cleaning, and near duplicate detection. This paper considers graph similarity joins with edit distance constraints, which return pairs of graphs such that their edit distances are no larger than a given threshold. Leveraging the MapReduce programming model, we propose MGSJoin, a scalable algorithm following the filteringverification framework for efficient graph similarity joins. It relies on counting overlapping graph signatures for filtering out nonpromising candidates. With the potential issue of too many keyvalue pairs in the filtering phase, spectral Bloom filters are introduced to reduce the number of keyvalue pairs. Furthermore, we integrate the multiway join strategy to boost the verification, where a MapReducebased method is proposed for GED calculation. The superior efficiency and scalability of the proposed algorithms are demonstrated by extensive experimental results.
Vertex Similarity A Basic Framework for Matching Geometric Graphs
"... Abstract. Solutions to the graph matching problem play an important role in many application domains, such as chemistry, proteomics, or image processing. Especially in these domains, graphs have geometric properties that describe the positions of the vertices in some 2 or 3dimensional space. Seve ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. Solutions to the graph matching problem play an important role in many application domains, such as chemistry, proteomics, or image processing. Especially in these domains, graphs have geometric properties that describe the positions of the vertices in some 2 or 3dimensional space. Several exact and approximate approaches have been proposed to address the problem of matching graphs, which is known to be NPhard in general. For this, most approaches depend on the concept of vertex similarity to iteratively increase the matching quality. In this paper, we study the vertex similarity problem for geometric graphs. We formally define such a problem and prove that its complexity is NPhard. For geometric graphs in 2D, we propose an approximate solution with polynomial runtime. For this, we utilize techniques underlying attributed cyclic string matching and customized edit operations that consider spatial properties and labeling information. In our evaluations, we show that our approach outperforms existing vertex similarity approaches in terms of classification accuracy and matching quality. 1
Research Article Efficient and Scalable Graph Similarity Joins in MapReduce
"... which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Along with the emergence of massive graphmodeled data, it is of great importance to investigate graph similarity joins due to their wide applications formultiple purposes, inc ..."
Abstract
 Add to MetaCart
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Along with the emergence of massive graphmodeled data, it is of great importance to investigate graph similarity joins due to their wide applications formultiple purposes, including data cleaning, and near duplicate detection.This paper considers graph similarity joins with edit distance constraints, which return pairs of graphs such that their edit distances are no larger than a given threshold. Leveraging the MapReduce programming model, we propose MGSJoin, a scalable algorithm following the filteringverification framework for efficient graph similarity joins. It relies on counting overlapping graph signatures for filtering out nonpromising candidates. With the potential issue of too many keyvalue pairs in the filtering phase, spectral Bloom filters are introduced to reduce the number of keyvalue pairs. Furthermore, we integrate the multiway join strategy to boost the verification, where a MapReducebased method is proposed for GED calculation. The superior efficiency and scalability of the proposed algorithms are demonstrated by extensive experimental results. 1.
N S I
"... This volume is a printed version of a work that appears in the Synthesis Digital Library of Engineering and Computer Science. Synthesis Lectures provide concise, original presentations of important research and development topics, published quickly, in digital and print formats. For more information ..."
Abstract
 Add to MetaCart
(Show Context)
This volume is a printed version of a work that appears in the Synthesis Digital Library of Engineering and Computer Science. Synthesis Lectures provide concise, original presentations of important research and development topics, published quickly, in digital and print formats. For more information visit www.morganclaypool.com