Results 1 -
7 of
7
GAIA: Graph Classification Using Evolutionary Computation
"... Discriminative subgraphs are widely used to define the feature space for graph classification in large graph databases. Several scalable approaches have been proposed to mine discriminative subgraphs. However, their intensive computation needs prevent them from mining large databases. We propose an ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
(Show Context)
Discriminative subgraphs are widely used to define the feature space for graph classification in large graph databases. Several scalable approaches have been proposed to mine discriminative subgraphs. However, their intensive computation needs prevent them from mining large databases. We propose an efficient method GAIA for mining discriminative subgraphs for graph classification in large databases. Our method employs a novel subgraph encoding approach to support an arbitrary subgraph pattern exploration order and explores the subgraph pattern space in a process resembling biological evolution. In this manner, GAIA is able to find discriminative subgraph patterns much faster than other algorithms. Additionally, we take advantage of parallel computing to further improve the quality of resulting patterns. In the end, we employ sequential coverage to generate association rules as graph classifiers using patterns mined by GAIA. Extensive experiments have been performed to analyze the performance of GAIA and to compare it with two other state-ofthe-art approaches. GAIA outperforms the other approaches both in terms of classification accuracy and runtime efficiency.
Compression of Weighted Graphs
"... We propose to compress weighted graphs (networks), motivated by the observation that large networks of social, biological, or other relations can be complex to handle and visualize. In the process also known as graph simplification, nodes and (unweighted) edges are grouped to supernodes and superedg ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
(Show Context)
We propose to compress weighted graphs (networks), motivated by the observation that large networks of social, biological, or other relations can be complex to handle and visualize. In the process also known as graph simplification, nodes and (unweighted) edges are grouped to supernodes and superedges, respectively, to obtain a smaller graph. We propose models and algorithms for weighted graphs. The interpretation (i.e. decompression) of a compressed, weighted graph is that a pair of original nodes is connected by an edge if their supernodes are connected by one, and that the weight of an edge is approximated to be the weight of the superedge. The compression problem now consists of choosing supernodes, superedges, and superedge weights so that the approximation error is minimized while the amount of compression is maximized. In this paper, we formulate this task as the ’simple weighted graph compression problem’. We then propose a much wider class of tasks under the name of ’generalized weighted graph compression problem’. The generalized task extends the optimization to preserve longer-range connectivities between nodes, not just individual edge weights. We study the properties of these problems and propose a range of algorithms to solve them, with different balances between complexity and quality of the result. We evaluate the problems and algorithms experimentally on real networks. The results indicate that weighted graphs can be compressed efficiently with relatively little compression error.
Efficient Graph Similarity Joins with Edit Distance Constraints
"... Abstract—Graphs are widely used to model complicated data semantics in many applications in bioinformatics, chemistry, social networks, pattern recognition, etc. A recent trend is to tolerate noise arising from various sources, such as erroneous data entry, and find similarity matches. In this paper ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Graphs are widely used to model complicated data semantics in many applications in bioinformatics, chemistry, social networks, pattern recognition, etc. A recent trend is to tolerate noise arising from various sources, such as erroneous data entry, and find similarity matches. In this paper, we study the graph similarity join problem that returns pairs of graphs such that their edit distances are no larger than a threshold. Inspired by the
Graph classification: a diversified discriminative feature selection approach
- In Proc. of CIKM
, 2012
"... A graph models complex structural relationships among object-s, and has been prevalently used in a wide range of applications. Building an automated graph classification model becomes very important for predicting unknown graphs or understanding com-plex structures between different classes. The gra ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
A graph models complex structural relationships among object-s, and has been prevalently used in a wide range of applications. Building an automated graph classification model becomes very important for predicting unknown graphs or understanding com-plex structures between different classes. The graph classification framework being widely used consists of two steps, namely, feature selection and classification. The key issue is how to select impor-tant subgraph features from a graph database with a large number of graphs including positive graphs and negative graphs. Given the features selected, a generic classification approach can be used to build a classification model. In this paper, we focus on feature se-lection. We identify two main issues with the most widely used fea-ture selection approach which is based on a discriminative score to select frequent subgraph features, and introduce a new diversified discriminative score to select features that have a higher diversity. We analyze the properties of the newly proposed diversified dis-criminative score, and conducted extensive performance studies to demonstrate that such a diversified discriminative score makes pos-itive/negative graphs separable and leads to a higher classification accuracy.
Graph Compression
"... Abstract. Graphs form the foundation of many real-world datasets. At the same time, the size of graphs presents a big obstacle to understand the essential information they contain. In this report, I mainly review the framework in article [1] for compressing large graphs. It can be used to improve vi ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Graphs form the foundation of many real-world datasets. At the same time, the size of graphs presents a big obstacle to understand the essential information they contain. In this report, I mainly review the framework in article [1] for compressing large graphs. It can be used to improve visualization, to understand the high-level structure of the graph, or as a pre-processing step for other data mining algorithms. The compression model consists of a graph summary and a set of edge corrections. This framework can produce either lossless or lossy compressed graph representations. Combined with Minimum Description Length (MDL), it can produce a compact summary. The performance of this framework is evaluated on multiple sets of real data graph. 1
Efficient Processing of Which-Edge Questions on Shortest Path Queries?
"... Abstract. In this paper, we formulate a novel problem called Which-Edge ques-tion on shortest path queries. Specifically, this problem aims to find k edges that minimize the total distance for a given set of shortest path queries on a graph. This problem has important applications in logistics, urba ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. In this paper, we formulate a novel problem called Which-Edge ques-tion on shortest path queries. Specifically, this problem aims to find k edges that minimize the total distance for a given set of shortest path queries on a graph. This problem has important applications in logistics, urban planning, and net-work planning. We show the NP-hardness of the problem, as well as present ef-ficient algorithms that compute highly accurate results in practice. Experimental evaluations are carried out on real datasets and results show that our algorithms are scalable and return high quality solutions. 1