Results 1  10
of
37
Partial Least Squares Regression for Graph Mining
"... Attributed graphs are increasingly more common in many application domains such as chemistry, biology and text processing. A central issue in graph mining is how to collect informative subgraph patterns for a given learning task. We propose an iterative mining method based on partial least squares r ..."
Abstract

Cited by 26 (6 self)
 Add to MetaCart
Attributed graphs are increasingly more common in many application domains such as chemistry, biology and text processing. A central issue in graph mining is how to collect informative subgraph patterns for a given learning task. We propose an iterative mining method based on partial least squares regression (PLS). To apply PLS to graph data, a sparse version of PLS is developed first and then it is combined with a weighted pattern mining algorithm. The mining algorithm is iteratively called with different weight vectors, creating one latent component per one mining call. Our method, graph PLS, is efficient and easy to implement, because the weight vector is updated with elementary matrix calculations. In experiments, our graph PLS algorithm showed competitive prediction accuracies in many chemical datasets and its efficiency was significantly superior to graph boosting (gBoost) and the naive method based on frequent graph mining.
Nearoptimal supervised feature selection among frequent subgraphs
 IN SIAM INT’L CONF. ON DATA MINING
, 2009
"... Graph classification is an increasingly important step in numerous application domains, such as function prediction of molecules and proteins, computerised scene analysis, and anomaly detection in program flows. Among the various approaches proposed in the literature, graph classification based on f ..."
Abstract

Cited by 18 (8 self)
 Add to MetaCart
Graph classification is an increasingly important step in numerous application domains, such as function prediction of molecules and proteins, computerised scene analysis, and anomaly detection in program flows. Among the various approaches proposed in the literature, graph classification based on frequent subgraphs is a popular branch: Graphs are represented as (usually binary) vectors, with components indicating whether a graph contains a particular subgraph that is frequent across the dataset. On large graphs, however, one faces the enormous problem that the number of these frequent subgraphs may grow exponentially with the size of the graphs, but only few of them possess enough discriminative power to make them
Direct mining of discriminative and essential frequent patterns via modelbased search tree
 In KDD
, 2008
"... Frequent patterns provide solutions to datasets that do not have wellstructured feature vectors. However, frequent pattern mining is nontrivial since the number of unique patterns is exponential but many are nondiscriminative and correlated. Currently, frequent pattern mining is performed in two ..."
Abstract

Cited by 17 (6 self)
 Add to MetaCart
Frequent patterns provide solutions to datasets that do not have wellstructured feature vectors. However, frequent pattern mining is nontrivial since the number of unique patterns is exponential but many are nondiscriminative and correlated. Currently, frequent pattern mining is performed in two sequential steps: enumerating a set of frequent patterns, followed by feature selection. Although many methods have been proposed in the past few years on how to perform each separate step efficiently, there is still limited success in eventually finding highly compact and discriminative patterns. The culprit is due to the inherent nature of this widely adopted twostep approach. This paper discusses these problems and proposes a new and different method. It builds a decision tree that partitions the data onto different
Output Space Sampling for Graph Patterns
, 2009
"... Recent interest in graph pattern mining has shifted from finding all frequent subgraphs to obtaining a small subset of frequent subgraphs that are representative, discriminative or significant. The main motivation behind that is to cope with the scalability problem that the graph mining algorithms s ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
Recent interest in graph pattern mining has shifted from finding all frequent subgraphs to obtaining a small subset of frequent subgraphs that are representative, discriminative or significant. The main motivation behind that is to cope with the scalability problem that the graph mining algorithms suffer when mining databases of large graphs. Another motivation is to obtain a succinct output set that is informative and useful. In the same spirit, researchers also proposed sampling based algorithms that sample the output space of the frequent patterns to obtain representative subgraphs. In this work, we propose a generic sampling framework that is based on MetropolisHastings algorithm to sample the output space of frequent subgraphs. Our experiments on various sampling strategies show the versatility, utility and efficiency of the proposed sampling approach.
Multilabel feature selection for graph classification
 In Proceedings of the 10th IEEE International Conference on Data Mining
, 2010
"... Abstract—Nowadays, the classification of graph data has become an important and active research topic in the last decade, which has a wide variety of real world applications, e.g. drug activity predictions and kinase inhibitor discovery. Current research on graph classification focuses on singlelabe ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
Abstract—Nowadays, the classification of graph data has become an important and active research topic in the last decade, which has a wide variety of real world applications, e.g. drug activity predictions and kinase inhibitor discovery. Current research on graph classification focuses on singlelabel settings. However, in many applications, each graph data can be assigned with a set of multiple labels simultaneously. Extracting good features using multiple labels of the graphs becomes an important step before graph classification. In this paper, we study the problem of multilabel feature selection for graph classification and propose a novel solution, called gMLC, to efficiently search for optimal subgraph features for graph objects with multiple labels. Different from existing feature selection methods in vector spaces which assume the feature set is given, we perform multilabel feature selection for graph data in a progressive way together with the subgraph feature mining process. We derive an evaluation criterion, named gHSIC, to estimate the dependence between subgraph features and multiple labels of graphs. Then a branchandbound algorithm is proposed to efficiently search for optimal subgraph features by judiciously pruning the subgraph search space using multiple labels. Empirical studies on realworld tasks demonstrate that our feature selection approach can effectively boost multilabel graph classification performances and is more efficient by pruning the subgraph search space using multiple labels. Keywordsfeature selection; graph classification; multilabel learning. I.
GBASE: A Scalable and General Graph Management System
"... Graphs appear in numerous applications including cybersecurity, the Internet, social networks, protein networks, recommendation systems, and many more. Graphs with millions or even billions of nodes and edges are commonplace. How to store such large graphs efficiently? What are the core operations ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
Graphs appear in numerous applications including cybersecurity, the Internet, social networks, protein networks, recommendation systems, and many more. Graphs with millions or even billions of nodes and edges are commonplace. How to store such large graphs efficiently? What are the core operations/queries on those graph? How to answer the graph queries quickly? We propose GBASE, a scalable and general graph management and mining system. The key novelties lie in 1) our storage and compression scheme for a parallel setting and 2) the carefully chosen graph operations and their efficient implementation. We designed and implemented an instance of GBASE using MAPREDUCE/HADOOP. GBASE provides a parallel indexing mechanism for graph mining operations that both saves storage space, as well as accelerates queries. We ran numerous experiments on real graphs, spanning billions of nodes and edges, and we show that our proposed GBASE is indeed fast, scalable and nimble, with significant savings in space and time.
GAIA: Graph Classification Using Evolutionary Computation
"... Discriminative subgraphs are widely used to define the feature space for graph classification in large graph databases. Several scalable approaches have been proposed to mine discriminative subgraphs. However, their intensive computation needs prevent them from mining large databases. We propose an ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Discriminative subgraphs are widely used to define the feature space for graph classification in large graph databases. Several scalable approaches have been proposed to mine discriminative subgraphs. However, their intensive computation needs prevent them from mining large databases. We propose an efficient method GAIA for mining discriminative subgraphs for graph classification in large databases. Our method employs a novel subgraph encoding approach to support an arbitrary subgraph pattern exploration order and explores the subgraph pattern space in a process resembling biological evolution. In this manner, GAIA is able to find discriminative subgraph patterns much faster than other algorithms. Additionally, we take advantage of parallel computing to further improve the quality of resulting patterns. In the end, we employ sequential coverage to generate association rules as graph classifiers using patterns mined by GAIA. Extensive experiments have been performed to analyze the performance of GAIA and to compare it with two other stateoftheart approaches. GAIA outperforms the other approaches both in terms of classification accuracy and runtime efficiency.
2009) Structure of neighborhoods in a large social network
 Proc. IEEE Conf. on Soc. Computing
"... Abstract—We present here a method for analyzing the neighborhoods of all the vertices in a large graph. We first give an algorithm for characterizing a simple undirected graph that relies on enumeration of small induced subgraphs. We make a step further in this direction by identifying not only subg ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Abstract—We present here a method for analyzing the neighborhoods of all the vertices in a large graph. We first give an algorithm for characterizing a simple undirected graph that relies on enumeration of small induced subgraphs. We make a step further in this direction by identifying not only subgraphs but also the positions occupied by the different vertices of the graph, being thus able to compute the roles played by the vertices of the graph. We apply this method to the neighborhood of each vertex in a 2.7M vertices, 6M edges mobile phone graph. We analyze how the contacts of each person are connected to each other and the positions they occupy in the neighborhood network. Then we compare the intensity of their communications (duration and frequency) to their positions, finding that the two are not independent. We finally interpret and explain the results using social studies on phone communications. I.
SemiSupervised Feature Selection for Graph Classification ABSTRACT
"... The problem of graph classification has attracted great interest in the last decade. Current research on graph classification assumes the existence of large amounts of labeled training graphs. However, in many applications, the labels of graph data are very expensive or difficult to obtain, while th ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
The problem of graph classification has attracted great interest in the last decade. Current research on graph classification assumes the existence of large amounts of labeled training graphs. However, in many applications, the labels of graph data are very expensive or difficult to obtain, while there are often copious amounts of unlabeled graph data available. In this paper, we study the problem of semisupervised feature selection for graph classification and propose a novel solution, called gSSC, to efficiently search for optimal subgraph features with labeled and unlabeled graphs. Different from existing feature selection methods in vector spaces which assume the feature set is given, we perform semisupervised feature selection for graph data in a progressive way together with the subgraph feature mining process. We derive a feature evaluation criterion, named gSemi, to estimate the usefulness of subgraph features based upon both labeled and unlabeled graphs. Then we propose a branchandbound algorithm to efficiently search for optimal subgraph features by judiciously pruning the subgraph search space. Empirical studies on several realworld tasks demonstrate that our semisupervised feature selection approach can effectively boost graph classification performances with semisupervised feature selection and is very efficient by pruning the subgraph search space using both labeled and unlabeled graphs.
Graph Classification Based on Pattern Cooccurrence
"... Subgraph patterns are widely used in graph classification, but their effectiveness is often hampered by large number of patterns or lack of discrimination power among individual patterns. We introduce a novel classification method based on pattern cooccurrence to derive graph classification rules. O ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Subgraph patterns are widely used in graph classification, but their effectiveness is often hampered by large number of patterns or lack of discrimination power among individual patterns. We introduce a novel classification method based on pattern cooccurrence to derive graph classification rules. Our method employs a pattern exploration order such that the complementary discriminative patterns are examined first. Patterns are grouped into cooccurrence rules during the pattern exploration, leading to an integrated process of pattern mining and classifier learning. By taking advantage of cooccurrence information, our method can generate strong features by assembling weak features. Unlike previous methods that invoke the pattern mining process repeatedly, our method only performs pattern mining once. In addition, our method produces a more interpretable classifier and shows better or competitive classification effectiveness in terms of accuracy and execution time.