Results 1  10
of
12
Dual active feature and sample selection for graph classification
 in KDD
, 2011
"... Graph classification has become an important and active research topic in the last decade. Current research on graph classification focuses on mining discriminative subgraph features under supervised settings. The basic assumption is that a large number of labeled graphs are available. However, labe ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
(Show Context)
Graph classification has become an important and active research topic in the last decade. Current research on graph classification focuses on mining discriminative subgraph features under supervised settings. The basic assumption is that a large number of labeled graphs are available. However, labeling graph data is quite expensive and time consuming for many realworld applications. In order to reduce the labeling cost for graph data, we address the problem of how to select the most important graph to query for the label. This problem is challenging and different from conventional active learning problems because there is no predefined feature vector. Moreover, the subgraph enumeration problem is NPhard. The active sample selection problem and the feature selection problem are correlated for graph data. Before we can solve the active sample selection problem, we need to find a set of optimal subgraph features. To address this challenge, we demonstrate how one can simultaneously estimate the usefulness of a query graph and a set of subgraph features. The idea is to maximize the dependency between subgraph features and graph labels using an active learning framework. We propose a branchandbound algorithm to search for the optimal query graph and optimal features simultaneously. Empirical studies on nine realworld tasks demonstrate that the proposed method can obtain better accuracy on graph data than alternative approaches.
LTS: Discriminative Subgraph Mining by Learning from Search History
"... Abstract — Discriminative subgraphs can be used to characterize complex graphs, construct graph classifiers and generate graph indices. The search space for discriminative subgraphs is usually prohibitively large. Most measurements of interestingness of discriminative subgraphs are neither monotonic ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
Abstract — Discriminative subgraphs can be used to characterize complex graphs, construct graph classifiers and generate graph indices. The search space for discriminative subgraphs is usually prohibitively large. Most measurements of interestingness of discriminative subgraphs are neither monotonic nor antimonotonic with respect to subgraph frequencies. Therefore, branchandbound algorithms are unable to mine discriminative subgraphs efficiently. We discover that search history of discriminative subgraph mining is very useful in computing empirical upperbounds of discrimination scores of subgraphs. We propose a novel discriminative subgraph mining method, LTS (Learning To Search), which begins with a greedy algorithm that first samples the search space through subgraph probing and then explores the search space in a branch and bound fashion leveraging the search history of these samples. Extensive experiments have been performed to analyze the gain in performance by taking into account search history and to demonstrate that LTS can significantly improve performance compared with the stateoftheart discriminative subgraph mining algorithms. I.
Positive and Unlabeled Learning for Graph Classification
"... Abstract—The problem of graph classification has drawn much attention in the last decade. Conventional approaches on graph classification focus on mining discriminative subgraph features under supervised settings. The feature selection strategies strictly follow the assumption that both positive and ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Abstract—The problem of graph classification has drawn much attention in the last decade. Conventional approaches on graph classification focus on mining discriminative subgraph features under supervised settings. The feature selection strategies strictly follow the assumption that both positive and negative graphs exist. However, in many realworld applications, the negative graph examples are not available. In this paper we study the problem of how to select useful subgraph features and perform graph classification based upon only positive and unlabeled graphs. This problem is challenging and different from previous works on PU learning, because there are no predefined features in graph data. Moreover, the subgraph enumeration problem is NPhard. We need to identify a subset of unlabeled graphs that are most likely to be negative graphs. However, the negative graph selection problem and the subgraph feature selection problem are correlated. Before the reliable negative graphs can be resolved, we need to have a set of useful subgraph features. In order to address this problem, we first derive an evaluation criterion to estimate the dependency between subgraph features and class labels based on a set of estimated negative graphs. In order to build accurate models for the PU learning problem on graph data, we propose an integrated approach to concurrently select the discriminative features and the negative graphs in an iterative manner. Experimental results illustrate the effectiveness and efficiency of the proposed method. Keywordsgraph classification; positive and unlabeled data; feature selection; I.
Discriminative Feature Selection for Uncertain Graph Classification
"... Mining discriminative features for graph data has attracted much attention in recent years due to its important role in constructing graph classifiers, generating graph indices, etc. Most measurement of interestingness of discriminative subgraph features are defined on certain graphs, where the stru ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Mining discriminative features for graph data has attracted much attention in recent years due to its important role in constructing graph classifiers, generating graph indices, etc. Most measurement of interestingness of discriminative subgraph features are defined on certain graphs, where the structure of graph objects are certain, and the binary edges within each graph represent the “presence ” of linkages among the nodes. In many realworld applications, however, the linkage structure of the graphs is inherently uncertain. Therefore, existing measurements of interestingness based upon certain graphs are unable to capture the structural uncertainty in these applications effectively. In this paper, we study the problem of discriminative subgraph feature selection from uncertain graphs. This problem is challenging and different from conventional subgraph mining problems because both the structure of the graph objects and the discrimination score of each subgraph feature are uncertain. To address these challenges, we propose a novel discriminative subgraph feature selection method, Dug, which can find discriminative subgraph features in uncertain graphs based upon different statistical measures including expectation, median, mode and ϕprobability. We first compute the probability distribution of the discrimination scores for each subgraph feature based on dynamic programming. Then a branchandbound algorithm is proposed to search for discriminative subgraphs efficiently. Extensive experiments on various neuroimaging applications (i.e., Alzheimers Disease, ADHD and HIV) have been performed to analyze the gain in performance by taking into account structural uncertainties in identifying discriminative subgraph features for graph classification.
Efficient BreadthFirst Search on Large Graphs with Skewed Degree Distributions
"... Many recent largescale data intensive applications are increasingly demanding efficient graph databases. Distributed graphalgorithms,asacorepartofpracticalgraphdatabases, have a wide range of important applications, but have been rarely studied in sufficient detail. These problems are challenging a ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Many recent largescale data intensive applications are increasingly demanding efficient graph databases. Distributed graphalgorithms,asacorepartofpracticalgraphdatabases, have a wide range of important applications, but have been rarely studied in sufficient detail. These problems are challenging as real graphs are usually extremely large and the intrinsic character of graph data, lacking locality, causes unbalanced computation and communication workloads. In this paper, we explore distributed breadthfirst search algorithms with regards to largescale applications. We propose DPC (Degreebased Partitioningand Communication), a scalable and efficient distributed BFS algorithm which achieves high scalability and performance through novel balancing techniques between computation and communication. In experimental study, we compare our algorithm with two stateoftheart algorithms under the Graph500 benchmark with a variety of settings. The result shows our algorithm significantly outperforms the existing algorithms under all the settings.
Semisupervised Clustering of Graph Objects: A Subgraph Mining Approach
"... Abstract. Semisupervised clustering has recently received a lot of attention in the literature, which aims to improve the clustering performance with limited supervision. Most existing semisupervised clustering studies assume that the data is represented in a vector space, e.g., text and relation ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Semisupervised clustering has recently received a lot of attention in the literature, which aims to improve the clustering performance with limited supervision. Most existing semisupervised clustering studies assume that the data is represented in a vector space, e.g., text and relational data. When the data objects have complex structures, e.g., proteins and chemical compounds, those semisupervised clustering methods are not directly applicable to clustering such graph objects. In this paper, we study the problem of semisupervised clustering of data objects which are represented as graphs. The supervision information is in the form of pairwise constraints of mustlinks and cannotlinks. As there is no predefined feature set for the graph objects, we propose to use discriminative subgraph patterns as the features. We design an objective function which incorporates the constraints to guide the subgraph feature mining and selection process. We derive an upper bound of the objective function based on which, a branchandbound algorithm is proposed to speedup subgraph mining. We also introduce a redundancy measure into the feature selection process in order to reduce the redundancy in the feature set. When the graph objects are represented in the vector space of the discriminative subgraph features, we use semisupervised kernel Kmeans to cluster all graph objects. Experimental results on realworld protein datasets demonstrate that the constraint information can effectively guide the feature selection and clustering process and achieve satisfactory clustering performance.
Classifying Graphs Using Theoretical Metrics: A Study of Feasibility
"... Abstract. Graph classification has become an increasingly important research topic in recent years due to its wide applications. However, one interesting problem about how to classify graphs based on the implicit properties of graphs has not been studied yet. To address it, this paper first conducts ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Graph classification has become an increasingly important research topic in recent years due to its wide applications. However, one interesting problem about how to classify graphs based on the implicit properties of graphs has not been studied yet. To address it, this paper first conducts an extensive study on existing graph theoretical metrics and also propose various novel metrics to discover implicit graph properties. We then apply feature selection techniques to discover a subset of discriminative metrics by considering domain knowledge. Two classifiers are proposed to classify the graphs based on the subset of features. The feasibility of graph classification based on the proposed graph metrics and techniques has been experimentally studied. 1
Is Frequent Pattern Mining useful in building predictive models?
"... Abstract. The recent studies of pattern mining have given more attention to discovering patterns that are interesting, significant, discriminative and so forth, than simply frequent. Does this imply that the frequent patterns are not useful anymore? In this paper we carry out a survey of frequent p ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. The recent studies of pattern mining have given more attention to discovering patterns that are interesting, significant, discriminative and so forth, than simply frequent. Does this imply that the frequent patterns are not useful anymore? In this paper we carry out a survey of frequent pattern mining and, using an empirical study , show how far the frequent pattern mining is useful in building predictive models.