Results 1 -
8 of
8
GREW—A Scalable Frequent Subgraph Discovery Algorithm
- in Fourth IEEE International Conference on Data Mining (ICDM 2004). 2004
, 2003
"... Existing algorithms that mine graph datasets to discover patterns corresponding to frequently occurring subgraphs can operate efficiently on graphs that are sparse, contain a large number of relatively small connected components, have vertices with low and bounded degrees, and contain well-labeled v ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
(Show Context)
Existing algorithms that mine graph datasets to discover patterns corresponding to frequently occurring subgraphs can operate efficiently on graphs that are sparse, contain a large number of relatively small connected components, have vertices with low and bounded degrees, and contain well-labeled vertices and edges. However, there are a number of applications that lead to graphs that do not share these characteristics, for which these algorithms highly become unscalable. In this paper we propose a heuristic algorithm called GREW to overcome the limitations of existing complete or heuristic frequent subgraph discovery algorithms. GREW is designed to operate on a large graph and to find patterns corresponding to connected subgraphs that have a large number of vertex-disjoint embeddings. Our experimental evaluation shows that GREW is efficient, can scale to very large graphs, and find non-trivial patterns that cover large portions of the input graph and the lattice of frequent patterns.
Constructing a decision tree for graph structured data
- IN: PROC. MGTS 2003, HTTP://WWW.AR.SANKEN.OSAKA-U.AC.JP/MGTS-2003CFP.HTML
, 2003
"... Decision tree Graph-Based Induction (DT-GBI) is proposed that constructs a decision tree for graph structured data. Substructures (patterns) are extracted at each node of a decision tree by stepwise pair expansion (pairwise chunking) in GBI to be used as attributes for testing. Since attributes (fe ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Decision tree Graph-Based Induction (DT-GBI) is proposed that constructs a decision tree for graph structured data. Substructures (patterns) are extracted at each node of a decision tree by stepwise pair expansion (pairwise chunking) in GBI to be used as attributes for testing. Since attributes (features) are constructed while a classifier is being constructed, DT-GBI can be conceived as a method for feature construction. The predictive accuracy of a decision tree is affected by which attributes (patterns) are used and how they are constructed. A beam search is employed to extract good enough discriminative patterns within the greedy search framework. Pessimistic pruning is incorporated to avoid overfitting to the training data. Experiments using a DNA dataset were conducted to see the effect of the beam width, the number of chunking at each node of a decision tree, and the pruning. The results indicate that DT-GBI that does not use any prior domain knowledge can construct a decision tree that is comparable to other classifiers constructed using the domain knowledge.
Constructing Decision Trees for Graph-Structured Data by Chunkingless Graph-Based
- Induction”, Advances in Knowledge Discovery and Data Mining, Lecture Notes in Computer Science, Volume 3918
, 2006
"... Abstract. A decision tree is an effective means of data classification from which one can obtain rules that are easy to understand. However, decision trees cannot be conventionally constructed for data which are not explicitly expressed with attribute-value pairs such as graph-structured data. We h ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
Abstract. A decision tree is an effective means of data classification from which one can obtain rules that are easy to understand. However, decision trees cannot be conventionally constructed for data which are not explicitly expressed with attribute-value pairs such as graph-structured data. We have proposed a novel algorithm, named Chunkingless Graph-Based Induction (Cl-GBI), for extracting typical patterns from graph-structured data. Cl-GBI is an improved version of Graph-Based Induction (GBI) which employs stepwise pair expansion (pairwise chunking) to extract typical patterns from graphstructured data, and can find overlapping patterns that cannot not be found by GBI. In this paper, we further propose an algorithm for constructing decision trees for graphstructured data using Cl-GBI. This decision tree construction algorithm, now called Decision Tree Chunkingless Graph-Based Induction (DT-ClGBI), can construct a decision tree from a graph-structured dataset while simultaneously constructing attributes useful for classification using Cl-GBI internally. Since patterns (subgraphs) extracted by Cl-GBI are considered as attributes of a graph, and their existence/non-existence are used as attribute values in DT-ClGBI, DT-ClGBI can be conceived as a tree generator equipped with feature construction capability. Experiments were conducted on both synthetic and real-world graph-structured datasets showing the usefulness and effectiveness of the algorithm.
Mining Discriminative Patterns from Graph Structured Data with Constrained Search
"... Abstract. A graph mining method, Chunkingless Graph-Based Induction (Cl-GBI), finds typical patterns that appear in graph structured data by the operation called chunkingless pairwise expansion which generates pseudo-nodes from selected pairs of nodes in the data. Cl-GBI enables to extract overlappi ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Abstract. A graph mining method, Chunkingless Graph-Based Induction (Cl-GBI), finds typical patterns that appear in graph structured data by the operation called chunkingless pairwise expansion which generates pseudo-nodes from selected pairs of nodes in the data. Cl-GBI enables to extract overlapping subgraphs, while it requires more time and space complexities. Thus, it happens that Cl-GBI cannot extract patterns that need be large enough to describe characteristics of data within a limited time and a given computational resource. In such a case, extracted patterns may not be so much of interest for domain experts. To mine more discriminative patterns which cannot be extracted by the current Cl-GBI, we introduce a search algorithm guided by domain knowledge or interests of domain experts. We further experimentally show that the proposed method can efficiently extract more discriminative patterns using both synthetic and real world datasets. 1
A Survey on Assorted Approaches to Graph Data Mining
"... Graph mining has become a popular area of research in recent years because of its numerous applications in a wide variety of practical fields, including computational biology, sociology, software bug localization, keyword search, and computer networking. Different applications result in graphs of di ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Graph mining has become a popular area of research in recent years because of its numerous applications in a wide variety of practical fields, including computational biology, sociology, software bug localization, keyword search, and computer networking. Different applications result in graphs of different sizes and complexities. Graph mining is an important tool to transform the graphical data into graphical information. We investigate recurring patterns in real-world graphs, to gain a deeper understanding of their structure. We can extract normal and abnormal subgraphs thereby detecting suspicious nodes and outliers in the existing graphs. In this paper we present a survey of various approaches to mine the graphs. These are used to extract patterns, trends, classes, and clusters from graphs.
Faster Computation of the Direct Product Kernel for Graph Classification
"... Abstract — The direct product kernel, introduced by Gärtner et al. for graph classification, is based on defining a feature for every possible label sequence in a labelled graph and counting how many label sequences in two given graphs are identical. Although the direct product kernel has achieved p ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract — The direct product kernel, introduced by Gärtner et al. for graph classification, is based on defining a feature for every possible label sequence in a labelled graph and counting how many label sequences in two given graphs are identical. Although the direct product kernel has achieved promising results in terms of accuracy, the kernel computation is not feasible for large graphs. This is because computing the direct product kernel for two graphs is essentially computing either the inverse of or by diagonalizing the adjacency matrix of the direct product of these two graphs. For two graphs with adjacency matrices of sizes m and n, the adjacency matrix of their direct product graph can be of size mn in the worst case. As both matrix inversion or matrix diagonalizing in the general case is O(n 3), computing the direct product kernel is O((mn) 3). Our survey of data sets in graph classification indicates that most graphs have adjacency matrices
Graph Clustering based on Structural Similarity of Fragments
"... Abstract. Resources available over the Web are often used in combi-nation to meet a specific need of a user. Since resource combinations can be represented as graphs in terms of the relations among the re-sources, locating desirable resource combinations can be formulated as locating the correspondi ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. Resources available over the Web are often used in combi-nation to meet a specific need of a user. Since resource combinations can be represented as graphs in terms of the relations among the re-sources, locating desirable resource combinations can be formulated as locating the corresponding graph. This paper describes a graph clustering method based on structural similarity of fragments (currently connected subgraphs are considered) in graph structured data. A fragment is char-acterized based on the connectivity (degree) of a node in the fragment. A fragment spectrum of a graph is created based on the frequency distri-bution of fragments. Thus, the representation of a graph is transformed into a fragment spectrum in terms of the properties of fragments in the graph. Graphs are then clustered with respect to the transformed spectra by applying a standard clustering method. We also devise a criterion to determine the number of clusters by defining a pseudo-entropy of cluster. Preliminary experiments with synthesized data were conducted and the results are reported.