Results 1  10
of
14
An efficient algorithm for discovering frequent subgraphs
 IEEE Transactions on Knowledge and Data Engineering
, 2002
"... Abstract — Over the years, frequent itemset discovery algorithms have been used to find interesting patterns in various application areas. However, as data mining techniques are being increasingly applied to nontraditional domains, existing frequent pattern discovery approach cannot be used. This i ..."
Abstract

Cited by 88 (9 self)
 Add to MetaCart
Abstract — Over the years, frequent itemset discovery algorithms have been used to find interesting patterns in various application areas. However, as data mining techniques are being increasingly applied to nontraditional domains, existing frequent pattern discovery approach cannot be used. This is because the transaction framework that is assumed by these algorithms cannot be used to effectively model the datasets in these domains. An alternate way of modeling the objects in these datasets is to represent them using graphs. Within that model, one way of formulating the frequent pattern discovery problem is as that of discovering subgraphs that occur frequently over the entire set of graphs. In this paper we present a computationally efficient algorithm, called FSG, for finding all frequent subgraphs in large graph datasets. We experimentally evaluate the performance of FSG using a variety of real and synthetic datasets. Our results show that despite the underlying complexity associated with frequent subgraph discovery, FSG is effective in finding all frequently occurring subgraphs in datasets containing over 200,000 graph transactions and scales linearly with respect to the size of the dataset. Index Terms — Data mining, scientific datasets, frequent pattern discovery, chemical compound datasets.
Finding frequent patterns in a large sparse graph
 SIAM Data Mining Conference
, 2004
"... This paper presents two algorithms based on the horizontal and vertical pattern discovery paradigms that find the connected subgraphs that have a sufficient number of edgedisjoint embeddings in a single large undirected labeled sparse graph. These algorithms use three different methods to determine ..."
Abstract

Cited by 83 (4 self)
 Add to MetaCart
This paper presents two algorithms based on the horizontal and vertical pattern discovery paradigms that find the connected subgraphs that have a sufficient number of edgedisjoint embeddings in a single large undirected labeled sparse graph. These algorithms use three different methods to determine the number of the edgedisjoint embeddings of a subgraph that are based on approximate and exact maximum independent set computations and use it to prune infrequent subgraphs. Experimental evaluation on real datasets from various domains show that both algorithms achieve good performance, scale well to sparse input graphs with more than 100,000 vertices, and significantly outperform a previously developed algorithm.
Discovering Frequent Geometric Subgraphs
 In IEEE Intl. Conference on Data Mining ’02
, 2002
"... As data mining techniques are being increasingly applied to nontraditional domains, existing approaches for finding frequent itemsets cannot be used as they cannot model the requirement of these domains. An alternate way of modeling the objects in these data sets, is to use a graph to model the ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
As data mining techniques are being increasingly applied to nontraditional domains, existing approaches for finding frequent itemsets cannot be used as they cannot model the requirement of these domains. An alternate way of modeling the objects in these data sets, is to use a graph to model the database objects. Within that model, the problem of finding frequent patterns becomes that of discovering subgraphs that occur frequently over the entire set of graphs. In this paper we present a computationally e#cient algorithm for finding frequent geometric subgraphs in a large collection of geometric graphs. Our algorithm is able to discover geometric subgraphs that can be rotation, scaling and translation invariant, and it can accommodate inherent errors on the coordinates of the vertices. We evaluated the performance of the algorithm using a large database of over 20,000 real two dimensional chemical structures, and our experimental results show that our algorithms requires relatively little time, can accommodate low support values, and scales linearly on the number of transactions.
GREW—A Scalable Frequent Subgraph Discovery Algorithm
 in Fourth IEEE International Conference on Data Mining (ICDM 2004). 2004
, 2003
"... Existing algorithms that mine graph datasets to discover patterns corresponding to frequently occurring subgraphs can operate efficiently on graphs that are sparse, contain a large number of relatively small connected components, have vertices with low and bounded degrees, and contain welllabeled v ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
Existing algorithms that mine graph datasets to discover patterns corresponding to frequently occurring subgraphs can operate efficiently on graphs that are sparse, contain a large number of relatively small connected components, have vertices with low and bounded degrees, and contain welllabeled vertices and edges. However, there are a number of applications that lead to graphs that do not share these characteristics, for which these algorithms highly become unscalable. In this paper we propose a heuristic algorithm called GREW to overcome the limitations of existing complete or heuristic frequent subgraph discovery algorithms. GREW is designed to operate on a large graph and to find patterns corresponding to connected subgraphs that have a large number of vertexdisjoint embeddings. Our experimental evaluation shows that GREW is efficient, can scale to very large graphs, and find nontrivial patterns that cover large portions of the input graph and the lattice of frequent patterns.
GADDI: Distance index based subgraph matching in biological networks
 In Proceedings of the 12th international conference on extending database technology (EDBT’09
, 2009
"... Currently, a huge amount of biological data can be naturally represented by graphs, e.g., protein interaction networks, gene regulatory networks, etc. The need for indexing large graphs is an urgent research problem of great practical importance. The main challenge is size. Each graph may contain ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
Currently, a huge amount of biological data can be naturally represented by graphs, e.g., protein interaction networks, gene regulatory networks, etc. The need for indexing large graphs is an urgent research problem of great practical importance. The main challenge is size. Each graph may contain thousands (or more) vertices. Most of the previous work focuses on indexing a set of small or medium sized database graphs (with only tens of vertices) and finding whether a query graph occurs in any of these. In this paper, we are interested in finding all the matches of a query graph in a given large graph of thousands of vertices, which is a very important task in many biological applications. This increases the complexity significantly. We propose a novel distance measurement which reintroduces the idea of frequent substructures in a single large graph. We devise the novel structure distance based approach (GADDI) to efficiently find matches of the query graph. GADDI is further optimized by the use of a dynamic matching scheme to minimize redundant calculations. Last but not least, a number of real and synthetic data sets are used to evaluate the efficiency and scalability of our proposed method. 1.
Mining Edgedisjoint Patterns in Graphrelational Data ∗
"... Diverse types of data are associated with proteins, including network and categorical data. While graph mining techniques have long focused on data with no more than one label per node, generalizations have recently been developed. We show that existing generalizations are not well suited to typical ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Diverse types of data are associated with proteins, including network and categorical data. While graph mining techniques have long focused on data with no more than one label per node, generalizations have recently been developed. We show that existing generalizations are not well suited to typical biological networks and are likely to return few or no results on protein regulatory networks. They are, furthermore, illsuited to graphs that are dense or show the small world property, which are typical features of biological networks. A graphrelational edge disjoint instance mining algorithm (GREDI) is presented that resolves these problems. Our algorithm treats bipartite edges separately and only constrains unipartite edges to be disjoint. We introduce a new pattern constraint that recovers the downward closure property. The algorithm uses a search lattice traversal strategy that allows more effective mining of graphs that cannot be considered as sparse due to hubs. Effectiveness is demonstrated for a real biological example. While existing techniques return few or no patterns, GREDI is able to extract many patterns. 1
Edgar: the embeddingbased graph miner
 Proceedings of the International Workshop on Mining and Learning with Graphs (MLG 2006
, 2006
"... Abstract. In this paper we present the novel graph mining algorithm Edgar which is based on the wellknown gSpan algorithm. The need for another subgraph miner results from procedural abstraction (an important technique to reduce code size in embedded systems). Assembler code is represented as a dat ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. In this paper we present the novel graph mining algorithm Edgar which is based on the wellknown gSpan algorithm. The need for another subgraph miner results from procedural abstraction (an important technique to reduce code size in embedded systems). Assembler code is represented as a data flow graph and subgraph mining on this graph returns frequent code fragments that can be extracted into procedures. When mining for procedural abstraction, it is not the number of data flow graphs in which a fragment occurs that is important but the number of all the nonoverlapping occurrences in all graphs. Several changes in the mining process have therefore become necessary. As traditional pruning strategies are inappropriate, Edgar uses a new embeddingbased frequency; on average, saves 160 % more instructions compared to classical approaches. 1
Tracking Hidden Groups Using Communications
 LAB, COMPUTER SCIENCE DEPARTMENT, UNIVERSITY OF GEROGIA
, 2005
"... We address the problem of tracking a group of agents based on their communications over a network when the network devices used for communication (e.g., phones for telephony, IP addresses for the Internet) change continually. We present a system design and describe our work on its key modules. O ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We address the problem of tracking a group of agents based on their communications over a network when the network devices used for communication (e.g., phones for telephony, IP addresses for the Internet) change continually. We present a system design and describe our work on its key modules. Our methods are based on detecting frequent patterns in graphs and on visual exploration of large amounts of raw and processed data using a zooming interface.
RealTime TrafficData Analysis
"... Abstract — We describe a method for realtime monitoring of data generated by traffic sensors. We provide a system architecture and discuss three key components: (1) a streaming query processor that is used to reduce the volume of data; (2) a patternmatching module that is used to detect when a dev ..."
Abstract
 Add to MetaCart
Abstract — We describe a method for realtime monitoring of data generated by traffic sensors. We provide a system architecture and discuss three key components: (1) a streaming query processor that is used to reduce the volume of data; (2) a patternmatching module that is used to detect when a developing traffic situation resembles one flagged earlier; and (3) an interface that efficiently displays the stream of sensor data in a userconfigurable manner. I.
Pattern Discovery
"... Despite the wealth of research on frequent graph pattern mining, how to efficiently mine the complete set of those with constraints still poses a huge challenge to the existing algorithms mainly due to the inherent bottleneck in the mining paradigm. In essence, mining requests with explicitlyspecif ..."
Abstract
 Add to MetaCart
Despite the wealth of research on frequent graph pattern mining, how to efficiently mine the complete set of those with constraints still poses a huge challenge to the existing algorithms mainly due to the inherent bottleneck in the mining paradigm. In essence, mining requests with explicitlyspecified constraints cannot be handled in a way that is direct and precise. In this paper, we propose a direct mining framework to solve the problem and illustrate our ideas in the context of a particular type of constrained frequent patterns — the “skinny ” patterns, which are graph patterns with a long backbone from which short twigs branch out. These patterns, which we formally define as llong δskinny patterns, are able to reveal insightful spatial and temporal trajectory patterns in mobile data mining, information diffusion, adoption propagation, and many others.