Results 11  20
of
130
Graph indexing based on discriminative frequent structure analysis
 ACM Transactions on Database Systems
"... Graphs have become increasingly important in modelling complicated structures and schemaless data such as chemical compounds, proteins, and XML documents. Given a graph query, it is desirable to retrieve graphs quickly from a large database via indices. In this paper, we investigate the issues of i ..."
Abstract

Cited by 32 (4 self)
 Add to MetaCart
Graphs have become increasingly important in modelling complicated structures and schemaless data such as chemical compounds, proteins, and XML documents. Given a graph query, it is desirable to retrieve graphs quickly from a large database via indices. In this paper, we investigate the issues of indexing graphs and propose a novel indexing model based on discriminative frequent structures that are identified through a graph mining process. We show that the compact index built under this model can achieve better performance in processing graph queries. Since discriminative frequent structures capture the intrinsic characteristics of the data, they are relatively stable to database updates, thus facilitating samplingbased feature extraction and incremental index maintenance. Our approach not only provides an elegant solution to the graph indexing problem, but also demonstrates how database indexing and query processing can benefit from data mining, especially frequent pattern mining. Furthermore, the concepts developed here can be generalized and applied to indexing sequences, trees, and other complicated structures as well.
A Survey of Frequent Subgraph Mining Algorithms
 THE KNOWLEDGE ENGINEERING REVIEW
, 2004
"... Graph mining is an important research area within the domain of data mining. The field of study concentrates on the identification of frequent subgraphs within graph data sets. The research goals are directed at: (i) effective mechanisms for generating candidate subgraphs (without generating duplica ..."
Abstract

Cited by 29 (1 self)
 Add to MetaCart
Graph mining is an important research area within the domain of data mining. The field of study concentrates on the identification of frequent subgraphs within graph data sets. The research goals are directed at: (i) effective mechanisms for generating candidate subgraphs (without generating duplicates) and (ii) how best to process the generated candidate subgraphs so as to identify the desired frequent subgraphs in a way that is computationally efficient and procedurally effective. This paper presents a survey of current research in the field of frequent subgraph mining, and proposed solutions to address the main research issues.
Mining Temporally Evolving Graphs
, 2004
"... Web mining has been explored to a vast degree and different techniques have been proposed for a variety of applications that include Web Search, Web Classification, Web Personalization etc. Most research on Web mining has been from a ‘datacentric ’ point of view. The focus has been primarily on dev ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
(Show Context)
Web mining has been explored to a vast degree and different techniques have been proposed for a variety of applications that include Web Search, Web Classification, Web Personalization etc. Most research on Web mining has been from a ‘datacentric ’ point of view. The focus has been primarily on developing measures and applications based on data collected from content, structure and usage of Web till a particular time instance. In this project we examine another dimension of Web Mining, namely temporal dimension. Web data has been evolving over time, reflecting the ongoing trends. These changes in data in the temporal dimension reveal new kind of information. This information has not captured the attention of the Web mining research community to a large extent. In this paper, we highlight the significance of studying the evolving nature of the Web graphs. We have classified the approach to such problems at three levels of analysis: single node, subgraphs and whole graphs. We provide a framework to approach problems of this kind and have identified interesting problems at each level. Our experiments verify the significance of such analysis and also point to future directions in this area. The approach we take is generic and can be applied to other domains, where data can be modeled as graph, such as network intrusion detection or social networks.
Support Computation for Mining Frequent Subgraphs in a Single Graph
"... Abstract—Defining the support (or frequency) of a subgraph is trivial when a database of graphs is given: it is simply the number of graphs in the database that contain the subgraph. However, if the input is one large graph, it is surprisingly difficult to find an appropriate support definition. In ..."
Abstract

Cited by 26 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Defining the support (or frequency) of a subgraph is trivial when a database of graphs is given: it is simply the number of graphs in the database that contain the subgraph. However, if the input is one large graph, it is surprisingly difficult to find an appropriate support definition. In this paper we study the core problem, namely overlapping embeddings of the subgraph, in detail and suggest a definition that relies on the nonexistence of equivalent ancestor embeddings in order to guarantee that the resulting support is antimonotone. We prove this property and describe a method to compute the support defined in this way. I.
GADDI: Distance index based subgraph matching in biological networks
 In Proceedings of the 12th international conference on extending database technology (EDBT’09
, 2009
"... Currently, a huge amount of biological data can be naturally represented by graphs, e.g., protein interaction networks, gene regulatory networks, etc. The need for indexing large graphs is an urgent research problem of great practical importance. The main challenge is size. Each graph may contain ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
(Show Context)
Currently, a huge amount of biological data can be naturally represented by graphs, e.g., protein interaction networks, gene regulatory networks, etc. The need for indexing large graphs is an urgent research problem of great practical importance. The main challenge is size. Each graph may contain thousands (or more) vertices. Most of the previous work focuses on indexing a set of small or medium sized database graphs (with only tens of vertices) and finding whether a query graph occurs in any of these. In this paper, we are interested in finding all the matches of a query graph in a given large graph of thousands of vertices, which is a very important task in many biological applications. This increases the complexity significantly. We propose a novel distance measurement which reintroduces the idea of frequent substructures in a single large graph. We devise the novel structure distance based approach (GADDI) to efficiently find matches of the query graph. GADDI is further optimized by the use of a dynamic matching scheme to minimize redundant calculations. Last but not least, a number of real and synthetic data sets are used to evaluate the efficiency and scalability of our proposed method. 1.
gApprox: Mining Frequent Approximate Patterns from a Massive Network
"... Recently, there arise a large number of graphs with massive sizes and complex structures in many new applications, such as biological networks, social networks, and the Web, demanding powerful data mining methods. Due to inherent noise or data diversity, it is crucial to address the issue of approxi ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
Recently, there arise a large number of graphs with massive sizes and complex structures in many new applications, such as biological networks, social networks, and the Web, demanding powerful data mining methods. Due to inherent noise or data diversity, it is crucial to address the issue of approximation, if one wants to mine patterns that are potentially interesting with tolerable variations. In this paper, we investigate the problem of mining frequent approximate patterns from a massive network and propose a method called gApprox. gApprox not only finds approximate network patterns, which is the key for many knowledge discovery applications on structural data, but also enriches the library of graph mining methodologies by introducing several novel techniques such as: (1) a complete and redundancyfree strategy to explore the new pattern space faced by gApprox; and (2) transform “frequent in an approximate sense ” into an antimonotonic constraint so that it can be pushed deep into the mining process. Systematic empirical studies on both real and synthetic data sets show that frequent approximate patterns mined from the worm proteinprotein interaction network are biologically interesting and gApprox is both effective and efficient. 1
MARGIN: Maximal Frequent Subgraph Mining
"... The exponential number of possible subgraphs makes the problem of frequent subgraph mining a challenge. Maximal frequent mining has triggered much interest since the size of the set of maximal frequent subgraphs is much smaller to that of the set of frequent subgraphs. We propose an algorithm that m ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
(Show Context)
The exponential number of possible subgraphs makes the problem of frequent subgraph mining a challenge. Maximal frequent mining has triggered much interest since the size of the set of maximal frequent subgraphs is much smaller to that of the set of frequent subgraphs. We propose an algorithm that mines the maximal frequent subgraphs while pruning the lattice space considerably. This reduces the number of isomorphism computations which is the kernel of all frequent subgraph mining problems. Experimental results validate the utility of the technique proposed. 1.
GREW—A Scalable Frequent Subgraph Discovery Algorithm
 in Fourth IEEE International Conference on Data Mining (ICDM 2004). 2004
, 2003
"... Existing algorithms that mine graph datasets to discover patterns corresponding to frequently occurring subgraphs can operate efficiently on graphs that are sparse, contain a large number of relatively small connected components, have vertices with low and bounded degrees, and contain welllabeled v ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
(Show Context)
Existing algorithms that mine graph datasets to discover patterns corresponding to frequently occurring subgraphs can operate efficiently on graphs that are sparse, contain a large number of relatively small connected components, have vertices with low and bounded degrees, and contain welllabeled vertices and edges. However, there are a number of applications that lead to graphs that do not share these characteristics, for which these algorithms highly become unscalable. In this paper we propose a heuristic algorithm called GREW to overcome the limitations of existing complete or heuristic frequent subgraph discovery algorithms. GREW is designed to operate on a large graph and to find patterns corresponding to connected subgraphs that have a large number of vertexdisjoint embeddings. Our experimental evaluation shows that GREW is efficient, can scale to very large graphs, and find nontrivial patterns that cover large portions of the input graph and the lattice of frequent patterns.
Subdue: compressionbased frequent pattern discovery in graph data
 Proceedings of the 1st international workshop on open
, 2005
"... A majority of the existing algorithms which mine graph datasets target complete, frequent subgraph discovery. We describe the graphbased data mining system Subdue which focuses on the discovery of subgraphs which are not only frequent but also compress the graph dataset, using a heuristic algori ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
(Show Context)
A majority of the existing algorithms which mine graph datasets target complete, frequent subgraph discovery. We describe the graphbased data mining system Subdue which focuses on the discovery of subgraphs which are not only frequent but also compress the graph dataset, using a heuristic algorithm. The rationale behind the use of a compressionbased methodology for frequent pattern discovery is to produce a fewer number of highly interesting patterns than to generate a large number of patterns from which interesting patterns need to be identied. We perform an experimental comparison of Subdue with the graph mining systems gSpan and FSG on the Chemical Toxicity and the Chemical Compounds datasets that are provided with gSpan. We present results on the performance on the Subdue system on the Mutagenesis and the KDD 2003 Citation Graph dataset. An analysis of the results indicates that Subdue can eciently discover bestcompressing frequent patterns which are fewer in number but can be of higher interest. 1.
2008. Molecular and cellular approaches for the detection of proteinprotein interactions, Latest techniques and current limitations
 Plant J
"... Summary Homotypic and heterotypic protein interactions are crucial for all levels of cellular function, including architecture, regulation, metabolism, and signaling. Therefore, protein interaction maps represent essential components of postgenomic toolkits needed for understanding biological proc ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
(Show Context)
Summary Homotypic and heterotypic protein interactions are crucial for all levels of cellular function, including architecture, regulation, metabolism, and signaling. Therefore, protein interaction maps represent essential components of postgenomic toolkits needed for understanding biological processes at a systems level. Over the past decade, a wide variety of methods have been developed to detect, analyze, and quantify protein interactions, including surface plasmon resonance spectroscopy, NMR, yeast twohybrid screens, peptide tagging combined with mass spectrometry and fluorescencebased technologies. Fluorescence techniques range from colocalization of tags, which may be limited by the optical resolution of the microscope, to fluorescence resonance energy transferbased methods that have molecular resolution and can also report on the dynamics and localization of the interactions within a cell. Proteins interact via highly evolved complementary surfaces with affinities that can vary over many orders of magnitude. Some of the techniques described in this review, such as surface plasmon resonance, provide detailed information on physical properties of these interactions, while others, such as twohybrid techniques and mass spectrometry, are amenable to highthroughput analysis using robotics. In addition to providing an overview of these methods, this review emphasizes techniques that can be applied to determine interactions involving membrane proteins, including the split ubiquitin system and fluorescencebased technologies for characterizing hits obtained with highthroughput approaches. Mass spectrometrybased methods are covered by a review by