Results 1  10
of
36
On mining crossgraph quasicliques
 In KDD
, 2005
"... Joint mining of multiple data sets can often discover interesting, novel, and reliable patterns which cannot be obtained solely from any single source. For example, in crossmarket customer segmentation, a group of customers who behave similarly in multiple markets should be considered as a more coh ..."
Abstract

Cited by 60 (5 self)
 Add to MetaCart
(Show Context)
Joint mining of multiple data sets can often discover interesting, novel, and reliable patterns which cannot be obtained solely from any single source. For example, in crossmarket customer segmentation, a group of customers who behave similarly in multiple markets should be considered as a more coherent and more reliable cluster than clusters found in a single market. As another example, in bioinformatics, by joint mining of gene expression data and protein interaction data, we can find clusters of genes which show coherent expression patterns and also produce interacting proteins. Such clusters may be potential pathways. In this paper, we investigate a novel data mining problem, mining crossgraph quasicliques, which is generalized from several interesting applications such as crossmarket customer segmentation and joint mining of gene expression data and protein interaction data. We build a general model for mining crossgraph quasicliques, show why the complete set of crossgraph quasicliques cannot be found by previous data mining methods, and study the complexity of the problem. While the problem is difficult, we develop an efficient algorithm, Crochet, which exploits several interesting and effective techniques and heuristics to efficaciously mine crossgraph quasicliques. A systematic performance study is reported on both synthetic and real data sets. We demonstrate some interesting and meaningful crossgraph quasicliques in bioinformatics. The experimental results also show that algorithm Crochet is efficient and scalable.
Truss Decomposition in Massive Networks
"... The ktruss is a type of cohesive subgraphs proposed recently for the study of networks. While the problem of computing most cohesive subgraphs is NPhard, there exists a polynomial time algorithm for computing ktruss. Compared with kcore which is also efficient to compute, ktruss represents the ..."
Abstract

Cited by 21 (5 self)
 Add to MetaCart
(Show Context)
The ktruss is a type of cohesive subgraphs proposed recently for the study of networks. While the problem of computing most cohesive subgraphs is NPhard, there exists a polynomial time algorithm for computing ktruss. Compared with kcore which is also efficient to compute, ktruss represents the “core ” of a kcore that keeps the key information of, while filtering out less important information from, the kcore. However, existing algorithms for computing ktruss are inefficient for handling today’s massive networks. We first improve the existing inmemory algorithm for computing ktruss in networks of moderate size. Then, we propose two I/Oefficient algorithms to handle massive networks that cannot fit in main memory. Our experiments on real datasets verify the efficiency of our algorithms and the value of ktruss. 1.
BAG: a graph theoretic sequence clustering algorithm
 Int. J. Data Mining and Bioinformatics
, 2003
"... Recently developed sequence clustering algorithms based on graph theory have been successful in clustering a large number of sequences into families of sequences of specific categories. In this paper, we present a new sequence clustering algorithm BAG based on graph theory. Our algorithm clusters se ..."
Abstract

Cited by 21 (9 self)
 Add to MetaCart
(Show Context)
Recently developed sequence clustering algorithms based on graph theory have been successful in clustering a large number of sequences into families of sequences of specific categories. In this paper, we present a new sequence clustering algorithm BAG based on graph theory. Our algorithm clusters sequences using two properties of graph, biconnected component and articulation point. As computation of biconnected components and articulation points is efficient, linear in relation to the number of vertices and edges, our algorithms are well suited for comparing a large number of proteins from multiple genomes. Our experiments with protein sequences from multiple genomes show that our algorithms generate families of high quality. For example, our algorithm correctly classified 3,306 predicted proteins from E. coli and H. influenzae into 1,427 families without human intervention. We also dicuss the importance of large scale sequence comparisons from our experience in clustering many different genomes, including Arabidopsis thaliana. 1
Outofcore coherent closed quasiclique mining from large dense graph databases
 ACM TODS
"... Due to the ability of graphs to represent more generic and more complicated relationships among different objects, graph mining has played a significant role in data mining, attracting increasing attention in the data mining community. In addition, frequent coherent subgraphs can provide valuable k ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
Due to the ability of graphs to represent more generic and more complicated relationships among different objects, graph mining has played a significant role in data mining, attracting increasing attention in the data mining community. In addition, frequent coherent subgraphs can provide valuable knowledge about the underlying internal structure of a graph database, and mining frequently occurring coherent subgraphs from large dense graph databases has witnessed several applications and received considerable attention in the graph mining community recently. In this article, we study how to efficiently mine the complete set of coherent closed quasicliques from large dense graph databases, which is an especially challenging task due to the fact that the downwardclosure property no longer holds. By fully exploring some properties of quasicliques, we propose several novel optimization techniques which can prune the unpromising and redundant subsearch spaces effectively. Meanwhile, we devise an efficient closure checking scheme to facilitate the discovery of closed quasicliques only. Since large databasescannot be held in main memory, we also design an A preliminary conference version of this article entitled “Coherent Closed QuasiClique Discovery
A SURVEY OF ALGORITHMS FOR DENSE SUBGRAPH DISCOVERY
"... In this chapter, we present a survey of algorithms for dense subgraph discovery. The problem of dense subgraph discovery is closely related to clustering though the two problems also have a number of differences. For example, the problem of clustering is largely concerned with that of finding a fixe ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
In this chapter, we present a survey of algorithms for dense subgraph discovery. The problem of dense subgraph discovery is closely related to clustering though the two problems also have a number of differences. For example, the problem of clustering is largely concerned with that of finding a fixed partition in the data, whereas the problem of dense subgraph discovery defines these dense components in a much more flexible way. The problem of dense subgraph discovery may wither be defined over single or multiple graphs. We explore both cases. In the latter case, the problem is also closely related to the problem of the frequent subgraph discovery. This chapter will discuss and organize the literature on this topic effectively in order to make it much more accessible to the reader.
Mining frequent crossgraph quasicliques
 ACM Trans. on Knowledge Discovery from Data
, 2009
"... Joint mining of multiple datasets can often discover interesting, novel, and reliable patterns which cannot be obtained solely from any single source. For example, in bioinformatics, jointly mining multiple gene expression datasets obtained by different labs or during various biological processes ma ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
Joint mining of multiple datasets can often discover interesting, novel, and reliable patterns which cannot be obtained solely from any single source. For example, in bioinformatics, jointly mining multiple gene expression datasets obtained by different labs or during various biological processes may overcome the heavy noise in the data. Moreover, by joint mining of gene expression data and proteinprotein interaction data, we may discover clusters of genes which show coherent expression patterns and also produce interacting proteins. Such clusters may be potential pathways. In this article, we investigate a novel data mining problem, mining frequent crossgraph quasicliques, which is generalized from several interesting applications in bioinformatics, crossmarket customer segmentation, social network analysis, and Web mining. In a graph, a set of vertices S is a γquasiclique (0 <γ ≤ 1) if each vertex v in S directly connects to at least γ · (S−1) other vertices in S. Given a set of graphs G1,..., Gn and parameter min sup (0 < min sup ≤ 1), a set of vertices S is a frequent crossgraph quasiclique if S is a γquasiclique in at least min sup · n graphs, and there does not exist a proper superset of S having the property. We build a general model, show why the complete set of frequent crossgraph quasicliques cannot be found by previous data mining methods, and study the complexity of the problem. While the problem is difficult, we develop practical algorithms which exploit several interesting and effective techniques and heuristics to efficaciously mine frequent crossgraph quasicliques. A systematic performance study is reported on both synthetic and real data sets. We demonstrate some interesting and meaningful frequent crossgraph quasicliques in bioinformatics. The experimental results also show that our algorithms are efficient and scalable.
Effective Pruning Techniques for Mining Quasicliques ⋆
"... Abstract. Many realworld datasets, such as biological networks and social networks, can be modeled as graphs. It is interesting to discover densely connected subgraphs from these graphs, as such subgraphs represent groups of objects sharing some common properties. Several algorithms have been propo ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Many realworld datasets, such as biological networks and social networks, can be modeled as graphs. It is interesting to discover densely connected subgraphs from these graphs, as such subgraphs represent groups of objects sharing some common properties. Several algorithms have been proposed to mine quasicliques from undirected graphs, but they have not fully utilized the minimum degree constraint for pruning. In this paper, we propose an efficient algorithm called Quick to find maximal quasicliques from undirected graphs. The Quick algorithm uses several effective pruning techniques based on the degree of the vertices to prune unqualified vertices as early as possible, and these pruning techniques can be integrated into existing algorithms to improve their performance as well. Our experiment results show that Quick is orders of magnitude faster than previous work on mining quasicliques. 1
Mobility Performance of
 MacrocellAssisted Small Cells in Manhattan Model,” Vehicular Technology Conference (VTC Spring), 2014 IEEE 79th
, 2014
"... Recent research efforts have made notable progress in improving the performance of (exhaustive) maximal clique enumeration (MCE). However, existing algorithms still suffer from exploring the huge search space of MCE. Furthermore, their results are often undesirable as many of the returned maximal c ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
Recent research efforts have made notable progress in improving the performance of (exhaustive) maximal clique enumeration (MCE). However, existing algorithms still suffer from exploring the huge search space of MCE. Furthermore, their results are often undesirable as many of the returned maximal cliques have large overlapping parts. This redundancy leads to problems in both computational efficiency and usefulness of MCE. In this paper, we aim at providing a concise and complete summary of the set of maximal cliques, which is useful to many applications. We propose the notion of τvisible MCE to achieve this goal and design algorithms to realize the notion. Based on the refined output space, we further consider applications including an efficient computation of the topk results with diversity and an interactive clique exploration process. Our experimental results demonstrate that our approach is capable of producing output of high usability and our algorithms achieve superior efficiency over classic MCE algorithms.
R.: On effectively finding maximal quasicliques in graphs
 Proc. 2nd Learning and Intelligent Optimization Workshop, LION 2
"... Abstract. The problem of finding a maximum clique in a graph is prototypical for many clustering and similarity problems; however, in many realworld scenarios, the classical problem of finding a complete subgraph needs to be relaxed to finding an almost complete subgraph, a socalled quasiclique. ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
Abstract. The problem of finding a maximum clique in a graph is prototypical for many clustering and similarity problems; however, in many realworld scenarios, the classical problem of finding a complete subgraph needs to be relaxed to finding an almost complete subgraph, a socalled quasiclique. In this work, we demonstrate how two previously existing definitions of quasicliques can be unified and how the resulting, more general quasiclique finding problem can be solved by extending two stateoftheart stochastic local search algorithms for the classical maximum clique problem. Preliminary results for these algorithms applied to both, artificial and realworld problem instances demonstrate the usefulness of the new quasiclique definition and the effectiveness of our algorithms. 1
A graphbased clustering method for a large set of sequences using a graph partitioning algorithm
 Genome Informatics
, 2001
"... A graphbased clustering method is proposed to cluster protein sequences into families, which automatically improves clusters of the conventional single linkage clustering method. Our approach formulates sequence clustering problem as a kind of graph partitioning problem in a weighted linkage graph, ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
A graphbased clustering method is proposed to cluster protein sequences into families, which automatically improves clusters of the conventional single linkage clustering method. Our approach formulates sequence clustering problem as a kind of graph partitioning problem in a weighted linkage graph, which vertices correspond to sequences, edges correspond to higher similarities than given threshold and are weighted by their similarities. The effectiveness of our method is shown in comparison with InterPro families in all mouse proteins in SWISSPROT. The result clusters match to InterPro families much better than the single linkage clustering method. 77 % of proteins in InterPro families are classified into appropriate clusters.