Results 1 -
6 of
6
Coherent closed quasi-clique discovery from large dense graph databases
- In KDD
, 2006
"... Frequent coherent subgraphscan provide valuable knowledge about the underlying internal structure of a graph database, and mining frequently occurring coherent subgraphs from large dense graph databaseshas been witnessed several applications and received considerable attention in the graph mining co ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Frequent coherent subgraphscan provide valuable knowledge about the underlying internal structure of a graph database, and mining frequently occurring coherent subgraphs from large dense graph databaseshas been witnessed several applications and received considerable attention in the graph mining community recently. In this paper, we study how to efficiently mine the complete set of coherent closed quasi-cliques from large dense graph databases, which is an especially challenging task due to the downward-closure property no longer holds. By fully exploring some properties of quasicliques, we propose several novel optimization techniques, which can prune the unpromising and redundant sub-search spaces effectively. Meanwhile, we devise an efficient closure checking scheme to facilitate the discovery of only closed quasi-cliques. We also develop a coherent closed quasi-clique mining algorithm, Cocain 1. Thorough performance study shows that Cocain is very efficient and scalable for large dense graph databases.
Visual exploration of complex time-varying graphs
-
, 2006
"... Many graph drawing and visualization algorithms, such as force-directed layout and line-dot rendering, work very well on relatively small and sparse graphs. However, they often produce extremely tangled results and exhibit impractical running times for highly non-planar graphs with large edge dens ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Many graph drawing and visualization algorithms, such as force-directed layout and line-dot rendering, work very well on relatively small and sparse graphs. However, they often produce extremely tangled results and exhibit impractical running times for highly non-planar graphs with large edge density. And very few graph layout algorithms support dynamic time-varying graphs; applying them independently to each frame produces distracting temporally incoherent visualizations. We have developed a new visualization technique based on a novel approach to hierarchically structuring dense graphs via stratification. Using this structure, we formulate a hierarchical force-directed layout algorithm that is both efficient and produces quality graph layouts. The stratification of the graph also allows us to present views of the data that abstract away many small details of its structure. Rather than displaying all edges and nodes at once, resulting in a convoluted rendering, we present an interactive tool that filters edges and nodes using the graph hierarchy and allows users to drill down into the graph for details. Our layout algorithm also accommodates time-varying graphs in a natural way, producing a temporally coherent animation that can be used to analyze and extract trends from dynamic graph data. For example, we demonstrate the use of our method to explore financial correlation data for the U.S. stock market in the period from 1990 to 2005. The user can easily analyze the time-varying correlation graph of the market, uncovering information such as market sector trends, representative stocks for portfolio construction, and the interrelationship of stocks over time.
Clique-detection Models in Computational Biochemistry and Genomics
- European Journal of Operational Research
, 2005
"... Many important problems arising in computational biochemistry and genomics have been formulated in terms of underlying combinatorial optimization models. In particular, a number have been formulated as clique-detection models. The proposed article includes an introduction to the underlying biochemis ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Many important problems arising in computational biochemistry and genomics have been formulated in terms of underlying combinatorial optimization models. In particular, a number have been formulated as clique-detection models. The proposed article includes an introduction to the underlying biochemistry and genomic aspects of the problems as well as to the graph-theoretic aspects of the solution approaches. Each subsequent section describes a particular type of problem, gives an example to show how the graph model can be derived, summarizes recent progress, and discusses challenges associated with solving the associated graph-theoretic models. Clique detection models include prescribing (a) a maximal clique, (b) a maximum clique, (c) a maximum weighted clique, or (d) all maximal cliques in a graph. The particular types of biochemistry and genomics problems that can be represented by a clique detection model include integration of genome mapping data, nonoverlapping local alignments, matching and comparing molecular structures, and protein docking.
Statistical Analysis of Financial Networks
, 2005
"... Massive datasets arise in a broad spectrum of scientific, engineering and commercial applications. In many practically important cases, a massive dataset can be represented as a very large graph with certain attributes associated with its vertices and edges. Studying the structure of this graph is e ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Massive datasets arise in a broad spectrum of scientific, engineering and commercial applications. In many practically important cases, a massive dataset can be represented as a very large graph with certain attributes associated with its vertices and edges. Studying the structure of this graph is essential for understanding the structural properties of the application it represents. Well-known examples of applying this approach are the Internet graph, the Web graph, and the Call graph. It turns out that the degree distributions of al these graphs can be described by the power-law model. Here we consider another important application -- a network representation of the stock market. Stock markets generate huge amounts of data, which can be used for constructing the market graph reflecting the market behavior. We conduct the statistical analysis of this graph and show that it also folliws the power-law model. Moreover, we detect cliques and independent sets in this graph. These special formations have a clear practical interpretation, and their analysis allows one to apply a new data mining technique of classifying financial instruments based on stock prices data, which provides a deeper insight into the internal structure of the stock market.
CONTEST: A controllable test matrix toolbox for MATLAB
- ACM Trans. Math. Software
, 2008
"... Networks describing connectivity structures arise across a vast range of application areas. Examples where it has proved useful to record data include interactions between genes [Kauffman 1969], proteins [de Silva and Stumpf 2005], cortical regions [Kamper et al. 2002; Sporns and Zwi 2004], internet ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Networks describing connectivity structures arise across a vast range of application areas. Examples where it has proved useful to record data include interactions between genes [Kauffman 1969], proteins [de Silva and Stumpf 2005], cortical regions [Kamper et al. 2002; Sporns and Zwi 2004], internet nodes [Faloutsos et al. 1999], web pages [Broder et al. 2000; Page et al. 1998], countries [Fagiolo 2007], co-authors [Newman 2004], telephones [Abello et al. 1998], assets on the stock market [Boginski
Network models of massive datasets
- COMPUTER SCIENCE AND INFORMATION SYSTEMS
, 2004
"... We give a brief overview of the methodology of modeling massive datasets arising in various applications as networks. This approach is often useful for extracting non-trivial information from the datasets by applying standard graph-theoretic techniques. We also point out that graphs representing d ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We give a brief overview of the methodology of modeling massive datasets arising in various applications as networks. This approach is often useful for extracting non-trivial information from the datasets by applying standard graph-theoretic techniques. We also point out that graphs representing datasets coming from diverse practical fields have a similar power-law structure, which indicates that the global organization and evolution of massive datasets arising in various spheres of life nowadays follow similar natural principles.

