Results 1  10
of
22
Denser than the Densest Subgraph: Extracting Optimal QuasiCliques with Quality Guarantees
"... Finding dense subgraphs is an important graphmining task with many applications. Given that the direct optimization of edge density is not meaningful, as even a single edge achieves maximum density, research has focused on optimizing alternative density functions. A very popular among such function ..."
Abstract

Cited by 17 (8 self)
 Add to MetaCart
(Show Context)
Finding dense subgraphs is an important graphmining task with many applications. Given that the direct optimization of edge density is not meaningful, as even a single edge achieves maximum density, research has focused on optimizing alternative density functions. A very popular among such functions is the average degree, whose maximization leads to the wellknown densestsubgraph notion. Surprisingly enough, however, densest subgraphs are typically large graphs, with small edge density and large diameter. In this paper, we define a novel density function, which gives subgraphs of much higher quality than densest subgraphs: thegraphsfoundbyourmethodarecompact, dense, and with smaller diameter. We show that the proposed function can be derived from a general framework, which includes other important density functions as subcases and for which we show interesting general theoretical properties. To optimize the proposed function we provide an additive approximation algorithm and a localsearch heuristic. Both algorithms are very efficient and scale well to large graphs. Weevaluateouralgorithmsonrealandsyntheticdatasets, and we also devise several application studies as variants of our original problem. When compared with the method that finds the subgraph of the largest average degree, our algorithms return denser subgraphs with smaller diameter. Finally, we discuss new interesting research directions that our problem leaves open. Categories andSubjectDescriptors
Localized motif discovery in gene regulatory sequences
, 2010
"... Motivation: Discovery of nucleotide motifs that are localized with respect to a certain biological landmark is important in several applications, such as in regulatory sequences flanking the transcription start site, in the neighborhood of known transcription factor binding sites, and in transcript ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Motivation: Discovery of nucleotide motifs that are localized with respect to a certain biological landmark is important in several applications, such as in regulatory sequences flanking the transcription start site, in the neighborhood of known transcription factor binding sites, and in transcription factor binding regions discovered by massively parallel sequencing (ChIPSeq). Results: We report an algorithm called LocalMotif to discover such localized motifs. The algorithm is based on a novel scoring function, called spatial confinement score, which can determine the exact interval of localization of a motif. This score is combined with other existing scoring measures including overrepresentation and relative entropy to determine the overall prominence of the motif. The approach successfully discovers biologically relevant motifs and their intervals of localization in scenarios where the motifs cannot be discovered by general motif finding tools. It is especially useful for discovering multiple colocalized motifs in a set of regulatory sequences, such as those identified by ChIPSeq. Availability and Implementation: The LocalMotif software is available at
Finding the Hierarchy of Dense Subgraphs using Nucleus Decompositions
"... Finding dense substructures in a graph is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasiclique, kdensest subgraph) are NPhard. Furthermore, the goal is ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Finding dense substructures in a graph is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasiclique, kdensest subgraph) are NPhard. Furthermore, the goal is rarely to find the “true optimum”, but to identify many (if not all) dense substructures, understand their distribution in the graph, and ideally determine relationships among them. Current dense subgraph finding algorithms usually optimize some objective, and only find a few such subgraphs without providing any structural relations. We define the nucleus decomposition of a graph, which represents the graph as a forest of nuclei. Each nucleus is a subgraph where smaller cliques are present in many larger cliques. The forest of nuclei is a hierarchy by containment, where the edge density increases as we proceed towards leaf nuclei. Sibling nuclei can have limited intersections, which enables discovering overlapping dense subgraphs. With the right parameters, the nucleus decomposition generalizes the classic notions of kcores and ktruss decompositions. We give provably efficient algorithms for nucleus decompositions, and empirically evaluate their behavior in a variety of real graphs. The tree of nuclei consistently gives a global, hierarchical snapshot of dense substructures, and outputs dense subgraphs of higher quality than other stateoftheart solutions. Our algorithm can process graphs with tens of millions of edges in less than an hour. ∗Work done while the author was interning at Sandia National Laboratories, Livermore, CA.
The Kclique Densest Subgraph Problem
"... Numerous graph mining applications rely on detecting subgraphs which are large nearcliques. Since formulations that are geared towards finding large nearcliques are NPhard and frequently inapproximable due to connections with the Maximum Clique problem, the polytime solvable densest subgraph pr ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Numerous graph mining applications rely on detecting subgraphs which are large nearcliques. Since formulations that are geared towards finding large nearcliques are NPhard and frequently inapproximable due to connections with the Maximum Clique problem, the polytime solvable densest subgraph problem which maximizes the average degree over all possible subgraphs “lies at the core of large scale data mining”[10]. However, frequently the densest subgraph problem fails in detecting large nearcliques in networks. In this work, we introduce the kclique densest subgraph problem, k ≥ 2. This generalizes the well studied densest subgraph problem which is obtained as a special case for k = 2. For k = 3 we obtain a novel formulation which we refer to as the triangle densest subgraph problem: given a graph G(V,E), find a subset of vertices S ∗ such that τ(S∗) = max S⊆V t(S)
Integrative Discovery of Multifaceted Sequence Patterns by FrameRelayed Search and Hybrid PSOANN
"... Abstract: For de novo pattern mining in genomic sequences, the main issues are constructing pattern definition model (PDM) and mining sequence patterns (MSP). The representations of PDMs and the discovery of patterns are functionally dependent; the performances thus depend on the adopted PDMs. The p ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract: For de novo pattern mining in genomic sequences, the main issues are constructing pattern definition model (PDM) and mining sequence patterns (MSP). The representations of PDMs and the discovery of patterns are functionally dependent; the performances thus depend on the adopted PDMs. The popular PDMs provide only descriptive patterns; they lack multifaceted considerations. Many of existing MSP methods are tied up with the exclusively devised PDMs, and the specialized and sophisticated models make the mined results hard to be reused. In this research, an integrative pattern mining system is proposed, which consists of a computationoriented PDM (COPDM) and generalpurpose MSP (GPMSP) methods. The COPDM defines four computational concerns (CCs) as facets of MSP: expression (E), location (L), range (R) and weight (W), which are integrated into a framerelayed pattern model (FRPM). The GPMSP develops a framerelayed search strategy to resolve the ELRCCs firstly, with the aids of criticalparameter automating (CPA) procedure; and then the WCC is determined by hybridizing particle swarm optimization (PSO) and artificial neural network (ANN). The proposed FRPM and GPMSP had been implemented and applied to 22,448 human introns; from the results, all the wellknown patterns were recovered and some new ones were also discovered. Furthermore, the
Finding Subgraphs with Maximum Total Density and Limited Overlap
"... Finding dense subgraphs in large graphs is a key primitive in a variety of realworld application domains, encompassing social network analytics, event detection, biology, and finance. In most such applications, one typically aims at finding several (possibly overlapping) dense subgraphs which migh ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Finding dense subgraphs in large graphs is a key primitive in a variety of realworld application domains, encompassing social network analytics, event detection, biology, and finance. In most such applications, one typically aims at finding several (possibly overlapping) dense subgraphs which might correspond to communities in social networks or interesting events. While a large amount of work is devoted to finding a single densest subgraph, perhaps surprisingly, the problem of finding several dense subgraphs with limited overlap has not been studied in a principled way, to the best of our knowledge. In this work we define and study a natural generalization of the densest subgraph problem, where the main goal is to find at most k subgraphs with maximum total aggregate density, while satisfying an upper bound on the pairwise Jaccard coefficient between the sets of nodes of the subgraphs. After showing that such a problem is NPHard, we devise an efficient algorithm that comes with provable guarantees in some cases of interest, as well as, an efficient practical heuristic. Our extensive evaluation on large realworld graphs confirms the efficiency and effectiveness of our algorithms. 1.
unknown title
, 2006
"... A graphbased motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites ..."
Abstract
 Add to MetaCart
(Show Context)
A graphbased motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites
Fast Hierarchy Construction for Dense Subgraphs
"... ABSTRACT Discovering dense subgraphs and understanding the relations among them is a fundamental problem in graph mining. We want to not only identify dense subgraphs, but also build a hierarchy among them (e.g., larger but sparser subgraphs formed by two smaller dense subgraphs). Peeling algorithm ..."
Abstract
 Add to MetaCart
(Show Context)
ABSTRACT Discovering dense subgraphs and understanding the relations among them is a fundamental problem in graph mining. We want to not only identify dense subgraphs, but also build a hierarchy among them (e.g., larger but sparser subgraphs formed by two smaller dense subgraphs). Peeling algorithms (kcore, ktruss, and nucleus decomposition) have been effective to locate many dense subgraphs. However, constructing a hierarchical representation of density structure, even correctly computing the connected kcores and ktrusses, have been mostly overlooked. Keeping track of connected components during peeling requires an additional traversal operation, which is as expensive as the peeling process. In this paper, we start with a thorough survey and point to nuances in problem formulations that lead to significant differences in runtimes. We then propose efficient and generic algorithms to construct the hierarchy of dense subgraphs for kcore, ktruss, or any nucleus decomposition. Our algorithms leverage the disjointset forest data structure to efficiently construct the hierarchy during traversal. Furthermore, we introduce a new idea to avoid traversal. We construct the subgraphs while visiting neighborhoods in the peeling process, and build the relations to previously constructed subgraphs. We also consider an existing idea to find the kcore hierarchy and adapt for our objectives efficiently. Experiments on different types of large scale realworld networks show significant speedups over naive algorithms and existing alternatives. Our algorithms also outperform the hypothetical limits of any possible traversalbased solution.