Results 1  10
of
60
On finding dense subgraphs
 In ICALP ’09
, 2009
"... Abstract. Given an undirected graph G = (V, E), the density of a subgraph on vertex set S is defined as d(S) = E(S), where E(S) is the set of edges S in the subgraph induced by nodes in S. Finding subgraphs of maximum density is a very well studied problem. One can also generalize this notion t ..."
Abstract

Cited by 38 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Given an undirected graph G = (V, E), the density of a subgraph on vertex set S is defined as d(S) = E(S), where E(S) is the set of edges S in the subgraph induced by nodes in S. Finding subgraphs of maximum density is a very well studied problem. One can also generalize this notion to directed graphs. For a directed graph one notion of density given by Kannan and Vinay [12] is as follows: given subsets S and T of vertices, the density of the subgraph
Dense Subgraph Maintenance under Streaming Edge Weight Updates for Realtime Story Identification
, 2012
"... Recent years have witnessed an unprecedented proliferation of social media. People around the globe author, every day, millions of blog posts, microblog posts, social network status updates, etc. This rich stream of information can be used to identify, on an ongoing basis, emerging stories, and eve ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
(Show Context)
Recent years have witnessed an unprecedented proliferation of social media. People around the globe author, every day, millions of blog posts, microblog posts, social network status updates, etc. This rich stream of information can be used to identify, on an ongoing basis, emerging stories, and events that capture popular attention. Stories can be identified via groups of tightlycoupled realworld entities, namely the people, locations, products, etc., that are involved in the story. The sheer scale, and rapid evolution of the data involved necessitate highly efficient techniques for identifying important stories at every point of time. The main challenge in realtime story identification is the maintenance of dense subgraphs (corresponding to groups of tightlycoupled entities) under streaming edge weight updates (resulting from a stream of usergenerated content). This is the first work to study the efficient maintenance of dense subgraphs under such streaming edge weight updates. For a wide range of definitions of density, we derive theoretical results regarding the magnitude of change that a single edge weight update can cause. Based on these, we propose a novel algorithm, DYNDENS, which outperforms adaptations of existing techniques to this setting, and yields meaningful results. Our approach is validated by a thorough experimental evaluation on largescale real and synthetic datasets.
Dense Subgraphs with Restrictions and Applications to Gene Annotation Graphs
"... Abstract. In this paper, we focus on finding complex annotation patterns representing novel and interesting hypotheses from gene annotation data. We define a generalization of the densest subgraph problem by adding an additional distance restriction (defined by a separate metric) to the nodes of the ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper, we focus on finding complex annotation patterns representing novel and interesting hypotheses from gene annotation data. We define a generalization of the densest subgraph problem by adding an additional distance restriction (defined by a separate metric) to the nodes of the subgraph. We show that while this generalization makes the problem NPhard for arbitrary metrics, when the metric comes from the distance metric of a tree, or an interval graph, the problem can be solved optimally in polynomial time. We also show that the densest subgraph problem with a specified subset of vertices that have to be included in the solution can be solved optimally in polynomial time. In addition, we consider other extensions when not just one solution needs to be found, but we wish to list all subgraphs of almost maximum density as well. We apply this method to a dataset of genes and their annotations obtained from The Arabidopsis Information Resource (TAIR). A user evaluation confirms that the patterns found in the distance restricted densest subgraph for a dataset of photomorphogenesis genes are indeed validated in the literature; a control dataset validates that these are not random patterns. Interestingly, the complex annotation patterns potentially lead to new and as yet unknown hypotheses. We perform experiments to determine the properties of the dense subgraphs, as we vary parameters, including the number of genes and the distance. 1
gPrune: A Constraint Pushing Framework for Graph Pattern Mining
"... Abstract. In graph mining applications, there has been an increasingly strong urge for imposing userspecified constraints on the mining results. However, unlike most traditional itemset constraints, structural constraints, such as density and diameter of a graph, are very hard to be pushed deep int ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
(Show Context)
Abstract. In graph mining applications, there has been an increasingly strong urge for imposing userspecified constraints on the mining results. However, unlike most traditional itemset constraints, structural constraints, such as density and diameter of a graph, are very hard to be pushed deep into the mining process. In this paper, we give the first comprehensive study on the pruning properties of both traditional and structural constraints aiming to reduce not only the pattern search space but the data search space as well. A new general framework, called gPrune, is proposed to incorporate all the constraints in such a way that they recursively reinforce each other through the entire mining process. A new concept, Patterninseparable Dataantimonotonicity, is proposed to handle the structural constraints unique in the context of graph, which, combined with known pruning properties, provides a comprehensive and unified classification framework for structural constraints. The exploration of these antimonotonicities in the context of graph pattern mining is a significant extension to the known classification of constraints, and deepens our understanding of the pruning properties of structural graph constraints. 1
A SURVEY OF ALGORITHMS FOR DENSE SUBGRAPH DISCOVERY
"... In this chapter, we present a survey of algorithms for dense subgraph discovery. The problem of dense subgraph discovery is closely related to clustering though the two problems also have a number of differences. For example, the problem of clustering is largely concerned with that of finding a fixe ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
In this chapter, we present a survey of algorithms for dense subgraph discovery. The problem of dense subgraph discovery is closely related to clustering though the two problems also have a number of differences. For example, the problem of clustering is largely concerned with that of finding a fixed partition in the data, whereas the problem of dense subgraph discovery defines these dense components in a much more flexible way. The problem of dense subgraph discovery may wither be defined over single or multiple graphs. We explore both cases. In the latter case, the problem is also closely related to the problem of the frequent subgraph discovery. This chapter will discuss and organize the literature on this topic effectively in order to make it much more accessible to the reader.
When Clusters Meet Partitions: A New Density Objective for Circuit Decomposition
 In Proc. European Design and Test Conf
, 1994
"... Recent research on multiway partitioning has focused on the minimum cut [20, 26, 27] or generalized ratio cut [28, 29, 5] cost metrics. At the same time, clustering research has focused on such objectives as kl connectivity [12], DS metric [6], or cliquefinding [8]. In this paper, we make the b ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
(Show Context)
Recent research on multiway partitioning has focused on the minimum cut [20, 26, 27] or generalized ratio cut [28, 29, 5] cost metrics. At the same time, clustering research has focused on such objectives as kl connectivity [12], DS metric [6], or cliquefinding [8]. In this paper, we make the basic observation that cut objectives in partitioning, and density objectives in clustering, are fundamentally incompatible. Moreover, for multiway decomposition applications (e.g., decomposing a system onto multiple FPGA chips), the two approaches fail to smoothly "meet in the middle". We present a new measure of multiway circuit decomposition, based on a sum of densities objective. Here, the density of a subgraph is the ratio of the number of edges to the number of nodes in the subgraph. In that we feel that this is a natural measure of circuit decomposition (indeed, arguably more natural than ratio cut for a variety of applications), our new objective can perhaps be viewed in the same sp...
Densest Subgraph in Streaming and MapReduce
"... The problem of finding locally dense components of a graph is an important primitive in data analysis, with wideranging applications from community mining to spam detection and the discovery of biological network modules. In this paper we present new algorithms for finding the densest subgraph in t ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
(Show Context)
The problem of finding locally dense components of a graph is an important primitive in data analysis, with wideranging applications from community mining to spam detection and the discovery of biological network modules. In this paper we present new algorithms for finding the densest subgraph in the streaming model. For any ɛ> 0, our algorithms make O(log 1+ɛ n) passes over the input and find a subgraph whose density is guaranteed to be within a factor 2(1 + ɛ) of the optimum. Our algorithms are also easily parallelizable and we illustrate this by realizing them in the MapReduce model. In addition we perform extensive experimental evaluation on massive realworld graphs showing the performance and scalability of our algorithms in practice. 1.
Denser than the Densest Subgraph: Extracting Optimal QuasiCliques with Quality Guarantees
"... Finding dense subgraphs is an important graphmining task with many applications. Given that the direct optimization of edge density is not meaningful, as even a single edge achieves maximum density, research has focused on optimizing alternative density functions. A very popular among such function ..."
Abstract

Cited by 15 (8 self)
 Add to MetaCart
(Show Context)
Finding dense subgraphs is an important graphmining task with many applications. Given that the direct optimization of edge density is not meaningful, as even a single edge achieves maximum density, research has focused on optimizing alternative density functions. A very popular among such functions is the average degree, whose maximization leads to the wellknown densestsubgraph notion. Surprisingly enough, however, densest subgraphs are typically large graphs, with small edge density and large diameter. In this paper, we define a novel density function, which gives subgraphs of much higher quality than densest subgraphs: thegraphsfoundbyourmethodarecompact, dense, and with smaller diameter. We show that the proposed function can be derived from a general framework, which includes other important density functions as subcases and for which we show interesting general theoretical properties. To optimize the proposed function we provide an additive approximation algorithm and a localsearch heuristic. Both algorithms are very efficient and scale well to large graphs. Weevaluateouralgorithmsonrealandsyntheticdatasets, and we also devise several application studies as variants of our original problem. When compared with the method that finds the subgraph of the largest average degree, our algorithms return denser subgraphs with smaller diameter. Finally, we discuss new interesting research directions that our problem leaves open. Categories andSubjectDescriptors
A New Conceptual Clustering Framework
 MACHINE LEARNING
, 2004
"... We propose a new formulation of the conceptual clustering problem where the goal is to explicitly output a collection of simple and meaningful conjunctions of attributes that define the clusters. The formulation differs from previous approaches since the clusters discovered may overlap and also may ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
We propose a new formulation of the conceptual clustering problem where the goal is to explicitly output a collection of simple and meaningful conjunctions of attributes that define the clusters. The formulation differs from previous approaches since the clusters discovered may overlap and also may not cover all the points. In addition, a point may be assigned to a cluster description even if it only satisfies most, and not necessarily all, of the attributes in the conjunction. Connections between this conceptual clustering problem and the maximum edge biclique problem are made. Simple, randomized algorithms are given that discover a collection of approximate conjunctive cluster descriptions in sublinear time.
A local algorithm for finding dense subgraphs
 In Proc. 19th Annual ACMSIAM Symposium on Discrete Algorithms
, 2008
"... We present a local algorithm for finding dense subgraphs of bipartite graphs, according to the definition of density proposed by Kannan and Vinay. Our algorithm takes as input a bipartite graph with a specified starting vertex, and attempts to find a dense subgraph near that vertex. We prove that fo ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
(Show Context)
We present a local algorithm for finding dense subgraphs of bipartite graphs, according to the definition of density proposed by Kannan and Vinay. Our algorithm takes as input a bipartite graph with a specified starting vertex, and attempts to find a dense subgraph near that vertex. We prove that for any subgraph S with k vertices and density θ, there are a significant number of starting vertices within S for which our algorithm produces a subgraph S ′ with density Ω(θ / log n) on at most O(∆k 2) vertices, where ∆ is the maximum degree. The running time of the algorithm is O(∆k 2), independent of the number of vertices in the graph. 1