Results 1  10
of
33
On finding dense subgraphs
 In ICALP ’09
, 2009
"... Abstract. Given an undirected graph G = (V, E), the density of a subgraph on vertex set S is defined as d(S) = E(S), where E(S) is the set of edges S in the subgraph induced by nodes in S. Finding subgraphs of maximum density is a very well studied problem. One can also generalize this notion t ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
Abstract. Given an undirected graph G = (V, E), the density of a subgraph on vertex set S is defined as d(S) = E(S), where E(S) is the set of edges S in the subgraph induced by nodes in S. Finding subgraphs of maximum density is a very well studied problem. One can also generalize this notion to directed graphs. For a directed graph one notion of density given by Kannan and Vinay [12] is as follows: given subsets S and T of vertices, the density of the subgraph
When Clusters Meet Partitions: A New Density Objective for Circuit Decomposition
 In Proc. European Design and Test Conf
, 1994
"... Recent research on multiway partitioning has focused on the minimum cut [20, 26, 27] or generalized ratio cut [28, 29, 5] cost metrics. At the same time, clustering research has focused on such objectives as kl connectivity [12], DS metric [6], or cliquefinding [8]. In this paper, we make the b ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
Recent research on multiway partitioning has focused on the minimum cut [20, 26, 27] or generalized ratio cut [28, 29, 5] cost metrics. At the same time, clustering research has focused on such objectives as kl connectivity [12], DS metric [6], or cliquefinding [8]. In this paper, we make the basic observation that cut objectives in partitioning, and density objectives in clustering, are fundamentally incompatible. Moreover, for multiway decomposition applications (e.g., decomposing a system onto multiple FPGA chips), the two approaches fail to smoothly "meet in the middle". We present a new measure of multiway circuit decomposition, based on a sum of densities objective. Here, the density of a subgraph is the ratio of the number of edges to the number of nodes in the subgraph. In that we feel that this is a natural measure of circuit decomposition (indeed, arguably more natural than ratio cut for a variety of applications), our new objective can perhaps be viewed in the same sp...
Dense Subgraphs with Restrictions and Applications to Gene Annotation Graphs
"... Abstract. In this paper, we focus on finding complex annotation patterns representing novel and interesting hypotheses from gene annotation data. We define a generalization of the densest subgraph problem by adding an additional distance restriction (defined by a separate metric) to the nodes of the ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
Abstract. In this paper, we focus on finding complex annotation patterns representing novel and interesting hypotheses from gene annotation data. We define a generalization of the densest subgraph problem by adding an additional distance restriction (defined by a separate metric) to the nodes of the subgraph. We show that while this generalization makes the problem NPhard for arbitrary metrics, when the metric comes from the distance metric of a tree, or an interval graph, the problem can be solved optimally in polynomial time. We also show that the densest subgraph problem with a specified subset of vertices that have to be included in the solution can be solved optimally in polynomial time. In addition, we consider other extensions when not just one solution needs to be found, but we wish to list all subgraphs of almost maximum density as well. We apply this method to a dataset of genes and their annotations obtained from The Arabidopsis Information Resource (TAIR). A user evaluation confirms that the patterns found in the distance restricted densest subgraph for a dataset of photomorphogenesis genes are indeed validated in the literature; a control dataset validates that these are not random patterns. Interestingly, the complex annotation patterns potentially lead to new and as yet unknown hypotheses. We perform experiments to determine the properties of the dense subgraphs, as we vary parameters, including the number of genes and the distance. 1
A New Conceptual Clustering Framework
 MACHINE LEARNING
, 2004
"... We propose a new formulation of the conceptual clustering problem where the goal is to explicitly output a collection of simple and meaningful conjunctions of attributes that define the clusters. The formulation differs from previous approaches since the clusters discovered may overlap and also may ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
We propose a new formulation of the conceptual clustering problem where the goal is to explicitly output a collection of simple and meaningful conjunctions of attributes that define the clusters. The formulation differs from previous approaches since the clusters discovered may overlap and also may not cover all the points. In addition, a point may be assigned to a cluster description even if it only satisfies most, and not necessarily all, of the attributes in the conjunction. Connections between this conceptual clustering problem and the maximum edge biclique problem are made. Simple, randomized algorithms are given that discover a collection of approximate conjunctive cluster descriptions in sublinear time.
On Finding Large Conjunctive Clusters
"... We propose a new formulation of the clustering problem that differs from previous work in several aspects. First, the goal is to explicitly output a collection of simple and meaningful conjunctive descriptions of the clusters. Second, the clusters might overlap, i.e., a point can belong to multiple ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
We propose a new formulation of the clustering problem that differs from previous work in several aspects. First, the goal is to explicitly output a collection of simple and meaningful conjunctive descriptions of the clusters. Second, the clusters might overlap, i.e., a point can belong to multiple clusters. Third, the clusters might not cover all points, i.e., not every point is clustered. Finally, we allow a point to be assigned to a conjunctive cluster description even if it does not completely satisfy all of the attributes, but rather only satisfies most. A convenient way ti view...
gPrune: A Constraint Pushing Framework for Graph Pattern Mining
"... Abstract. In graph mining applications, there has been an increasingly strong urge for imposing userspecified constraints on the mining results. However, unlike most traditional itemset constraints, structural constraints, such as density and diameter of a graph, are very hard to be pushed deep int ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Abstract. In graph mining applications, there has been an increasingly strong urge for imposing userspecified constraints on the mining results. However, unlike most traditional itemset constraints, structural constraints, such as density and diameter of a graph, are very hard to be pushed deep into the mining process. In this paper, we give the first comprehensive study on the pruning properties of both traditional and structural constraints aiming to reduce not only the pattern search space but the data search space as well. A new general framework, called gPrune, is proposed to incorporate all the constraints in such a way that they recursively reinforce each other through the entire mining process. A new concept, Patterninseparable Dataantimonotonicity, is proposed to handle the structural constraints unique in the context of graph, which, combined with known pruning properties, provides a comprehensive and unified classification framework for structural constraints. The exploration of these antimonotonicities in the context of graph pattern mining is a significant extension to the known classification of constraints, and deepens our understanding of the pruning properties of structural graph constraints. 1
A local algorithm for finding dense subgraphs
 In Proc. 19th Annual ACMSIAM Symposium on Discrete Algorithms
, 2008
"... We present a local algorithm for finding dense subgraphs of bipartite graphs, according to the definition of density proposed by Kannan and Vinay. Our algorithm takes as input a bipartite graph with a specified starting vertex, and attempts to find a dense subgraph near that vertex. We prove that fo ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
We present a local algorithm for finding dense subgraphs of bipartite graphs, according to the definition of density proposed by Kannan and Vinay. Our algorithm takes as input a bipartite graph with a specified starting vertex, and attempts to find a dense subgraph near that vertex. We prove that for any subgraph S with k vertices and density θ, there are a significant number of starting vertices within S for which our algorithm produces a subgraph S ′ with density Ω(θ / log n) on at most O(∆k 2) vertices, where ∆ is the maximum degree. The running time of the algorithm is O(∆k 2), independent of the number of vertices in the graph. 1
Experimental evaluation of a parametric flow algorithm
, 2006
"... We study a practical implementation of the parametric flow algorithm of Gallo, Grigoriadis, and Tarjan. We describe an efficient implementation of the algorithm and compare it with a simpler algorithm. ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
We study a practical implementation of the parametric flow algorithm of Gallo, Grigoriadis, and Tarjan. We describe an efficient implementation of the algorithm and compare it with a simpler algorithm.
kedge Subgraph Problems
 Discrete Applied Mathematics and Combinatorial Operations Research and Computer Science
, 1996
"... We study here a problem on graphs that involves finding a subgraph of maximum node weights spanning up to k edges. We interpret the concept of "spanning" to mean that at least one endpoint of the edge is in the subgraph in which we seek to maximize the total weight of the nodes. We discuss the co ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
We study here a problem on graphs that involves finding a subgraph of maximum node weights spanning up to k edges. We interpret the concept of "spanning" to mean that at least one endpoint of the edge is in the subgraph in which we seek to maximize the total weight of the nodes. We discuss the complexity of this problem and other related problems with different concepts of "spanning" and show that most of these variants are NPcomplete. For the problem defined, we demonstrate a factor 3 approximation algorithm with complexity O(kn) for a graph on n nodes. For the unweighted version of the the problem in a graph on m edges we describe a factor 2 approximation algorithm of greedy type, with complexity O(n + m). For trees and forests we present a polynomial time algorithm applicable to our problem and also to a problem seeking to maximize (minimize) the weight of a subtree on k nodes. Department of Mechanical Engineering, The University of Texas at Austin. This research has been...
Link Prediction for Annotation Graphs using Graph Summarization
"... Abstract. Annotation graph datasets are a natural representation of scientific knowledge. They are common in the life sciences where genes or proteins are annotated with controlled vocabulary terms (CV terms) from ontologies. The W3C Linking Open Data (LOD) initiative and semantic Web technologies a ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Abstract. Annotation graph datasets are a natural representation of scientific knowledge. They are common in the life sciences where genes or proteins are annotated with controlled vocabulary terms (CV terms) from ontologies. The W3C Linking Open Data (LOD) initiative and semantic Web technologies are playing a leading role in making such datasets widely available. Scientists can mine these datasets to discover patterns of annotation. While ontology alignment and integration across datasets has been explored in the context of the semantic Web, there is no current approach to mine such patterns in annotation graph datasets. In this paper, we propose a novel approach for link prediction; it is a preliminary task when discovering more complex patterns. Our prediction is based on a complementary methodology of graph summarization (GS) and dense subgraphs (DSG). GS can exploit and summarize knowledge captured within the ontologies and in the annotation patterns. DSG uses the ontology structure, in particular the distance between CV terms, to filter the graph, and to find promising subgraphs. We develop a scoring function based on multiple heuristics to rank the predictions. We perform an extensive evaluation on Arabidopsis thaliana genes.