Results 1  10
of
41
Efficient Identification of Web Communities
 IN SIXTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING
, 2000
"... We define a community on the web as a set of sites that have more links (in either direction) to members of the community than to nonmembers. Members of such a community can be eciently identified in a maximum flow / minimum cut framework, where the source is composed of known members, and the sink ..."
Abstract

Cited by 290 (13 self)
 Add to MetaCart
We define a community on the web as a set of sites that have more links (in either direction) to members of the community than to nonmembers. Members of such a community can be eciently identified in a maximum flow / minimum cut framework, where the source is composed of known members, and the sink consists of wellknown nonmembers. A focused crawler that crawls to a fixed depth can approximate community membership by augmenting the graph induced by the crawl with links to a virtual sink node. The effectiveness of the approximation algorithm is demonstrated with several crawl results that identify hubs, authorities, web rings, and other link topologies that are useful but not easily categorized. Applications of our approach include focused crawlers and search engines, automatic population of portal categories, and improved filtering.
CLICK and EXPANDER: a system for clustering and visualizing gene expression data
 Bioinformatics
, 2003
"... Motivation: Microarrays have become a central tool in biological research. Their applications range from functional annotation to tissue classification and genetic network inference. A key step in the analysis of gene expression data is the identification of groups of genes that manifest similar exp ..."
Abstract

Cited by 101 (6 self)
 Add to MetaCart
(Show Context)
Motivation: Microarrays have become a central tool in biological research. Their applications range from functional annotation to tissue classification and genetic network inference. A key step in the analysis of gene expression data is the identification of groups of genes that manifest similar expression patterns. This translates to the algorithmic problem of clustering genes based on their expression patterns. Results: We present a novel clustering algorithm, called CLICK, and its applications to gene expression analysis. The algorithm utilizes graphtheoretic and statistical techniques to identify tight groups (kernels) of highly similar elements, which are likely to belong to the same true cluster. Several heuristic procedures are then used to expand the kernels into the full clusters. We report on the application of CLICK to a variety of gene expression data sets. In all those applications it outperformed extant algorithms according to several common figures of merit. We also point out that CLICK can be successfully used for the identification of common regulatory motifs in the upstream regions of coregulated genes. Furthermore, we demonstrate how CLICK can be used to accurately classify tissue samples into disease types, based on their expression profiles. Finally, we present a new javabased graphical tool, called EXPANDER, for gene expression analysis and visualization, which incorporates CLICK and several other popular clustering algorithms.
A Randomized Fully Polynomial Time Approximation Scheme for the All Terminal Network Reliability Problem
, 1997
"... The classic allterminal network reliability problem posits a graph, each of whose edges fails (disappears) independently with some given probability. The goal is to determine the probability that the network becomes disconnected due to edge failures. The practical applications of this question to c ..."
Abstract

Cited by 86 (2 self)
 Add to MetaCart
(Show Context)
The classic allterminal network reliability problem posits a graph, each of whose edges fails (disappears) independently with some given probability. The goal is to determine the probability that the network becomes disconnected due to edge failures. The practical applications of this question to communication networks are obvious, and the problem hasthereforebeenthesubjectofagreatdealofstudy. Sinceitis]Pcomplete, andthusbelievedhardtosolveexactly, a great deal of researchhasbeendevotedtoestimatingthefailureprobability. Acomprehensivesurveycanbefoundin[Col87]. Therstauthorrecentlypresentedanalgorithmfor approximatingtheprobabilityofnetworkdisconnection underrandomedgefailures. In this paper, we report onourexperienceimplementingthisalgorithm.Our implementationshowsthatthealgorithmispractical onnetworksofmoderatesize, and indeedworksbetter thanthetheoreticalboundspredict. Part of this improvementarisesfromheuristicmodicationstothe theoreticalalgorithm, whileanotherpartsuggests that thetheoreticalrunningtimeanalysisofthealgorithm might not be tight. Based on our observation of the implementation, wewereabletodeviseanalyticexplanationsofatleast someoftheimprovedperformance. As one example, we formallyprovetheaccuracyofasimpleheuristic approximationforthereliability. Wealsodiscussother questionsraisedbytheimplementationwhichmightbe susceptibletoanalysis.
Graph clustering and minimum cut trees
 Internet Mathematics
, 2004
"... Abstract. In this paper, we introduce simple graph clustering methods based on minimum cuts within the graph. The clustering methods are general enough to apply to any kind of graph but are well suited for graphs where the link structure implies a notion of reference, similarity, or endorsement, suc ..."
Abstract

Cited by 68 (4 self)
 Add to MetaCart
Abstract. In this paper, we introduce simple graph clustering methods based on minimum cuts within the graph. The clustering methods are general enough to apply to any kind of graph but are well suited for graphs where the link structure implies a notion of reference, similarity, or endorsement, such as web and citation graphs. We show that the quality of the produced clusters is bounded by strong minimum cut and expansion criteria. We also develop a framework for hierarchical clustering and present applications to realworld data. We conclude that the clustering algorithms satisfy strong theoretical criteria and perform well in practice. 1.
Mining closed relational graphs with connectivity constraints
 Proc. KDD'05
"... Relational graphs are widely used in modeling large scale networks such as biological networks and social networks. In this kind of graph, connectivity becomes critical in identifying highly associated groups and clusters. In this paper, we investigate the issues of mining closed frequent graphs wi ..."
Abstract

Cited by 48 (9 self)
 Add to MetaCart
(Show Context)
Relational graphs are widely used in modeling large scale networks such as biological networks and social networks. In this kind of graph, connectivity becomes critical in identifying highly associated groups and clusters. In this paper, we investigate the issues of mining closed frequent graphs with connectivity constraints in massive relational graphs where each graph has around 10K nodes and 1M edges. We adopt the concept of edge connectivity and apply the results from graph theory, to speed up the mining process. Two approaches are developed to handle different mining requests: CloseCut, a patterngrowth approach, and Splat, a patternreduction approach. We have applied these methods in biological datasets and found the discovered patterns interesting.
Towards A Discipline Of Experimental Algorithmics
"... The last 20 years have seen enormous progress in the design of algorithms, but very little of it has been put into practice, even within academia; indeed, the gap between theory and practice has continuously widened over these years. Moreover, many of the recently developed algorithms are very hard ..."
Abstract

Cited by 34 (7 self)
 Add to MetaCart
(Show Context)
The last 20 years have seen enormous progress in the design of algorithms, but very little of it has been put into practice, even within academia; indeed, the gap between theory and practice has continuously widened over these years. Moreover, many of the recently developed algorithms are very hard to characterize theoretically and, as initially described, suffer from large runningtime coefficients. Thus the algorithms and data structures community needs to return to implementation as the standard of value; we call such an approach Experimental Algorithmics. Experimental Algorithmics studies algorithms and data structures by joining experimental studies with the more traditional theoretical analyses. Experimentation with algorithms and data structures is proving indispensable in the assessment of heuristics for hard problems, in the design of test cases, in the characterization of asymptotic behavior of complex algorithms, in the comparison of competing designs for tractabl...
Minimal Surfaces for Stereo
 in European Conference on Computer Vision (ECCV 02
, 2002
"... Determining shape from stereo has often been posed as a global minimization problem. Once formulated, the minimization problems are then solved with a variety of algorithmic approaches. These approaches include techniques such as dynamic programming mincut and alphaexpansion. In this paper we ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
(Show Context)
Determining shape from stereo has often been posed as a global minimization problem. Once formulated, the minimization problems are then solved with a variety of algorithmic approaches. These approaches include techniques such as dynamic programming mincut and alphaexpansion. In this paper we show how an algorithmic technique that constructs a discrete spatial minimal cost surface can be brought to bear on stereo global minimization problems. This problem can then be reduced to a single mincut problem. We use this approach to solve a new global minimization problem that naturally arises when solving for threecamera (trinocular) stereo. Our formulation treats the three cameras symmetrically, while imposing a natural occlusion cost and uniqueness constraint.
Approximation algorithms for singlesource unsplittable flow
 SIAM Journal on Computing
, 2002
"... In the singlesource unsplittable flow problem, commodities must be routed simultaneously from a common source vertex to certain sinks in a given graph with edge capacities. The demand of each commodity must be routed along a single path so that the total flow through any edge is at most its capacit ..."
Abstract

Cited by 26 (4 self)
 Add to MetaCart
In the singlesource unsplittable flow problem, commodities must be routed simultaneously from a common source vertex to certain sinks in a given graph with edge capacities. The demand of each commodity must be routed along a single path so that the total flow through any edge is at most its capacity. This problem was introduced by Kleinberg [1996a] and generalizes several NPcomplete problems. A cost value per unit of flow may also be defined for every edge. In this paper, we implement the 2approximation algorithm of Dinitz, Garg, and Goemans [1999] for congestion, which is the best known, and the (3, 1)approximation algorithm of Skutella [2002] for congestion and cost, which is the best known bicriteria approximation. We study experimentally the quality of approximation achieved by the algorithms and the effect of heuristics on their performance. We also compare these algorithms against the previous best ones by Kolliopoulos and Stein [1999] Categories and Subject Descriptors: G.2.2 [Discrete Mathematics]: Graph Algorithms—Graph
Graph Sparsification in the Semistreaming Model
, 2009
"... Analyzing massive data sets has been one of the key motivations for studying streaming algorithms. In recent years, there has been significant progress in analysing distributions in a streaming setting, but the progress on graph problems has been limited. A main reason for this has been the existenc ..."
Abstract

Cited by 21 (3 self)
 Add to MetaCart
Analyzing massive data sets has been one of the key motivations for studying streaming algorithms. In recent years, there has been significant progress in analysing distributions in a streaming setting, but the progress on graph problems has been limited. A main reason for this has been the existence of linear space lower bounds for even simple problems such as determining the connectedness of a graph. However, in many new scenarios that arise from social and other interaction networks, the number of vertices is significantly less than the number of edges. This has led to the formulation of the semistreaming model where we assume that the space is (near) linear in the number of vertices (but not necessarily the edges), and the edges appear in an arbitrary (and possibly adversarial) order. However there has been limited progress in analysing graph algorithms in this model. In this paper we focus on graph sparsification, which is one of the major building blocks in a variety of graph algorithms. Further, there has been a long history of (nonstreaming) sampling algorithms that provide sparse graph approximations and it a natural question to ask: since the end result of the sparse approximation is a small (linear) space structure, can we achieve that using a small space, and in addition using a single pass over the data? The question is interesting from the standpoint of both theory and practice and we answer the question in the affirmative, by providing a one pass Õ(n/ɛ2) space algorithm that produces a sparsification that approximates each cut to a (1 + ɛ) factor. We also show that Ω(n log 1 ɛ) space is necessary for a one pass streaming algorithm to approximate the mincut, improving upon the Ω(n) lower bound that arises from lower bounds for testing connectivity.
Optimal and efficient speculationbased partial redundancy elimination
 in ‘1st IEEE/ACM International Symposium on Code Generation and Optimization
, 2003
"... Existing profileguided partial redundancy elimination (PRE) methods use speculation to enable the removal of partial redundancies along more frequently executed paths at the expense of introducing additional expression evaluations along less frequently executed paths. While being capable of minimiz ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
(Show Context)
Existing profileguided partial redundancy elimination (PRE) methods use speculation to enable the removal of partial redundancies along more frequently executed paths at the expense of introducing additional expression evaluations along less frequently executed paths. While being capable of minimizing the number of expression evaluations in some cases, they are, in general, not computationally optimal in achieving this objective. In addition, the experimental results for their effectiveness are mostly missing. This work addresses the following three problems: (1) Is the computational optimality of speculative PRE solvable in polynomial time? (2) Is edge profiling — less costly than path profiling — sufficient to guarantee the computational optimality? (3) Is the optimal algorithm (if one exists) lightweight enough to be used efficiently in a dynamic compiler? In this paper, we provide positive answers to the first two problems and promising results to the third. We present an algorithm that analyzes edge insertion points based on an edge profile. Our algorithm guarantees optimally that the total number of computations for an expression in the transformed code is always minimized with respect to the edge profile given. This implies that edge profiling, which is less costly than path profiling, is sufficient to guarantee this optimality. The key in the development of our algorithm lies in the removal of some nonessential edges (and consequently, all resulting nonessential nodes) from a flow graph so that the problem of finding an optimal code motion is reduced to one of finding a minimal cut in the reduced (flow) graph thus obtained. We have implemented our algorithm in Intel’s Open Runtime Platform (ORP). Our preliminary results over a number of Java benchmarks show that our algorithm is lightweight and can be potentially a practical component in a dynamic compiler. As a result, our algorithm can also be profitably employed in a profileguided static compiler, in which compilation cost can often be sacrificed for code efficiency.