Results 1  10
of
35
Efficient Identification of Web Communities
 In Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2000
"... We de ne a community on the web as a set of sites that have more links (in either direction) to members of the community than to nonmembers. Members of such a community can be eciently identi ed in a maximum ow / minimum cut framework, where the source is composed of known members, and the sink c ..."
Abstract

Cited by 228 (12 self)
 Add to MetaCart
We de ne a community on the web as a set of sites that have more links (in either direction) to members of the community than to nonmembers. Members of such a community can be eciently identi ed in a maximum ow / minimum cut framework, where the source is composed of known members, and the sink consists of wellknown nonmembers. A focused crawler that crawls to a xed depth can approximate community membership by augmenting the graph induced by the crawl with links to a virtual sink node. The effectiveness of the approximation algorithm is demonstrated with several crawl results that identify hubs, authorities, web rings, and other link topologies that are useful but not easily categorized. Applications of our approach include focused crawlers and search engines, automatic population of portal categories, and improved ltering.
CLICK and EXPANDER: a system for clustering and visualizing gene expression data
 Bioinformatics
, 2003
"... Motivation: Microarrays have become a central tool in biological research. Their applications range from functional annotation to tissue classification and genetic network inference. A key step in the analysis of gene expression data is the identification of groups of genes that manifest similar exp ..."
Abstract

Cited by 58 (6 self)
 Add to MetaCart
Motivation: Microarrays have become a central tool in biological research. Their applications range from functional annotation to tissue classification and genetic network inference. A key step in the analysis of gene expression data is the identification of groups of genes that manifest similar expression patterns. This translates to the algorithmic problem of clustering genes based on their expression patterns. Results: We present a novel clustering algorithm, called CLICK, and its applications to gene expression analysis. The algorithm utilizes graphtheoretic and statistical techniques to identify tight groups (kernels) of highly similar elements, which are likely to belong to the same true cluster. Several heuristic procedures are then used to expand the kernels into the full clusters. We report on the application of CLICK to a variety of gene expression data sets. In all those applications it outperformed extant algorithms according to several common figures of merit. We also point out that CLICK can be successfully used for the identification of common regulatory motifs in the upstream regions of coregulated genes. Furthermore, we demonstrate how CLICK can be used to accurately classify tissue samples into disease types, based on their expression profiles. Finally, we present a new javabased graphical tool, called EXPANDER, for gene expression analysis and visualization, which incorporates CLICK and several other popular clustering algorithms.
Graph clustering and minimum cut trees
 Internet Mathematics
, 2004
"... Abstract. In this paper, we introduce simple graph clustering methods based on minimum cuts within the graph. The clustering methods are general enough to apply to any kind of graph but are well suited for graphs where the link structure implies a notion of reference, similarity, or endorsement, suc ..."
Abstract

Cited by 53 (3 self)
 Add to MetaCart
Abstract. In this paper, we introduce simple graph clustering methods based on minimum cuts within the graph. The clustering methods are general enough to apply to any kind of graph but are well suited for graphs where the link structure implies a notion of reference, similarity, or endorsement, such as web and citation graphs. We show that the quality of the produced clusters is bounded by strong minimum cut and expansion criteria. We also develop a framework for hierarchical clustering and present applications to realworld data. We conclude that the clustering algorithms satisfy strong theoretical criteria and perform well in practice. 1.
Towards A Discipline Of Experimental Algorithmics
"... The last 20 years have seen enormous progress in the design of algorithms, but very little of it has been put into practice, even within academia; indeed, the gap between theory and practice has continuously widened over these years. Moreover, many of the recently developed algorithms are very hard ..."
Abstract

Cited by 36 (8 self)
 Add to MetaCart
The last 20 years have seen enormous progress in the design of algorithms, but very little of it has been put into practice, even within academia; indeed, the gap between theory and practice has continuously widened over these years. Moreover, many of the recently developed algorithms are very hard to characterize theoretically and, as initially described, suffer from large runningtime coefficients. Thus the algorithms and data structures community needs to return to implementation as the standard of value; we call such an approach Experimental Algorithmics. Experimental Algorithmics studies algorithms and data structures by joining experimental studies with the more traditional theoretical analyses. Experimentation with algorithms and data structures is proving indispensable in the assessment of heuristics for hard problems, in the design of test cases, in the characterization of asymptotic behavior of complex algorithms, in the comparison of competing designs for tractabl...
Minimal Surfaces for Stereo
 in European Conference on Computer Vision (ECCV 02
, 2002
"... Determining shape from stereo has often been posed as a global minimization problem. Once formulated, the minimization problems are then solved with a variety of algorithmic approaches. These approaches include techniques such as dynamic programming mincut and alphaexpansion. In this paper we ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
Determining shape from stereo has often been posed as a global minimization problem. Once formulated, the minimization problems are then solved with a variety of algorithmic approaches. These approaches include techniques such as dynamic programming mincut and alphaexpansion. In this paper we show how an algorithmic technique that constructs a discrete spatial minimal cost surface can be brought to bear on stereo global minimization problems. This problem can then be reduced to a single mincut problem. We use this approach to solve a new global minimization problem that naturally arises when solving for threecamera (trinocular) stereo. Our formulation treats the three cameras symmetrically, while imposing a natural occlusion cost and uniqueness constraint.
Approximation algorithms for singlesource unsplittable flow
 SIAM Journal on Computing
, 2002
"... In the singlesource unsplittable flow problem, commodities must be routed simultaneously from a common source vertex to certain sinks in a given graph with edge capacities. The demand of each commodity must be routed along a single path so that the total flow through any edge is at most its capacit ..."
Abstract

Cited by 23 (4 self)
 Add to MetaCart
In the singlesource unsplittable flow problem, commodities must be routed simultaneously from a common source vertex to certain sinks in a given graph with edge capacities. The demand of each commodity must be routed along a single path so that the total flow through any edge is at most its capacity. This problem was introduced by Kleinberg [1996a] and generalizes several NPcomplete problems. A cost value per unit of flow may also be defined for every edge. In this paper, we implement the 2approximation algorithm of Dinitz, Garg, and Goemans [1999] for congestion, which is the best known, and the (3, 1)approximation algorithm of Skutella [2002] for congestion and cost, which is the best known bicriteria approximation. We study experimentally the quality of approximation achieved by the algorithms and the effect of heuristics on their performance. We also compare these algorithms against the previous best ones by Kolliopoulos and Stein [1999] Categories and Subject Descriptors: G.2.2 [Discrete Mathematics]: Graph Algorithms—Graph
Optimal and efficient speculationbased partial redundancy elimination
 in ‘1st IEEE/ACM International Symposium on Code Generation and Optimization
, 2003
"... Existing profileguided partial redundancy elimination (PRE) methods use speculation to enable the removal of partial redundancies along more frequently executed paths at the expense of introducing additional expression evaluations along less frequently executed paths. While being capable of minimiz ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
Existing profileguided partial redundancy elimination (PRE) methods use speculation to enable the removal of partial redundancies along more frequently executed paths at the expense of introducing additional expression evaluations along less frequently executed paths. While being capable of minimizing the number of expression evaluations in some cases, they are, in general, not computationally optimal in achieving this objective. In addition, the experimental results for their effectiveness are mostly missing. This work addresses the following three problems: (1) Is the computational optimality of speculative PRE solvable in polynomial time? (2) Is edge profiling — less costly than path profiling — sufficient to guarantee the computational optimality? (3) Is the optimal algorithm (if one exists) lightweight enough to be used efficiently in a dynamic compiler? In this paper, we provide positive answers to the first two problems and promising results to the third. We present an algorithm that analyzes edge insertion points based on an edge profile. Our algorithm guarantees optimally that the total number of computations for an expression in the transformed code is always minimized with respect to the edge profile given. This implies that edge profiling, which is less costly than path profiling, is sufficient to guarantee this optimality. The key in the development of our algorithm lies in the removal of some nonessential edges (and consequently, all resulting nonessential nodes) from a flow graph so that the problem of finding an optimal code motion is reduced to one of finding a minimal cut in the reduced (flow) graph thus obtained. We have implemented our algorithm in Intel’s Open Runtime Platform (ORP). Our preliminary results over a number of Java benchmarks show that our algorithm is lightweight and can be potentially a practical component in a dynamic compiler. As a result, our algorithm can also be profitably employed in a profileguided static compiler, in which compilation cost can often be sacrificed for code efficiency.
Graph and Hashing Algorithms for Modern Architectures: Design and Performance
 PROC. 2ND WORKSHOP ON ALGORITHM ENG. WAE 98, MAXPLANCK INST. FÜR INFORMATIK, 1998, IN TR MPII981019
, 1998
"... We study the eects of caches on basic graph and hashing algorithms and show how cache effects inuence the best solutions to these problems. We study the performance of basic data structures for storing lists of values and use these results to design and evaluate algorithms for hashing, BreadthFi ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
We study the eects of caches on basic graph and hashing algorithms and show how cache effects inuence the best solutions to these problems. We study the performance of basic data structures for storing lists of values and use these results to design and evaluate algorithms for hashing, BreadthFirstSearch (BFS) and DepthFirstSearch (DFS). For the basic
Algorithms and Experiments: The New (and Old) Methodology
 J. Univ. Comput. Sci
, 2001
"... The last twenty years have seen enormous progress in the design of algorithms, but little of it has been put into practice. Because many recently developed algorithms are hard to characterize theoretically and have large runningtime coefficients, the gap between theory and practice has widened over ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
The last twenty years have seen enormous progress in the design of algorithms, but little of it has been put into practice. Because many recently developed algorithms are hard to characterize theoretically and have large runningtime coefficients, the gap between theory and practice has widened over these years. Experimentation is indispensable in the assessment of heuristics for hard problems, in the characterization of asymptotic behavior of complex algorithms, and in the comparison of competing designs for tractable problems. Implementation, although perhaps not rigorous experimentation, was characteristic of early work in algorithms and data structures. Donald Knuth has throughout insisted on testing every algorithm and conducting analyses that can predict behavior on actual data; more recently, Jon Bentley has vividly illustrated the difficulty of implementation and the value of testing. Numerical analysts have long understood the need for standardized test suites to ensure robustness, precision and efficiency of numerical libraries. It is only recently, however, that the algorithms community has shown signs of returning to implementation and testing as an integral part of algorithm development. The emerging disciplines of experimental algorithmics and algorithm engineering have revived and are extending many of the approaches used by computing pioneers such as Floyd and Knuth and are placing on a formal basis many of Bentley's observations. We reflect on these issues, looking back at the last thirty years of algorithm development and forward to new challenges: designing cacheaware algorithms, algorithms for mixed models of computation, algorithms for external memory, and algorithms for scientific research.