Results 1 - 10
of
27
HyPursuit: A hierarchical network search engine that exploits content-link hypertext clustering
- PROCEEDINGS OF THE SEVENTH ACM CONFERENCE ON HYPERTEXT
, 1996
"... HyPursuit is a new hierarchical network search engine that clusters hypertext documents to structure a given information space for browsing and search activities. Our content-link clustering algorithm is based on the semantic information embedded in hyperlink structures and document contents. HyPurs ..."
Abstract
-
Cited by 88 (2 self)
- Add to MetaCart
HyPursuit is a new hierarchical network search engine that clusters hypertext documents to structure a given information space for browsing and search activities. Our content-link clustering algorithm is based on the semantic information embedded in hyperlink structures and document contents. HyPursuit admits multiple, coexisting cluster hierarchies based on different principles for grouping documents, such as the Library of Congress catalog scheme and automatically created hypertext clusters. HyPursuit's abstraction functions summarize cluster contents to support scalable query processing. The abstraction functions satisfy system resource limitations with controlled information loss. The result of query processing operations on a cluster summary approximates the result of performing the operations on the entire information space. We constructed a prototype system comprising 100 leaf World Wide Web sites and a hierarchy of 42 servers that route queries to the leaf sites. Experience with our system suggests that abstraction functions based on hypertext clustering can be used to construct meaningful and scalable cluster hierarchies. We are also encouraged by preliminary results on clustering based on both document contents and hyperlink structures.
A new approach to the minimum cut problem
- Journal of the ACM
, 1996
"... Abstract. This paper presents a new approach to finding minimum cuts in undirected graphs. The fundamental principle is simple: the edges in a graph’s minimum cut form an extremely small fraction of the graph’s edges. Using this idea, we give a randomized, strongly polynomial algorithm that finds th ..."
Abstract
-
Cited by 83 (8 self)
- Add to MetaCart
Abstract. This paper presents a new approach to finding minimum cuts in undirected graphs. The fundamental principle is simple: the edges in a graph’s minimum cut form an extremely small fraction of the graph’s edges. Using this idea, we give a randomized, strongly polynomial algorithm that finds the minimum cut in an arbitrarily weighted undirected graph with high probability. The algorithm runs in O(n 2 log 3 n) time, a significant improvement over the previous Õ(mn) time bounds based on maximum flows. It is simple and intuitive and uses no complex data structures. Our algorithm can be parallelized to run in �� � with n 2 processors; this gives the first proof that the minimum cut problem can be solved in ���. The algorithm does more than find a single minimum cut; it finds all of them. With minor modifications, our algorithm solves two other problems of interest. Our algorithm finds all cuts with value within a multiplicative factor of � of the minimum cut’s in expected Õ(n 2 � ) time, or in �� � with n 2 � processors. The problem of finding a minimum multiway cut of a graph into r pieces is solved in expected Õ(n 2(r�1) ) time, or in �� � with n 2(r�1) processors. The “trace ” of the algorithm’s execution on these two problems forms a new compact data structure for representing all small cuts and all multiway cuts in a graph. This data structure can be efficiently transformed into the
Minimum Cuts in Near-Linear Time
- In Proceedings of the 28 th ACM Symposium on Theory of Computing [ACM96
, 1996
"... We significantly improve known time bounds for solving the minimum cut problem on undirected graphs. We use a "semi-duality" between minimum cuts and maximum spanning tree packings combined with our previously developed random sampling techniques. We give a randomized (Monte Carlo) algorithm that fi ..."
Abstract
-
Cited by 63 (11 self)
- Add to MetaCart
We significantly improve known time bounds for solving the minimum cut problem on undirected graphs. We use a "semi-duality" between minimum cuts and maximum spanning tree packings combined with our previously developed random sampling techniques. We give a randomized (Monte Carlo) algorithm that finds a minimum cut in an m-edge, n-vertex graph with high probability in O(m log 3 n) time. We also give a simpler randomized algorithm that finds all minimum cuts with high probability in O(n 2 log n) time. This variant has an optimal RNC parallelization. Both variants improve on the previous best time bound of O(n 2 log 3 n). Other applications of the tree-packing approach are new, nearly tight bounds on the number of near minimum cuts a graph may have and a new data structure for representing them in a space-efficient manner. 1 Introduction The minimum cut problem has been studied for many years as a fundamental graph optimization problem with numerous applications. Initially, th...
An NC Algorithm for Minimum Cuts
- IN PROCEEDINGS OF THE 25TH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING
"... We show that the minimum cut problem for weighted undirected graphs can be solved in NC using three separate and independently interesting results. The first is an (m 2 =n)-processor NC algorithm for finding a (2 + ffl)-approximation to the minimum cut. The second is a randomized reduction from ..."
Abstract
-
Cited by 39 (4 self)
- Add to MetaCart
We show that the minimum cut problem for weighted undirected graphs can be solved in NC using three separate and independently interesting results. The first is an (m 2 =n)-processor NC algorithm for finding a (2 + ffl)-approximation to the minimum cut. The second is a randomized reduction from the minimum cut problem to the problem of obtaining a (2 + ffl)-approximation to the minimum cut. This reduction involves a natural combinatorial Set-Isolation Problem that can be solved easily in RNC. The third result is a derandomization of this RNC solution that requires a combination of two widely used tools: pairwise independence and random walks on expanders. We believe that the set-isolation approach will prove useful in other derandomization problems. The techniques extend to two related problems: we describe NC algorithms finding minimum k-way cuts for any constant k and finding all cuts of value within any constant factor of the minimum. Another application of these techni...
Experimental Study of Minimum Cut Algorithms
- PROCEEDINGS OF THE EIGHTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS (SODA)
, 1997
"... Recently, several new algorithms have been developed for the minimum cut problem. These algorithms are very different from the earlier ones and from each other and substantially improve worst-case time bounds for the problem. We conduct experimental evaluation the relative performance of these algor ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
Recently, several new algorithms have been developed for the minimum cut problem. These algorithms are very different from the earlier ones and from each other and substantially improve worst-case time bounds for the problem. We conduct experimental evaluation the relative performance of these algorithms. In the process, we develop heuristics and data structures that substantially improve practical performance of the algorithms. We also develop problem families for testing minimum cut algorithms. Our work leads to a better understanding of practical performance of the minimum cut algorithms and produces very efficient codes for the problem.
A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation
- Communication Cognition and Artificial Intelligence, Spring
, 1998
"... : The rapid proliferation of textual and multimedia online databases, digital libraries, Internet servers, and intranet services has turned researchers' and practitioners' dream of creating an information-rich society into a nightmare of info-gluts. Many researchers believe that turning an info-glu ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
: The rapid proliferation of textual and multimedia online databases, digital libraries, Internet servers, and intranet services has turned researchers' and practitioners' dream of creating an information-rich society into a nightmare of info-gluts. Many researchers believe that turning an info-glut into a useful digital library requires automated techniques for organizing and categorizing large-scale information. This paper presents research in which we sought to develop a scaleable textual classification and categorization system based on the Kohonen's self-organizing feature map (SOM) algorithm. In our paper, we show how self-organization can be used for automatic thesaurus generation. Our proposed data structure and algorithm took advantage of the sparsity of coordinates in the document input vectors and reduced the SOM computational complexity by several order of magnitude. The proposed Scaleable SOM (SSOM) algorithm makes large-scale textual categorization tasks a possibility. A...
Approximating Layout Problems on Random Geometric Graphs
- Journal of Algorithms
, 2001
"... In this paper, we study the approximability of several layout problems on a family of random geometric graphs. Vertices of random geometric graphs are randomly distributed on the unit square and are connected by edges whenever they are closer than some given parameter. The layout problems that we co ..."
Abstract
-
Cited by 20 (10 self)
- Add to MetaCart
In this paper, we study the approximability of several layout problems on a family of random geometric graphs. Vertices of random geometric graphs are randomly distributed on the unit square and are connected by edges whenever they are closer than some given parameter. The layout problems that we consider are: Bandwidth, Minimum Linear Arrangement, Minimum Cut Width, Minimum Sum Cut, Vertex Separation and Edge Bisection. We first prove that some of these problems remain NP-complete even for geometric graphs. Afterwards, we compute lower bounds that hold, almost surely, for random geometric graphs. Then, we present two heuristics that, almost surely, turn to be constant approximation algorithms for our layout problems on random geometric graphs. In fact, for the Bandwidth and Vertex Separation problems, these heuristics are asymptotically optimal. Finally, we use the theoretical results in order to empirically compare these and other well-known heuristics. # This research was partially ...
On the use of information retrieval techniques for the automatic construction of hypertext
- Information Processing and Management
, 1997
"... The rst part of the paper brie y introduces what automatic authoring of a hypertext for information retrieval means. The most di cult part of the automatic construction of a hypertext is the creation of links connecting documents or document fragments that are semantically related. Because of this, ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
The rst part of the paper brie y introduces what automatic authoring of a hypertext for information retrieval means. The most di cult part of the automatic construction of a hypertext is the creation of links connecting documents or document fragments that are semantically related. Because of this, to many researchers it seemed natural to use IR techniques for this purpose, since IR has always dealt with the construction of relationships between objects mutually relevant. The second part of the paper presents a survey of some of attempts toward the automatic construction of hypertexts for information retrieval. This part will identify and compare scope, advantages and limitations of di erent approaches. The aim of this survey is to point out the main and most successful current lines of research.
Visualization of search results in document retrieval systems, General Examination Report
- University of Washington, SIGTRS Bulletin
, 1998
"... Traditional information retrieval systems present search results as a ranked list of documents, ordered by their estimated relevance to the query. Visualization of search results is emerging as a powerful tool for presenting more information to the user in a way that is both intuitive and easy to in ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Traditional information retrieval systems present search results as a ranked list of documents, ordered by their estimated relevance to the query. Visualization of search results is emerging as a powerful tool for presenting more information to the user in a way that is both intuitive and easy to interpret. This paper describes the various visualization techniques, and presents a novel classification of these methods. Next it discusses several of the important issues concerning these techniques: how they are evaluated, how they scale to large document sets, can they be combined, and will we see them in practice on the Web anytime soon. 1

