Results 1  10
of
181
Authoritative Sources in a Hyperlinked Environment
 JOURNAL OF THE ACM
, 1999
"... The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and repo ..."
Abstract

Cited by 3632 (12 self)
 Add to MetaCart
(Show Context)
The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and report on experiments that demonstrate their effectiveness in a variety of contexts on the World Wide Web. The central issue we address within our framework is the distillation of broad search topics, through the discovery of “authoritative ” information sources on such topics. We propose and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of “hub pages ” that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristics for linkbased analysis.
Consistency of spectral clustering
, 2004
"... Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spe ..."
Abstract

Cited by 572 (15 self)
 Add to MetaCart
Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spectral clustering algorithms, which cluster the data with the help of eigenvectors of graph Laplacian matrices. We show that one of the two of major classes of spectral clustering (normalized clustering) converges under some very general conditions, while the other (unnormalized), is only consistent under strong additional assumptions, which, as we demonstrate, are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering in practical applications. We believe that methods used in our analysis will provide a basis for future exploration of Laplacianbased methods in a statistical setting.
Coclustering documents and words using Bipartite Spectral Graph Partitioning
, 2001
"... ..."
(Show Context)
New spectral methods for ratio cut partition and clustering
 IEEE TRANS. ON COMPUTERAIDED DESIGN
, 1992
"... Partitioning of circuit netlists is important in many phases of VLSI design, ranging from layout to testing and hardware simulation. The ratio cut objective function [29] has received much attention since it naturally captures both mincut and equipartition, the two traditional goals of partitionin ..."
Abstract

Cited by 296 (17 self)
 Add to MetaCart
Partitioning of circuit netlists is important in many phases of VLSI design, ranging from layout to testing and hardware simulation. The ratio cut objective function [29] has received much attention since it naturally captures both mincut and equipartition, the two traditional goals of partitioning. In this paper, we show that the second smallest eigenvalue of a matrix derived from the netlist gives a provably good approximation of the optimal ratio cut partition cost. We also demonstrate that fast Lanczostype methods for the sparse symmetric eigenvalue problem are a robust basis for computing heuristic ratio cuts based on the eigenvector of this second eigenvalue. Effective clustering methods are an immediate byproduct of the second eigenvector computation, and are very successful on the “difficult” input classes proposed in the CAD literature. Finally, we discuss the very natural intersection graph
A Minmax Cut Algorithm for Graph Partitioning and Data Clustering
, 2001
"... An important application of graph partitioning is data clustering using a graph model  the pairwise similarities between all data objects form a weighted graph adjacency matrix that contains all necessary information for clustering. Here we propose a new algorithm for graph partition with an objec ..."
Abstract

Cited by 213 (15 self)
 Add to MetaCart
An important application of graph partitioning is data clustering using a graph model  the pairwise similarities between all data objects form a weighted graph adjacency matrix that contains all necessary information for clustering. Here we propose a new algorithm for graph partition with an objective function that follows the minmax clustering principle. The relaxed version of the optimization of the minmax cut objective function leads to the Fiedler vector in spectral graph partition. Theoretical analyses of minmax cut indicate that it leads to balanced partitions, and lower bonds are derived. The minmax cut algorithm is tested on newsgroup datasets and is found to outperform other current popular partitioning/clustering methods. The linkagebased re nements in the algorithm further improve the quality of clustering substantially. We also demonstrate that the linearized search order based on linkage differential is better than that based on the Fiedler vector, providing another effective partition method.
Spectral partitioning works: planar graphs and finite element meshes, in:
 Proceedings of the 37th Annual Symposium on Foundations of Computer Science,
, 1996
"... Abstract Spectral partitioning methods use the Fiedler vectorthe eigenvector of the secondsmallest eigenvalue of the Laplacian matrixto find a small separator of a graph. These methods are important components of many scientific numerical algorithms and have been demonstrated by experiment to wo ..."
Abstract

Cited by 201 (10 self)
 Add to MetaCart
(Show Context)
Abstract Spectral partitioning methods use the Fiedler vectorthe eigenvector of the secondsmallest eigenvalue of the Laplacian matrixto find a small separator of a graph. These methods are important components of many scientific numerical algorithms and have been demonstrated by experiment to work extremely well. In this paper, we show that spectral partitioning methods work well on boundeddegree planar graphs and finite element meshesthe classes of graphs to which they are usually applied. While naive spectral bisection does not necessarily work, we prove that spectral partitioning techniques can be used to produce separators whose ratio of vertices removed to edges cut is O( √ n) for boundeddegree planar graphs and twodimensional meshes and O(n 1/d ) for wellshaped ddimensional meshes. The heart of our analysis is an upper bound on the secondsmallest eigenvalues of the Laplacian matrices of these graphs: we prove a bound of O(1/n) for boundeddegree planar graphs and O(1/n 2/d ) for wellshaped ddimensional meshes.
Clustering categorical data: An approach based on dynamical systems
, 1998
"... We describe a novel approach for clustering collections of sets, and its application to the analysis and mining of categorical data. By “categorical data, ” we mean tables with fields that cannot be naturally ordered by a metric e.g., the names of producers of automobiles, or the names of product ..."
Abstract

Cited by 180 (1 self)
 Add to MetaCart
We describe a novel approach for clustering collections of sets, and its application to the analysis and mining of categorical data. By “categorical data, ” we mean tables with fields that cannot be naturally ordered by a metric e.g., the names of producers of automobiles, or the names of products offered by a manufacturer. Our approach is based on an iterative method for assigning and propagating weights on the categorical values in a table; this facilitates a type of similarity measure arising from the cooccurrence of values in the dataset. Our techniques can be studied analytically in terms of certain types of nonlinear dynamical systems. We discuss experiments on a variety of tables of synthetic and real data; we find that our iterative methods converge quickly to prominently correlated values of various categorical fields.