Results 1  10
of
523
On Spectral Clustering: Analysis and an algorithm
 ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS
, 2001
"... Despite many empirical successes of spectral clustering methods  algorithms that cluster points using eigenvectors of matrices derived from the distances between the points  there are several unresolved issues. First, there is a wide variety of algorithms that use the eigenvectors in slightly ..."
Abstract

Cited by 1102 (14 self)
 Add to MetaCart
Despite many empirical successes of spectral clustering methods  algorithms that cluster points using eigenvectors of matrices derived from the distances between the points  there are several unresolved issues. First, there is a wide variety of algorithms that use the eigenvectors in slightly different ways. Second, many of these algorithms have no proof that they will actually compute a reasonable clustering. In this paper, we present a simple spectral clustering algorithm that can be implemented using a few lines of Matlab. Using tools from matrix perturbation theory, we analyze the algorithm, and give conditions under which it can be expected to do well. We also show surprisingly good experimental results on a number of challenging clustering problems.
Consistency of spectral clustering
, 2004
"... Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spe ..."
Abstract

Cited by 286 (15 self)
 Add to MetaCart
Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spectral clustering algorithms, which cluster the data with the help of eigenvectors of graph Laplacian matrices. We show that one of the two of major classes of spectral clustering (normalized clustering) converges under some very general conditions, while the other (unnormalized), is only consistent under strong additional assumptions, which, as we demonstrate, are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering in practical applications. We believe that methods used in our analysis will provide a basis for future exploration of Laplacianbased methods in a statistical setting.
Single View Metrology
, 1999
"... We describe how 3D affine measurements may be computed from a single perspective view of a scene given only minimal geometric information determined from the image. This minimal information is typically the vanishing line of a reference plane, and a vanishing point for a direction not parallel to th ..."
Abstract

Cited by 164 (3 self)
 Add to MetaCart
We describe how 3D affine measurements may be computed from a single perspective view of a scene given only minimal geometric information determined from the image. This minimal information is typically the vanishing line of a reference plane, and a vanishing point for a direction not parallel to the plane. It is shown that affine scene structure may then be determined from the image, without knowledge of the camera's internal calibration (e.g. focal length), nor of the explicit relation between camera and world (pose). In particular, we show how to (i) compute the distance between planes parallel to the reference plane (up to a common scale factor); (ii) compute area and length ratios on any plane parallel to the reference plane; (iii) determine the camera's (viewer's) location. Simple geometric derivations are given for these results. We also develop an algebraic representation which unifies the three types of measurement and, amongst other advantages, permits a first order error pr...
Fast Monte Carlo Algorithms for Matrices II: Computing a LowRank Approximation to a Matrix
 SIAM Journal on Computing
, 2004
"... matrix A. It is often of interest to nd a lowrank approximation to A, i.e., an approximation D to the matrix A of rank not greater than a speci ed rank k, where k is much smaller than m and n. Methods such as the Singular Value Decomposition (SVD) may be used to nd an approximation to A which ..."
Abstract

Cited by 142 (17 self)
 Add to MetaCart
matrix A. It is often of interest to nd a lowrank approximation to A, i.e., an approximation D to the matrix A of rank not greater than a speci ed rank k, where k is much smaller than m and n. Methods such as the Singular Value Decomposition (SVD) may be used to nd an approximation to A which is the best in a well de ned sense. These methods require memory and time which are superlinear in m and n; for many applications in which the data sets are very large this is prohibitive. Two simple and intuitive algorithms are presented which, when given an m n matrix A, compute a description of a lowrank approximation D to A, and which are qualitatively faster than the SVD. Both algorithms have provable bounds for the error matrix A D . For any matrix X , let kXk and kXk 2 denote its Frobenius norm and its spectral norm, respectively. In the rst algorithm, c = O(1) columns of A are randomly chosen. If the m c matrix C consists of those c columns of A (after appropriate rescaling) then it is shown that from C C approximations to the top singular values and corresponding singular vectors may be computed. From the computed singular vectors a description D of the matrix A may be computed such that rank(D ) k and such that holds with high probability for both = 2; F . This algorithm may be implemented without storing the matrix A in Random Access Memory (RAM), provided it can make two passes over the matrix stored in external memory and use O(m + n) additional RAM memory. The second algorithm is similar except that it further approximates the matrix C by randomly sampling r = O(1) rows of C to form a r c matrix W . Thus, it has additional error, but it can be implemented in three passes over the matrix using only constant ...
On the Nyström Method for Approximating a Gram Matrix for Improved KernelBased Learning
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... A problem for many kernelbased methods is that the amount of computation required to find the solution scales as O(n³), where n is the number of training examples. We develop and analyze an algorithm to compute an easilyinterpretable lowrank approximation to an nn Gram matrix G such that compu ..."
Abstract

Cited by 108 (7 self)
 Add to MetaCart
A problem for many kernelbased methods is that the amount of computation required to find the solution scales as O(n³), where n is the number of training examples. We develop and analyze an algorithm to compute an easilyinterpretable lowrank approximation to an nn Gram matrix G such that computations of interest may be performed more rapidly. The approximation is of the form G k = CW , where C is a matrix consisting of a small number c of columns of G and W k is the best rankk approximation to W , the matrix formed by the intersection between those c columns of G and the corresponding c rows of G. An important aspect of the algorithm is the probability distribution used to randomly sample the columns; we will use a judiciouslychosen and datadependent nonuniform probability distribution. Let F denote the spectral norm and the Frobenius norm, respectively, of a matrix, and let G k be the best rankk approximation to G. We prove that by choosing O(k/# ) columns both in expectation and with high probability, for both # = 2, F , and for all k : 0 rank(W ). This approximation can be computed using O(n) additional space and time, after making two passes over the data from external storage. The relationships between this algorithm, other related matrix decompositions, and the Nyström method from integral equation theory are discussed.
ThreeDimensional Face Recognition
, 2005
"... An expressioninvariant 3D face recognition approach is presented. Our basic assumption is that facial expressions can be modelled as isometries of the facial surface. This allows to construct expressioninvariant representations of faces using the bendinginvariant canonical forms approach. The re ..."
Abstract

Cited by 103 (22 self)
 Add to MetaCart
An expressioninvariant 3D face recognition approach is presented. Our basic assumption is that facial expressions can be modelled as isometries of the facial surface. This allows to construct expressioninvariant representations of faces using the bendinginvariant canonical forms approach. The result is an efficient and accurate face recognition algorithm, robust to facial expressions, that can distinguish between identical twins (the first two authors). We demonstrate a prototype system based on the proposed algorithm and compare its performance to classical face recognition methods. The numerical methods employed by our approach do not require the facial surface explicitly. The surface gradients field, or the surface metric, are sufficient for constructing the expressioninvariant representation of any given face. It allows us to perform the 3D face recognition task while avoiding the surface reconstruction stage.
Subspace Linear Discriminant Analysis for Face Recognition
, 1999
"... In this paper we describe a holistic face recognition method based on subspace Linear Discriminant Analysis (LDA). The method consists of two steps: first we project the face image from the original vector space to a face subspace via Principal Component Analysis where the subspace dimension is care ..."
Abstract

Cited by 98 (8 self)
 Add to MetaCart
In this paper we describe a holistic face recognition method based on subspace Linear Discriminant Analysis (LDA). The method consists of two steps: first we project the face image from the original vector space to a face subspace via Principal Component Analysis where the subspace dimension is carefully chosen, and then use LDA to obtain a linear classifier in the subspace. The criterion we use to choose the subspace dimension enables us to generate classseparable features via LDA. In addition, we employ a weighted distance metric guided by the LDA eigenvalues to improve the performance of the subspace LDA method. Finally, the improved performance of the subspace LDA approach is demonstrated through experiments using the FERET dataset for face recognition/verification, a large mugshot dataset for person verification, and the MPEG7 dataset. 1 Partially supported by the Office of Naval Research under Grant N000149510521. I. Introduction The problem of automatic face recognition...
On the Early History of the Singular Value Decomposition
, 1992
"... This paper surveys the contributions of five mathematicians  Eugenio Beltrami (18351899), Camille Jordan (18381921), James Joseph Sylvester (18141897), Erhard Schmidt (18761959), and Hermann Weyl (18851955)  who were responsible for establishing the existence of the singular value de ..."
Abstract

Cited by 82 (1 self)
 Add to MetaCart
This paper surveys the contributions of five mathematicians  Eugenio Beltrami (18351899), Camille Jordan (18381921), James Joseph Sylvester (18141897), Erhard Schmidt (18761959), and Hermann Weyl (18851955)  who were responsible for establishing the existence of the singular value decomposition and developing its theory.
A Chernoff Bound For Random Walks On Expander Graphs
 SIAM J. Comput
, 1998
"... . We consider a finite random walk on a weighted graph G; we show that the fraction of time spent in a set of vertices A converges to the stationary probability #(A) with error probability exp ..."
Abstract

Cited by 80 (0 self)
 Add to MetaCart
.<F3.827e+05> We consider a finite random walk on a weighted graph<F3.539e+05><F3.827e+05> G; we show that the fraction of time spent in a set of vertices<F3.539e+05> A<F3.827e+05> converges to the stationary probability<F3.539e+05><F3.827e+05><F3.539e+05><F3.827e+05> #(A) with error probability exponentially small in the length of the random walk and the square of the size of the deviation from<F3.539e+05><F3.827e+05><F3.539e+05><F3.827e+05> #(A). The exponential bound is in terms of the expansion of<F3.539e+05> G<F3.827e+05> and improves previous results of [D. Aldous,<F3.405e+05> Probab. Engrg. Inform.<F3.827e+05> Sci., 1 (1987), pp. 3346], [L. Lovasz and M. Simonovits,<F3.405e+05> Random Structures<F3.827e+05> Algorithms, 4 (1993), pp. 359412], [M. Ajtai, J. Komlos, and E. Szemeredi,<F3.405e+05> Deterministic simulation of<F3.827e+05> logspace, in Proc. 19th ACM Symp. on Theory of Computing, 1987]. We show that taking the sample average from one trajectory gives a more e#cien...
Spectral Analysis of Internet Topologies
, 2003
"... We perform spectral analysis of the Internet topology at the AS level, by adapting the standard spectral filtering method of examining the eigenvectors corresponding to the largest eigenvalues of matrices related to the adjacency matrix of the topology. We observe that the method suggests clusters o ..."
Abstract

Cited by 74 (6 self)
 Add to MetaCart
We perform spectral analysis of the Internet topology at the AS level, by adapting the standard spectral filtering method of examining the eigenvectors corresponding to the largest eigenvalues of matrices related to the adjacency matrix of the topology. We observe that the method suggests clusters of ASes with natural semantic proximity, such as geography or business interests. We examine how these clustering properties vary in the core and in the edge of the network, as well as across geographic areas, over time, and between real and synthetic data. We observe that these clustering properties may be suggestive of traffic patterns and thus have direct impact on the link stress of the network. Finally, we use the weights of the eigenvector corresponding to the first eigenvalue to obtain an alternative hierarchical ranking of the ASes.