Results 1  10
of
248
Consistency of spectral clustering
, 2004
"... Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spe ..."
Abstract

Cited by 289 (15 self)
 Add to MetaCart
Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spectral clustering algorithms, which cluster the data with the help of eigenvectors of graph Laplacian matrices. We show that one of the two of major classes of spectral clustering (normalized clustering) converges under some very general conditions, while the other (unnormalized), is only consistent under strong additional assumptions, which, as we demonstrate, are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering in practical applications. We believe that methods used in our analysis will provide a basis for future exploration of Laplacianbased methods in a statistical setting.
Markov Chain Monte Carlo Estimation of Exponential Random Graph Models
 Journal of Social Structure
, 2002
"... This paper is about estimating the parameters of the exponential random graph model, also known as the p # model, using frequentist Markov chain Monte Carlo (MCMC) methods. The exponential random graph model is simulated using Gibbs or MetropolisHastings sampling. The estimation procedures consider ..."
Abstract

Cited by 104 (15 self)
 Add to MetaCart
This paper is about estimating the parameters of the exponential random graph model, also known as the p # model, using frequentist Markov chain Monte Carlo (MCMC) methods. The exponential random graph model is simulated using Gibbs or MetropolisHastings sampling. The estimation procedures considered are based on the RobbinsMonro algorithm for approximating a solution to the likelihood equation.
Finite Markov Chains and Algorithmic Applications
 IN LONDON MATHEMATICAL SOCIETY STUDENT TEXTS
, 2001
"... ..."
The Principal Components Analysis of a Graph, and its Relationships to Spectral Clustering
 Proceedings of the 15th European Conference on Machine Learning (ECML 2004). Lecture Notes in Artificial Intelligence
, 2004
"... This work presents a novel procedure for computing (1) distances between nodes of a weighted, undirected, graph, called the Euclidean Commute Time Distance (ECTD), and (2) a subspace projection of the nodes of the graph that preserves as much variance as possible, in terms of the ECTD  a princi ..."
Abstract

Cited by 68 (16 self)
 Add to MetaCart
This work presents a novel procedure for computing (1) distances between nodes of a weighted, undirected, graph, called the Euclidean Commute Time Distance (ECTD), and (2) a subspace projection of the nodes of the graph that preserves as much variance as possible, in terms of the ECTD  a principal components analysis of the graph. It is based on a Markovchain model of random walk through the graph. The model assigns transition probabilities to the links between nodes, so that a random walker can jump from node to node. A quantity, called the average commute time, computes the average time taken by a random walker for reaching node j when starting from node i, and coming back to node i. The square root of this quantity, the ECTD, is a distance measure between any two nodes, and has the nice property of decreasing when the number of paths connecting two nodes increases and when the "length" of any path decreases. The ECTD can be computed from the pseudoinverse of the Laplacian matrix of the graph, which is a kernel. We finally define the Principal Components Analysis (PCA) of a graph as the subspace projection that preserves as much variance as possible, in terms of the ECTD. This graph PCA has some interesting links with spectral graph theory, in particular spectral clustering.
What is this Page Known for? Computing Web Page Reputations
 In Proceedings of the Ninth International World Wide Web Conference
, 2000
"... The textual content of the Web enriched with the hyperlink structure surrounding it can be a useful source of information for querying and searching. This paper presents a search process where the input is the URL of a page, and the output is a ranked set of topics on which the page has a reputation ..."
Abstract

Cited by 61 (3 self)
 Add to MetaCart
The textual content of the Web enriched with the hyperlink structure surrounding it can be a useful source of information for querying and searching. This paper presents a search process where the input is the URL of a page, and the output is a ranked set of topics on which the page has a reputation. For example, if the input is www.gamelan.com, then a possible output is "Java." We propose several algorithmic formulations of the notion of reputation using simple random walk models of Web browsing behaviour. We give preliminary test results on the effectiveness of these algorithms. Keywords: Reputation Ranking, Searching, Random Walks, PageRank, Hubs and Authorities. 1 Introduction The idea of exploiting the "reputation" of a Web page when searching has attracted research attention recently and even been incorporated into some search engines [15, 5, 11, 2, 3]. The idea is that pages with good reputations should be given preferential treatment when reporting the results of a se...
Quantized consensus
, 2007
"... We study the distributed averaging problem on arbitrary connected graphs, with the additional constraint that the value at each node is an integer. This discretized distributed averaging problem models several problems of interest, such as averaging in a network with finite capacity channels and loa ..."
Abstract

Cited by 57 (0 self)
 Add to MetaCart
We study the distributed averaging problem on arbitrary connected graphs, with the additional constraint that the value at each node is an integer. This discretized distributed averaging problem models several problems of interest, such as averaging in a network with finite capacity channels and load balancing in a processor network. We describe simple randomized distributed algorithms which achieve consensus to the extent that the discrete nature of the problem permits. We give bounds on the convergence time of these algorithms for fully connected networks and linear networks.
Extracting macroscopic dynamics: model problems and algorithms
 NONLINEARITY
, 2004
"... In many applications, the primary objective of numerical simulation of timeevolving systems is the prediction of macroscopic, or coarsegrained, quantities. A representative example is the prediction of biomolecular conformations from molecular dynamics. In recent years a number of new algorithmic ..."
Abstract

Cited by 51 (8 self)
 Add to MetaCart
In many applications, the primary objective of numerical simulation of timeevolving systems is the prediction of macroscopic, or coarsegrained, quantities. A representative example is the prediction of biomolecular conformations from molecular dynamics. In recent years a number of new algorithmic approaches have been introduced to extract effective, lowerdimensional, models for the macroscopic dynamics; the starting point is the full, detailed, evolution equations. In many cases the effective lowdimensional dynamics may be stochastic, even when the original starting point is deterministic. This review surveys a number of these new approaches to the problem of extracting effective dynamics, highlighting similarities and differences between them. The importance of model problems for the evaluation of these new approaches is stressed, and a number of model problems are described. When the macroscopic dynamics is stochastic, these model problems are either obtained through a clear separation of timescales, leading to a stochastic effect of the fast dynamics on the slow dynamics, or by considering high dimensional ordinary differential equations which, when projected onto a low dimensional subspace, exhibit stochastic behaviour through the presence of a broad frequency spectrum. Models whose stochastic microscopic behaviour leads to deterministic macroscopic dynamics are also introduced. The algorithms we overview include SVDbased methods for nonlinear problems, model reduction for linear control systems, optimal prediction techniques, asymptoticsbased mode elimination, coarse timestepping methods and transferoperator based methodologies.
Models for Longitudinal Network Data
 Models and Methods in Social Network Analysis
, 2005
"... This chapter treats statistical methods for network evolution. It is argued that it is most fruitful to consider models where network evolution is represented as the result of many (usually nonobserved) small changes occurring between the consecutively observed networks. Accordingly, the focus is o ..."
Abstract

Cited by 34 (6 self)
 Add to MetaCart
This chapter treats statistical methods for network evolution. It is argued that it is most fruitful to consider models where network evolution is represented as the result of many (usually nonobserved) small changes occurring between the consecutively observed networks. Accordingly, the focus is on models where a continuoustime network evolution is assumed although the observations are made at discrete time points (two or more). Three models are considered in detail, all based on the assumption that the observed networks are outcomes of a Markov process evolving in continuous time. The independent arcs model is a trivial baseline model. The reciprocity model expresses effects of reciprocity, but lacks other structural effects. The actororiented model is based on a model of actors changing their outgoing ties as a consequence of myopic stochastic optimization of an objective function. This framework offers the flexibility to represent a variety of network effects. An estimation algorithm is treated, based on a Markov chain Monte Carlo implementation of the method of moments.