Results 1  10
of
43
Asymptotic normality of the maximumlikelihood estimator for general hidden Markov models
 Ann. Statist
, 1998
"... ar ..."
SPECTRAL CLUSTERING AND THE HIGHDIMENSIONAL STOCHASTIC BLOCKMODEL
 SUBMITTED TO THE ANNALS OF STATISTICS
"... Networks or graphs can easily represent a diverse set of data sources that are characterized by interacting units or actors. Social networks, representing people who communicate with each other, are one example. Communities or clusters of highly connected actors form an essential feature in the stru ..."
Abstract

Cited by 98 (7 self)
 Add to MetaCart
Networks or graphs can easily represent a diverse set of data sources that are characterized by interacting units or actors. Social networks, representing people who communicate with each other, are one example. Communities or clusters of highly connected actors form an essential feature in the structure of several empirical networks. Spectral clustering is a popular and computationally feasible method to discover these communities. The Stochastic Blockmodel (Holland, Laskey and Leinhardt, 1983) is a social network model with well defined communities; each node is a member of one community. For a network generated from the Stochastic Blockmodel, we bound the number of nodes “misclustered” by spectral clustering. The asymptotic results in this paper are the first clustering results that allow the number of clusters in the model to grow with the number of nodes, hence the name highdimensional. In order to study spectral clustering under the Stochastic Blockmodel, we first show that under the more general latent space model, the eigenvectors of the normalized graph Laplacian asymptotically converge to the eigenvectors of a “population” normalized graph Laplacian. Aside from the implication for spectral clustering, this provides insight into a graph visualization technique. Our method of studying the eigenvectors of random matrices is original.
Spectral Clustering of Graphs with General Degrees in the Extended Planted Partition Model
"... In this paper, we examine a spectral clustering algorithm for similarity graphs drawn from a simple random graph model, where nodes are allowed to have varying degrees, and we provide theoretical bounds on its performance. The random graph model we study is the Extended Planted Partition (EPP) model ..."
Abstract

Cited by 41 (0 self)
 Add to MetaCart
In this paper, we examine a spectral clustering algorithm for similarity graphs drawn from a simple random graph model, where nodes are allowed to have varying degrees, and we provide theoretical bounds on its performance. The random graph model we study is the Extended Planted Partition (EPP) model, a variant of the classical planted partition model. The standard approach to spectral clustering of graphs is to compute the bottom k singular vectors or eigenvectors of a suitable graph Laplacian, project the nodes of the graph onto these vectors, and then use an iterative clustering algorithm on the projected nodes. However a challenge with applying this approach to graphs generated from the EPP model is that unnormalized Laplacians do not work, and normalized Laplacians do not concentrate well when the graph has a number of low degree nodes. We resolve this issue by introducing the notion of a degreecorrected graph Laplacian. For graphs with many low degree nodes, degree correction has a regularizing effect on the Laplacian. Our spectral clustering algorithm projects the nodes in the graph onto the bottom k right singular vectors of the degreecorrected randomwalk Laplacian, and clusters the nodes in this subspace. We show guarantees on the performance of this algorithm, demonstrating that it outputs the correct partition under a wide range of parameter values. Unlike some previous work, our algorithm does not require access to any generative parameters of the model.
Matrix estimation by universal singular value thresholding
, 2012
"... Abstract. Consider the problem of estimating the entries of a large matrix, when the observed entries are noisy versions of a small random fraction of the original entries. This problem has received widespread attention in recent times, especially after the pioneering works of Emmanuel Candès and ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Consider the problem of estimating the entries of a large matrix, when the observed entries are noisy versions of a small random fraction of the original entries. This problem has received widespread attention in recent times, especially after the pioneering works of Emmanuel Candès and collaborators. This paper introduces a simple estimation procedure, called Universal Singular Value Thresholding (USVT), that works for any matrix that has ‘a little bit of structure’. Surprisingly, this simple estimator achieves the minimax error rate up to a constant factor. The method is applied to solve problems related to low rank matrix estimation, blockmodels, distance matrix completion, latent space models, positive definite matrix completion, graphon estimation, and generalized Bradley–Terry models for pairwise comparison. 1.
Consistent adjacencyspectral partitioning for the stochastic block model when the model parameters are unknown
, 2014
"... ..."
Review of statistical network analysis: models, algorithms, and software
 STATISTICAL ANALYSIS AND DATA MINING
, 2012
"... ..."
(Show Context)
Universally consistent latent position estimation and vertex classification for random dot product graphs
 IEEE Transactions on Pattern Analysis and Machine Intelligence (Accepted
, 2013
"... In this work we show that, using the eigendecomposition of the adjacency matrix, we can consistently estimate latent positions for random dot product graphs provided the latent positions are i.i.d. from some distribution. If class labels are observed for a number of vertices tending to infinity, th ..."
Abstract

Cited by 12 (9 self)
 Add to MetaCart
In this work we show that, using the eigendecomposition of the adjacency matrix, we can consistently estimate latent positions for random dot product graphs provided the latent positions are i.i.d. from some distribution. If class labels are observed for a number of vertices tending to infinity, then we show that the remaining vertices can be classified with error converging to Bayes optimal using the knearestneighbors classification rule. We evaluate the proposed methods on simulated data and a graph derived from Wikipedia.
Coclustering for directed graphs; the stochastic coblockmodel and a spectral algorithm
, 2012
"... Communities of highly connected actors form an essential feature in the structure of several empirical directed and undirected networks. However, compared to the amount of research on clustering for undirected graphs, there is relatively little understanding of clustering in directed networks. Th ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
Communities of highly connected actors form an essential feature in the structure of several empirical directed and undirected networks. However, compared to the amount of research on clustering for undirected graphs, there is relatively little understanding of clustering in directed networks. This paper extends the spectral clustering algorithm to directed networks in a way that coclusters or biclusters the rows and columns of a graph Laplacian. Coclustering leverages the increased complexity of asymmetric relationships to gain new insight into the structure of the directed network. To understand this algorithm and to study its asymptotic properties in a canonical setting, we propose the Stochastic CoBlockmodel to encode coclustering structure. This is the first statistical model of coclustering and it is derived using the concept of stochastic equivalence that motivated the original Stochastic Blockmodel. Although directed spectral clustering is not derived from the Stochastic CoBlockmodel, we show that, asymptotically, the algorithm can estimate the blocks in a high dimensional asymptotic setting in which the number of blocks grows with the number of nodes. The algorithm, model, and asymptotic results can all be extended to bipartite graphs.
Supplement to “Consistency of community detection in networks under degreecorrected stochastic block models.” DOI:10.1214/12AOS1036SUPP
 Department of Statistics George Mason University 4400 University Drive, MS 4A7
, 2012
"... ar ..."
Classification and estimation in the stochastic blockmodel based on the empirical degrees
 Electronic Journal of Statistics
, 2012
"... The Stochastic Block Model (Holland et al., 1983) is a mixture model for heterogeneous network data. Unlike the usual statistical framework, new nodes give additional information about the previous ones in this model. Thereby the distribution of the degrees concentrates in points conditionally on th ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
The Stochastic Block Model (Holland et al., 1983) is a mixture model for heterogeneous network data. Unlike the usual statistical framework, new nodes give additional information about the previous ones in this model. Thereby the distribution of the degrees concentrates in points conditionally on the node class. We show under a mild assumption that classification, estimation and model selection can actually be achieved with no more than the empirical degree data. We provide an algorithm able to process very large networks and consistent estimators based on it. In particular, we prove a bound of the probability of misclassification of at least one node, including when the number of classes grows. 1