Results 1  10
of
29
Asymptotic normality of the maximumlikelihood estimator for general hidden Markov models
 Ann. Statist
, 1998
"... ar ..."
PSEUDOLIKELIHOOD METHODS FOR COMMUNITY DETECTION IN LARGE SPARSE NETWORKS
 SUBMITTED TO THE ANNALS OF STATISTICS
, 2013
"... Many algorithms have been proposed for fitting network models with communities but most of them do not scale well to large networks, and often fail on sparse networks. Here we propose a new fast pseudolikelihood method for fitting the stochastic block model for networks, as well as a variant that a ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
(Show Context)
Many algorithms have been proposed for fitting network models with communities but most of them do not scale well to large networks, and often fail on sparse networks. Here we propose a new fast pseudolikelihood method for fitting the stochastic block model for networks, as well as a variant that allows for an arbitrary degree distribution by conditioning on degrees. We show that the algorithms perform well under a range of settings, including on very sparse networks, and illustrate on the example of a network of political blogs. We also propose spectral clustering with perturbations, a method of independent interest, which works well on sparse networks where regular spectral clustering fails, and use it to provide an initial value for pseudolikelihood. We prove that pseudolikelihood provides consistent estimates of the communities under a mild condition on the starting value, for the case of a block model with two balanced communities.
Universally consistent vertex classification for latent positions graphs
 THE ANNALS OF STATISTICS
, 2013
"... ..."
(Show Context)
Convex recovery from interferometric measurements. arXiv preprint arXiv:1307.6864
, 2013
"... This note formulates a deterministic recovery result for vectors x from quadratic measurements of the form (Ax)i(Ax)j for some leftinvertible A. Recovery is exact, or stable in the noisy case, when the couples (i, j) are chosen as edges of a wellconnected graph. One possible way of obtaining the ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
This note formulates a deterministic recovery result for vectors x from quadratic measurements of the form (Ax)i(Ax)j for some leftinvertible A. Recovery is exact, or stable in the noisy case, when the couples (i, j) are chosen as edges of a wellconnected graph. One possible way of obtaining the solution is as a feasible point of a simple semidefinite program. Furthermore, we show how the proportionality constant in the error estimate depends on the spectral gap of a dataweighted graph Laplacian. Such quadratic measurements have found applications in phase retrieval, angular synchronization, and more recently interferometric waveform inversion. Acknowledgments. The authors would like to thank Amit Singer for interesting discussions. 1
Network histograms and universality of blockmodel approximation. arXiv preprint arXiv:1312.5306
, 2013
"... In this article we introduce the network histogram: a statistical summary of network interactions, to be used as a tool for exploratory data analysis. A network histogram is obtained by fitting a stochastic blockmodel to a single observation of a network dataset. Blocks of edges play the role of h ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
In this article we introduce the network histogram: a statistical summary of network interactions, to be used as a tool for exploratory data analysis. A network histogram is obtained by fitting a stochastic blockmodel to a single observation of a network dataset. Blocks of edges play the role of histogram bins, and community sizes that of histogram bandwidths or bin sizes. Just as standard histograms allow for varying bandwidths, different blockmodel estimates can all be considered valid representations of an underlying probability model, subject to bandwidth constraints. Here we provide methods for automatic bandwidth selection, by which the network histogram approximates the generating mechanism that gives rise to exchangeable random graphs. This makes the blockmodel a universal network representation for unlabeled graphs. With this insight, we discuss the interpretation of network communities in light of the fact that many different community assignments can all give an equally valid representation of such a network. To demonstrate the fidelityversusinterpretability tradeoff inherent in considering different numbers and sizes of communities, we analyze two publicly available networks— political weblogs and student friendships—and discuss how to interpret the network histogram when additional information related to node and edge labeling is present. Key words: Community detection, exchangeable random graphs, graphons, nonparametric statistics, statistical network analysis, stochastic blockmodels The purpose of this article is to introduce the network histogram—a nonparametric statistical summary obtained by fitting a stochastic blockmodel to a single observation of a network dataset. A key point of our construction is that it is not necessary to assume the data to have been generated by a blockmodel. This is crucial, since networks provide a general means of describing relationships between objects. Given n objects under study, a total of
1bit matrix completion
 CoRR
"... In this paper we develop a theory of matrix completion for the extreme case of noisy 1bit observations. Instead of observing a subset of the realvalued entries of a matrix M, we obtain a small number of binary (1bit) measurements generated according to a probability distribution determined by th ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
In this paper we develop a theory of matrix completion for the extreme case of noisy 1bit observations. Instead of observing a subset of the realvalued entries of a matrix M, we obtain a small number of binary (1bit) measurements generated according to a probability distribution determined by the realvalued entries of M. The central question we ask is whether or not it is possible to obtain an accurate estimate of M from this data. In general this would seem impossible, but we show that the maximum likelihood estimate under a suitable constraint returns an accurate estimate of M when ‖M‖ ∞ ≤ α and rank(M) ≤ r. If the loglikelihood is a concave function (e.g., the logistic or probit observation models), then we can obtain this maximum likelihood estimate by optimizing a convex program. In addition, we also show that if instead of recovering M we simply wish to obtain an estimate of the distribution generating the 1bit measurements, then we can eliminate the requirement that ‖M‖ ∞ ≤ α. For both cases, we provide lower bounds showing that these estimates are nearoptimal. We conclude with a suite of experiments that both verify the implications of our theorems as well as illustrate some of the practical applications of 1bit matrix completion. In particular, we compare our program to standard matrix completion methods on movie rating data in which users submit ratings from 1 to 5. In order to use our program, we quantize this data to a single bit, but we allow the standard matrix completion program to have access to the original ratings (from 1 to 5). Surprisingly, the approach based on binary data performs significantly better. 1
HIGHDIMENSIONAL ESTIMATION WITH GEOMETRIC CONSTRAINTS
"... Abstract. Consider measuring a vector x ∈ Rn through the inner product with several measurement vectors, a1, a2,..., am. It is common in both signal processing and statistics to assume the linear response model yi = 〈ai, x〉+ εi, where εi is a noise term. However, in practice the precise relationshi ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Consider measuring a vector x ∈ Rn through the inner product with several measurement vectors, a1, a2,..., am. It is common in both signal processing and statistics to assume the linear response model yi = 〈ai, x〉+ εi, where εi is a noise term. However, in practice the precise relationship between the signal x and the observations yi may not follow the linear model, and in some cases it may not even be known. To address this challenge, in this paper we propose a general model where it is only assumed that each observation yi may depend on ai only through 〈ai, x〉. We do not assume that the dependence is known. This is a form of the semiparametricsingle index model, and it includes the linear model as well as many forms of the generalized linear model as special cases. We further assume that the signal x has some structure, and we formulate this as a general assumption that x belongs to some known (but arbitrary) feasible set K ⊆ Rn. We carefully detail the benefit of using the signal structure to improve estimation. The theory is based on the mean width of K, a geometric parameter which can be used to understand its effective dimension in estimation problems. We determine a simple, efficient twostep procedure for estimating the signal based on this model – a linear estimation followed by metric projection onto K. We give general conditions under which the estimator is minimax optimal up to a constant. This leads to the intriguing conclusion that in the high noise regime, an unknown nonlinearity in the observations does not significantly reduce one’s ability to determine the signal, even when the nonlinearity may be noninvertible. Our results may be specialized to understand the effect of nonlinearities in compressed sensing. 1.