Results 1  10
of
13
Detection of an Anomalous Cluster in a Network
, 2010
"... We consider the problem of detecting whether or not in a given sensor network, there is a cluster of sensors which exhibit an “unusual behavior.” Formally, suppose we are given a set of nodes and attach a random variable to each node. We observe a realization of this process and want to decide bet ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
We consider the problem of detecting whether or not in a given sensor network, there is a cluster of sensors which exhibit an “unusual behavior.” Formally, suppose we are given a set of nodes and attach a random variable to each node. We observe a realization of this process and want to decide between the following two hypotheses: under the null, the variables are i.i.d. standard normal; under the alternative, there is a cluster of variables that are i.i.d. normal with positive mean and unit variance, while the rest are i.i.d. standard normal. We also address surveillance settings where each sensor in the network collects information over time. The resulting model is similar, now with a time series attached to each node. We again observetheprocessovertime and want to decide between the null, where all the variables are i.i.d. standard normal; and the alternative, where there is an emerging cluster of i.i.d. normal variables with positive mean and unit variance. The growth models used to represent the emerging cluster are quite general, and in particular include cellular automata used in modelling epidemics. In both settings, we consider classes of clusters that are quite general, for which we obtain a lower bound on their respective minimax detection rate, and show that some form of scan statistic, by far the most popular method in practice, achieves that same rate within a logarithmic factor. Our results are not limited to the normal location model, but generalize to any oneparameter exponential family when the anomalous clusters are large enough.
Minimax Localization of Structural Information in Large Noisy Matrices
"... We consider the problem of identifying a sparse set of relevant columns and rows in a large data matrix with highly corrupted entries. This problem of identifying groups from a collection of bipartite variables such as proteins and drugs, biological species and gene sequences, malware and signatures ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
We consider the problem of identifying a sparse set of relevant columns and rows in a large data matrix with highly corrupted entries. This problem of identifying groups from a collection of bipartite variables such as proteins and drugs, biological species and gene sequences, malware and signatures, etc is commonly referred to as biclustering or coclustering. Despite its great practical relevance, and although several adhoc methods are available for biclustering, theoretical analysis of the problem is largely nonexistent. The problem we consider is also closely related to structured multiple hypothesis testing, an area of statistics that has recently witnessed a flurry of activity. We make the following contributions 1. We prove lower bounds on the minimum signal strength needed for successful recovery of a bicluster as a function of the noise variance, size of the matrix and bicluster of interest. 2. We show that a combinatorial procedure based on the scan statistic achieves this optimal limit. 3. We characterize the SNR required by several computationally tractable procedures for biclustering including elementwise thresholding, column/row average thresholding and a convex relaxation approach to sparse singular vector decomposition. 1
Detecting Faint Curved Edges in Noisy Images
"... Abstract. A fundamental question for edge detection is how faint an edge can be and still be detected. In this paper we offer a formalism to study this question and subsequently introduce a hierarchical edge detection algorithm designed to detect faint curved edges in noisy images. In our formalism ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract. A fundamental question for edge detection is how faint an edge can be and still be detected. In this paper we offer a formalism to study this question and subsequently introduce a hierarchical edge detection algorithm designed to detect faint curved edges in noisy images. In our formalism we view edge detection as a search in a space of feasible curves, and derive expressions to characterize the behavior of the optimal detection threshold as a function of curve length and the combinatorics of the search space. We then present an algorithm that efficiently searches for edges through a very large set of curves by hierarchically constructing difference filters that match the curves traced by the sought edges. We demonstrate the utility of our algorithm in simulations and in applications to challenging real images. 1
DETECTION OF CORRELATIONS
 SUBMITTED TO THE ANNALS OF STATISTICS
"... We consider the hypothesis testing problem of deciding whether an observed highdimensional vector has independent normal components or, alternatively, if it has a small subset of correlated components. The correlated components may have a certain combinatorial structure known to the statistician. W ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We consider the hypothesis testing problem of deciding whether an observed highdimensional vector has independent normal components or, alternatively, if it has a small subset of correlated components. The correlated components may have a certain combinatorial structure known to the statistician. We establish upper and lower bounds for the worstcase (minimax) risk in terms of the size of the correlated subset, the level of correlation, and the structure of the class of possibly correlated sets. We show that some simple tests have nearoptimal performance in many cases, while the generalized likelihood ratio test is suboptimal in some important cases.
Clustering Based on Pairwise Distances When the Data is of Mixed Dimensions
, 909
"... Abstract. In the context of clustering, we consider a generative model in a Euclidean ambient space with clusters of different shapes, dimensions, sizes and densities. In an asymptotic setting where the number of points becomes large, we obtain theoretical guaranties for a few emblematic methods bas ..."
Abstract
 Add to MetaCart
Abstract. In the context of clustering, we consider a generative model in a Euclidean ambient space with clusters of different shapes, dimensions, sizes and densities. In an asymptotic setting where the number of points becomes large, we obtain theoretical guaranties for a few emblematic methods based on pairwise distances: a simple algorithm based on the extraction of connected components in a neighborhood graph; the spectral clustering method of Ng, Jordan and Weiss; and hierarchical clustering with single linkage. The methods are shown to enjoy some nearoptimal properties in terms of separation between clusters and robustness to outliers. The local scaling method of ZelnikManor and Perona is shown to lead to a nearoptimal choice for the scale in the first two methods. We also provide a lower bound on the spectral gap to consistently choose the correct number of clusters in the spectral method.
DRAFT
, 2013
"... This thesis addresses statistical estimation and testing of signals over a graph when measurements are noisy and highdimensional. Graph structured patterns appear in applications as diverse as sensor networks, virology in human networks, congestion in internet routers, and advertising in social net ..."
Abstract
 Add to MetaCart
This thesis addresses statistical estimation and testing of signals over a graph when measurements are noisy and highdimensional. Graph structured patterns appear in applications as diverse as sensor networks, virology in human networks, congestion in internet routers, and advertising in social networks. We will develop asymptotic guarantees of the performance of statistical estimators and tests, by stating conditions for consistency by properties of the graph (e.g. graph spectra). The goal of this thesis is to demonstrate theoretically that by exploiting the graph structure one can achieve statistical consistency in extremely noisy conditions. We begin with the study of a projection estimator called laplacian eigenmaps, and find that eigenvalue concentration plays a central role in the ability to estimate graph structured patterns. We continue with the study of the edge lasso, a least squares procedure with total variation penalty, and determine combinatorial conditions under which changepoints (edges across which the underlying signal changes) on the graph are recovered. We will shift focus to testing for anomalous activations in the graph, using the scan statistic relaxations, the spectral scan statistic and the graph ellipsoid scan statistic. We will also show how one can form a decomposition of the graph from a spanning tree which will lead to a test for activity in the graph. This will lead to the construction of a spanning tree wavelet basis, which can be used to localize activations on the graph. April 25, 2013
Probability Theory and Related Fields manuscript No. (will be inserted by the editor)
, 2009
"... Randomized polynuclear growth with a columnar defect Last modified: 03.02.09 Abstract We study a variant of polynuclear growth where the level boundaries perform continuoustime, discretespace random walks, and study how its asymptotic behavior is affected by the presence of a columnar defect on t ..."
Abstract
 Add to MetaCart
Randomized polynuclear growth with a columnar defect Last modified: 03.02.09 Abstract We study a variant of polynuclear growth where the level boundaries perform continuoustime, discretespace random walks, and study how its asymptotic behavior is affected by the presence of a columnar defect on the line. We prove that there is a nontrivial phase transition in the strength of the perturbation, above which the law of large numbers for the height function is modified. Keywords Polynuclear growth · interacting random walks · zerotemperature Glauber dynamics · polymer pinning Mathematics Subject Classification (2000) 60K35 · 60K37 1 Preliminary considerations and statement of the results Rigorous study of growth processes, and constant attempts to give mathematical rigor to KardarParisiZhang theory and its predictions, led to a very rich flow of results. Remarkable, but yet very limited progress was achieved in the last few years for 1+1 dimensional models by using a broad spectrum of techniques and arguments from the theory of random matrices, first passage percolation and interacting particle systems. An important role in the successful application and interpretation of obtained results was played by the fact that some properties of
Optimal Detection For Sparse Mixtures ∗
, 2012
"... Detection of sparse signals arises in a wide range of modern scientific studies. The focus so far has been mainly on Gaussian mixture models. In this paper, we consider the detection problem under a general sparse mixture model and obtain an explicit expression for the detection boundary. It is show ..."
Abstract
 Add to MetaCart
Detection of sparse signals arises in a wide range of modern scientific studies. The focus so far has been mainly on Gaussian mixture models. In this paper, we consider the detection problem under a general sparse mixture model and obtain an explicit expression for the detection boundary. It is shown that the fundamental limits of detection is governed by the behavior of the loglikelihood ratio evaluated at an appropriate quantile of the null distribution. We also establish the adaptive optimality of the higher criticism procedure across all sparse mixtures satisfying certain mild regularity conditions. In particular, the general results obtained in this paper recover and extend in a unified manner the previously known results on sparse detection far beyond the conventional Gaussian model and other exponential families.
Finding and Leveraging Structure in Learning Problems
, 2012
"... for the degree of Doctor of Philosophy. ..."
DRAFT Contents
, 2013
"... 1.1 Sparse highdimensional learning.......................... 5 1.1.1 Learning generative models for protein fold families........... 5 1.1.2 Sparse additive functional and kernel CCA................ 5 ..."
Abstract
 Add to MetaCart
1.1 Sparse highdimensional learning.......................... 5 1.1.1 Learning generative models for protein fold families........... 5 1.1.2 Sparse additive functional and kernel CCA................ 5