Results 1 -
5 of
5
Detection of an Anomalous Cluster in a Network
, 2010
"... We consider the problem of detecting whether or not in a given sensor network, there is a cluster of sensors which exhibit an “unusual behavior.” Formally, suppose we are given a set of nodes and attach a random variable to each node. We observe a realization of this process and want to decide bet ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
We consider the problem of detecting whether or not in a given sensor network, there is a cluster of sensors which exhibit an “unusual behavior.” Formally, suppose we are given a set of nodes and attach a random variable to each node. We observe a realization of this process and want to decide between the following two hypotheses: under the null, the variables are i.i.d. standard normal; under the alternative, there is a cluster of variables that are i.i.d. normal with positive mean and unit variance, while the rest are i.i.d. standard normal. We also address surveillance settings where each sensor in the network collects information over time. The resulting model is similar, now with a time series attached to each node. We again observetheprocessovertime and want to decide between the null, where all the variables are i.i.d. standard normal; and the alternative, where there is an emerging cluster of i.i.d. normal variables with positive mean and unit variance. The growth models used to represent the emerging cluster are quite general, and in particular include cellular automata used in modelling epidemics. In both settings, we consider classes of clusters that are quite general, for which we obtain a lower bound on their respective minimax detection rate, and show that some form of scan statistic, by far the most popular method in practice, achieves that same rate within a logarithmic factor. Our results are not limited to the normal location model, but generalize to any one-parameter exponential family when the anomalous clusters are large enough.
Submitted to the Annals of Statistics DETECTION OF CORRELATIONS
"... We consider the hypothesis testing problem of deciding whether an observed high-dimensional vector has independent normal components or, alternatively, if it has a small subset of correlated components. The correlated components may have a certain combinatorial structure known to the statistician. W ..."
Abstract
- Add to MetaCart
We consider the hypothesis testing problem of deciding whether an observed high-dimensional vector has independent normal components or, alternatively, if it has a small subset of correlated components. The correlated components may have a certain combinatorial structure known to the statistician. We establish upper and lower bounds for the worst-case (minimax) risk in terms of the size of the correlated subset, the level of correlation, and the structure of the class of possibly correlated sets. We show that some simple tests have near-optimal performance in many cases, while the generalized likelihood ratio test is suboptimal in some important cases. 1. Introduction. In
Minimax Localization of Structural Information in Large Noisy Matrices
"... We consider the problem of identifying a sparse set of relevant columns and rows in a large data matrix with highly corrupted entries. This problem of identifying groups from a collection of bipartite variables such as proteins and drugs, biological species and gene sequences, malware and signatures ..."
Abstract
- Add to MetaCart
We consider the problem of identifying a sparse set of relevant columns and rows in a large data matrix with highly corrupted entries. This problem of identifying groups from a collection of bipartite variables such as proteins and drugs, biological species and gene sequences, malware and signatures, etc is commonly referred to as biclustering or co-clustering. Despite its great practical relevance, and although several ad-hoc methods are available for biclustering, theoretical analysis of the problem is largely non-existent. The problem we consider is also closely related to structured multiple hypothesis testing, an area of statistics that has recently witnessed a flurry of activity. We make the following contributions 1. We prove lower bounds on the minimum signal strength needed for successful recovery of a bicluster as a function of the noise variance, size of the matrix and bicluster of interest. 2. We show that a combinatorial procedure based on the scan statistic achieves this optimal limit. 3. We characterize the SNR required by several computationally tractable procedures for biclustering including element-wise thresholding, column/row average thresholding and a convex relaxation approach to sparse singular vector decomposition. 1
Probability Theory and Related Fields manuscript No. (will be inserted by the editor)
, 2009
"... Randomized polynuclear growth with a columnar defect Last modified: 03.02.09 Abstract We study a variant of poly-nuclear growth where the level boundaries perform continuous-time, discrete-space random walks, and study how its asymptotic behavior is affected by the presence of a columnar defect on t ..."
Abstract
- Add to MetaCart
Randomized polynuclear growth with a columnar defect Last modified: 03.02.09 Abstract We study a variant of poly-nuclear growth where the level boundaries perform continuous-time, discrete-space random walks, and study how its asymptotic behavior is affected by the presence of a columnar defect on the line. We prove that there is a non-trivial phase transition in the strength of the perturbation, above which the law of large numbers for the height function is modified. Keywords Poly-nuclear growth · interacting random walks · zero-temperature Glauber dynamics · polymer pinning Mathematics Subject Classification (2000) 60K35 · 60K37 1 Preliminary considerations and statement of the results Rigorous study of growth processes, and constant attempts to give mathematical rigor to Kardar-Parisi-Zhang theory and its predictions, led to a very rich flow of results. Remarkable, but yet very limited progress was achieved in the last few years for 1+1 dimensional models by using a broad spectrum of techniques and arguments from the theory of random matrices, first passage percolation and interacting particle systems. An important role in the successful application and interpretation of obtained results was played by the fact that some properties of
Clustering Based on Pairwise Distances When the Data is of Mixed Dimensions
, 909
"... Abstract. In the context of clustering, we consider a generative model in a Euclidean ambient space with clusters of different shapes, dimensions, sizes and densities. In an asymptotic setting where the number of points becomes large, we obtain theoretical guaranties for a few emblematic methods bas ..."
Abstract
- Add to MetaCart
Abstract. In the context of clustering, we consider a generative model in a Euclidean ambient space with clusters of different shapes, dimensions, sizes and densities. In an asymptotic setting where the number of points becomes large, we obtain theoretical guaranties for a few emblematic methods based on pairwise distances: a simple algorithm based on the extraction of connected components in a neighborhood graph; the spectral clustering method of Ng, Jordan and Weiss; and hierarchical clustering with single linkage. The methods are shown to enjoy some near-optimal properties in terms of separation between clusters and robustness to outliers. The local scaling method of Zelnik-Manor and Perona is shown to lead to a near-optimal choice for the scale in the first two methods. We also provide a lower bound on the spectral gap to consistently choose the correct number of clusters in the spectral method.

