Results 1  10
of
42
Innovated higher criticism for detecting sparse signals in correlated noise
 Ann. Statist
, 2010
"... Higher Criticism is a method for detecting signals that are both sparse and weak. Although first proposed in cases where the noise variables are independent, Higher Criticism also has reasonable performance in settings where those variables are correlated. In this paper we show that, by exploiting t ..."
Abstract

Cited by 42 (9 self)
 Add to MetaCart
(Show Context)
Higher Criticism is a method for detecting signals that are both sparse and weak. Although first proposed in cases where the noise variables are independent, Higher Criticism also has reasonable performance in settings where those variables are correlated. In this paper we show that, by exploiting the nature of the correlation, performance can be improved by using a modified approach which exploits the potential advantages that correlation has to offer. Indeed, it turns out that the case of independent noise is the most difficult of all, from a statistical viewpoint, and that more accurate signal detection (for a given level of signal sparsity and strength) can be obtained when correlation is present. We characterize the advantages of correlation by showing how to incorporate them into the definition of an optimal detection boundary. The boundary has particularly attractive properties when correlation decays at a polynomial rate or the correlation matrix is Toeplitz.
Foundations of a Multiway Spectral Clustering Framework for Hybrid Linear Modeling
, 2009
"... Abstract The problem of Hybrid Linear Modeling (HLM) is to model and segment data using a mixture of affine subspaces. Different strategies have been proposed to solve this problem, however, rigorous analysis justifying their performance is missing. This paper suggests the Theoretical Spectral Curva ..."
Abstract

Cited by 37 (10 self)
 Add to MetaCart
(Show Context)
Abstract The problem of Hybrid Linear Modeling (HLM) is to model and segment data using a mixture of affine subspaces. Different strategies have been proposed to solve this problem, however, rigorous analysis justifying their performance is missing. This paper suggests the Theoretical Spectral Curvature Clustering (TSCC) algorithm for solving the HLM problem and provides careful analysis to justify it. The TSCC algorithm is practically a combination of Govindu’s multiway spectral clustering framework (CVPR 2005) and Ng et al.’s spectral clustering algorithm (NIPS 2001). The main result of this paper states that if the given data is sampled from a mixture of distributions concentrated around affine subspaces, then with high sampling probability the TSCC algorithm segments well the different underlying clusters. The goodness of clustering depends on the withincluster errors, the betweenclusters interaction, and a tuning parameter applied by TSCC. The proof also provides new insights for the analysis of Ng et al. (NIPS 2001). Keywords Hybrid linear modeling · dflats clustering · Multiway clustering · Spectral clustering · Polar curvature · Perturbation analysis · Concentration inequalities Communicated by Albert Cohen. This work was supported by NSF grant #0612608.
Detection of an Anomalous Cluster in a Network
, 2010
"... We consider the problem of detecting whether or not in a given sensor network, there is a cluster of sensors which exhibit an “unusual behavior.” Formally, suppose we are given a set of nodes and attach a random variable to each node. We observe a realization of this process and want to decide bet ..."
Abstract

Cited by 32 (4 self)
 Add to MetaCart
We consider the problem of detecting whether or not in a given sensor network, there is a cluster of sensors which exhibit an “unusual behavior.” Formally, suppose we are given a set of nodes and attach a random variable to each node. We observe a realization of this process and want to decide between the following two hypotheses: under the null, the variables are i.i.d. standard normal; under the alternative, there is a cluster of variables that are i.i.d. normal with positive mean and unit variance, while the rest are i.i.d. standard normal. We also address surveillance settings where each sensor in the network collects information over time. The resulting model is similar, now with a time series attached to each node. We again observetheprocessovertime and want to decide between the null, where all the variables are i.i.d. standard normal; and the alternative, where there is an emerging cluster of i.i.d. normal variables with positive mean and unit variance. The growth models used to represent the emerging cluster are quite general, and in particular include cellular automata used in modelling epidemics. In both settings, we consider classes of clusters that are quite general, for which we obtain a lower bound on their respective minimax detection rate, and show that some form of scan statistic, by far the most popular method in practice, achieves that same rate within a logarithmic factor. Our results are not limited to the normal location model, but generalize to any oneparameter exponential family when the anomalous clusters are large enough.
A framework for discrete integral transformations II – the 2D 31 Radon transform
"... This paper is dedicated to the memory of Professor Moshe Israeli 19402007, who passed away on February 18. Computing the Fourier transform of a function in polar coordinates is an important building block in many scientific disciplines and numerical schemes. In this paper we present the pseudopola ..."
Abstract

Cited by 27 (10 self)
 Add to MetaCart
(Show Context)
This paper is dedicated to the memory of Professor Moshe Israeli 19402007, who passed away on February 18. Computing the Fourier transform of a function in polar coordinates is an important building block in many scientific disciplines and numerical schemes. In this paper we present the pseudopolar Fourier transform that samples the Fourier transform on the pseudopolar grid, also known as the concentric squares grid. The pseudopolar grid consists of equally spaced samples along rays, where different rays are equally spaced and not equally angled. The pseudopolar Fourier transform Fourier transform is shown to be fast (the same complexity as the FFT), stable, invertible, requires only
Comparing Point Clouds
, 2004
"... Point clouds are one of the most primitive and fundamental surface representations. A popular source of point clouds are three dimensional shape acquisition devices such as laser range scanners. Another important field where point clouds are found is in the representation of highdimensional manifol ..."
Abstract

Cited by 25 (9 self)
 Add to MetaCart
Point clouds are one of the most primitive and fundamental surface representations. A popular source of point clouds are three dimensional shape acquisition devices such as laser range scanners. Another important field where point clouds are found is in the representation of highdimensional manifolds by samples. With the increasing popularity and very broad applications of this source of data, it is natural and important to work directly with this representation, without having to go to the intermediate and sometimes impossible and distorting steps of surface reconstruction. A geometric framework for comparing manifolds given by point clouds is presented in this paper. The underlying theory is based on GromovHausdorff distances, leading to isometry invariant and completely geometric comparisons. This theory is embedded in a probabilistic setting as derived from random sampling of manifolds, and then combined with results on matrices of pairwise geodesic distances to lead to a computational implementation of the framework. The theoretical and computational results here presented are complemented with experiments for real three dimensional shapes.
ADAPTIVE MULTISCALE DETECTION OF FILAMENTARY STRUCTURES IN A BACKGROUND OF UNIFORM RANDOM POINTS 1
, 2003
"... We are given a set of n points that might be uniformly distributed in the unit square [0,1] 2. We wish to test whether the set, although mostly consisting of uniformly scattered points, also contains a small fraction of points sampled from some (a priori unknown) curve with C αnorm bounded by β. An ..."
Abstract

Cited by 23 (6 self)
 Add to MetaCart
(Show Context)
We are given a set of n points that might be uniformly distributed in the unit square [0,1] 2. We wish to test whether the set, although mostly consisting of uniformly scattered points, also contains a small fraction of points sampled from some (a priori unknown) curve with C αnorm bounded by β. An asymptotic detection threshold exists in this problem; for a constant T−(α,β)> 0, if the number of points sampled from the curve is smaller than T−(α,β)n 1/(1+α) , reliable detection is not possible for large n. We describe a multiscale significantruns algorithm that can reliably detect concentration of data near a smooth curve, without knowing the smoothness information α or β in advance, provided that the number of points on the curve exceeds T∗(α,β)n 1/(1+α). This algorithm therefore has an optimal detection threshold, up to a factor T∗/T−. At the heart of our approach is an analysis of the data by counting membership in multiscale multianisotropic strips. The strips will
Searching for a Trail of Evidence in a Maze
, 2007
"... Consider a graph with a set of vertices and oriented edges connecting pairs of vertices. Each vertex is associated with a random variable and these are assumed to be independent. In this setting, suppose we wish to solve the following hypothesis testing problem: under the null, the random variables ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
(Show Context)
Consider a graph with a set of vertices and oriented edges connecting pairs of vertices. Each vertex is associated with a random variable and these are assumed to be independent. In this setting, suppose we wish to solve the following hypothesis testing problem: under the null, the random variables have common distribution N(0, 1) while under the alternative, there is an unknown path along which random variables have distribution N(µ, 1), µ> 0, and distribution N(0, 1) away from it. For which values of the mean shift µ can one reliably detect and for which values is this impossible? This paper develops detection thresholds for two types of common graphs which exhibit a different behavior. The first is the usual regular lattice with vertices of the form {(i, j) : 0 ≤ i, −i ≤ j ≤ i and j has the parity of i} and oriented edges (i, j) → (i+1, j +s) where s = ±1. We show that for paths of length m starting at the origin, the hypotheses become distinguishable (in a minimax sense) if µm ≫ √ log m, while they are not if µm ≪ log m. We derive equivalent results in a Bayesian setting where one assumes that all paths are equally likely; there the asymptotic threshold is µm ≈ m −1/4. We
Detecting Highly Oscillatory Signals by Chirplet Path Pursuit
, 2006
"... This paper considers the problem of detecting nonstationary phenomena, and chirps in particular, from very noisy data. Chirps are waveforms of the very general form A(t) exp(iλ ϕ(t)), where λ is a (large) base frequency, the phase ϕ(t) is timevarying and the amplitude A(t) is slowly varying. Given ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
This paper considers the problem of detecting nonstationary phenomena, and chirps in particular, from very noisy data. Chirps are waveforms of the very general form A(t) exp(iλ ϕ(t)), where λ is a (large) base frequency, the phase ϕ(t) is timevarying and the amplitude A(t) is slowly varying. Given a set of noisy measurements, we would like to test whether there is signal or whether the data is just noise. One particular application of note in conjunction with this problem is the detection of gravitational waves predicted by Einstein’s Theory of General Relativity. We introduce detection strategies which are very sensitive and more flexible than existing feature detectors. The idea is to use structured algorithms which exploit information in the socalled chirplet graph to chain chirplets together adaptively as to form chirps with polygonal instantaneous frequency. We then search for the path in the graph which provides the best tradeoff between complexity and goodness of fit. Underlying our methodology is the idea that while the signal may be extremely weak so that none of the individual empirical coefficients is statistically significant, one can still reliably detect by combining several coefficients into a
Changepoint detection over graphs with the spectral scan statistic. Arxiv preprint arXiv:1206.0773
, 2012
"... We consider the changepoint detection problem of deciding, based on noisy measurements, whether an unknown signal over a given graph is constant or is instead piecewise constant over two induced subgraphs of relatively low cut size. We analyze the corresponding generalized likelihood ratio (GLR) st ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
(Show Context)
We consider the changepoint detection problem of deciding, based on noisy measurements, whether an unknown signal over a given graph is constant or is instead piecewise constant over two induced subgraphs of relatively low cut size. We analyze the corresponding generalized likelihood ratio (GLR) statistic and relate it to the problem of finding a sparsest cut in a graph. We develop a tractable relaxation of the GLR statistic based on the combinatorial Laplacian of the graph, which we call the spectral scan statistic, and analyze its properties. We show how its performance as a testing procedure depends directly on the spectrum of the graph, and use this result to explicitly derive its asymptotic properties on few graph topologies. Finally, we demonstrate both theoretically and by simulations that the spectral scan statistic can outperform naive testing procedures based on edge thresholding and χ2 testing. 1
DETECTION OF CORRELATIONS
 SUBMITTED TO THE ANNALS OF STATISTICS
"... We consider the hypothesis testing problem of deciding whether an observed highdimensional vector has independent normal components or, alternatively, if it has a small subset of correlated components. The correlated components may have a certain combinatorial structure known to the statistician. W ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
We consider the hypothesis testing problem of deciding whether an observed highdimensional vector has independent normal components or, alternatively, if it has a small subset of correlated components. The correlated components may have a certain combinatorial structure known to the statistician. We establish upper and lower bounds for the worstcase (minimax) risk in terms of the size of the correlated subset, the level of correlation, and the structure of the class of possibly correlated sets. We show that some simple tests have nearoptimal performance in many cases, while the generalized likelihood ratio test is suboptimal in some important cases.