Results 1  10
of
153
Statistical challenges with high dimensionality: feature selection in knowledge discovery
, 2006
"... ..."
(Show Context)
Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses
 ANN. STAT
, 2006
"... We consider the problem of estimating the number of false null hypotheses among a very large number of independently tested hypotheses, focusing on the situation in which the proportion of false null hypotheses is very small. We propose a family of methods for establishing lower 100(1 − α) % confide ..."
Abstract

Cited by 60 (4 self)
 Add to MetaCart
(Show Context)
We consider the problem of estimating the number of false null hypotheses among a very large number of independently tested hypotheses, focusing on the situation in which the proportion of false null hypotheses is very small. We propose a family of methods for establishing lower 100(1 − α) % confidence bounds for this proportion, based on the empirical distribution of the pvalues of the tests. Methods in this family are then compared in terms of ability to consistently estimate the proportion by letting α → 0 as the number of hypothesis tests increases and the proportion decreases. This work is motivated by a signal detection problem that occurs in astronomy.
Higher Criticism thresholding: optimal feature selection when useful features and rare and weak
 Proc. Natl. Acad. Sci
, 2008
"... Motivated by many ambitious modern applications – genomics and proteomics are examples, we consider a twoclass linear classification in highdimensional, lowsample size setting (a.k.a. p n). We consider the case where among a large number of features (dimensions), only a small fraction of them is ..."
Abstract

Cited by 43 (13 self)
 Add to MetaCart
(Show Context)
Motivated by many ambitious modern applications – genomics and proteomics are examples, we consider a twoclass linear classification in highdimensional, lowsample size setting (a.k.a. p n). We consider the case where among a large number of features (dimensions), only a small fraction of them is useful. The useful features are unknown to us, and each of them contributes weakly to the classification decision – we call this setting the rare/weak model (RW Model [2]). The success of linear classification hinges on how to select a small subset of useful features. We select features by thresholding feature zscores. The threshold is set by the recent innovation of higher criticism (HC) [1, 2]: Let πi denote the pvalue associated to the ith zscore and π () i denote the ith order statistic of the collection of pvalues, the HC threshold (HCT) is the order statistic of the zscore corresponding to index i which maximizes the ratio i / n − p() i / p i 1 − p i. HCT has many interesting features as follows. Asymptotic optimality in threshold selection. We formalize an asymptotic framework for studying the RW model, considering a sequence of problems with increasingly many features and relatively fewer observations.
Optimal detection of sparse principal components in high dimension
, 2013
"... We perform a finite sample analysis of the detection levels for sparse principal components of a highdimensional covariance matrix. Our minimax optimal test is based on a sparse eigenvalue statistic. Alas, computing this test is known to be NPcomplete in general, and we describe a computationally ..."
Abstract

Cited by 42 (4 self)
 Add to MetaCart
We perform a finite sample analysis of the detection levels for sparse principal components of a highdimensional covariance matrix. Our minimax optimal test is based on a sparse eigenvalue statistic. Alas, computing this test is known to be NPcomplete in general, and we describe a computationally efficient alternative test using convex relaxations. Our relaxation is also proved to detect sparse principal components at near optimal detection levels, and it performs well on simulated datasets. Moreover, using polynomial time reductions from theoretical computer science, we bring significant evidence that our results cannot be improved, thus revealing an inherent trade off between statistical and computational performance.
Nearoptimal detection of geometric objects by fast multiscale methods
 IEEE TRANS. INFORM. THEORY
, 2005
"... We construct detectors for “geometric” objects in noisy data. Examples include a detector for presence of a line segment of unknown length, position, and orientation in twodimensional image data with additive white Gaussian noise. We focus on the following two issues. i) The optimal detection thre ..."
Abstract

Cited by 41 (9 self)
 Add to MetaCart
(Show Context)
We construct detectors for “geometric” objects in noisy data. Examples include a detector for presence of a line segment of unknown length, position, and orientation in twodimensional image data with additive white Gaussian noise. We focus on the following two issues. i) The optimal detection threshold—i.e., the signal strength below which no method of detection can be successful for large dataset size. ii) The optimal computational complexity of a nearoptimal detector, i.e., the complexity required to detect signals slightly exceeding the detection threshold. We describe a general approach to such problems which covers several classes of geometrically defined signals; for example, with onedimensional data, signals having elevated mean on an interval, and, indimensional data, signals with elevated mean on a rectangle, a ball, or an ellipsoid. In all these problems, we show that a naive or straightforward approach leads to detector thresholds and algorithms which are asymptotically far away from optimal. At the same time, a multiscale geometric analysis of these classes of objects allows us to derive asymptotically optimal detection thresholds and fast algorithms for nearoptimal detectors.
Innovated higher criticism for detecting sparse signals in correlated noise
 Ann. Statist
, 2010
"... Higher Criticism is a method for detecting signals that are both sparse and weak. Although first proposed in cases where the noise variables are independent, Higher Criticism also has reasonable performance in settings where those variables are correlated. In this paper we show that, by exploiting t ..."
Abstract

Cited by 41 (8 self)
 Add to MetaCart
(Show Context)
Higher Criticism is a method for detecting signals that are both sparse and weak. Although first proposed in cases where the noise variables are independent, Higher Criticism also has reasonable performance in settings where those variables are correlated. In this paper we show that, by exploiting the nature of the correlation, performance can be improved by using a modified approach which exploits the potential advantages that correlation has to offer. Indeed, it turns out that the case of independent noise is the most difficult of all, from a statistical viewpoint, and that more accurate signal detection (for a given level of signal sparsity and strength) can be obtained when correlation is present. We characterize the advantages of correlation by showing how to incorporate them into the definition of an optimal detection boundary. The boundary has particularly attractive properties when correlation decays at a polynomial rate or the correlation matrix is Toeplitz.
Estimating the null and the proportion of nonnull effects in largescale multiple comparisons
 J. Amer. Statist. Assoc
, 2007
"... An important issue raised by Efron [7] in the context of largescale multiple comparisons is that in many applications the usual assumption that the null distribution is known is incorrect, and seemingly negligible differences in the null may result in large differences in subsequent studies. This s ..."
Abstract

Cited by 39 (6 self)
 Add to MetaCart
(Show Context)
An important issue raised by Efron [7] in the context of largescale multiple comparisons is that in many applications the usual assumption that the null distribution is known is incorrect, and seemingly negligible differences in the null may result in large differences in subsequent studies. This suggests that a careful study of estimation of the null is indispensable. In this paper, we consider the problem of estimating a null normal distribution, and a closely related problem, estimation of the proportion of nonnull effects. We develop an approach based on the empirical characteristic function and Fourier analysis. The estimators are shown to be uniformly consistent over a wide class of parameters. Numerical performance of the estimators is investigated using both simulated and real data. In particular, we apply our
On combinatorial testing problems
 ANNALS OF STATISTICS
, 2009
"... We study a class of hypothesis testing problems in which, upon observing the realization of an ndimensional Gaussian vector, one has to decide whether the vector was drawn from a standard normal distribution or, alternatively, whether there is a subset of the components belonging to a certain given ..."
Abstract

Cited by 39 (6 self)
 Add to MetaCart
(Show Context)
We study a class of hypothesis testing problems in which, upon observing the realization of an ndimensional Gaussian vector, one has to decide whether the vector was drawn from a standard normal distribution or, alternatively, whether there is a subset of the components belonging to a certain given class of sets whose elements have been “contaminated, ” that is, have a mean different from zero. We establish some general conditions under which testing is possible and others under which testing is hopeless with a small risk. The combinatorial and geometric structure of the class of sets is shown to play a crucial role. The bounds are illustrated on various examples.
Estimation and confidence sets for sparse normal mixtures
, 2006
"... Estimation and confidence sets for sparse normal mixtures ..."
Abstract

Cited by 35 (17 self)
 Add to MetaCart
(Show Context)
Estimation and confidence sets for sparse normal mixtures
Detection of an Anomalous Cluster in a Network
, 2010
"... We consider the problem of detecting whether or not in a given sensor network, there is a cluster of sensors which exhibit an “unusual behavior.” Formally, suppose we are given a set of nodes and attach a random variable to each node. We observe a realization of this process and want to decide bet ..."
Abstract

Cited by 30 (4 self)
 Add to MetaCart
We consider the problem of detecting whether or not in a given sensor network, there is a cluster of sensors which exhibit an “unusual behavior.” Formally, suppose we are given a set of nodes and attach a random variable to each node. We observe a realization of this process and want to decide between the following two hypotheses: under the null, the variables are i.i.d. standard normal; under the alternative, there is a cluster of variables that are i.i.d. normal with positive mean and unit variance, while the rest are i.i.d. standard normal. We also address surveillance settings where each sensor in the network collects information over time. The resulting model is similar, now with a time series attached to each node. We again observetheprocessovertime and want to decide between the null, where all the variables are i.i.d. standard normal; and the alternative, where there is an emerging cluster of i.i.d. normal variables with positive mean and unit variance. The growth models used to represent the emerging cluster are quite general, and in particular include cellular automata used in modelling epidemics. In both settings, we consider classes of clusters that are quite general, for which we obtain a lower bound on their respective minimax detection rate, and show that some form of scan statistic, by far the most popular method in practice, achieves that same rate within a logarithmic factor. Our results are not limited to the normal location model, but generalize to any oneparameter exponential family when the anomalous clusters are large enough.