Results 1 -
6 of
6
High-Dimensional Density Estimation via SCA: An Example in the Modelling of Hurricane Tracks ✩
, 907
"... We present nonparametric techniques for constructing and verifying density estimates from high-dimensional data whose irregular dependence structure cannot be modelled by parametric multivariate distributions. A low-dimensional representation of the data is critical in such situations because of the ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We present nonparametric techniques for constructing and verifying density estimates from high-dimensional data whose irregular dependence structure cannot be modelled by parametric multivariate distributions. A low-dimensional representation of the data is critical in such situations because of the curse of dimensionality. Our proposed methodology consists of three main parts: (1) data reparameterization via dimensionality reduction, wherein the data are mapped into a space where standard techniques can be used for density estimation and simulation; (2) inverse mapping, in which simulated points are mapped back to the high-dimensional input space; and (3) verification, in which the quality of the estimate is assessed by comparing simulated samples with the observed data. These approaches are illustrated via an exploration of the spatial variability of tropical cyclones in the North Atlantic; each datum in this case is an entire hurricane trajectory. We conclude the paper with a discussion of extending the methods to model the relationship between TC variability and climatic variables. Key words: dimension reduction, nonparametric density estimation, application to physical sciences 1.
Detecting weak but hierarchically-structured patterns in networks
, 2010
"... The ability to detect weak distributed activation patterns in networks is critical to several applications, such as identifying the onset of anomalous activity or incipient congestion in the Internet, or faint traces of a biochemical spread by a sensor network. This is a challenging problem since we ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The ability to detect weak distributed activation patterns in networks is critical to several applications, such as identifying the onset of anomalous activity or incipient congestion in the Internet, or faint traces of a biochemical spread by a sensor network. This is a challenging problem since weak distributed patterns can be invisible in per node statistics as well as a global networkwide aggregate. Most prior work considers situations in which the activation/non-activation of each node is statistically independent, but this is unrealistic in many problems. In this paper, we consider structured patterns arising from statistical dependencies in the activation process. Our contributions are three-fold. First, we propose a sparsifying transform that succinctly represents structured activation patterns that conform to a hierarchical dependency graph. Second, we establish that the proposed transform facilitates detection of very weak activation patterns that cannot be detected with existing methods. Third, we show that the structure of the hierarchical dependency graph governing the activation process, and hence the network transform, can be learnt from very few (logarithmic in network size) independent snapshots of network activity. 1
694 Journal of the American Statistical Association, June 2009 Discussion
"... I commend Johnstone and Lu for publishing this important article, which has motivated quite a lot of recent work on sparsity and statistical inference in high-dimensional settings. In their article, Johnstone and Lu present two main results. ..."
Abstract
- Add to MetaCart
I commend Johnstone and Lu for publishing this important article, which has motivated quite a lot of recent work on sparsity and statistical inference in high-dimensional settings. In their article, Johnstone and Lu present two main results.
Using Dimension Reduction Techniques to Model Genetic Relationships for Association Studies
"... Beyond a few degrees of relationship pedigrees are rarely known with absolute certainty. This uncertainty is often elevated in population isolates, in which all extant individuals trace their ancestry to a limited number of founders. Cryptic relatedness can have a detrimental impact on nominal false ..."
Abstract
- Add to MetaCart
Beyond a few degrees of relationship pedigrees are rarely known with absolute certainty. This uncertainty is often elevated in population isolates, in which all extant individuals trace their ancestry to a limited number of founders. Cryptic relatedness can have a detrimental impact on nominal false positive rates for genetic association tests. An algorithm overcoming this problem is as follows: first estimate the relatedness of all pairs of individuals assessed for association; then adjust the test for association on the basis of relatedness. Methods exist by which relatedness can be estimated using genotypes obtained as part of a genome wide association study (GWA). It is important to recognize that using genotype information to estimate relationships between pairs of individuals can be very noisy. Treelets are an adaptive approach to dealing with noisy, high-dimensional and unordered data. Treelets simultaneously construct a hierarchical tree and an orthonormal basis that represent the internal structure of the data. We propose to use treelets on estimated relationship data by examining each individuals relationship to everyone else. Noise is removed by identifying the most important features of the basis and then reconstructing the data. We apply these techniques to data from Palau, an Oceanic nation of relatively recent origin in human history. These data are part of an ongoing project to understand the genetic basis of schizophrenia.
Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries
"... Learning sparse representations on data adaptive dictionaries is a state-of-the-art method for modeling data. But when the dictionary is large and the data dimension is high, it is a computationally challenging problem. We explore three aspects of the problem. First, we derive new, greatly improved ..."
Abstract
- Add to MetaCart
Learning sparse representations on data adaptive dictionaries is a state-of-the-art method for modeling data. But when the dictionary is large and the data dimension is high, it is a computationally challenging problem. We explore three aspects of the problem. First, we derive new, greatly improved screening tests that quickly identify codewords that are guaranteed to have zero weights. Second, we study the properties of random projections in the context of learning sparse representations. Finally, we develop a hierarchical framework that uses incremental random projections and screening to learn, in small stages, a hierarchically structured dictionary for sparse representations. Empirical results show that our framework can learn informative hierarchical sparse representations more efficiently. 1

