Results 1  10
of
15
Detecting weak but hierarchicallystructured patterns in networks
, 2010
"... The ability to detect weak distributed activation patterns in networks is critical to several applications, such as identifying the onset of anomalous activity or incipient congestion in the Internet, or faint traces of a biochemical spread by a sensor network. This is a challenging problem since we ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
(Show Context)
The ability to detect weak distributed activation patterns in networks is critical to several applications, such as identifying the onset of anomalous activity or incipient congestion in the Internet, or faint traces of a biochemical spread by a sensor network. This is a challenging problem since weak distributed patterns can be invisible in per node statistics as well as a global networkwide aggregate. Most prior work considers situations in which the activation/nonactivation of each node is statistically independent, but this is unrealistic in many problems. In this paper, we consider structured patterns arising from statistical dependencies in the activation process. Our contributions are threefold. First, we propose a sparsifying transform that succinctly represents structured activation patterns that conform to a hierarchical dependency graph. Second, we establish that the proposed transform facilitates detection of very weak activation patterns that cannot be detected with existing methods. Third, we show that the structure of the hierarchical dependency graph governing the activation process, and hence the network transform, can be learnt from very few (logarithmic in network size) independent snapshots of network activity. 1
HighDimensional Density Estimation via SCA: An Example in the Modelling of Hurricane Tracks ✩
, 907
"... We present nonparametric techniques for constructing and verifying density estimates from highdimensional data whose irregular dependence structure cannot be modelled by parametric multivariate distributions. A lowdimensional representation of the data is critical in such situations because of the ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
We present nonparametric techniques for constructing and verifying density estimates from highdimensional data whose irregular dependence structure cannot be modelled by parametric multivariate distributions. A lowdimensional representation of the data is critical in such situations because of the curse of dimensionality. Our proposed methodology consists of three main parts: (1) data reparameterization via dimensionality reduction, wherein the data are mapped into a space where standard techniques can be used for density estimation and simulation; (2) inverse mapping, in which simulated points are mapped back to the highdimensional input space; and (3) verification, in which the quality of the estimate is assessed by comparing simulated samples with the observed data. These approaches are illustrated via an exploration of the spatial variability of tropical cyclones in the North Atlantic; each datum in this case is an entire hurricane trajectory. We conclude the paper with a discussion of extending the methods to model the relationship between TC variability and climatic variables. Key words: dimension reduction, nonparametric density estimation, application to physical sciences 1.
Gene expression data classification combining hierarchical representation and efficient feature selection
 Journal of Biological Systems
"... A general framework for microarray data classification is proposed in this paper. It produces precise and reliable classifiers through a twostep approach. At first, the original feature set is enhanced by a new set of features called metagenes. These new features are obtained through a hierarchica ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
A general framework for microarray data classification is proposed in this paper. It produces precise and reliable classifiers through a twostep approach. At first, the original feature set is enhanced by a new set of features called metagenes. These new features are obtained through a hierarchical clustering process on the original data. Two different metagene generation rules have been analyzed, called Treelets clustering and Euclidean clustering. Metagenes creation is attractive for several reasons: first, they can improve the classification since they broaden the available feature space and capture the common behavior of similar genes reducing the residual measurement noise. Furthermore, by analyzing some of the chosen metagenes for classification with gene set enrichment analysis algorithms, it is shown how metagenes can summarize the behavior of functionally related probe sets. Additionally, metagenes can point out, still undocumented, highly discriminant probe sets numerically related to other probes endowed with prior biological information in order to contribute to the knowledge discovery process. The second step of the framework is the feature selection which applies the Improved Sequential Floating Forward Selection algorithm (IFFS) to properly choose a subset from
Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries
"... Learning sparse representations on data adaptive dictionaries is a stateoftheart method for modeling data. But when the dictionary is large and the data dimension is high, it is a computationally challenging problem. We explore three aspects of the problem. First, we derive new, greatly improved ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Learning sparse representations on data adaptive dictionaries is a stateoftheart method for modeling data. But when the dictionary is large and the data dimension is high, it is a computationally challenging problem. We explore three aspects of the problem. First, we derive new, greatly improved screening tests that quickly identify codewords that are guaranteed to have zero weights. Second, we study the properties of random projections in the context of learning sparse representations. Finally, we develop a hierarchical framework that uses incremental random projections and screening to learn, in small stages, a hierarchically structured dictionary for sparse representations. Empirical results show that our framework can learn informative hierarchical sparse representations more efficiently. 1
694 Journal of the American Statistical Association, June 2009 Discussion
"... I commend Johnstone and Lu for publishing this important article, which has motivated quite a lot of recent work on sparsity and statistical inference in highdimensional settings. In their article, Johnstone and Lu present two main results. ..."
Abstract
 Add to MetaCart
I commend Johnstone and Lu for publishing this important article, which has motivated quite a lot of recent work on sparsity and statistical inference in highdimensional settings. In their article, Johnstone and Lu present two main results.
Using Dimension Reduction Techniques to Model Genetic Relationships for Association Studies
"... Beyond a few degrees of relationship pedigrees are rarely known with absolute certainty. This uncertainty is often elevated in population isolates, in which all extant individuals trace their ancestry to a limited number of founders. Cryptic relatedness can have a detrimental impact on nominal false ..."
Abstract
 Add to MetaCart
(Show Context)
Beyond a few degrees of relationship pedigrees are rarely known with absolute certainty. This uncertainty is often elevated in population isolates, in which all extant individuals trace their ancestry to a limited number of founders. Cryptic relatedness can have a detrimental impact on nominal false positive rates for genetic association tests. An algorithm overcoming this problem is as follows: first estimate the relatedness of all pairs of individuals assessed for association; then adjust the test for association on the basis of relatedness. Methods exist by which relatedness can be estimated using genotypes obtained as part of a genome wide association study (GWA). It is important to recognize that using genotype information to estimate relationships between pairs of individuals can be very noisy. Treelets are an adaptive approach to dealing with noisy, highdimensional and unordered data. Treelets simultaneously construct a hierarchical tree and an orthonormal basis that represent the internal structure of the data. We propose to use treelets on estimated relationship data by examining each individuals relationship to everyone else. Noise is removed by identifying the most important features of the basis and then reconstructing the data. We apply these techniques to data from Palau, an Oceanic nation of relatively recent origin in human history. These data are part of an ongoing project to understand the genetic basis of schizophrenia.
for gene expression
"... clustering combining numerical and biological similarities ..."
(Show Context)
Microarray classification with hierarchical data representation and novel feature selection criteria
"... Abstract—Microarray data classification is a challenging problem due to the high number of variables compared to the small number of available samples. An effective methodology to output a precise and reliable classifier is proposed in this work as an improvement of the algorithm in [1]. It conside ..."
Abstract
 Add to MetaCart
Abstract—Microarray data classification is a challenging problem due to the high number of variables compared to the small number of available samples. An effective methodology to output a precise and reliable classifier is proposed in this work as an improvement of the algorithm in [1]. It considers the sample scarcity problem and the lack of data structure typical of microarrays. Both problem are assessed by a twostep approach applying hierarchical clustering to create new features called metagenes and introducing a novel feature ranking criterion, inside the wrapper feature selection task. The classification ability has been evaluated on 4 publicly available datasets from Micro Array Quality Control study phase II (MAQC) classified by 7 different endpoints. The global results have showed how the proposed approach obtains better prediction accuracy than a wide variety of state of the art alternatives. Index Terms—Microarray classification; metagenes; hierarchical representation; Treelets; feature selection; LDA; wrapper. I.