Results 11  20
of
114
Treelets — An Adaptive MultiScale Basis for Sparse Unordered Data
"... In many modern applications, including analysis of gene expression and text documents, the data are noisy, highdimensional, and unordered — with no particular meaning to the given order of the variables. Yet, successful learning is often possible due to sparsity: the fact that the data are typicall ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
In many modern applications, including analysis of gene expression and text documents, the data are noisy, highdimensional, and unordered — with no particular meaning to the given order of the variables. Yet, successful learning is often possible due to sparsity: the fact that the data are typically redundant with underlying structures that can be represented by only a few features. In this paper, we present treelets — a novel construction of multiscale bases that extends wavelets to nonsmooth signals. The method is fully adaptive, as it returns a hierarchical tree and an orthonormal basis which both reflect the internal structure of the data. Treelets are especially wellsuited as a dimensionality reduction and feature selection tool prior to regression and classification, in situations where sample sizes are small and the data are sparse with unknown groupings of correlated or collinear variables. The method is also simple to implement and analyze theoretically. Here we describe a variety of situations where treelets perform better than principal component analysis as well as some common variable selection and cluster averaging schemes. We illustrate treelets on a blocked covariance model and on several data sets (hyperspectral image data, DNA microarray data, and internet advertisements) with highly complex dependencies between variables. 1
Consistent Feature Selection for Pattern Recognition in Polynomial Time
"... We analyze two different feature selection problems: finding a minimal feature set optimal for classification (MINIMALOPTIMAL) vs. finding all features relevant to the target variable (ALLRELEVANT). The latter problem is motivated by recent applications within bioinformatics, particularly gene exp ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
(Show Context)
We analyze two different feature selection problems: finding a minimal feature set optimal for classification (MINIMALOPTIMAL) vs. finding all features relevant to the target variable (ALLRELEVANT). The latter problem is motivated by recent applications within bioinformatics, particularly gene expression analysis. For both problems, we identify classes of data distributions for which there exist consistent, polynomialtime algorithms. We also prove that ALLRELEVANT is much harder than MINIMALOPTIMAL and propose two consistent, polynomialtime algorithms. We argue that the distribution classes considered are reasonable in many practical cases, so that our results simplify feature selection in a wide range of machine learning tasks.
Geometry of faithfulness assumption in causal inference
 Annals of Statistics
"... ar ..."
(Show Context)
Active learning of causal networks with intervention experiments and optimal
, 2008
"... The causal discovery from data is important for various scientific investigations. Because we cannot distinguish the different directed acyclic graphs (DAGs) in a Markov equivalence class learned from observational data, we have to collect further information on causal structures from experiments wi ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
The causal discovery from data is important for various scientific investigations. Because we cannot distinguish the different directed acyclic graphs (DAGs) in a Markov equivalence class learned from observational data, we have to collect further information on causal structures from experiments with external interventions. In this paper, we propose an active learning approach for discovering causal structures in which we first find a Markov equivalence class from observational data, and then we orient undirected edges in every chain component via intervention experiments separately. In the experiments, some variables are manipulated through external interventions. We discuss two kinds of intervention experiments, randomized experiment and quasiexperiment. Furthermore, we give two optimal designs of experiments, a batchintervention design and a sequentialintervention design, to minimize the number of manipulated variables and the set of candidate structures based on the minimax and the maximum entropy criteria. We show theoretically that structural learning can be done locally in subgraphs of chain components without need of checking illegal vstructures and cycles in the whole network and that a Markov equivalence subclass obtained after each intervention can still be depicted as a chain graph.
Hifh dimensional sparse covariance estimation via directed acyclic graphs
, 2009
"... We present a graphbased technique for estimating sparse covariance matrices and their inverses from highdimensional data. The method is based on learning a directed acyclic graph (DAG) and estimating parameters of a multivariate Gaussian distribution based on a DAG. For inferring the underlying DA ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
We present a graphbased technique for estimating sparse covariance matrices and their inverses from highdimensional data. The method is based on learning a directed acyclic graph (DAG) and estimating parameters of a multivariate Gaussian distribution based on a DAG. For inferring the underlying DAG we use the PCalgorithm [27] and for estimating the DAGbased covariance matrix and its inverse, we use a Cholesky decomposition approach which provides a positive (semi)definite sparse estimate. We present a consistency result in the highdimensional framework and we compare our method with the Glasso [12, 8, 2] for simulated and real data.
HIGHDIMENSIONAL STRUCTURE ESTIMATION IN ISING MODELS: LOCAL SEPARATION CRITERION
, 2012
"... We consider the problem of highdimensional Ising (graphical) model selection. We propose a simple algorithm for structure estimation based on the thresholding of the empirical conditional variation distances. We introduce a novel criterion for tractable graph families, where this method is efficien ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
We consider the problem of highdimensional Ising (graphical) model selection. We propose a simple algorithm for structure estimation based on the thresholding of the empirical conditional variation distances. We introduce a novel criterion for tractable graph families, where this method is efficient, based on the presence of sparse local separators between node pairs in the underlying graph. For such graphs, the proposed algorithm has a sample complexity of n = �(J −2 min log p), where p is the number of variables, and Jmin is the minimum (absolute) edge potential in the model. We also establish nonasymptotic necessary and sufficient conditions for structure estimation.
Assessing the validity domains of graphical Gaussian models in order to infer relationships among components of complex biological systems
, 2008
"... Abstract. The study of the interactions of cellular components is an essential base step to understand the structure and dynamics of biological networks. So, various methods were recently developed in this purpose. While most of them combine different types of data and ¡em¿a priori¡/em ¿ knowledge, ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
Abstract. The study of the interactions of cellular components is an essential base step to understand the structure and dynamics of biological networks. So, various methods were recently developed in this purpose. While most of them combine different types of data and ¡em¿a priori¡/em ¿ knowledge, methods based on Graphical Gaussian Models are capable of learning the network directly from raw data. They consider the fullorder partial correlations which are partial correlations between two variables given the remaining ones, for modelling direct links between variables. Statistical methods were developed for estimating these links when the number of observations is larger than the number of variables. However, the rapid advance of new technologies that allow to simultaneous measure genome expression, led to largescale datasets where the number of variables is far larger than the number of observations. To get round this dimensionality problem, different strategies and new statistical methods were proposed. In this study we focused on statistical methods recently published. All are based on the fact that the number of direct relationship between two variables is very small in regards to the number of possible relationships, ¡em¿p(p1)/2¡/em¿. In the biological context, this assumption is not always satisfied over the whole graph. So it is essential to precisely know the behaviour of the methods in regards to the characteristics of the studied object before applying them. For this purpose, we evaluated the validity domain of each method from wideranging simulated datasets. We then illustrated our results using recently published biological data.
Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs
, 2012
"... The investigation of directed acyclic graphs (DAGs) encoding the same Markov property, that is the same conditional independence relations of multivariate observational distributions, has a long tradition; many algorithms exist for model selection and structure learning in Markov equivalence classes ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
The investigation of directed acyclic graphs (DAGs) encoding the same Markov property, that is the same conditional independence relations of multivariate observational distributions, has a long tradition; many algorithms exist for model selection and structure learning in Markov equivalence classes. In this paper, we extend the notion of Markov equivalence of DAGs to the case of interventional distributions arising from multiple intervention experiments. We show that under reasonable assumptions on the intervention experiments, interventional Markov equivalence defines a finer partitioning of DAGs than observational Markov equivalence and hence improves the identifiability of causal models. We give a graph theoretic criterion for two DAGs being Markov equivalent under interventions and show that each interventional Markov equivalence class can, analogously to the observational case, be uniquely represented by a chain graph called interventional essential graph (also known as CPDAG in the observational case). These are key insights for deriving a generalization of the Greedy Equivalence Search algorithm aimed at structure learning from interventional data. This new algorithm is evaluated in a simulation study.