Results 1  10
of
14
The grouplasso for generalized linear models: uniqueness of solutions and efficient
, 2008
"... The GroupLasso method for finding important explanatory factors suffers from the potential nonuniqueness of solutions and also from high computational costs. We formulate conditions for the uniqueness of GroupLasso solutions which lead to an easily implementable test procedure that allows us to i ..."
Abstract

Cited by 71 (0 self)
 Add to MetaCart
The GroupLasso method for finding important explanatory factors suffers from the potential nonuniqueness of solutions and also from high computational costs. We formulate conditions for the uniqueness of GroupLasso solutions which lead to an easily implementable test procedure that allows us to identify all potentially active groups. These results are used to derive an efficient algorithm that can deal with input dimensions in the millions and can approximate the solution path efficiently. The derived methods are applied to largescale learning problems where they exhibit excellent performance and where the testing procedure helps to avoid misinterpretations of the solutions. 1.
Using Markov Blankets for Causal Structure Learning
"... We show how a generic featureselection algorithm returning strongly relevant variables can be turned into a causal structurelearning algorithm. We prove this under the Faithfulness assumption for the data distribution. In a causal graph, the strongly relevant variables for a node X are its parents ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We show how a generic featureselection algorithm returning strongly relevant variables can be turned into a causal structurelearning algorithm. We prove this under the Faithfulness assumption for the data distribution. In a causal graph, the strongly relevant variables for a node X are its parents, children, and children’s parents (or spouses), also known as the Markov blanket of X. Identifying the spouses leads to the detection of the Vstructure patterns and thus to causal orientations. Repeating the task for all variables yields a valid partially oriented causal graph. We first show an efficient way to identify the spouse links. We then perform several experiments in the continuous domain using the Recursive Feature Elimination featureselection algorithm with Support Vector Regression and empirically verify the intuition of this direct (but computationally expensive) approach. Within the same framework, we then devise a fast and consistent algorithm, Total Conditioning (TC), and a variant, TCbw, with an explicit backward featureselection heuristics, for Gaussian data. After running a series of comparative experiments on five artificial networks, we argue that Markov blanket algorithms such as TC/TCbw or GrowShrink scale better than the reference PC algorithm and provides higher structural accuracy.
On the Complexity of Discrete Feature Selection for Optimal Classification
"... N.B.: When citing this work, cite the original article. ©2009 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
N.B.: When citing this work, cite the original article. ©2009 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Joint Markov Blankets in Feature Sets Extracted from Wavelet Packet Decompositions
 ENTROPY
, 2011
"... ..."
WCCI2008 workshop on causality Design and Analysis of the Causation and Prediction Challenge
"... We organized for WCCI 2008 a challenge to evaluate causal modeling techniques, focusing on predicting the effect of “interventions ” performed by an external agent. Examples of that problem are found in the medical domain to predict the effect of a drug prior to administering it, or in econometrics ..."
Abstract
 Add to MetaCart
(Show Context)
We organized for WCCI 2008 a challenge to evaluate causal modeling techniques, focusing on predicting the effect of “interventions ” performed by an external agent. Examples of that problem are found in the medical domain to predict the effect of a drug prior to administering it, or in econometrics to predict the effect of a new policy prior to issuing it. We concentrate on a given target variable to be predicted (e.g., health status of a patient) from a number of candidate predictive variables or “features ” (e.g., risk factors in the medical domain). Under interventions, variable predictive power and causality are tied together. For instance, both smoking and coughing may be predictive of lung cancer (the target) in the absence of external intervention; however, prohibiting smoking (a possible cause) may prevent lung cancer, but administering a cough medicine to stop coughing (a possible consequence) would not. We propose four tasks from various application domains, each dataset including a training set drawn from a “natural ” distribution and three test sets: one from the same distribution as the training set and two corresponding to data drawn when an external agent is manipulating certain variables. The goal is to predict a binary
A novel scalable and correct Markov boundary learning algorithm under faithfulness condition
"... In this paper, we propose a novel constraintbased Markov boundary discovery algorithm, called MBOR, that scales up to hundreds of thousands of variables. Its correctness under faithfulness condition is guaranteed. A thorough empiric evaluation of MBOR’s robustness, efficiency and scalability is pro ..."
Abstract
 Add to MetaCart
(Show Context)
In this paper, we propose a novel constraintbased Markov boundary discovery algorithm, called MBOR, that scales up to hundreds of thousands of variables. Its correctness under faithfulness condition is guaranteed. A thorough empiric evaluation of MBOR’s robustness, efficiency and scalability is provided on synthetic databases involving thousands of variables. Our experimental results show a clear benefit in several situations: large Markov boundaries, weak associations and approximate functional dependencies among the variables. 1
Statistical Methods for Genomewide Association Studies and Personalized Medicine
, 2014
"... iAbstract In genomewide association studies (GWAS), researchers analyze the genetic variation across the entire human genome, searching for variations that are associated with observable traits or certain diseases. There are several inference challenges in GWAS, including the huge number of genetic ..."
Abstract
 Add to MetaCart
(Show Context)
iAbstract In genomewide association studies (GWAS), researchers analyze the genetic variation across the entire human genome, searching for variations that are associated with observable traits or certain diseases. There are several inference challenges in GWAS, including the huge number of genetic markers to test, the weak association between truly associated markers and the traits, and the correlation structure between the genetic markers. This thesis mainly develops statistical methods that are suitable for genomewide association studies and their clinical translation for personalized medicine. After we introduce more background and related work in Chapters 1 and 2, we further discuss the problem of high dimensional statistical inference, especially capturing the dependence among multiple hypotheses, which has been underutilized in classical multiple testing procedures. Chapter 3 proposes a feature selection approach based on a unique graphical model which can leverage correlation structure among the markers. This graphical modelbased feature selection approach significantly outperforms the conventional feature selection methods used in GWAS. Chapter 4 reformulates this feature selection approach as a multiple testing procedure that has many elegant
Editor: n/a
"... We show how a generic feature selection algorithm returning strongly relevant variables can be turned into a causal structure learning algorithm. We prove this under the Faithfulness assumption for the data distribution. In a causal graph, the strongly relevant variables for a node X are its parents ..."
Abstract
 Add to MetaCart
We show how a generic feature selection algorithm returning strongly relevant variables can be turned into a causal structure learning algorithm. We prove this under the Faithfulness assumption for the data distribution. In a causal graph, the strongly relevant variables for a node X are its parents, children, and childen’s parents (or spouses), also known as the Markov blanket of X. Identifying the spouses leads to the detection of the Vstructure patterns and thus to causal orientations. Repeating the task for all variables yields a valid partially oriented causal graph. We first show an efficient way to identify the spouse links. We then perform several experiments in the continuous domain using the Recursive Feature Elimination feature selection algorithm with Support Vector Regression and empirically verify the intuition of this direct (but computationally expensive) approach. Within the same framework, we then devise a fast and consistent algorithm, Total Conditioning (TC), and a variant, TCbw, with an explicit backwards feature selection heuristics, for Gaussian data. After running a series of comparative experiments on five artificial networks, we argue that Markov blanket algorithms such as TC/TCbw or GrowShrink scale better than the reference PC algorithm with a better structural accuracy.
The All Relevant Feature Selection using Random Forest
"... Abstract. In this paper we examine the application of the random forest classifier for the all relevant feature selection problem. To this end we first examine two recently proposed all relevant feature selection algorithms, both being a random forest wrappers, on a series of synthetic data sets w ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. In this paper we examine the application of the random forest classifier for the all relevant feature selection problem. To this end we first examine two recently proposed all relevant feature selection algorithms, both being a random forest wrappers, on a series of synthetic data sets with varying size. We show that reasonable accuracy of predictions can be achieved and that heuristic algorithms that were designed to handle the all relevant problem, have performance that is close to that of the reference ideal algorithm. Then, we apply one of the algorithms to four families of semisynthetic data sets to assess how the properties of particular data set influence results of feature selection. Finally we test the procedure using a wellknown gene expression data set. The relevance of nearly all previously established important genes was confirmed, moreover the relevance of several new ones is discovered. 1