## Exploiting

### BibTeX

@MISC{Huynh-thu_exploiting,

author = {Vân Anh Huynh-thu and Louis Wehenkel and Pierre Geurts},

title = {Exploiting},

year = {}

}

### OpenURL

### Abstract

New challenges for feature selection

### Citations

4609 |
Classification and Regression Trees
- Breiman, Friedman, et al.
- 1984
(Show Context)
Citation Context ...ning sample of N input-output pairs drawn from some unknown probability distribution. The m input variables are denoted fi, i = 1, . . .,m. Tree-based methods. The basic idea of classification trees (=-=Breiman et al., 1984-=-) is to recursively split the learning sample with binary tests based each on one input variable trying to reduce as much as possible the uncertainty about the output classification in the resulting s... |

1765 | Random forests - Breiman |

768 | V.: Gene selection for cancer classification using support vector machine
- Guyon, Weston, et al.
- 2002
(Show Context)
Citation Context ...mutation based importance measure (Breiman, 2003). On the other hand, we have already carried out some preliminary experiments with variable importances derived from the weights of linear SVM models (=-=Guyon et al., 2002-=-). One problem however with these importances is that their interpretation is dependent on one hand on the scaling of the input variables and on the other hand on the specific margin that can be achie... |

444 |
Statistical significance for genomewide studies
- Storey, Tibshirani
- 2003
(Show Context)
Citation Context ...ariables whose importance is greater than v are relevant and our concern is to estimate the expected rate of truly irrelevant features among these variables, the so-called false discovery rate (FDR) (=-=Storey and Tibshirani, 2003-=-). More formally, for a given importance threshold v, the FDR is defined as: [ ] F(v) FDR(v) = E , (2) S(v) where S(v) is the number of variables considered relevant at threshold v and F(v) is the num... |

207 | Resampling-Based Multiple Testing: Examples and Methods for P-Value Adjustment” ISBN 0-471-55761-7 - Westfall, Young - 1993 |

143 | A review of feature selection techniques in bioinformatics - Saeys, Inza, et al. - 2007 |

92 | Resampling-based multiple testing for microarray data analysis - Ge, Dudoit, et al. - 2003 |

62 | Microarray expression profiling identifies genes with altered expression in HDL-deficient mice - Callow, Dudoit, et al. - 2000 |

51 | Y.: Ranking a random feature for variable and feature selection - Stoppiglia, Dreyfus, et al. |

49 |
Automatic learning techniques in power systems
- Wehenkel
- 1997
(Show Context)
Citation Context ...rwise. Variable importance measure. Several variable importance measures have been proposed in the literature for tree-based methods. In this paper, we consider a measure based on information theory (=-=Wehenkel, 1998-=-), which at each test node n computes the total reduction of the class entropy due to the split, defined by: I(n) = #S.HC(S) − #St.HC(St) − #Sf.HC(Sf), (1) where S denotes the set of samples that reac... |

32 |
Manual on Setting Up, using, and understanding Random Forests v3.1. http://www.stat.berkeley.edu/users/breiman/RandomForests/cc.home.htm
- BREIMAN
(Show Context)
Citation Context ...n measures based on variance reduction (Breiman et al., 1984). It would be interesting also to consider other importance measures such as, for example, Breiman’s permutation based importance measure (=-=Breiman, 2003-=-). On the other hand, we have already carried out some preliminary experiments with variable importances derived from the weights of linear SVM models (Guyon et al., 2002). One problem however with th... |

15 | Determining the number of non-spurious arcs in a learned dag model: Investigation of a bayesian and a frequentist approach - Listgarten, Heckerman |

5 |
Feature selection using ensemble based ranking against artificial contrasts
- Tuv, Borisov, et al.
- 2006
(Show Context)
Citation Context ...multivariate context is also an interesting future work direction. Two related approaches to selectively identify relevant features from a ranking have been proposed in (Stoppiglia et al., 2003) and (=-=Tuv et al., 2006-=-). The common idea of both methods is to include random features in the learning sample and then to exploit their rank among the original features to determine a relevance threshold. Stoppiglia et al.... |

1 | 72 tree-based variable importances - Geurts, Ernst, et al. |