Results 1  10
of
14
LASSOPatternsearch Algorithm with Application to Ophthalmology and Genomic Data
, 2008
"... The LASSOPatternsearch algorithm is proposed to efficiently identify patterns of multiple dichotomous risk factors for outcomes of interest in demographic and genomic studies. The patterns considered are those that arise naturally from the log linear expansion of the multivariate Bernoulli density. ..."
Abstract

Cited by 29 (22 self)
 Add to MetaCart
The LASSOPatternsearch algorithm is proposed to efficiently identify patterns of multiple dichotomous risk factors for outcomes of interest in demographic and genomic studies. The patterns considered are those that arise naturally from the log linear expansion of the multivariate Bernoulli density. The method is designed for the case where there is a possibly very large number of candidate patterns but it is believed that only a relatively small number are important. A LASSO is used to greatly reduce the number of candidate patterns, using a novel computational algorithm that can handle an extremely large number of unknowns simultaneously. The patterns surviving the LASSO are further pruned in the framework of (parametric) generalized linear models. A novel tuning procedure based on the GACV for Bernoulli outcomes, modified to act
Regression Tree Models for Designed Experiments
 IMS Lecture
"... Abstract: Although regression trees were originally designed for large datasets, they can profitably be used on small datasets as well, including those from replicated or unreplicated complete factorial experiments. We show that in the latter situations, regression tree models can provide simpler an ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Abstract: Although regression trees were originally designed for large datasets, they can profitably be used on small datasets as well, including those from replicated or unreplicated complete factorial experiments. We show that in the latter situations, regression tree models can provide simpler and more intuitive interpretations of interaction effects as differences between conditional main effects. We present simulation results to verify that the models can yield lower prediction mean squared errors than the traditional techniques. The tree models span a wide range of sophistication, from piecewise constant to piecewise simple and multiple linear, and from least squares to Poisson and logistic regression. 1.
On Oblique Random Forests
"... Abstract. In his original paper on random forests, Breiman proposed two different decision tree ensembles: one generated from “orthogonal” trees with thresholds on individual features in every split, and one from “oblique ” trees separating the feature space by randomly oriented hyperplanes. In spit ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Abstract. In his original paper on random forests, Breiman proposed two different decision tree ensembles: one generated from “orthogonal” trees with thresholds on individual features in every split, and one from “oblique ” trees separating the feature space by randomly oriented hyperplanes. In spite of a rising interest in the random forest framework, however, ensembles built from orthogonal trees (RF) have gained most, if not all, attention so far. In the present work we propose to employ “oblique ” random forests (oRF) built from multivariate trees which explicitly learn optimal split directions at internal nodes using linear discriminative models, rather than using random coefficients as the original oRF. This oRF outperforms RF, as well as other classifiers, on nearly all data sets but those with discrete factorial features. Learned node models perform distinctively better than random splits. An oRF feature importance score shows to be preferable over standard RF feature importance scores such as Gini or permutation importance. The topology of the oRF decision space appears to be smoother and better adapted to the data, resulting in improved generalization performance. Overall, the oRF propose here may be preferred over standard RF on most learning tasks involving numerical and spectral data. 1
060095. LASSOPatternsearch Algorithm By
, 2008
"... The LASSOPatternsearch Algorithm and its variant the Grouped LASSOPatternsearch Algorithm are proposed to efficiently identify patterns of multiple dichotomous risk factors for outcomes of interest in demographic and genomic studies. The patterns considered are those that arise naturally from the ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
The LASSOPatternsearch Algorithm and its variant the Grouped LASSOPatternsearch Algorithm are proposed to efficiently identify patterns of multiple dichotomous risk factors for outcomes of interest in demographic and genomic studies. The patterns considered are those that arise naturally from the log linear expansion of the multivariate Bernoulli density. Both methods are designed for the case where there is a possibly very large number of candidate patterns but it is believed that only a relatively small number are important. In the LASSOPatternsearch Algorithm, a LASSO is used to greatly reduce the number of candidate patterns, using a novel computational algorithm that can handle an extremely large number of unknowns simultaneously. The patterns surviving the LASSO are further pruned in the framework of (parametric) generalized linear models. A novel tuning procedure based on the GACV for Bernoulli outcomes, modified to act as a model selector, is used at both steps. We applied the method to myopia data from the populationbased Beaver Dam Eye Study, exposing physiologically interesting interacting risk factors. We then
Initialising Neural Networks with Prior
, 2006
"... This thesis explores the relationship between two classification models: decision trees and multilayer perceptrons. Decision trees carve up databases into boxshaped regions, and make predictions based on the majority class in each box. They are quick to build and relatively easy to interpret. Multi ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
This thesis explores the relationship between two classification models: decision trees and multilayer perceptrons. Decision trees carve up databases into boxshaped regions, and make predictions based on the majority class in each box. They are quick to build and relatively easy to interpret. Multilayer perceptrons (MLPs) are often more accurate than decision trees, because they are able to use soft, curved, arbitrarily oriented decision boundaries. Unfortunately MLPs typically require a great deal of effort to determine a good number and arrangement of neural units, and then require many passes through the database to determine a good set of connection weights. The cost of creating and training an MLP is thus hundreds of times greater than the cost of creating a decision tree, for perhaps only a small gain in accuracy. The following scheme is proposed for reducing the computational cost of creating and training MLPs. First, build and prune a decision tree to generate prior knowledge of the database. Then, use that knowledge to determine the initial
Classification and Regression Tree Methods (In Encyclopedia of Statistics in Quality and Reliability,
"... selection bias, unbiased A classification or regression tree is a prediction model that can be represented as a decision tree. This article discusses the C4.5, CART, CRUISE, GUIDE, and QUEST methods in terms of their algorithms, features, properties, and performance. 1 ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
selection bias, unbiased A classification or regression tree is a prediction model that can be represented as a decision tree. This article discusses the C4.5, CART, CRUISE, GUIDE, and QUEST methods in terms of their algorithms, features, properties, and performance. 1
Investigator AwardsLASSOPatternsearch Algorithm with Application to Ophthalmology and Genomic Data
, 2008
"... The LASSOPatternsearch algorithm is proposed as a twostep method to identify clusters or patterns of multiple risk factors for outcomes of interest in demographic and genomic studies. The predictor variables are dichotomous or can be coded as dichotomous. Many diseases are suspected of having mult ..."
Abstract
 Add to MetaCart
The LASSOPatternsearch algorithm is proposed as a twostep method to identify clusters or patterns of multiple risk factors for outcomes of interest in demographic and genomic studies. The predictor variables are dichotomous or can be coded as dichotomous. Many diseases are suspected of having multiple interacting risk factors acting in concert, and it is of much interest to uncover higher order interactions or risk patterns when they exist. The patterns considered here are those that arise naturally from the log linear expansion of the multivariate Bernoulli density. The method is designed for the case where there is a possibly very large number of candidate patterns but it is believed that only a relatively small number are important. A LASSO is used to greatly reduce the number of candidate patterns, using a novel computational algorithm that can handle an extremely large number of unknowns simultaneously. Then the patterns surviving the LASSO are further pruned in the framework of (parametric) generalized linear models. A novel tuning procedure based on the GACV for Bernoulli
22 Evolutionary Algorithms in Decision Tree Induction
"... One of the biggest problem that many data analysis techniques have to deal with nowadays is Combinatorial Optimization that, in the past, has led many methods to be taken apart. Actually, the (still not enough!) higher computing power available makes it possible to apply such techniques within certa ..."
Abstract
 Add to MetaCart
One of the biggest problem that many data analysis techniques have to deal with nowadays is Combinatorial Optimization that, in the past, has led many methods to be taken apart. Actually, the (still not enough!) higher computing power available makes it possible to apply such techniques within certain bounds. Since other research fields like Artificial
Stepwise Induction of Logistic Model Trees
"... Abstract. In statistics, logistic regression is a regression model to predict a binomially distributed response variable. Recent research has investigated the opportunity of combining logistic regression with decision tree learners. Following this idea, we propose a novel Logistic Model Tree inducti ..."
Abstract
 Add to MetaCart
Abstract. In statistics, logistic regression is a regression model to predict a binomially distributed response variable. Recent research has investigated the opportunity of combining logistic regression with decision tree learners. Following this idea, we propose a novel Logistic Model Tree induction system, SILoRT, which induces trees with two types of nodes: regression nodes, which perform only univariate logistic regression, and splitting nodes, which partition the feature space. The multiple regression model associated with a leaf is then built stepwise by combining univariate logistic regressions along the path from the root to the leaf. Internal regression nodes contribute to the definition of multiple models and have a global effect, while univariate regressions at leaves have only local effects. Experimental results are reported. 1
Investigator AwardsLASSOPatternsearch Algorithm with Application to Ophthalmology Data
, 2008
"... The LASSOPatternsearch is proposed, as a twostage procedure to identify clusters of multiple risk factors for outcomes of interest in large demographic studies, when the predictor variables are dichotomous or take on values in a small finite set. Many diseases are suspected of having multiple inte ..."
Abstract
 Add to MetaCart
The LASSOPatternsearch is proposed, as a twostage procedure to identify clusters of multiple risk factors for outcomes of interest in large demographic studies, when the predictor variables are dichotomous or take on values in a small finite set. Many diseases are suspected of having multiple interacting risk factors acting in concert, and it is of much interest to uncover higher order interactions when they exist. The method is related to Zhang et al(2004) except that variable flexibility is sacrificed to allow entertaining models with high as well as low order interactions among multiple predictors. A LASSO is used to select important patterns, being applied conservatively to have a high rate of retention of true patterns, while allowing some noise. Then the patterns selected by the LASSO are tested in the framework of (parametric) generalized linear models to reduce the noise. Notably, the patterns are those that arise naturally from the log linear expansion of the multivariate Bernoulli density. Separate tuning procedures are proposed for the LASSO step and then the parametric step and a novel