Results 1  10
of
30
Model selection and estimation in regression with grouped variables
 Journal of the Royal Statistical Society, Series B
, 2006
"... We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactor ANOVA problem as the most important and well known example. Instead of selecting factors by stepwise backward el ..."
Abstract

Cited by 510 (7 self)
 Add to MetaCart
We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactor ANOVA problem as the most important and well known example. Instead of selecting factors by stepwise backward elimination, we focus on estimation accuracy and consider extensions of the LASSO, the LARS, and the nonnegative garrote for factor selection. The LASSO, the LARS, and the nonnegative garrote are recently proposed regression methods that can be used to select individual variables. We study and propose efficient algorithms for the extensions of these methods for factor selection, and show that these extensions give superior performance to the traditional stepwise backward elimination method in factor selection problems. We study the similarities and the differences among these methods. Simulations and real examples are used to illustrate the methods.
Mixtures of gpriors for Bayesian variable selection
 Journal of the American Statistical Association
, 2008
"... Zellner’s gprior remains a popular conventional prior for use in Bayesian variable selection, despite several undesirable consistency issues. In this paper, we study mixtures of gpriors as an alternative to default gpriors that resolve many of the problems with the original formulation, while mai ..."
Abstract

Cited by 36 (4 self)
 Add to MetaCart
Zellner’s gprior remains a popular conventional prior for use in Bayesian variable selection, despite several undesirable consistency issues. In this paper, we study mixtures of gpriors as an alternative to default gpriors that resolve many of the problems with the original formulation, while maintaining the computational tractability that has made the gprior so popular. We present theoretical properties of the mixture gpriors and provide real and simulated examples to compare the mixture formulation with fixed gpriors, Empirical Bayes approaches and other default procedures.
LASSOPatternsearch Algorithm with Application to Ophthalmology and Genomic Data
, 2008
"... The LASSOPatternsearch algorithm is proposed to efficiently identify patterns of multiple dichotomous risk factors for outcomes of interest in demographic and genomic studies. The patterns considered are those that arise naturally from the log linear expansion of the multivariate Bernoulli density. ..."
Abstract

Cited by 29 (22 self)
 Add to MetaCart
The LASSOPatternsearch algorithm is proposed to efficiently identify patterns of multiple dichotomous risk factors for outcomes of interest in demographic and genomic studies. The patterns considered are those that arise naturally from the log linear expansion of the multivariate Bernoulli density. The method is designed for the case where there is a possibly very large number of candidate patterns but it is believed that only a relatively small number are important. A LASSO is used to greatly reduce the number of candidate patterns, using a novel computational algorithm that can handle an extremely large number of unknowns simultaneously. The patterns surviving the LASSO are further pruned in the framework of (parametric) generalized linear models. A novel tuning procedure based on the GACV for Bernoulli outcomes, modified to act
Efficient empirical Bayes variable selection and estimation in linear models
 J. Amer. Statist. Assoc
, 2005
"... We propose an empirical Bayes method for variable selection and coefficient estimation in linear regression models. The method is based on a particular hierarchical Bayes formulation, and the empirical Bayes estimator is shown to be closely related to the LASSO estimator. Such a connection allows u ..."
Abstract

Cited by 25 (4 self)
 Add to MetaCart
We propose an empirical Bayes method for variable selection and coefficient estimation in linear regression models. The method is based on a particular hierarchical Bayes formulation, and the empirical Bayes estimator is shown to be closely related to the LASSO estimator. Such a connection allows us to take advantage of the recently developed quick LASSO algorithm to compute the empirical Bayes estimate, and provides a new way to select the tuning parameter in the LASSO method. Unlike previous empirical Bayes variable selection methods, which in most practical situations can only be implemented through a greedy stepwise algorithm, our method gives a global solution efficiently. Simulations and real examples show that the proposed method is very competitive in terms of variable selection, estimation accuracy, and computation speed when compared with other variable selection and estimation methods.
Cryptanalysis of the Cellular Message Encryption Algorithm By David Wagner Bruce Schneier John Kelsey i
 IEEE/ACM Trans. Comput. Biol. Bioinform
, 2005
"... Abstract—We construct a genetogene regulatory network from timeseries data of expression levels for the whole genome of the yeast Saccharomyces cerevisae, in a case where the number of measurements is much smaller than the number of genes in the network. This network is analyzed with respect to p ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
Abstract—We construct a genetogene regulatory network from timeseries data of expression levels for the whole genome of the yeast Saccharomyces cerevisae, in a case where the number of measurements is much smaller than the number of genes in the network. This network is analyzed with respect to present biological knowledge of all genes (according to the Gene Ontology database), and we find some of its largescale properties to be in accordance with known facts about the organism. The linear modeling employed here has been explored several times, but due to lack of any validation beyond investigating individual genes, it has been seriously questioned with respect to its applicability to biological systems. Our results show the adequacy of the approach and make further investigations of the model meaningful. Index Terms—Biology and genetics, time series analysis, network problems, gene network, network inference, Lasso, yeast, validation, outdegree. æ 1
Streamwise Feature Selection
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... In streamwise feature selection, new features are sequentially considered for addition to a predictive model. When the space of potential features is large, streamwise feature selection offers many advantages over traditional feature selection methods, which assume that all features are known in ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
In streamwise feature selection, new features are sequentially considered for addition to a predictive model. When the space of potential features is large, streamwise feature selection offers many advantages over traditional feature selection methods, which assume that all features are known in advance. Features can be generated dynamically, focusing the search for new features on promising subspaces, and overfitting can be controlled by dynamically adjusting the threshold for adding features to the model. In contrast to traditional forward feature selection algorithms such as stepwise regression in which at each step all possible features are evaluated and the best one is selected, streamwise feature selection only evaluates each feature once when it is generated. We describe informationinvesting and #investing, two adaptive complexity penalty methods for streamwise feature selection which dynamically adjust the threshold on the error reduction required for adding a new feature. These two methods give false discovery rate style guarantees against overfitting. They differ
When do stepwise algorithms meet subset selection criteria?
, 2007
"... Recent results in homotopy and solution paths demonstrate that certain welldesigned greedy algorithms, with a range of values of the algorithmic parameter, can provide solution paths to a sequence of convex optimization problems. On the other hand, in regression many existing criteria in subset sel ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Recent results in homotopy and solution paths demonstrate that certain welldesigned greedy algorithms, with a range of values of the algorithmic parameter, can provide solution paths to a sequence of convex optimization problems. On the other hand, in regression many existing criteria in subset selection (including Cp, AIC, BIC, MDL, RIC, etc.) involve optimizing an objective function that contains a counting measure. The two optimization problems are formulated as (P1) and (P0) in the present paper. The latter is generally combinatoric and has been proven to be NPhard. We study the conditions under which the two optimization problems have common solutions. Hence, in these situations a stepwise algorithm can be used to solve the seemingly unsolvable problem. Our main result is motivated by recent work in sparse representation, while two others emerge from different angles: a direct analysis of sufficiency and necessity and a condition on the mostly correlated covariates. An extreme example connected with least angle regression is of independent interest.
Variable Selection and Model Choice in Geoadditive Regression Models
"... Model choice and variable selection are issues of major concern in practical regression analyses. We propose a boosting procedure that facilitates both tasks in a class of complex geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction s ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
Model choice and variable selection are issues of major concern in practical regression analyses. We propose a boosting procedure that facilitates both tasks in a class of complex geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, random effects, and varying coefficient terms. The major modelling component are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a remaining smooth component with one degree of freedom to obtain a fair comparison between all model terms. A generic representation of the geoadditive model allows to devise a general boosting algorithm that implements automatic model choice and variable selection. We demonstrate the versatility of our approach with two examples: a geoadditive Poisson regression model for species counts in habitat suitability analyses and a geoadditive logit model for the analysis of forest health. Key words: bivariate smoothing, boosting, functional gradient, penalised splines, random effects, spacevarying effects
Bayesian Variable Selection and Data Integration for Biological Regulatory Networks
"... • Genes are long sequences of DNA that are transcribed to eventually become a protein • Nearidentical genetic material can lead to many different cell types and species • A critical aspect of cellular function is how genes are regulated and which genes are regulated together Shane T. Jensen 2 March ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
• Genes are long sequences of DNA that are transcribed to eventually become a protein • Nearidentical genetic material can lead to many different cell types and species • A critical aspect of cellular function is how genes are regulated and which genes are regulated together Shane T. Jensen 2 March 5, 2008Gene Regulatory Networks • Genes are regulated by transcription factor (TF) proteins that bind directly to the DNA sequence near to a gene • The bound protein affects the amount of transcription, thereby affecting the amount of protein produced • The collection of TFs and their target genes is often called the gene regulatory network – Goal is to elucidate regulatory network: which genes are targeted for regulation by a particuler TF?