Results 1  10
of
4,929
Regression Shrinkage and Selection Via the Lasso
 Journal of the Royal Statistical Society, Series B
, 1994
"... We propose a new method for estimation in linear models. The "lasso" minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactl ..."
Abstract

Cited by 4055 (51 self)
 Add to MetaCart
We propose a new method for estimation in linear models. The "lasso" minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly zero and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and treebased models are briefly described. Keywords: regression, subset selection, shrinkage, quadratic programming. 1 Introduction Consider the usual regression situation: we h...
Bagging Predictors
 Machine Learning
, 1996
"... Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor. The aggregation averages over the versions when predicting a numerical outcome and does a plurality vote when predicting a class. The multiple versions are formed by making ..."
Abstract

Cited by 3574 (1 self)
 Add to MetaCart
Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor. The aggregation averages over the versions when predicting a numerical outcome and does a plurality vote when predicting a class. The multiple versions are formed by making bootstrap replicates of the learning set and using these as new learning sets. Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy. The vital element is the instability of the prediction method. If perturbing the learning set can cause significant changes in the predictor constructed, then bagging can improve accuracy. 1. Introduction A learning set of L consists of data f(y n ; x n ), n = 1; : : : ; Ng where the y's are either class labels or a numerical response. We have a procedure for using this learning set to form a predictor '(x; L)  if the input is x we ...
A Study of CrossValidation and Bootstrap for Accuracy Estimation and Model Selection
 INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE
, 1995
"... We review accuracy estimation methods and compare the two most common methods: crossvalidation and bootstrap. Recent experimental results on artificial data and theoretical results in restricted settings have shown that for selecting a good classifier from a set of classifiers (model selection), te ..."
Abstract

Cited by 1248 (12 self)
 Add to MetaCart
We review accuracy estimation methods and compare the two most common methods: crossvalidation and bootstrap. Recent experimental results on artificial data and theoretical results in restricted settings have shown that for selecting a good classifier from a set of classifiers (model selection), tenfold crossvalidation may be better than the more expensive leaveoneout crossvalidation. We report on a largescale experiment  over half a million runs of C4.5 and a NaiveBayes algorithm  to estimate the effects of different parameters on these algorithms on realworld datasets. For crossvalidation, we vary the number of folds and whether the folds are stratified or not; for bootstrap, we vary the number of bootstrap samples. Our results indicate that for realword datasets similar to ours, the best method to use for model selection is tenfold stratified cross validation, even if computation power allows using more folds.
Using Bayesian networks to analyze expression data
 Journal of Computational Biology
, 2000
"... DNA hybridization arrays simultaneously measure the expression level for thousands of genes. These measurements provide a “snapshot ” of transcription levels within the cell. A major challenge in computational biology is to uncover, from such measurements, gene/protein interactions and key biologica ..."
Abstract

Cited by 1076 (18 self)
 Add to MetaCart
(Show Context)
DNA hybridization arrays simultaneously measure the expression level for thousands of genes. These measurements provide a “snapshot ” of transcription levels within the cell. A major challenge in computational biology is to uncover, from such measurements, gene/protein interactions and key biological features of cellular systems. In this paper, we propose a new framework for discovering interactions between genes based on multiple expression measurements. This framework builds on the use of Bayesian networks for representing statistical dependencies. A Bayesian network is a graphbased model of joint multivariate probability distributions that captures properties of conditional independence between variables. Such models are attractive for their ability to describe complex stochastic processes and because they provide a clear methodology for learning from (noisy) observations. We start by showing how Bayesian networks can describe interactions between genes. We then describe a method for recovering gene interactions from microarray data using tools for learning Bayesian networks. Finally, we demonstrate this method on the S. cerevisiae cellcycle measurements of Spellman et al. (1998). Key words: gene expression, microarrays, Bayesian methods. 1.
Filtering via simulation: Auxiliary particle filter, The
 Journal of the American Statistical Association
, 1999
"... ..."
How much should we trust differencesindifferences estimates? Quarterly Journal of Economics 119:249–75
, 2004
"... Most papers that employ DifferencesinDifferences estimation (DD) use many years of data and focus on serially correlated outcomes but ignore that the resulting standard errors are inconsistent. To illustrate the severity of this issue, we randomly generate placebo laws in statelevel data on fema ..."
Abstract

Cited by 775 (1 self)
 Add to MetaCart
(Show Context)
Most papers that employ DifferencesinDifferences estimation (DD) use many years of data and focus on serially correlated outcomes but ignore that the resulting standard errors are inconsistent. To illustrate the severity of this issue, we randomly generate placebo laws in statelevel data on female wages from the Current Population Survey. For each law, we use OLS to compute the DD estimate of its “effect ” as well as the standard error of this estimate. These conventional DD standard errors severely understate the standard deviation of the estimators: we find an “effect ” significant at the 5 percent level for up to 45 percent of the placebo interventions. We use Monte Carlo simulations to investigate how well existing methods help solve this problem. Econometric corrections that place a specific parametric form on the timeseries process do not perform well. Bootstrap (taking into account the autocorrelation of the data) works well when the number of states is large enough. Two corrections based on asymptotic approximation of the variancecovariance matrix work well for moderate numbers of states and one correction that collapses the time series information into a “pre ” and “post ” period and explicitly takes into account the effective sample size works well even for small numbers of states.
A direct approach to false discovery rates
, 2002
"... Summary. Multiplehypothesis testing involves guarding against much more complicated errors than singlehypothesis testing. Whereas we typically control the type I error rate for a singlehypothesis test, a compound error rate is controlled for multiplehypothesis tests. For example, controlling the ..."
Abstract

Cited by 749 (14 self)
 Add to MetaCart
(Show Context)
Summary. Multiplehypothesis testing involves guarding against much more complicated errors than singlehypothesis testing. Whereas we typically control the type I error rate for a singlehypothesis test, a compound error rate is controlled for multiplehypothesis tests. For example, controlling the false discovery rate FDR traditionally involves intricate sequential pvalue rejection methods based on the observed data. Whereas a sequential pvalue method fixes the error rate and estimates its corresponding rejection region, we propose the opposite approach—we fix the rejection region and then estimate its corresponding error rate. This new approach offers increased applicability, accuracy and power. We apply the methodology to both the positive false discovery rate pFDR and FDR, and provide evidence for its benefits. It is shown that pFDR is probably the quantity of interest over FDR. Also discussed is the calculation of the qvalue, the pFDR analogue of the pvalue, which eliminates the need to set the error rate beforehand as is traditionally done. Some simple numerical examples are presented that show that this new approach can yield an increase of over eight times in power compared with the Benjamini–Hochberg FDR method.
Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms
, 1998
"... This article reviews five approximate statistical tests for determining whether one learning algorithm outperforms another on a particular learning task. These tests are compared experimentally to determine their probability of incorrectly detecting a difference when no difference exists (type I err ..."
Abstract

Cited by 718 (9 self)
 Add to MetaCart
This article reviews five approximate statistical tests for determining whether one learning algorithm outperforms another on a particular learning task. These tests are compared experimentally to determine their probability of incorrectly detecting a difference when no difference exists (type I error). Two widely used statistical tests are shown to have high probability of type I error in certain situations and should never be used: a test for the difference of two proportions and a paireddifferences t test based on taking several random traintest splits. A third test, a paireddifferences t test based on 10fold crossvalidation, exhibits somewhat elevated probability of type I error. A fourth test, McNemar’s test, is shown to have low type I error. The fifth test is a new test, 5 × 2 cv, based on five iterations of twofold crossvalidation. Experiments show that this test also has acceptable type I error. The article also measures the power (ability to detect algorithm differences when they do exist) of these tests. The crossvalidated t test is the most powerful. The 5×2 cv test is shown to be slightly more powerful than McNemar’s test. The choice of the best test is determined by the computational cost of running the learning algorithm. For algorithms that can be executed only once, McNemar’s test is the only test with acceptable type I error. For algorithms that can be executed 10 times, the 5×2 cv test is recommended, because it is slightly more powerful and because it directly measures variation due to the choice of training set.
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants
 MACHINE LEARNING
, 1999
"... Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and realworld datasets. We review these algorithms and describe a large empirical study comparing several variants in co ..."
Abstract

Cited by 695 (2 self)
 Add to MetaCart
Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and realworld datasets. We review these algorithms and describe a large empirical study comparing several variants in conjunction with a decision tree inducer (three variants) and a NaiveBayes inducer.
The purpose of the study is to improve our understanding of why and
when these algorithms, which use perturbation, reweighting, and
combination techniques, affect classification error. We provide a
bias and variance decomposition of the error to show how different
methods and variants influence these two terms. This allowed us to
determine that Bagging reduced variance of unstable methods, while
boosting methods (AdaBoost and Arcx4) reduced both the bias and
variance of unstable methods but increased the variance for NaiveBayes,
which was very stable. We observed that Arcx4 behaves differently
than AdaBoost if reweighting is used instead of resampling,
indicating a fundamental difference. Voting variants, some of which
are introduced in this paper, include: pruning versus no pruning,
use of probabilistic estimates, weight perturbations (Wagging), and
backfitting of data. We found that Bagging improves when
probabilistic estimates in conjunction with nopruning are used, as
well as when the data was backfit. We measure tree sizes and show
an interesting positive correlation between the increase in the
average tree size in AdaBoost trials and its success in reducing the
error. We compare the meansquared error of voting methods to
nonvoting methods and show that the voting methods lead to large
and significant reductions in the meansquared errors. Practical
problems that arise in implementing boosting algorithms are
explored, including numerical instabilities and underflows. We use
scatterplots that graphically show how AdaBoost reweights instances,
emphasizing not only "hard" areas but also outliers and noise.
Mediation in experimental and nonexperimental studies: new procedures and recommendations
 PSYCHOLOGICAL METHODS
, 2002
"... Mediation is said to occur when a causal effect of some variable X on an outcome Y is explained by some intervening variable M. The authors recommend that with small to moderate samples, bootstrap methods (B. Efron & R. Tibshirani, 1993) be used to assess mediation. Bootstrap tests are powerful ..."
Abstract

Cited by 634 (3 self)
 Add to MetaCart
(Show Context)
Mediation is said to occur when a causal effect of some variable X on an outcome Y is explained by some intervening variable M. The authors recommend that with small to moderate samples, bootstrap methods (B. Efron & R. Tibshirani, 1993) be used to assess mediation. Bootstrap tests are powerful because they detect that the sampling distribution of the mediated effect is skewed away from 0. They argue that R. M. Baron and D. A. Kenny’s (1986) recommendation of first testing the X → Y association for statistical significance should not be a requirement when there is a priori belief that the effect size is small or suppression is a possibility. Empirical examples and computer setups for bootstrap analyses are provided. Mediation models of psychological processes are popular because they allow interesting associations to be decomposed into components that reveal possible causal mechanisms. These models are useful for theory development and testing as well as for the identification of possible points of intervention in applied work. Mediation is equally of interest to experimental psychologists as it is to those who study naturally occurring processes through nonexperimental studies. For example, social–cognitive psychologists are interested in showing that the effects of cognitive priming on attitude change are mediated by the accessibility of certain beliefs (Eagly & Chaiken, 1993). Developmental psychologists use longitudinal methods to study how parental unemployment can have adverse effects on child behavior through its intervening effect on quality of parenting (Conger et al., 1990). Mediation analysis is also used in organizational