Results 1  10
of
218
Logistic Regression in Rare Events Data
, 1999
"... We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a ..."
Abstract

Cited by 152 (4 self)
 Add to MetaCart
We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quartermillion dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99 % of their (nonfixed) data collection costs or to collect much more meaningful explanatory
Assessing Degeneracy in Statistical Models of Social Networks
 Journal of the American Statistical Association
, 2003
"... discussions. This paper presents recent advances in the statistical modeling of random graphs that have an impact on the empirical study of social networks. Statistical exponential family models (Wasserman and Pattison 1996) are a generalization of the Markov random graph models introduced by Frank ..."
Abstract

Cited by 95 (15 self)
 Add to MetaCart
(Show Context)
discussions. This paper presents recent advances in the statistical modeling of random graphs that have an impact on the empirical study of social networks. Statistical exponential family models (Wasserman and Pattison 1996) are a generalization of the Markov random graph models introduced by Frank and Strauss (1986), which in turn are derived from developments in spatial statistics (Besag 1974). These models recognize the complex dependencies within relational data structures. A major barrier to the application of random graph models to social networks has been the lack of a sound statistical theory to evaluate model fit. This problem has at least three aspects: the specification of realistic models, the algorithmic difficulties of the inferential methods, and the assessment of the degree to which the graph structure produced by the models matches that of the data. We discuss these and related issues of the model degeneracy and inferential degeneracy for commonly used estimators.
A WEAKLY INFORMATIVE DEFAULT PRIOR DISTRIBUTION FOR LOGISTIC AND OTHER REGRESSION MODELS
"... We propose a new prior distribution for classical (nonhierarchical) logistic regression models, constructed by first scaling all nonbinary variables to have mean 0 and standard deviation 0.5, and then placing independent Studentt prior distributions on the coefficients. As a default choice, we reco ..."
Abstract

Cited by 78 (12 self)
 Add to MetaCart
We propose a new prior distribution for classical (nonhierarchical) logistic regression models, constructed by first scaling all nonbinary variables to have mean 0 and standard deviation 0.5, and then placing independent Studentt prior distributions on the coefficients. As a default choice, we recommend the Cauchy distribution with center 0 and scale 2.5, which in the simplest setting is a longertailed version of the distribution attained by assuming onehalf additional success and onehalf additional failure in a logistic regression. Crossvalidation on a corpus of datasets shows the Cauchy class of prior distributions to outperform existing implementations of Gaussian and Laplace priors. We recommend this prior distribution as a default choice for routine applied use. It has the advantage of always giving answers, even when there is complete separation in logistic regression (a common problem, even when the sample size is large and the number of predictors is small), and also automatically applying more shrinkage to higherorder interactions. This can
Partial least squares: A versatile tool for the analysis of highdimensional genomic data
 Briefings in Bioinformatics
, 2007
"... Partial Least Squares (PLS) is a highly efficient statistical regression technique that is well suited for the analysis of highdimensional genomic data. In this paper we review the theory and applications of PLS both under methodological and biological points of view. Focusing on microarray express ..."
Abstract

Cited by 63 (9 self)
 Add to MetaCart
(Show Context)
Partial Least Squares (PLS) is a highly efficient statistical regression technique that is well suited for the analysis of highdimensional genomic data. In this paper we review the theory and applications of PLS both under methodological and biological points of view. Focusing on microarray expression data we provide a systematic comparison of the PLS approaches currently employed, and discuss problems as different as tumor classification, identification of relevant genes, survival analysis and modeling of gene networks. 2 1
Back to the Future: Modeling Time Dependence in Binary Data. Working Paper
, 2009
"... Since Beck, Katz, and Tucker (1998), the standard method for modeling time dependence in binary data has been to incorporate time dummies or splined time in logistic regressions. Although we agree with the need for modeling time dependence, we demonstrate that time dummies can induce serious estima ..."
Abstract

Cited by 61 (0 self)
 Add to MetaCart
(Show Context)
Since Beck, Katz, and Tucker (1998), the standard method for modeling time dependence in binary data has been to incorporate time dummies or splined time in logistic regressions. Although we agree with the need for modeling time dependence, we demonstrate that time dummies can induce serious estimation problems due to separation. Splines do not suffer from these problems. However, the complexity of splines has led substantive researchers (1) to use knot values that may be inappropriate for their data and (2) to ignore any substantive discussion concerning the effect of time. We propose a relatively simple alternative: including t, t2, and t3 in the regression. This cubic polynomial approximation is trivial to implement — and, therefore, interpret — and it avoids problems such as quasicomplete separation. Monte Carlo analysis demonstrates that, for the types of hazards one often sees in substantive research, the polynomial approximation always outperforms time dummies and generally performs as well as splines, or even more flexible autosmoothing procedures. Due to its simplicity, this method also accommodates nonproportional hazards in a straightforward way. We reanalyze Crowley and Skocpol (2001) using nonproportional hazards and find new empirical support for their theory.
Differential privacy for statistics: What we know and what we want to learn
 In Proceedings of the 33rd International Colloquium on Automata, Languages and Programming, volume 4052 of LECTURE NOTES IN COMPUTER SCIENCE
"... Abstract. We motivate and review the definition of differential privacy, survey some results on differentially private statistical estimators, and outline a research agenda. This survey is based on two presentations given by the authors at an NCHS/CDC sponsored workshop on data privacy in May 2008. ..."
Abstract

Cited by 42 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We motivate and review the definition of differential privacy, survey some results on differentially private statistical estimators, and outline a research agenda. This survey is based on two presentations given by the authors at an NCHS/CDC sponsored workshop on data privacy in May 2008. 1
MLDS: Maximum Likelihood Difference Scaling in R
 Journal of Statistical Software
, 2008
"... This introduction to the R package MLDS is a modified and updated version of Knoblauch and Maloney (2008) published in the Journal of Statistical Software. The MLDS package in the R programming language can be used to estimate perceptual scales based on the results of psychophysical experiments usin ..."
Abstract

Cited by 31 (8 self)
 Add to MetaCart
This introduction to the R package MLDS is a modified and updated version of Knoblauch and Maloney (2008) published in the Journal of Statistical Software. The MLDS package in the R programming language can be used to estimate perceptual scales based on the results of psychophysical experiments using the method of difference scaling. In a difference scaling experiment, observers compare two suprathreshold differences (a,b) and (c,d) on each trial. The approach is based on a stochastic model of how the observer decides which perceptual difference (or interval) (a, b) or (c, d) is greater, and the parameters of the model are estimated using a maximum likelihood criterion. We also propose a method to test the model by evaluating the selfconsistency of the estimated scale. The package includes an example in which an observer judges the differences in correlation between scatterplots. The example may be readily adapted to estimate perceptual scales for arbitrary physical continua.
The VGAM Package for Categorical Data Analysis
"... Classical categorical regression models such as the multinomial logit and proportional odds models are shown to be readily handled by the vector generalized linear and additive model (VGLM/VGAM) framework. Additionally, there are natural extensions, such as reducedrank VGLMs for dimension reduction ..."
Abstract

Cited by 30 (0 self)
 Add to MetaCart
(Show Context)
Classical categorical regression models such as the multinomial logit and proportional odds models are shown to be readily handled by the vector generalized linear and additive model (VGLM/VGAM) framework. Additionally, there are natural extensions, such as reducedrank VGLMs for dimension reduction, and allowing covariates that have values specific to each linear/additive predictor, e.g., for consumer choice modeling. This article describes some of the framework behind the VGAM R package, its usage and implementation details.
Classification using generalized partial least squares
, 2005
"... The gpls package includes functions for classification using generalized partial least squares approaches. Both twogroup and multigroup (more than 2 groups) classifications can be done. The basic functionalities are based on and extended from the Iteratively ReWeighted Least Squares (IRWPLS) by Ma ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
The gpls package includes functions for classification using generalized partial least squares approaches. Both twogroup and multigroup (more than 2 groups) classifications can be done. The basic functionalities are based on and extended from the Iteratively ReWeighted Least Squares (IRWPLS) by Marx (1996). Additionally, Firth’s bias reduction procedure (Firth, 1992a,b, 1993) is