Results 1  10
of
104
Logistic Regression in Rare Events Data
, 1999
"... We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a ..."
Abstract

Cited by 59 (4 self)
 Add to MetaCart
We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quartermillion dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99 % of their (nonfixed) data collection costs or to collect much more meaningful explanatory
Assessing Degeneracy in Statistical Models of Social Networks
 Journal of the American Statistical Association
, 2003
"... discussions. This paper presents recent advances in the statistical modeling of random graphs that have an impact on the empirical study of social networks. Statistical exponential family models (Wasserman and Pattison 1996) are a generalization of the Markov random graph models introduced by Frank ..."
Abstract

Cited by 58 (14 self)
 Add to MetaCart
discussions. This paper presents recent advances in the statistical modeling of random graphs that have an impact on the empirical study of social networks. Statistical exponential family models (Wasserman and Pattison 1996) are a generalization of the Markov random graph models introduced by Frank and Strauss (1986), which in turn are derived from developments in spatial statistics (Besag 1974). These models recognize the complex dependencies within relational data structures. A major barrier to the application of random graph models to social networks has been the lack of a sound statistical theory to evaluate model fit. This problem has at least three aspects: the specification of realistic models, the algorithmic difficulties of the inferential methods, and the assessment of the degree to which the graph structure produced by the models matches that of the data. We discuss these and related issues of the model degeneracy and inferential degeneracy for commonly used estimators.
Partial least squares: A versatile tool for the analysis of highdimensional genomic data
 Briefings in Bioinformatics
, 2007
"... Partial Least Squares (PLS) is a highly efficient statistical regression technique that is well suited for the analysis of highdimensional genomic data. In this paper we review the theory and applications of PLS both under methodological and biological points of view. Focusing on microarray express ..."
Abstract

Cited by 25 (6 self)
 Add to MetaCart
Partial Least Squares (PLS) is a highly efficient statistical regression technique that is well suited for the analysis of highdimensional genomic data. In this paper we review the theory and applications of PLS both under methodological and biological points of view. Focusing on microarray expression data we provide a systematic comparison of the PLS approaches currently employed, and discuss problems as different as tumor classification, identification of relevant genes, survival analysis and modeling of gene networks. 2 1
Classification using generalized partial least squares
, 2005
"... The gpls package includes functions for classification using generalized partial least squares approaches. Both twogroup and multigroup (more than 2 groups) classifications can be done. The basic functionalities are based on and extended from the Iteratively ReWeighted Least Squares (IRWPLS) by Ma ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
The gpls package includes functions for classification using generalized partial least squares approaches. Both twogroup and multigroup (more than 2 groups) classifications can be done. The basic functionalities are based on and extended from the Iteratively ReWeighted Least Squares (IRWPLS) by Marx (1996). Additionally, Firth’s bias reduction procedure (Firth, 1992a,b, 1993) is
Differential privacy for statistics: What we know and what we want to learn
 In Proceedings of the 33rd International Colloquium on Automata, Languages and Programming, volume 4052 of LECTURE NOTES IN COMPUTER SCIENCE
"... Abstract. We motivate and review the definition of differential privacy, survey some results on differentially private statistical estimators, and outline a research agenda. This survey is based on two presentations given by the authors at an NCHS/CDC sponsored workshop on data privacy in May 2008. ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
Abstract. We motivate and review the definition of differential privacy, survey some results on differentially private statistical estimators, and outline a research agenda. This survey is based on two presentations given by the authors at an NCHS/CDC sponsored workshop on data privacy in May 2008. 1
A WEAKLY INFORMATIVE DEFAULT PRIOR DISTRIBUTION FOR LOGISTIC AND OTHER REGRESSION MODELS
"... We propose a new prior distribution for classical (nonhierarchical) logistic regression models, constructed by first scaling all nonbinary variables to have mean 0 and standard deviation 0.5, and then placing independent Studentt prior distributions on the coefficients. As a default choice, we reco ..."
Abstract

Cited by 19 (7 self)
 Add to MetaCart
We propose a new prior distribution for classical (nonhierarchical) logistic regression models, constructed by first scaling all nonbinary variables to have mean 0 and standard deviation 0.5, and then placing independent Studentt prior distributions on the coefficients. As a default choice, we recommend the Cauchy distribution with center 0 and scale 2.5, which in the simplest setting is a longertailed version of the distribution attained by assuming onehalf additional success and onehalf additional failure in a logistic regression. Crossvalidation on a corpus of datasets shows the Cauchy class of prior distributions to outperform existing implementations of Gaussian and Laplace priors. We recommend this prior distribution as a default choice for routine applied use. It has the advantage of always giving answers, even when there is complete separation in logistic regression (a common problem, even when the sample size is large and the number of predictors is small), and also automatically applying more shrinkage to higherorder interactions. This can
MLDS: Maximum Likelihood Difference Scaling in R
 Journal of Statistical Software
, 2008
"... This introduction to the R package MLDS is a modified and updated version of Knoblauch and Maloney (2008) published in the Journal of Statistical Software. The MLDS package in the R programming language can be used to estimate perceptual scales based on the results of psychophysical experiments usin ..."
Abstract

Cited by 15 (8 self)
 Add to MetaCart
This introduction to the R package MLDS is a modified and updated version of Knoblauch and Maloney (2008) published in the Journal of Statistical Software. The MLDS package in the R programming language can be used to estimate perceptual scales based on the results of psychophysical experiments using the method of difference scaling. In a difference scaling experiment, observers compare two suprathreshold differences (a,b) and (c,d) on each trial. The approach is based on a stochastic model of how the observer decides which perceptual difference (or interval) (a, b) or (c, d) is greater, and the parameters of the model are estimated using a maximum likelihood criterion. We also propose a method to test the model by evaluating the selfconsistency of the estimated scale. The package includes an example in which an observer judges the differences in correlation between scatterplots. The example may be readily adapted to estimate perceptual scales for arbitrary physical continua.
Comparison of Maximum Pseudo Likelihood and Maximum Likelihood Estimation of Exponential Family Random Graph Models
, 2007
"... The statistical modeling of social network data is difficult due to the complex dependence structure of the tie variables. Statistical exponential families of distributions provide a flexible way to model such dependence. They enable the statistical characteristics of the network to be encapsulated ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
The statistical modeling of social network data is difficult due to the complex dependence structure of the tie variables. Statistical exponential families of distributions provide a flexible way to model such dependence. They enable the statistical characteristics of the network to be encapsulated within an exponential family random graph (ERG) model. For a long time, however, likelihoodbased estimation was only feasible for ERG models assuming dyad independence. For more realistic and complex models inference has been based on the pseudolikelihood. Recent advances in computational methods have made likelihoodbased inference practical, and comparison of the different estimators possible. In this paper, we compare the bias, standard errors, coverage rates and efficiency of maximum likelihood and maximum pseudolikelihood estimators. We also propose an improved pseudolikelihood estimation method aimed at reducing bias. The comparison is performed using simulated social network data based on two versions of an empirically realistic network model, the first representing Lazega’s law firm data and the second a modified version
Efficient, differentially private point estimators
, 2008
"... Differential privacy is a recent notion of privacy for statistical databases that provides rigorous, meaningful confidentiality guarantees, even in the presence of an attacker with access to arbitrary side information. We show that for a large class of parametric probability models, one can construc ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
Differential privacy is a recent notion of privacy for statistical databases that provides rigorous, meaningful confidentiality guarantees, even in the presence of an attacker with access to arbitrary side information. We show that for a large class of parametric probability models, one can construct a differentially private estimator whose distribution converges to that of the maximum likelihood estimator. In particular, it is efficient and asymptotically unbiased. This result provides (further) compelling evidence that rigorous notions of privacy in statistical databases can be consistent with statistically valid inference.