Results 1 
3 of
3
Logistic Regression in Rare Events Data
, 1999
"... We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a ..."
Abstract

Cited by 56 (4 self)
 Add to MetaCart
We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quartermillion dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99 % of their (nonfixed) data collection costs or to collect much more meaningful explanatory
Estimation in ChoiceBased Sampling With Measurement Error and Bootstrap Analysis
 J. Economet
, 1997
"... In this paper we discuss the estimation of a logit binary response model. The sampling is choicebased and is done in two stages. We investigate a likelihood based estimator which reduces to the usual logistic estimator when there is no measurement error and which takes into account the constraints ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
In this paper we discuss the estimation of a logit binary response model. The sampling is choicebased and is done in two stages. We investigate a likelihood based estimator which reduces to the usual logistic estimator when there is no measurement error and which takes into account the constraints imposed by the structure of the problem. Estimated standard errors obtained by formulae for prospective analysis are asymptotically correct. A robust estimation procedure is proposed and an asymptotic covariance matrix obtained. Several bootstrap methods are applied to this retrospective problem. Numerical results are presented to illustrate useful properties of the methods. Key words: Binary logit; Bootstrap; Choicebased sampling; Measurement error; Robustness JEL classification: C13; C25; C35 Correspondence to: C.Y. Wang, Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, MP 1002, Seattle, WA 98104, USA. Email: cywang@mule.fhcrc.org, Fax: (206) 6674142. * The...
Explaining Rare Events in International Relations
, 2000
"... Some of the most important phenomena in international conflict are coded as "rare events data," binary dependent variables with dozens to thousands of times fewer events, such as wars, coups, etc., than "nonevents". Unfortunately, rare events data are difficult to explain and predict, a problem that ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Some of the most important phenomena in international conflict are coded as "rare events data," binary dependent variables with dozens to thousands of times fewer events, such as wars, coups, etc., than "nonevents". Unfortunately, rare events data are difficult to explain and predict, a problem that seems to have at least two sources. First, and most importantly, the data collection strategies used in international conflict are grossly inefficient. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99% of their (nonfixed) data collection costs, or to collect much more meaningful explanatory variables. Second, logistic regression, and other commonly ...