Results 1 
5 of
5
Logistic Regression in Rare Events Data
, 1999
"... We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a ..."
Abstract

Cited by 55 (4 self)
 Add to MetaCart
We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quartermillion dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99 % of their (nonfixed) data collection costs or to collect much more meaningful explanatory
Explaining Rare Events in International Relations
, 2000
"... Some of the most important phenomena in international conflict are coded as "rare events data," binary dependent variables with dozens to thousands of times fewer events, such as wars, coups, etc., than "nonevents". Unfortunately, rare events data are difficult to explain and predict, a problem that ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Some of the most important phenomena in international conflict are coded as "rare events data," binary dependent variables with dozens to thousands of times fewer events, such as wars, coups, etc., than "nonevents". Unfortunately, rare events data are difficult to explain and predict, a problem that seems to have at least two sources. First, and most importantly, the data collection strategies used in international conflict are grossly inefficient. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99% of their (nonfixed) data collection costs, or to collect much more meaningful explanatory variables. Second, logistic regression, and other commonly ...
Usage
, 2010
"... Description Extends the approach proposed by Firth (1993) for bias reduction of MLEs in exponential family models to the multinomial logistic regression model with general covariate types. Modification of the logistic regression score function to remove firstorder bias is equivalent to penalizing t ..."
Abstract
 Add to MetaCart
Description Extends the approach proposed by Firth (1993) for bias reduction of MLEs in exponential family models to the multinomial logistic regression model with general covariate types. Modification of the logistic regression score function to remove firstorder bias is equivalent to penalizing the likelihood by the Jeffreys prior, and yields penalized maximum likelihood estimates (PLEs) that always exist. Hypothesis testing is conducted via likelihood ratio statistics. Profile confidence intervals (CI) are constructed for the PLEs.
Translated by Kimon FriarContents
"... Discipline is the highest of all virtues. Only so may strength and desire be counterbalanced and the endeavors of man bear fruit. N. KAZANTZAKIS, ..."
Abstract
 Add to MetaCart
Discipline is the highest of all virtues. Only so may strength and desire be counterbalanced and the endeavors of man bear fruit. N. KAZANTZAKIS,
Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Avoiding infinite estimates in logistic regression – theory, solutions, examples
, 2009
"... In logistic regression analyses of small or sparse data sets, results obtained by maximum likelihood methods cannot be generally trusted. In such analyses, although the likelihood meets the convergence criterion, at least one parameter may diverge to plus or minus infinity. This situation has been t ..."
Abstract
 Add to MetaCart
In logistic regression analyses of small or sparse data sets, results obtained by maximum likelihood methods cannot be generally trusted. In such analyses, although the likelihood meets the convergence criterion, at least one parameter may diverge to plus or minus infinity. This situation has been termed ’separation’. Examples of two studies are given, where the phenomenon of separation occurred: the first one investigated whether primary graft dysfunction of lung transplants is associated with endothelin1 mRNA expression measured in lung donors and in graft recipients. In the second example, conditional logistic regression was used to analyze a randomized animal experiment in which animals were clustered into sets defined by equal followup time. I show that a penalized likelihood approach provides an ideal solution to both examples, and provide comparative analyses including possible alternative approaches. The estimates obtained by the penalized likelihood approach have reduced bias compared to their maximum likelihood counterparts, and inference using penalized profile likelihood is straightforward. Finally, I provide an overview of software that can be used to apply the proposed penalized likelihood approach. Eviter les estimations infinies avec la regression logistique