Results 1  10
of
22
Logistic Regression in Rare Events Data
, 1999
"... We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a ..."
Abstract

Cited by 57 (4 self)
 Add to MetaCart
We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quartermillion dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99 % of their (nonfixed) data collection costs or to collect much more meaningful explanatory
Explaining Rare Events in International Relations
, 2000
"... Some of the most important phenomena in international conflict are coded as "rare events data," binary dependent variables with dozens to thousands of times fewer events, such as wars, coups, etc., than "nonevents". Unfortunately, rare events data are difficult to explain and predict, a problem that ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Some of the most important phenomena in international conflict are coded as "rare events data," binary dependent variables with dozens to thousands of times fewer events, such as wars, coups, etc., than "nonevents". Unfortunately, rare events data are difficult to explain and predict, a problem that seems to have at least two sources. First, and most importantly, the data collection strategies used in international conflict are grossly inefficient. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99% of their (nonfixed) data collection costs, or to collect much more meaningful explanatory variables. Second, logistic regression, and other commonly ...
Stratified Sample Design for Fair Lending Binary Logit Models +
, 2000
"... Logistic regressions are commonly used to assess for fair lending across groups of loan applicants. This paper considers estimation of the disparate treatment parameter when the sample is stratified jointly by loan outcome and race covariate. We use Monte Carlo analysis to investigate the finitesam ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Logistic regressions are commonly used to assess for fair lending across groups of loan applicants. This paper considers estimation of the disparate treatment parameter when the sample is stratified jointly by loan outcome and race covariate. We use Monte Carlo analysis to investigate the finitesample properties of two estimators of the disparate treatment parameter under six stratified sampling designs and three data generating processes; one estimator is consistent irrespective of sample design while the other is not. Unfortunately the inconsistent estimator is employed inadvertently in fair lending studies. We demonstrate the gains in using the consistent estimator as well as providing recommendations on sample design. We also study the effect of sample design on the empirical power of a test for statistical significance of the disparate treatment parameter. We recommend adopting a sample design that approximately balances by outcome and racial group, when using the estimator that adjusts for the stratification scheme. However, if the standard logit estimator is employed, then our results suggest a sample design that balances by outcome and allocates across racial groups proportionally to the population. Though our study is framed in terms of fair lending applications, our results apply generally to the estimation of logistic regressions that use stratified or choicebased sample designs.
Forecasting Binary Outcomes
, 2013
"... Binary events are involved in many economic decision problems. In recent years, considerable progress has been made in diverse disciplines in developing models for forecasting binary outcomes. We distinguish between two types of forecasts for binary events that are generally obtained as the output o ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Binary events are involved in many economic decision problems. In recent years, considerable progress has been made in diverse disciplines in developing models for forecasting binary outcomes. We distinguish between two types of forecasts for binary events that are generally obtained as the output of regression models: probability forecasts and point forecasts. We summarize specification, estimation, and evaluation of binary response models for the purpose of forecasting in a unified framework which is characterized by the joint distribution of forecasts and actuals, and a general loss function. Analysis of both the skill and the value of probability and point forecasts can be carried out within this framework. Parametric, semiparametric, nonparametric, and Bayesian approaches are covered. The emphasis is on the basic intuitions underlying each methodology, abstracting away from the mathematical details.
Patent Trolls on Markets for Technology – An Empirical Analysis of Trolls ’ Patent Acquisitions
, 2009
"... Patent trolls—firms that appropriate profits from innovation by enforcing patents against infringers—are peculiar players on markets for technologies. As buyers of patents, they are solely interested in the exclusion right, not in the underlying knowledge. Similarly, when they sell or license out pa ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Patent trolls—firms that appropriate profits from innovation by enforcing patents against infringers—are peculiar players on markets for technologies. As buyers of patents, they are solely interested in the exclusion right, not in the underlying knowledge. Similarly, when they sell or license out patents, the transaction does not involve a technology transfer. In this paper, we empirically analyze trolls ’ patent acquisitions. We draw on a unique dataset of 753 patents acquired by known patent trolls, which we compare to 1506 patents acquired by practicing firms. Our findings regarding patent characteristics support recent theoretical propositions about the troll business model. Trolls focus on patents that have a broad scope and that lie in patent thickets. Furthermore, and contrary to common belief, we find that troll patents are of significantly higher quality than those in the control group, a result that suggests sustainability of the troll business in the future. Extrapolating from our findings, we posit that transactions involving patent trolls may only be the tip of the iceberg of “patentonly ” transactions, a conjecture with strong implications for the efficiency of markets for technologies. Managerial and policy implications are discussed.
On the robustness of racial discrimination findings in mortgage lending studies Page 35 of 36
 Biometrika
, 1997
"... That mortgage lenders have complex underwriting standards, often differing legitimately from one lender to another, implies that any statistical model estimated to approximate these standards, for use in fair lending determinations, must be misspecified. Exploration of the sensitivity of disparate t ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
That mortgage lenders have complex underwriting standards, often differing legitimately from one lender to another, implies that any statistical model estimated to approximate these standards, for use in fair lending determinations, must be misspecified. Exploration of the sensitivity of disparate treatment findings from such statistical models is, thus, imperative. We contribute to this goal. This paper examines whether conclusions from several bankspecific studies, undertaken by the Office of the Comptroller of the Currency, are robust to changes in the link function adopted to model the probability of loan approval and to the approach used to approximate the finite sample null distribution for the disparate treatment hypothesis test. We find that discrimination findings are reasonably robust to the range of examined link functions, which supports the current use of the logit link. Based on several features of our results, we advocate regular use of a resampling method to determine pvalues.
Inference Methods for the Conditional Logistic Regression Model with Longitudinal Data
, 2007
"... This paper considers inference methods for casecontrol logistic regression in longitudinal setups. The motivation is provided by an analysis of plains bison spatial location as a function of habitat heterogeneity. The sampling is done according to a longitudinal matched casecontrol design in which ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
This paper considers inference methods for casecontrol logistic regression in longitudinal setups. The motivation is provided by an analysis of plains bison spatial location as a function of habitat heterogeneity. The sampling is done according to a longitudinal matched casecontrol design in which, at certain time points, exactly one case, the actual location of an animal, is matched to a number of controls, the alternative locations that could have been reached. We develop inference methods for the conditional logistic regression model in this setup, which can be formulated within a generalized estimating equation (GEE) framework. This permits the use of statistical techniques developed for GEEbased inference, such as robust variance estimators and model selection criteria adapted for nonindependent data. The performance of the methods is investigated in a simulation study and illustrated with the bison data analysis. Key words: Akaike information criterion (AIC); Casecontrol logistic regression; Estimating equations; Generalized estimating equations; Quasilikelihood under independence criterion (QIC); Retrospective sampling; Robust sandwich estimators. 1
Smoothing Spline Density Estimation: ResponseBased Sampling
"... In this paper a nonparametric approach to density estimation under responsebased sampling is studied. As a variation of the general penalized likelihood density estimation procedure, an asymptotic theory and an algorithm for the calculation of the estimates with automatic smoothing parameters are c ..."
Abstract
 Add to MetaCart
In this paper a nonparametric approach to density estimation under responsebased sampling is studied. As a variation of the general penalized likelihood density estimation procedure, an asymptotic theory and an algorithm for the calculation of the estimates with automatic smoothing parameters are customized from that in the general setting. Simulations of limited scale are conducted to illustrate the practical performance of the method. Keywords: ANOVA decomposition, conditional likelihood, logistic regression, odds ratio, penalized likelihood, tensor product spline. 1 Introduction Consider a probability density f(x; y) on a product domain X \Theta Y , where Y = f1; \Delta \Delta \Delta ; Kg is discrete, or categorical. Of interest is the estimation of the conditional density f(yjx), based on samples which are subject to a form of selection bias known as choicebased sampling in the econometrics literature or casecontrol sampling in the biostatistics literature. In such a setting, ...
the
, 2007
"... Semiparametric efficiency bounds for regression models under generalised casecontrol sampling: ..."
Abstract
 Add to MetaCart
Semiparametric efficiency bounds for regression models under generalised casecontrol sampling:
On the semiparametric efficiency of the ScottWild estimator under choicebased and
"... twophase sampling ..."