Results 1 - 10
of
16
Logistic Regression in Rare Events Data
, 1999
"... We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a ..."
Abstract
-
Cited by 33 (4 self)
- Add to MetaCart
We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quarter-million dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99 % of their (nonfixed) data collection costs or to collect much more meaningful explanatory
Economic Choices
- American Economic Review
, 2001
"... ome detail more recent developments in the economic theory of choice, and modifications to this theory that are being forced by experimental evidence from cognitive psychology. I will close with a survey of statistical methods that have developed as part of the research program on economic choice be ..."
Abstract
-
Cited by 28 (2 self)
- Add to MetaCart
ome detail more recent developments in the economic theory of choice, and modifications to this theory that are being forced by experimental evidence from cognitive psychology. I will close with a survey of statistical methods that have developed as part of the research program on economic choice behavior. Science is a cooperative enterprise, and my work on choice behavior reflects not only my own ideas, but the results of exchange and collaboration with many other scholars. 1 First, of course, is my co-laureate James Heckman, who among his many contributions pioneered the important area of dynamic discrete choice analysis. Nine other individuals who played a major role in channeling microeconometrics and choice theory toward their modern forms, and had a particularly important influence on my own work, are Zvi Griliches, L.L. Thurstone, Jacob Marschak, Duncan Luce, Danny Kahneman, Amos Tversky, Moshe Ben-Akiva, Charles Manski, and Kenneth Train. A gallery of their p
Variable selection and Bayesian model averaging in case-control studies
, 1998
"... Covariate and confounder selection in case-control studies is most commonly carried out using either a two-step method or a stepwise variable selection method in logistic regression. Inference is then carried out conditionally on the selected model, but this ignores the model uncertainty implicit in ..."
Abstract
-
Cited by 17 (7 self)
- Add to MetaCart
Covariate and confounder selection in case-control studies is most commonly carried out using either a two-step method or a stepwise variable selection method in logistic regression. Inference is then carried out conditionally on the selected model, but this ignores the model uncertainty implicit in the variable selection process, and so underestimates uncertainty about relative risks. We report on a simulation study designed to be similar to actual case-control studies. This shows that p-values computed after variable selection can greatly overstate the strength of conclusions. For example, for our simulated case-control studies with 1,000 subjects, of variables declared to be "significant" with p-values between.01 and.05, only 49 % actually were risk factors when stepwise variable selection was used. We propose Bayesian model averaging as a formal way of taking account of model uncertainty in case-control studies. This yields an easily interpreted summary, the posterior probability that a variable is a risk factor, and our simulation study indicates this to be reasonably well calibrated in the situations simulated. The methods are applied and compared
Improving Forecasts of State Failure
, 2000
"... We offer the first independent scholarly evaluation of the claims, forecasts, and causal inferences of the State Failure Task Force and their efforts to forecast when states will fail. This task force, set up at the behest of Vice President Gore in 1994, has been led by a group of distinguished acad ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
We offer the first independent scholarly evaluation of the claims, forecasts, and causal inferences of the State Failure Task Force and their efforts to forecast when states will fail. This task force, set up at the behest of Vice President Gore in 1994, has been led by a group of distinguished academics working as consultants to the U.S. Central Intelligence Agency. State failure refers to the collapse of the authority of the central government to impose order, as in civil wars, revolutionary wars, genocides, politicides, and adverse or disruptive regime transitions. State Failure Task Force reports and publications have received attention in the media, in academia, and from public policy decision-makers. In this paper, we identify several methodological errors in the task force work that cause their reported forecast probabilities of conflict to be too large, their causal inferences to be biased in unpredictable directions, and their claims of forecasting performance to be exaggerate...
Predictors of customer perceived software quality
- in ICSE ’05: Proceedings of the 27th international conference on Software engineering
, 2005
"... Predicting software quality as perceived by a customer may allow an organization to adjust deployment to meet the quality expectations of its customers, to allocate the appropriate amount of maintenance resources, and to direct quality improvement efforts to maximize the return on investment. Howeve ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
Predicting software quality as perceived by a customer may allow an organization to adjust deployment to meet the quality expectations of its customers, to allocate the appropriate amount of maintenance resources, and to direct quality improvement efforts to maximize the return on investment. However, customer perceived quality may be affected not simply by the software content and the development process, but also by a number of other factors including deployment issues, amount of usage, software platform, and hardware configurations. We predict customer perceived quality as measured by various service interactions, including software defect reports, requests for assistance, and field technician dispatches using the afore mentioned and other factors for a large telecommunications software system. We employ the non-intrusive data gathering technique of using existing data captured in automated project monitoring and tracking systems as well as customer support and tracking systems. We find that the effects of deployment schedule, hardware configurations, and software platform can increase the probability of observing a software failure by more than 20 times. Furthermore, we find that the factors affect all quality measures in a similar fashion. Our approach can be applied at other organizations, and we suggest methods to independently validate and replicate our results.
Drivers for Customer Perceived Software Quality
- Proc. of 2005 Int'l Conference on Software Engineering (ICSE 2005), Saint Louis
, 2005
"... Predicting software quality as perceived by a customer may allow an organization to adjust deployment to meet the quality expectations of its customers, to allocate the appropriate amount of maintenance resources, and to help direct quality improvement efforts to maximize return on investment. Howev ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Predicting software quality as perceived by a customer may allow an organization to adjust deployment to meet the quality expectations of its customers, to allocate the appropriate amount of maintenance resources, and to help direct quality improvement efforts to maximize return on investment. However, customer perceived quality may be affected not simply by the software content and the development process, but also by a number of other factors including deployment issues, amount of usage, software platforms and hardware configurations. We predict customer perceived quality as measured by various service interactions, including software defect reports, request for assistance, and field technician dispatches using the afore mentioned and other factors for a large software system. We employ the non-intrusive data gathering technique of using existing data captured in automated project monitoring and tracking systems as well as customer support and tracking systems. We find that the effect of deployment schedule, hardware platform, and software configurations can increase the probability of observing failures more that 20 times, Furthermore, we found that the factors affected all quality measures in similar fashion. Our theoretical model could be applied at other organizations, and we suggest methods to independently validate and replicate our results. 1.
Explaining Rare Events in International Relations
, 2000
"... Some of the most important phenomena in international conflict are coded as "rare events data," binary dependent variables with dozens to thousands of times fewer events, such as wars, coups, etc., than "nonevents". Unfortunately, rare events data are difficult to explain and predict, a problem that ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Some of the most important phenomena in international conflict are coded as "rare events data," binary dependent variables with dozens to thousands of times fewer events, such as wars, coups, etc., than "nonevents". Unfortunately, rare events data are difficult to explain and predict, a problem that seems to have at least two sources. First, and most importantly, the data collection strategies used in international conflict are grossly inefficient. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of non-events (peace). This enables scholars to save as much as 99% of their (non-fixed) data collection costs, or to collect much more meaningful explanatory variables. Second, logistic regression, and other commonly ...
Conditional Logistic Analysis of Case-Control Studies With Complex Sampling
, 2001
"... this paper, we will show that the finite sampling approach leads to conditional logistic likelihood methods and can accommodate quite general sampling in a very natural way. 2. A STUDY OF RISK FACTORS FOR EARLY CHILDHOOD ASTHMA The particular goals and available resources should always be considere ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
this paper, we will show that the finite sampling approach leads to conditional logistic likelihood methods and can accommodate quite general sampling in a very natural way. 2. A STUDY OF RISK FACTORS FOR EARLY CHILDHOOD ASTHMA The particular goals and available resources should always be considered in the design of an epidemiologic case-control study. Until recently, these design considerations have been limited to how much to stratify and how large a sample to take. Although the standard case-control design will continue to serve for `general purpose' studies, it can be advantageous, in terms of validity and cost-efficiency, to tailor the study design to exploit particular features of the study setting. As a example which illustrates how study features can be used to advantage, we describe a study of risk factors for early childhood Case-control studies with complex sampling 3 # # # # Underlying ` infinite' population Independent realizations # # # # # Study base Sample controls (and cases) # # # # # Case-control set Fig. 2. Finite population model for case-control studies. asthma. This study is part of the Children's Health Study currently underway at the University of Southern California Department of Preventive Medicine (Peters et al., 1999). Children from 12 communities and three grade levels were enrolled to participate in a longitudinal study of childhood respiratory health. Information collected at enrolment to the study included whether the student had ever been diagnosed with asthma, exposed to tobacco smoke in utero and during childhood, and other factors that are potentially related to respiratory health. Using this baseline data, it was found that an asthma diagnosis age five or younger was associated with maternal smoking during pregnancy (in uter...
MODEL SELECTION, COVARIANCE SELECTION AND BAYES CLASSIFICATION VIA SHRINKAGE
, 2006
"... The naive Bayes classifier (NB) has exhibited its “mysterious ” but outstanding classification ability in practice, in spite of its often unrealistic conditional inde-pendence assumption. This simple assumption implies the adoption of a diagonal structure for the underlying class-specific precision ..."
Abstract
- Add to MetaCart
The naive Bayes classifier (NB) has exhibited its “mysterious ” but outstanding classification ability in practice, in spite of its often unrealistic conditional inde-pendence assumption. This simple assumption implies the adoption of a diagonal structure for the underlying class-specific precision matrices. However, the NB leaves covariates interrelationships unrevealed. In this dissertation, we will ex-tend the NB from the perspectives of covariance modeling and classification. Due to the positive definiteness constraint and the rapidly-growing number of parameters with dimensions, covariance estimation in a multivariate normal population has been a classic but challenging statistical problem. Sparse shrinkage covariance/precision matrix estimation has been obeyed as an important principle in covariance/precision matrix modeling. However, many existing models can only shrink the covariance/precision matrix toward a predefined diagonal structure. We model a precision matrix via its Cholesky decomposition in terms of compositional regression coefficient matrix and error precisions. Our approach aims at estimating

