Results 1  10
of
16
Logistic Regression in Rare Events Data
, 1999
"... We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a ..."
Abstract

Cited by 56 (4 self)
 Add to MetaCart
We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quartermillion dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99 % of their (nonfixed) data collection costs or to collect much more meaningful explanatory
A DivideandConquer Algorithm for Generating Markov Bases of Multiway Tables
, 2003
"... We describe a divideandconquer technique for generating a Markov basis that connects all tables of counts having a fixed set of marginal totals ..."
Abstract

Cited by 23 (8 self)
 Add to MetaCart
We describe a divideandconquer technique for generating a Markov basis that connects all tables of counts having a fixed set of marginal totals
Markov chain Monte Carlo exact tests for incomplete twoway contingency tables
, 2002
"... We consider testing the quasiindependence hypothesis for twoway contingency tables which contain some structural zero cells. For sparse contingency tables where the large sample... ..."
Abstract

Cited by 23 (14 self)
 Add to MetaCart
We consider testing the quasiindependence hypothesis for twoway contingency tables which contain some structural zero cells. For sparse contingency tables where the large sample...
Statistical notions of data disclosure avoidance and their relationship to traditional statistical methodology: Data swapping and loglinear models
 Proc. Bureau of the Census
, 1996
"... For most data releases especially those from censuses, the U. S. Bureau of the Census has either released data at high levels of aggregation or applied a data disclosure avoidance procedure such as data swapping or cell suppression before preparing microdata or tables for release. In this paper, we ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
For most data releases especially those from censuses, the U. S. Bureau of the Census has either released data at high levels of aggregation or applied a data disclosure avoidance procedure such as data swapping or cell suppression before preparing microdata or tables for release. In this paper, we present a general statistical characterization of the goal of a statistical agency in releasing confidential data subject to the application of disclosure avoidance procedures. We use this characterization to provide a framework for the study of data disclosure avoidance procedures for categorical variables. Consider a sample of n observations on p variables, which may be discrete or continuous. Our general characterization is in terms of the smoothing of a multidimensional empirical distribution function (an ordered version of the data), and sampling from it using bootstraplike selection. Both the smoothing and the sampling introduce alterations to the data and thus a bootstrap sample will not necessarily be the same as the original sample this works to preserve the confidentiality of individuals providing the original data. Two obvious questions are: How well confidentiality is preserved by such a process? Have the smoothing and sampling disguised fundamental relationships among the p variables of interest to others who will work only with the altered data? Rubin (1993) has provided a closely related characterization and approach based on multiple imputation. We explain some of these ideas in greater detail in the context of categorical random variables and compare them to methods in current use for data disclosure avoidance such as data swapping and cell suppression. We also relate this approach the data disclosure avoidance methods to statistical analysis associated with the use of loglinear models for crossclassified categorical data.
Objective Bayesian analysis of contingency tables
, 2002
"... The statistical analysis of contingency tables is typically carried out with a hypothesis test. In the Bayesian paradigm, default priors for hypothesis tests are typically improper, and cannot be used. Although such priors are available, and proper, for testing contingency tables, we show that for t ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
The statistical analysis of contingency tables is typically carried out with a hypothesis test. In the Bayesian paradigm, default priors for hypothesis tests are typically improper, and cannot be used. Although such priors are available, and proper, for testing contingency tables, we show that for testing independence they can be greatly improved on by socalled intrinsic priors. We also argue that because there is no realistic situation that corresponds to the case of conditioning on both margins of a contingency table, the proper analysis of an a × b contingency table should only condition on either the table total or on only one of the margins. The posterior probabilities from the intrinsic priors provide reasonable answers in these cases. Examples using simulated and real data are given.
Assessing Robustness of Intrinsic Tests of Independence in TwoWay Contingency Tables
"... For testing nested hypotheses from a Bayesian standpoint, a desirable condition is that the prior for the alternative model concentrates mass around the smaller, or null, model. For testing independence in contingency tables, the intrinsic priors satisfy this requirement. Furthermore, the degree of ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
For testing nested hypotheses from a Bayesian standpoint, a desirable condition is that the prior for the alternative model concentrates mass around the smaller, or null, model. For testing independence in contingency tables, the intrinsic priors satisfy this requirement. Furthermore, the degree of concentration of the priors is controlled by a discrete parameter, t, the training sample size, which plays an important role in the resulting answer. In this article we report on the robustness of the tests of independence for small or moderate sample sizes in contingency tables with respect to intrinsic priors with different degrees of concentration around the null. We compare these tests to frequentist tests and other robust Bayes tests. For large sample sizes, robustness is achieved because the intrinsic Bayesian tests are consistent. Examples using real and simulated data are given. Supplemental materials (technical details and data sets) are available online. KEY WORDS:
Explaining Rare Events in International Relations
, 2000
"... Some of the most important phenomena in international conflict are coded as "rare events data," binary dependent variables with dozens to thousands of times fewer events, such as wars, coups, etc., than "nonevents". Unfortunately, rare events data are difficult to explain and predict, a problem that ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Some of the most important phenomena in international conflict are coded as "rare events data," binary dependent variables with dozens to thousands of times fewer events, such as wars, coups, etc., than "nonevents". Unfortunately, rare events data are difficult to explain and predict, a problem that seems to have at least two sources. First, and most importantly, the data collection strategies used in international conflict are grossly inefficient. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99% of their (nonfixed) data collection costs, or to collect much more meaningful explanatory variables. Second, logistic regression, and other commonly ...
Exact Tests for TwoWay Symmetric Contingency Tables
"... this paper, we review exact tests and the computing problems involved. We propose new recursive algorithms for exact goodnessoffit tests of quasiindependence, quasisymmetry, linearbylinear association and some related models. We propose that all computations be carried out using symbolic comput ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
this paper, we review exact tests and the computing problems involved. We propose new recursive algorithms for exact goodnessoffit tests of quasiindependence, quasisymmetry, linearbylinear association and some related models. We propose that all computations be carried out using symbolic computation and rational arithmetic in order to calculate the exact pvalues accurately and describe how we implemented our proposals. Two examples are presented.
Monte Carlo Exact Conditional Tests for Quasiindependence using Gibbs Sampling
, 1994
"... this paper, is the hypothesis of QI for the offdiagonal cells of a r \Theta r square table, where the sufficient statistics for the nuisance parameters are x i+ ; x +j and x ii , for i; j = 1; : : : ; r. ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
this paper, is the hypothesis of QI for the offdiagonal cells of a r \Theta r square table, where the sufficient statistics for the nuisance parameters are x i+ ; x +j and x ii , for i; j = 1; : : : ; r.
Assessing Robustness of Intrinsic Tests of Independence in Twoway Contingency Tables
"... Abstract: A condition needed for testing nested hypotheses from a Bayesian viewpoint is that the prior for the alternative model concentrates mass around the smaller, or null, model. For testing independence in contingency tables, the intrinsic priors satisfy this requirement. Further, the degree ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract: A condition needed for testing nested hypotheses from a Bayesian viewpoint is that the prior for the alternative model concentrates mass around the smaller, or null, model. For testing independence in contingency tables, the intrinsic priors satisfy this requirement. Further, the degree of concentration of the priors is controlled by a discrete parameter m, the training sample size, which plays an important role in the resulting answer. In this paper we study, for small or moderate sample sizes, robustness of the tests of independence in contingency tables with respect to intrinsic priors with different degrees of concentration around the null. We compare these tests with frequentist tests and the robust Bayes tests of Good and Crook. For large sample sizes robustness is achieved since the intrinsic Bayesian tests are consistent. We also discuss conditioning issues and sampling schemes, and argue that conditioning should be on either one margin or the table total, but not on both margins. Examples using real are simulated data are given.