Results 1 - 10
of
13
Logistic Regression in Rare Events Data
, 1999
"... We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a ..."
Abstract
-
Cited by 33 (4 self)
- Add to MetaCart
We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quarter-million dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99 % of their (nonfixed) data collection costs or to collect much more meaningful explanatory
A Divide-and-Conquer Algorithm for Generating Markov Bases of Multi-way Tables
, 2003
"... We describe a divide-and-conquer technique for generating a Markov basis that connects all tables of counts having a fixed set of marginal totals ..."
Abstract
-
Cited by 20 (7 self)
- Add to MetaCart
We describe a divide-and-conquer technique for generating a Markov basis that connects all tables of counts having a fixed set of marginal totals
Markov chain Monte Carlo exact tests for incomplete two-way contingency tables
, 2002
"... We consider testing the quasi-independence hypothesis for two-way contingency tables which contain some structural zero cells. For sparse contingency tables where the large sample... ..."
Abstract
-
Cited by 20 (12 self)
- Add to MetaCart
We consider testing the quasi-independence hypothesis for two-way contingency tables which contain some structural zero cells. For sparse contingency tables where the large sample...
Explaining Rare Events in International Relations
, 2000
"... Some of the most important phenomena in international conflict are coded as "rare events data," binary dependent variables with dozens to thousands of times fewer events, such as wars, coups, etc., than "nonevents". Unfortunately, rare events data are difficult to explain and predict, a problem that ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Some of the most important phenomena in international conflict are coded as "rare events data," binary dependent variables with dozens to thousands of times fewer events, such as wars, coups, etc., than "nonevents". Unfortunately, rare events data are difficult to explain and predict, a problem that seems to have at least two sources. First, and most importantly, the data collection strategies used in international conflict are grossly inefficient. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of non-events (peace). This enables scholars to save as much as 99% of their (non-fixed) data collection costs, or to collect much more meaningful explanatory variables. Second, logistic regression, and other commonly ...
Exact Tests for Two-Way Symmetric Contingency Tables
"... this paper, we review exact tests and the computing problems involved. We propose new recursive algorithms for exact goodness-of-fit tests of quasi-independence, quasi-symmetry, linear-bylinear association and some related models. We propose that all computations be carried out using symbolic comput ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
this paper, we review exact tests and the computing problems involved. We propose new recursive algorithms for exact goodness-of-fit tests of quasi-independence, quasi-symmetry, linear-bylinear association and some related models. We propose that all computations be carried out using symbolic computation and rational arithmetic in order to calculate the exact p-values accurately and describe how we implemented our proposals. Two examples are presented.
Monte Carlo Exact Conditional Tests for Quasi-independence using Gibbs Sampling
, 1994
"... this paper, is the hypothesis of QI for the off-diagonal cells of a r \Theta r square table, where the sufficient statistics for the nuisance parameters are x i+ ; x +j and x ii , for i; j = 1; : : : ; r. ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
this paper, is the hypothesis of QI for the off-diagonal cells of a r \Theta r square table, where the sufficient statistics for the nuisance parameters are x i+ ; x +j and x ii , for i; j = 1; : : : ; r.
Assessing Robustness of Intrinsic Tests of Independence in Twoway Contingency Tables
"... Abstract: A condition needed for testing nested hypotheses from a Bayesian view-point is that the prior for the alternative model concentrates mass around the smaller, or null, model. For testing independence in contingency tables, the intrin-sic priors satisfy this requirement. Further, the degree ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract: A condition needed for testing nested hypotheses from a Bayesian view-point is that the prior for the alternative model concentrates mass around the smaller, or null, model. For testing independence in contingency tables, the intrin-sic priors satisfy this requirement. Further, the degree of concentration of the priors is controlled by a discrete parameter m, the training sample size, which plays an important role in the resulting answer. In this paper we study, for small or moderate sample sizes, robustness of the tests of independence in contingency tables with respect to intrinsic priors with different degrees of concentration around the null. We compare these tests with frequentist tests and the robust Bayes tests of Good and Crook. For large sample sizes robustness is achieved since the intrinsic Bayesian tests are consistent. We also discuss conditioning issues and sampling schemes, and argue that condi-tioning should be on either one margin or the table total, but not on both margins. Examples using real are simulated data are given.
Assessing Robustness of Intrinsic Tests of Independence in Two-Way Contingency Tables
"... For testing nested hypotheses from a Bayesian standpoint, a desirable condition is that the prior for the alternative model concentrates mass around the smaller, or null, model. For testing independence in contingency tables, the intrinsic priors satisfy this requirement. Furthermore, the degree of ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
For testing nested hypotheses from a Bayesian standpoint, a desirable condition is that the prior for the alternative model concentrates mass around the smaller, or null, model. For testing independence in contingency tables, the intrinsic priors satisfy this requirement. Furthermore, the degree of concentration of the priors is controlled by a discrete parameter, t, the training sample size, which plays an important role in the resulting answer. In this article we report on the robustness of the tests of independence for small or moderate sample sizes in contingency tables with respect to intrinsic priors with different degrees of concentration around the null. We compare these tests to frequentist tests and other robust Bayes tests. For large sample sizes, robustness is achieved because the intrinsic Bayesian tests are consistent. Examples using real and simulated data are given. Supplemental materials (technical details and data sets) are available online. KEY WORDS:
Exact confidence regions for species assignment based on DNA markers
"... Assignment of individuals to correct species or population of origin based on a comparison of allele profiles has in recent years become more accurate due to improvements in DNA marker technology. A method of assessing the error in such assignment problems is presented. The method is based on the ex ..."
Abstract
- Add to MetaCart
Assignment of individuals to correct species or population of origin based on a comparison of allele profiles has in recent years become more accurate due to improvements in DNA marker technology. A method of assessing the error in such assignment problems is presented. The method is based on the exact hypergeometric distributions of contingency tables conditioned on marginal totals. The result is a confidence region of fixed confidence level. This confidence level is calculable exactly in principle, and estimable very accurately by simulation, without knowledge of the true population allele frequencies. Various properties of these techniques are examined through application to several examples of actual DNA marker data and through simulation studies. Methods which may reduce computation time are discussed and illustrated. R ESUM E Grace a l'amelioration des techniques de marquage de l'ADN, l'a#ectation des individus aleurespece ou a leur population d'origine a partir de la compara...
Implications of random cut-points theory for the Mann-Whitney and binomial tests
, 2000
"... Through random cut-points theory, the author extends inference for ordered categorical data to the unspecified continuum underlying the ordered categories. He shows that a random cut-point Mann--Whitney test yields slightly smaller p-values than the conventional test for most data. However, when at ..."
Abstract
- Add to MetaCart
Through random cut-points theory, the author extends inference for ordered categorical data to the unspecified continuum underlying the ordered categories. He shows that a random cut-point Mann--Whitney test yields slightly smaller p-values than the conventional test for most data. However, when at least P% of the data lie in one of the k categories (with P =80fork=2,P=67fork=3, ..., P =18fork= 30), he also shows that the conventional test can yield much smaller p-values, and hence misleadingly liberal inference for the underlying continuum. The author derives formulas for exact tests; for k = 2, the Mann--Whitney test is but a binomial test. R ESUM E L'auteur montre que l'utilisation de la theorie des points de coupure aleatoires permet d'etendre l'inference pour des donnees qualitatives ordonnees au continuum sous-jacent inconnu. Il montre que dans la plupart des cas, le seuil observe du test de Mann--Whitney a points de coupure aleatoires est legerement inferieur a celui du test...

