Results 1  10
of
97
Twosided confidence intervals for the single proportion: comparison of seven methods. Stat. Med
, 1998
"... Simple interval estimate methods for proportions exhibit poor coverage and can produce evidently inappropriate intervals. Criteria appropriate to the evaluation of various proposed methods include: closeness of the achieved coverage probability to its nominal value; whether intervals are located too ..."
Abstract

Cited by 73 (2 self)
 Add to MetaCart
Simple interval estimate methods for proportions exhibit poor coverage and can produce evidently inappropriate intervals. Criteria appropriate to the evaluation of various proposed methods include: closeness of the achieved coverage probability to its nominal value; whether intervals are located too close to or too distant from the middle of the scale; expected interval width; avoidance of aberrations such as limits outside [0, 1] or zero width intervals; and ease of use, whether by tables, software or formulae. Seven methods for the single proportion are evaluated on 96,000 parameter space points. Intervals based on tail areas and the simpler score methods are recommended for use. In each case, methods are available that aim to align either the minimum or the mean coverage with the nominal 1!α. � 1998 John Wiley & Sons, Ltd. 1.
Interval estimation for the difference between independent proportions: comparison of eleven methods
 Statistics in Medicine
, 1998
"... Several existing unconditional methods for setting confidence intervals for the difference between binomial proportions are evaluated. Computationally simpler methods are prone to a variety of aberrations and poor coverage properties. The closely interrelated methods of Mee and Miettinen and Nurmine ..."
Abstract

Cited by 30 (2 self)
 Add to MetaCart
Several existing unconditional methods for setting confidence intervals for the difference between binomial proportions are evaluated. Computationally simpler methods are prone to a variety of aberrations and poor coverage properties. The closely interrelated methods of Mee and Miettinen and Nurminen perform well but require a computer program. Two new approaches which also avoid aberrations are developed and evaluated. A tail area profile likelihood based method produces the best coverage properties, but is difficult to calculate for large denominators. A method combining Wilson score intervals for the two proportions to be compared also performs well, and is readily implemented irrespective of sample size. � 1998 John Wiley & Sons, Ltd. 1.
Assessing interactive causal influence
 Psychological Review
"... The discovery of conjunctive causes—factors that act in concert to produce or prevent an effect—has been explained by purely covariational theories. Such theories assume that concomitant variations in observable events directly license causal inferences, without postulating the existence of unobserv ..."
Abstract

Cited by 25 (6 self)
 Add to MetaCart
The discovery of conjunctive causes—factors that act in concert to produce or prevent an effect—has been explained by purely covariational theories. Such theories assume that concomitant variations in observable events directly license causal inferences, without postulating the existence of unobservable causal relations. This article discusses problems with these theories, proposes a causalpower theory that overcomes the problems, and reports empirical evidence favoring the new theory. Unlike earlier models, the new theory derives (a) the conditions under which covariation implies conjunctive causation and (b) functions relating observable events to unobservable conjunctive causal strength. This psychological theory, which concerns simple cases involving 2 binary candidate causes and a binary effect, raises questions about normative statistics for testing causal hypotheses regarding categorical data resulting from discrete variables. The preparation of this article was supported by National Science
Discovering Risk of Disease with a Learning Classifier System
 Proceedings of the 7th International Conference on Genetic Algorithms (ICGA97
, 1998
"... A learning classifier system, EpiCS, was used to derive a continuous measure of disease risk in a series of 250 individuals. Using the area under the receiveroperating characteristic curve, this measure was compared with the risk estimate derived for the same individuals by logistic regression. Ove ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
A learning classifier system, EpiCS, was used to derive a continuous measure of disease risk in a series of 250 individuals. Using the area under the receiveroperating characteristic curve, this measure was compared with the risk estimate derived for the same individuals by logistic regression. Over 20 trainingtesting trials, risk estimates derived by EpiCS were consistently more accurate (mean area=0.97, SD=0.01) than that derived by logistic regression (mean area=0.89, SD=0.02). The areas for the trials with minimum and maximum classification performance on testing were significantly greater (p=0.019 and p<0.001, respectively) than the area for the logistic regression curve. This investigation demonstrated the ability of a learning classifier system to produce output that is clinically meaningful in diagnostic classification. 1.0 INTRODUCTION This work investigated the use of a learning classifier system to discover a type of knowledge particularly useful to epidemiologic research...
Why There Is No Statistical Test For Confounding, Why Many Think There Is, And Why They Are Almost Right
, 1998
"... this paper is to bring to the attention of investigators several basic limitations of the associational criterion. We will show that the associational criterion does not ensure unbiased e#ect estimates, nor does it follow from the requirement of unbiasedness. After demonstrating, by examples, the ab ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
this paper is to bring to the attention of investigators several basic limitations of the associational criterion. We will show that the associational criterion does not ensure unbiased e#ect estimates, nor does it follow from the requirement of unbiasedness. After demonstrating, by examples, the absence of logical connections between the statistical and the causal notions of confounding, we will de#ne a stronger notion of unbiasedness, called stable unbiasedness, relative to which a modi#ed statistical criterion will be shown necessary and su#cient. The necessary part will then yield a practical test for stable unbiasedness which, remarkably, does not require knowledge of all potential confounders in a problem. Finally,wewill argue that the prevailing practice of substituting statistical criteria for the e#ectbased de#nition of confounding is not entirely misguided, because stable unbiasedness is in fact what investigators have been and should be aiming to achieve, and stable unbiasedness is what statistical criteria can test.
The Cult of Statistical Significance
"... difficult friend, Ronald A. Fisher (18901962), though a genius, was wrong. Fit is not the same thing as importance. Statistical significance is not the same thing as scientific importance or economic sense. But the mistaken equation is made, we find, in 8 or 9 of every 10 articles appearing in the ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
difficult friend, Ronald A. Fisher (18901962), though a genius, was wrong. Fit is not the same thing as importance. Statistical significance is not the same thing as scientific importance or economic sense. But the mistaken equation is made, we find, in 8 or 9 of every 10 articles appearing in the leading journals of science, economics to medicine. The history of this "standard error " of science involves varied characters and plot twists, but especially R. A. Fisher's canonical translation of "Student's " t. William S. Gosset aka “Student, ” who was for most of his life Head Experimental Brewer at Guinness, took an economic approach to the logic of uncertainty. Against Gosset’s wishes his friend Fisher erased the consciously economic element, Gosset's "real error. ” We want to bring it back. For the past eightyfive years it appears that some of the sciences have made a mistake, by basing decisions on statistical “significance. ” Though it looks at first like a matter of minor statistical detail, it is not. Statistics, magnitudes, coefficients are essential scientific tools. No one can credibly doubt that. And mathematical statistics is a glorious social and practical and
Tenacious Tortoises: A formalism for argument over rules of inference
 Computational Dialectics (ECAI 2000 Workshop
, 2000
"... As multiagent systems proliferate and employ different and more sophisticated formal logics, it is increasingly likely that agents will be reasoning with different rules of inference. Hence, an agent seeking to convince another of some proposition may first have to convince the latter to use a rule ..."
Abstract

Cited by 10 (7 self)
 Add to MetaCart
As multiagent systems proliferate and employ different and more sophisticated formal logics, it is increasingly likely that agents will be reasoning with different rules of inference. Hence, an agent seeking to convince another of some proposition may first have to convince the latter to use a rule of inference which it has not thus far adopted. We define a formalism to represent degrees of acceptability or validity of rules of inference, to enable autonomous agents to undertake dialogue concerning inference rules. Even when they disagree over the acceptability of a rule, two agents may still use the proposed formalism to reason collaboratively. 1
For Objective Causal Inference, Design Trumps Analysis.” Annals of Applied Statistics 2(3):808–840
"... For obtaining causal inferences that are objective, and therefore have the best chance of revealing scientific truths, carefully designed and executed randomized experiments are generally considered to be the gold standard. Observational studies, in contrast, are generally fraught with problems that ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
For obtaining causal inferences that are objective, and therefore have the best chance of revealing scientific truths, carefully designed and executed randomized experiments are generally considered to be the gold standard. Observational studies, in contrast, are generally fraught with problems that compromise any claim for objectivity of the resulting causal inferences. The thesis here is that observational studies have to be carefully designed to approximate randomized experiments, in particular, without examining any final outcome data. Often a candidate data set will have to be rejected as inadequate because of lack of data on key covariates, or because of lack of overlap in the distributions of key covariates between treatment and control groups, often revealed by careful propensity score analyses. Sometimes the template for the approximating randomized experiment will have to be altered, and the use of principal stratification can be helpful in doing this. These issues are discussed and illustrated using the framework of potential outcomes to define causal effects, which greatly clarifies critical issues. 1. Randomized experiments versus observational studies. 1.1. Historical dichotomy between randomized and nonrandomized studies for causal effects. For may years, causal inference based on randomized experiments, as described, for example, in classic texts by Fisher (1935), Kempthorne (1952), Cochran and Cox (1950) and Cox (1958), was an entirely distinct endeavor than causal inference based on observational data sets, described, for example, in texts by Blalock (1964), Kenny (1979),
A Case Study on the Choice, Interpretation and Checking of Multilevel Models for Longitudinal Binary Outcomes
"... Recent advances in statistical software have led to the rapid diffusion of new methods for modeling longitudinal data. Multilevel (also known as hierarchical or random effects) models for binary outcomes have been generally based on a logisticnormal specification, by analogy with earlier work for n ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Recent advances in statistical software have led to the rapid diffusion of new methods for modeling longitudinal data. Multilevel (also known as hierarchical or random effects) models for binary outcomes have been generally based on a logisticnormal specification, by analogy with earlier work for normally distributed data. The appropriate application and interpretation of these models remains somewhat unclear, especially when compared with the computationally more straightforward marginal modeling (GEE) approaches. In this paper we pose two interrelated questions. First, what limits should be placed on the interpretation of the coefficients and inferences derived from random effect models involving binary outcomes? Second, what are the minimum diagnostic checks that are required to evaluate whether such random effect models provide appropriate fits to the data? We address these questions by means of an extended case study using data on adolescent smoking from a large cohort study. Bay...
The 28:1 Grant/Sackman legend is misleading, or: How large is interpersonal variation really?
, 1999
"... How long do different programmers take to solve the same task? In 1967, Grant and Sackman published their now famous number of 28:1 interpersonal performance differences, which is both incorrect and misleading. This report presents the analysis of a much larger dataset of software engineering work t ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
How long do different programmers take to solve the same task? In 1967, Grant and Sackman published their now famous number of 28:1 interpersonal performance differences, which is both incorrect and misleading. This report presents the analysis of a much larger dataset of software engineering work time data with respect to the same question. It corrects the false 28:1 value, proposes more appropriate metrics, presents the results for the larger dataset, and presents results of several further analyses: distribution shapes, effect sizes, and the performance of various significance tests. 2 CONTENTS Contents 1 The 1966 experiment of Grant and Sackman 3 1.1 28:1 just isn't true! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 How should we measure variability? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Only 12 programmers even after 30 years? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 Overview of this r...