Results 1 - 10
of
154
Empirical Bayes Screening for Multi-Item Associations
, 2001
"... This paper considers the framework of the so-called "market basket problem", in which a database of transactions is mined for the occurrence of unusually frequent item sets. In our case, "unusually frequent" involves estimates of the frequency of each item set divided by a baseline frequency compute ..."
Abstract
-
Cited by 46 (0 self)
- Add to MetaCart
This paper considers the framework of the so-called "market basket problem", in which a database of transactions is mined for the occurrence of unusually frequent item sets. In our case, "unusually frequent" involves estimates of the frequency of each item set divided by a baseline frequency computed as if items occurred independently. The focus is on obtaining reliable estimates of this measure of interestingness for all item sets, even item sets with relatively low frequencies. For example, in a medical database of patient histories, unusual item sets including the item "patient death" (or other serious adverse event) might hopefully be flagged with as few as 5 or 10 occurrences of the item set, it being unacceptable to require that item sets occur in as many as 0.1% of millions of patient reports before the data mining algorithm detects a signal. Similar considerations apply in fraud detection applications. Thus we abandon the requirement that interesting item sets must contain a re...
Some practical guidance for the implementation of propensity score matching
- IZA DISCUSSION PAPER
, 2005
"... ..."
Statistical issues in the design, analysis and interpretation of animal carcinogenicity studies. Environ Health Persp 58: 385−392
, 1984
"... Statistical issues in the design, analysis and interpretation of animal carcinogenicity studies are discussed. In the area of experimental design, issues that must be considered include randomization of animals, sample size considerations, dose selection and allocation of animals to experimental gro ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Statistical issues in the design, analysis and interpretation of animal carcinogenicity studies are discussed. In the area of experimental design, issues that must be considered include randomization of animals, sample size considerations, dose selection and allocation of animals to experimental groups, and control of potentially confounding factors. In the analysis of tumor incidence data, survival differences among groups should be taken into account. It is important to try to distinguish between tumors that contribute to the death of the animal and "incidental " tumors discovered at autopsy in an animal dying of an unrelated cause. Life table analyses (appropriate for lethal tumors) and incidental tumor tests (appropriate for nonfatal tumors) are described, and the utilization of these procedures by the National Tbxicology Program is discussed. Despite the fact that past interpretations of carcinogenicity data have tended to focus on pairwise comparisons in general and high-dose effects in particular, the importance of trend tests should not be overlooked, since these procedures are more sensitive than pairwise comparisons to the detection of carcinogenic effects. No rigid statistical "decision rule " should be employed in the interpretation of carcinogenicity data. Although the statistical significance of an observed tumor increase is perhaps the single most important piece of evidence used in the evaluation process, a number of biological factors must also be taken into account. The use of historical control data, the false-positive issue and the interpretation of negative trends are also discussed.
A General Model for the Hazard Rate with Covariables and Methods tor Sample Size Determination for Cohort Studies
, 1977
"... This research is concerned with developing improved methods for analyzing survival data and determining appropriate sample sizes for cohort studies. The model proposed for the hazard function incorporating covariables is a polynomial with different functions of the covariables as coefficients of the ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
This research is concerned with developing improved methods for analyzing survival data and determining appropriate sample sizes for cohort studies. The model proposed for the hazard function incorporating covariables is a polynomial with different functions of the covariables as coefficients of the various powers of time. This model does not require the assumption that the hazards for different individuals be in constant ratio over time, and it allows for testing whether this assumption is reasonable. The model is parametric, which allows for easy specification of the survival curve and interpretation of results. At the same time, it is general enough so that the form of the hazard is not unduly restricted. Methods for fitting the model to data, testing hypotheses about
Replication and meta–analysis in parapsychology (with discussion
- Statistical Science
, 1991
"... Abstract. Parapsychology, the laboratory study of psychic phenomena, has had its history interwoven with that of statistics. Many of the controversies in parapsychology have focused on statistical issues, and statistical models have played an integral role in the experimental work. Recently, parapsy ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
Abstract. Parapsychology, the laboratory study of psychic phenomena, has had its history interwoven with that of statistics. Many of the controversies in parapsychology have focused on statistical issues, and statistical models have played an integral role in the experimental work. Recently, parapsychologists have been using meta-analysis as a tool for synthesizing large bodies of work. This paper presents an overview of the use of statistics in parapsychology and offers a summary of the meta-analyses that have been conducted. It begins with some anecdotal information about the involvement of statistics and statisticians with the early history of parapsychology. Next, it is argued that most nonstatisticians do not appreciate the connection between power and "successful " replication of experimental effects. Returning to parapsychology, a particular experimental regime is examined by summarizing an extended debate over the interpretation of the results. A new set of experiments designed to resolve the debate is then reviewed. Finally,
A framework for regional association rule mining in spatial datasets
- In The 6th IEEE International Conference on Data Mining (ICDM
, 2006
"... The immense explosion of geographically referenced data calls for efficient discovery of spatial knowledge. One critical requirement for spatial data mining is the capability to analyze datasets at different levels of granularity. One of the special challenges for spatial data mining is that informa ..."
Abstract
-
Cited by 9 (7 self)
- Add to MetaCart
The immense explosion of geographically referenced data calls for efficient discovery of spatial knowledge. One critical requirement for spatial data mining is the capability to analyze datasets at different levels of granularity. One of the special challenges for spatial data mining is that information is usually not uniformly distributed in spatial datasets. Consequently, the discovery of regional knowledge is of fundamental importance for spatial data mining. Unfortunately, most of the current data mining techniques are ill-prepared for discovering regional knowledge. For example, when using traditional association rule mining, regional patterns frequently fail to be discovered due to insufficient global confidence and/or support. This raises the questions on how to measure the interestingness of a set of regions and how to search effectively and efficiently for interesting regions. This paper centers on discovering regional association rules in spatial datasets. In particular, we introduce a novel framework to mine regional association rules relying on a given class structure. A rewardbased regional discovery methodology is introduced, and a divisive, grid-based supervised clustering algorithm is presented that identifies interesting subregions in spatial datasets. Then, an integrated approach is discussed to systematically mine regional rules. The proposed framework is evaluated in a real-world case study that identifies spatial risk patterns from arsenic in Texas water supply. 1.
Age at first birth, parity and risk of breast cancer in a Swedish population
- Br. J. Cancer
, 1980
"... Summary.-A case-control study was conducted over a period of 11 months in an area containing one-third of the Swedish population. One thousand and one patients participated, constituting 94°, of all women newly diagnosed as having breast cancer within the area. They were compared with 1,001 age-matc ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Summary.-A case-control study was conducted over a period of 11 months in an area containing one-third of the Swedish population. One thousand and one patients participated, constituting 94°, of all women newly diagnosed as having breast cancer within the area. They were compared with 1,001 age-matched, non-hospitalized controls without breast cancer, selected by paired sampling from a population register. The risk of breast cancer was slightly, but significantly, related to parity, the standardized relative risk (SRR) being 1P35 for nulliparous women as compared to ever parous. In the different parity groups a risk significantly lower than that for nulliparous women was found only for women with more than 2 children (SRR = 0-59) but the trend with parity was highly significant (P <0-001). Age at first birth was not found to be an important risk factor for breast cancer. SRR was lower than for nulliparous women in all groups of women with their first birth before the age of 35 years, but the difference was significant (P <0 05) only for those with the first birth between 20 and 24 (SRR=0-69) and 25 and 29 (SRR=0-69) years of age. The trend with age at first birth (P < 0 05) disappeared after stratification for parity, suggesting that
Three Centuries of Categorical Data Analysis: Log-linear Models and Maximum Likelihood Estimation
"... The common view of the history of contingency tables is that it begins in 1900 with the work of Pearson and Yule, but it extends back at least into the 19th century. Moreover it remains an active area of research today. In this paper we give an overview of this history focussing on the development o ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
The common view of the history of contingency tables is that it begins in 1900 with the work of Pearson and Yule, but it extends back at least into the 19th century. Moreover it remains an active area of research today. In this paper we give an overview of this history focussing on the development of log-linear models and their estimation via the method of maximum likelihood. S. N. Roy played a crucial role in this development with two papers co-authored with his students S. K. Mitra and Marvin Kastenbaum, at roughly the mid-point temporally in this development. Then we describe a problem that eluded Roy and his students, that of the implications of sampling zeros for the existence of maximum likelihood estimates for loglinear models. Understanding the problem of non-existence is crucial to the analysis of large sparse contingency tables. We introduce some relevant results from the application of algebraic geometry to the study of this statistical problem. 1
PAIRED COMPARISONS FOR MULTIPLE CHARACTERISTICS: AN ANOCOVA APPROACH
"... Abstract. An analysis of covariance model is developed for paired comparisons to situations in which responses (on a preference order) to paired comparisons are obtained on some primary as well as concomitant traits. Along with the general rationality of the proposed test, its asymptotic properties ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Abstract. An analysis of covariance model is developed for paired comparisons to situations in which responses (on a preference order) to paired comparisons are obtained on some primary as well as concomitant traits. Along with the general rationality of the proposed test, its asymptotic properties are studied. 1.
Exact inference for categorical data
- In Encyclopedia of Biostatistics (P. Armitage and T. Colton
, 1998
"... Modern statistical methods rely heavily on nonparametric techniques for comparing two or more populations. These techniques generate p-values without making any distributional assumptions about the populations being compared. They rely, however, on asymptotic theory that is valid only if the sample ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Modern statistical methods rely heavily on nonparametric techniques for comparing two or more populations. These techniques generate p-values without making any distributional assumptions about the populations being compared. They rely, however, on asymptotic theory that is valid only if the sample sizes are reasonably large and well balanced across the populations. For small, sparse, skewed, or heavily tied data, the asymptotic theory may not be valid. See Agresti and Yang [5] for some empirical results, and Read and Cressie [31] for a more theoretical discussion. One way to make valid statistical inferences in the presence of small, sparse or unbalanced data is to compute exact p-values and confidence intervals, based on the permutational distribution of the test statistic. This approach was first proposed by R. A. Fisher [11] and has been used extensively for the single 2 × 2 contingency table. Previously exact tests were rarely attempted for tables of higher dimension than 2 × 2, primarily because of the formidable computing problems involved in their execution. In recent years, however, the easy availability of immense quantities of computing power combined with many new, fast and efficient algorithms for exact permutational inference have revolutionized our thinking about what is computationally feasible. Problems that would previously have taken several hours or even days to solve now take only a few minutes. Exact inference is 1 now a practical proposition and has been incorporated into standard statistical software packages. In the present paper we present a unified framework for exact inference, anchored in the permutation principle. We demonstrate that, for a very broad class of nonparametric problems, such inference can be accomplished by permuting the entries in a contingency table subject to fixed margins. Exact and Monte Carlo algorithms for solving these permutation problems are referenced. We then apply these algorithms to several data sets. Both exact and asymptotic p- values are computed for these data so that one may assess the accuracy of the asymptotic methods. Finally we discuss the availability of software and cite an internet resource for performing exact permutational inference.

