Results 1 - 10
of
153
A Theory Of Inferred Causation
, 1991
"... This paper concerns the empirical basis of causation, and addresses the following issues: 1. the clues that might prompt people to perceive causal relationships in uncontrolled observations. 2. the task of inferring causal models from these clues, and 3. whether the models inferred tell us anything ..."
Abstract
-
Cited by 175 (31 self)
- Add to MetaCart
This paper concerns the empirical basis of causation, and addresses the following issues: 1. the clues that might prompt people to perceive causal relationships in uncontrolled observations. 2. the task of inferring causal models from these clues, and 3. whether the models inferred tell us anything useful about the causal mechanisms that underly the observations. We propose a minimal-model semantics of causation, and show that, contrary to common folklore, genuine causal influences can be distinguished from spurious covariations following standard norms of inductive reasoning. We also establish a sound characterization of the conditions under which such a distinction is possible. We provide an effective algorithm for inferred causation and show that, for a large class of data the algorithm can uncover the direction of causal influences as defined above. Finally, we address the issue of non-temporal causation. 1 Introduction The study of causation is central to the understanding of hum...
Analysis of Variance for Gene Expression Microarray Data
- Journal of Computational Biology
, 2000
"... Spotted cDNA microarrays are emerging as a powerful and cost-effective tool for largescale analysis of gene expression. Microarrays can be used to measure the relative quantities of speci � c mRNAs in two or more tissue samples for thousands of genes simultaneously. While the power of this technolog ..."
Abstract
-
Cited by 140 (4 self)
- Add to MetaCart
Spotted cDNA microarrays are emerging as a powerful and cost-effective tool for largescale analysis of gene expression. Microarrays can be used to measure the relative quantities of speci � c mRNAs in two or more tissue samples for thousands of genes simultaneously. While the power of this technology has been recognized, many open questions remain about appropriate analysis of microarray data. One question is how to make valid estimates of the relative expression for genes that are not biased by ancillary sources of variation. Recognizing that there is inherent “noise ” in microarray data, how does one estimate the error variation associated with an estimated change in expression, i.e., how does one construct the error bars? We demonstrate that ANOVA methods can be used to normalize microarray data and provide estimates of changes in gene expression that are corrected for potential confounding effects. This approach establishes a framework for the general analysis and interpretation of microarray data. Key words: Gene expression microarray, differential expression, analysis of variance, bootstrap.
Using confidence intervals in within-subject designs
- Psychonomic Bulletin & Review
, 1994
"... Wolford, and two anonymous reviewers for very useful comments on earlier drafts of the manuscript. Correspondence may be addressed to ..."
Abstract
-
Cited by 102 (18 self)
- Add to MetaCart
Wolford, and two anonymous reviewers for very useful comments on earlier drafts of the manuscript. Correspondence may be addressed to
Clustering Association Rules
, 1997
"... We consider the problem of clustering two-dimensional association rules in large databases. We present a geometric-based algorithm, BitOp, for performing the clustering, embedded within an association rule clustering system, ARCS. Association rule clustering is useful when the user desires to segmen ..."
Abstract
-
Cited by 99 (0 self)
- Add to MetaCart
We consider the problem of clustering two-dimensional association rules in large databases. We present a geometric-based algorithm, BitOp, for performing the clustering, embedded within an association rule clustering system, ARCS. Association rule clustering is useful when the user desires to segment the data. We measure the quality of the segmentation generated by ARCS using the Minimum Description Length (MDL) principle of encoding the clusters on several databases including noise and errors. Scale-up experiments show that ARCS, using the BitOp algorithm, scales linearly with the amount of data. 1 Introduction Data mining, or the efficient discovery of interesting patterns from large collections of data, has been recognized as an important area of database research. The most commonly sought patterns are association rules as introduced in [AIS93b]. Intuitively, an association rule identifies a frequently occuring pattern of information in a database. Consider a supermarket database w...
Nonparametric estimation of average treatment effects under exogeneity: a review
- Review of Economics and Statistics
, 2004
"... Abstract—Recently there has been a surge in econometric work focusing on estimating average treatment effects under various sets of assumptions. One strand of this literature has developed methods for estimating average treatment effects for a binary treatment under assumptions variously described a ..."
Abstract
-
Cited by 97 (6 self)
- Add to MetaCart
Abstract—Recently there has been a surge in econometric work focusing on estimating average treatment effects under various sets of assumptions. One strand of this literature has developed methods for estimating average treatment effects for a binary treatment under assumptions variously described as exogeneity, unconfoundedness, or selection on observables. The implication of these assumptions is that systematic (for example, average or distributional) differences in outcomes between treated and control units with the same values for the covariates are attributable to the treatment. Recent analysis has considered estimation and inference for average treatment effects under weaker assumptions than typical of the earlier literature by avoiding distributional and functional-form assumptions. Various methods of semiparametric estimation have been proposed, including estimating the unknown regression functions, matching, methods using the propensity score such as weighting and blocking, and combinations of these approaches. In this paper I review the state of this
Assessment and Propagation of Model Uncertainty
, 1995
"... this paper I discuss a Bayesian approach to solving this problem that has long been available in principle but is only now becoming routinely feasible, by virtue of recent computational advances, and examine its implementation in examples that involve forecasting the price of oil and estimating the ..."
Abstract
-
Cited by 79 (0 self)
- Add to MetaCart
this paper I discuss a Bayesian approach to solving this problem that has long been available in principle but is only now becoming routinely feasible, by virtue of recent computational advances, and examine its implementation in examples that involve forecasting the price of oil and estimating the chance of catastrophic failure of the U.S. Space Shuttle.
Nonparametric Permutation Tests for Functional Neuroimaging: A Primer with Examples. Human Brain Mapping
, 2001
"... The statistical analyses of functional mapping experiments usually proceeds at the voxel level, involving the formation and assessment of a statistic image: at each voxel a statistic indicating evidence of the experimental effect of interest, at that voxel, is computed, giving an image of statistics ..."
Abstract
-
Cited by 73 (6 self)
- Add to MetaCart
The statistical analyses of functional mapping experiments usually proceeds at the voxel level, involving the formation and assessment of a statistic image: at each voxel a statistic indicating evidence of the experimental effect of interest, at that voxel, is computed, giving an image of statistics, a statistic
Bootstrapping Cluster Analysis: Assessing the Reliability of Conclusions from Microarray Experiments
- PNAS
, 2000
"... We introduce a general technique for making statistical inference from gene expression microarray data. The approach utilizes an analysis of variance model to achieve normalization and estimate di#erential expression of genes across multiple conditions. Statistical inference is based on two applicat ..."
Abstract
-
Cited by 68 (2 self)
- Add to MetaCart
We introduce a general technique for making statistical inference from gene expression microarray data. The approach utilizes an analysis of variance model to achieve normalization and estimate di#erential expression of genes across multiple conditions. Statistical inference is based on two applications of a randomization technique, bootstrapping. Bootstrapping is used to obtain confidence intervals for di#erential expression estimates from individual genes, and then to assess the stability of results from a cluster analysis. We illustrate the technique with a publicly available data set and draw conclusions about reliability of clustering results in light of variation in the data. The bootstrapping procedure relies on experimental replication. We discuss the implications of replication and good design in microarray experiments. 1 Corresponding author Microarray technology [1] is a revolutionary high-throughput tool for the study of gene expression. The ability to simultaneously stu...
Multiresolution image classification by hierarchical modeling with two dimensional hidden Markov models
- IEEE TRANS. INFORMATION THEORY
, 2000
"... This paper treats a multiresolution hidden Markov model for classifying images. Each image is represented by feature vectors at several resolutions, which are statistically dependent as modeled by the underlying state process, a multiscale Markov mesh. Unknowns in the model are estimated by maximum ..."
Abstract
-
Cited by 39 (8 self)
- Add to MetaCart
This paper treats a multiresolution hidden Markov model for classifying images. Each image is represented by feature vectors at several resolutions, which are statistically dependent as modeled by the underlying state process, a multiscale Markov mesh. Unknowns in the model are estimated by maximum likelihood, in particular by employing the expectation-maximization algorithm. An image is classified by finding the optimal set of states with maximum a posteriori probability. States are then mapped into classes. The multiresolution model enables multiscale information about context to be incorporated into classification. Suboptimal algorithms based on the model provide progressive classification that is much faster than the algorithm based on single-resolution hidden Markov models.
Constrained-Realization Monte-Carlo method for Hypothesis Testing
- Physica D
"... : We compare two theoretically distinct approaches to generating artificial (or "surrogate") data for testing hypotheses about a given data set. The first and more straightforward approach is to fit a single "best" model to the original data, and then to generate surrogate data sets that are "typica ..."
Abstract
-
Cited by 38 (1 self)
- Add to MetaCart
: We compare two theoretically distinct approaches to generating artificial (or "surrogate") data for testing hypotheses about a given data set. The first and more straightforward approach is to fit a single "best" model to the original data, and then to generate surrogate data sets that are "typical realizations" of that model. The second approach concentrates not on the model but directly on the original data; it attempts to constrain the surrogate data sets so that they exactly agree with the original data for a specified set of sample statistics. Examples of these two approaches are provided for two simple cases: a test for deviations from a gaussian distribution, and a test for serial dependence in a time series. Additionally, we consider tests for nonlinearity in time series based on a Fourier transform (FT) method and on more conventional autoregressive moving-average (ARMA) fits to the data. The comparative performance of hypothesis testing schemes based on these two approaches...

