Results 1 - 10
of
26
Statistical Inference and Data Mining
, 1996
"... es of probability distributions, estimation, hypothesis testing, model scoring, Gibb's sampling, rational decision making, causal inference, prediction, and model averaging. For a rigorous survey of statistics, the mathematically inclined reader should see [7]. Due to space limitations, we ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
es of probability distributions, estimation, hypothesis testing, model scoring, Gibb's sampling, rational decision making, causal inference, prediction, and model averaging. For a rigorous survey of statistics, the mathematically inclined reader should see [7]. Due to space limitations, we must also ignore a number of interesting topics, including time series analysis and meta-analysis. Probability Distributions The statistical literature contains mathematical characterizations of a wealth of probability distributions, as well as properties of random variables---functions defined on the "events" to which a probability measure assigns values. Important relations among probability distributions include marginalization (summing over a subset of values) and conditionalization (forming a conditional probability measure from a probability measure on a sample space and some event of positive probability. Essential relations among random variable
From association to causation: Some remarks on the history of statistics
- Statistical Science
, 1999
"... The “numerical method ” in medicine goes back to Pierre Louis ’ study of pneumonia (1835), and John Snow’s book on the epidemiology of cholera (1855). Snow took advantage of natural experiments and used convergent lines of evidence to demonstrate that cholera is a waterborne infectious disease. More ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
The “numerical method ” in medicine goes back to Pierre Louis ’ study of pneumonia (1835), and John Snow’s book on the epidemiology of cholera (1855). Snow took advantage of natural experiments and used convergent lines of evidence to demonstrate that cholera is a waterborne infectious disease. More recently, investigators in the social and life sciences have used statistical models and significance tests to deduce cause-and-effect relationships from patterns of association; an early example is Yule’s study on the causes of poverty (1899). In my view, this modeling enterprise has not been successful. Investigators tend to neglect the difficulties in establishing causal relations, and the mathematical complexities obscure rather than clarify the assumptions on which the analysis is based. Formal statistical inference is, by its nature, conditional. If maintained hypotheses A, B, C,... hold, then H can be tested against the data. However, if A, B, C,... remain in doubt, so must inferences about H. Careful scrutiny of maintained hypotheses should therefore be a critical part of empirical work—a principle honored more often in the breach than the observance. Snow’s work on cholera will be contrasted with modern studies that depend on statistical models and tests of significance. The examples may help to clarify the limits of current statistical techniques for making causal inferences from patterns of association. 1.
From association to causation via regression
- Indiana: University of Notre Dame
, 1997
"... For nearly a century, investigators in the social sciences have used regression models to deduce cause-and-effect relationships from patterns of association. Path models and automated search procedures are more recent developments. In my view, this enterprise has not been successful. The models tend ..."
Abstract
-
Cited by 15 (6 self)
- Add to MetaCart
For nearly a century, investigators in the social sciences have used regression models to deduce cause-and-effect relationships from patterns of association. Path models and automated search procedures are more recent developments. In my view, this enterprise has not been successful. The models tend to neglect the difficulties in establishing causal relations, and the mathematical complexities tend to obscure rather than clarify the assumptions on which the analysis is based. Formal statistical inference is, by its nature, conditional. If maintained hypotheses A, B, C,... hold, then H can be tested against the data. However, if A, B, C,... remain in doubt, so must inferences about H. Careful scrutiny of maintained hypotheses should therefore be a critical part of empirical work-- a principle honored more often in the breach than the observance.
Efficient estimation in the bivariate normal copula model: normal margins are least favorable
- BERNOULLI
, 1997
"... Consider semiparametric bivariate copula models in which the family of copula functions is parametrized by a Euclidean parameter of interest and in which the two unknown marginal distributions are the (infinite dimensional) nuisance parameters. The efficient score for can be characterized in terms ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
Consider semiparametric bivariate copula models in which the family of copula functions is parametrized by a Euclidean parameter of interest and in which the two unknown marginal distributions are the (infinite dimensional) nuisance parameters. The efficient score for can be characterized in terms of the solutions of two coupled Sturm-Liouville equations. In case the family of copula functions corresponds to the normal distributions with mean 0, variance 1, and correlation, the solution of these equations is given, and we thereby show that the Van der Waerden normal scores rank correlation coe cient is asymptotically efficient. We also show that the bivariate normal model with equal variances constitutes the least favorable parametric submodel. Finally, we discuss the interpretation of j j in the normal copula model as the maximum (monotone) correlation coefficient.
The Design Argument
, 2004
"... The design argument is one of three main arguments for the existence of God; the others are the ontological argument and the cosmological argument. Unlike the ontological argument, the design argument and the cosmological argument are a posteriori. And whereas the cosmological argument could focus o ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
The design argument is one of three main arguments for the existence of God; the others are the ontological argument and the cosmological argument. Unlike the ontological argument, the design argument and the cosmological argument are a posteriori. And whereas the cosmological argument could focus on any present event to get the ball rolling (arguing that it must trace back to a first cause, namely God), design theorists are usually more selective. Design arguments have typically been of two types – organismic and cosmic. Organismic design arguments start with the observation that organisms have features that adapt them to the environments in which they live and that exhibit a kind of delicacy. Consider, for example, the vertebrate eye. This organ helps organisms survive by permitting them to perceive objects in their environment. And were the parts of the eye even slightly different in their shape and assembly, the resulting organ would not allow us to see. Cosmic design arguments begin with an observation concerning features of the entire cosmos – the universe obeys simple laws, it has a kind of stability, its physical features permit life and intelligent life to exist. However, not all design arguments fit into these two neat compartments. Kepler, for example, thought that the face we see when we look at the moon requires explanation in terms of intelligent design. Still, the common thread is that design theorists
Logicist Statistics I. Models and Modeling
- Statistical Science
, 1998
"... Abstract. Arguments are presented to support increased emphasis on logical aspects of formal methods of analysis, depending on probability in the sense of R. A. Fisher. Formulating probabilistic models that convey uncertain knowledge of objective phenomena and using such models for inductive reasoni ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract. Arguments are presented to support increased emphasis on logical aspects of formal methods of analysis, depending on probability in the sense of R. A. Fisher. Formulating probabilistic models that convey uncertain knowledge of objective phenomena and using such models for inductive reasoning are central activities of individuals that introduce limited but necessary subjectivity into science. Statistical models are classified into overlapping types called here empirical, stochastic and predictive, all drawing on a common mathematical theory of probability, and all facilitating statements with logical and epistemic content. Contexts in which these ideas are intended to apply are discussed via three major examples. Key words and phrases: Logicism and proceduralism; specificity of analysis; formal subjective probability; complementarity; subjective and objective; formal and informal; empirical, stochastic and predictive models; U.S. national census; screening for chronic disease; global climate change.
Exploiting Hidden Meanings Using Bilingual Text
- In A. Gelbukh (Ed.), Lecture Notes in Computer Science 2945: Computational Linguistics and Intelligent Text Processing: Fifth International Conference, CICLing 2004 Proceedings (pp. 283–299
, 2004
"... The last decade has taught computational linguists that high performance on broad-coverage natural language processing tasks is best obtained using supervised learning techniques, which require annotation of large quantities of training data. But annotated text is hard to obtain. ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
The last decade has taught computational linguists that high performance on broad-coverage natural language processing tasks is best obtained using supervised learning techniques, which require annotation of large quantities of training data. But annotated text is hard to obtain.
Adapting measures of clumping strength to assess term-term similarity
- Journal of the American Society for Information Science and Technology
, 2003
"... Automated information retrieval relies heavily on statistical regularities that emerge as terms are deposited to produce text. This paper examines statistical patterns expected of a pair of terms that are semantically related to each other. Guided by a conceptualization of the text generation proces ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Automated information retrieval relies heavily on statistical regularities that emerge as terms are deposited to produce text. This paper examines statistical patterns expected of a pair of terms that are semantically related to each other. Guided by a conceptualization of the text generation process, we derive measures of how tightly two terms are semantically associated. Our main objective is to probe whether such measures yield reasonable results. Specifically, we examine how the tendency of a content bearing term to clump, as quantified by previously developed measures of term clumping, is influenced by the presence of other terms. This approach allows us to present a toolkit from which a range of measures can be constructed. As an illustration, one of several suggested measures is evaluated on a large text corpus built from an on-line encyclopedia. 1.
Applications of Generalized Method of Moments Estimation
- JOURNAL OF ECONOMIC PERSPECTIVES—VOLUME 15, NUMBER 4—FALL 2001—PAGES 87–100
, 2001
"... The method of moments approach to parameter estimation dates back more than 100 years (Stigler, 1986). The notion of a moment is fundamental for describing features of a population. For example, the population mean (or population average), usually denoted �, is the moment that measures central tende ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The method of moments approach to parameter estimation dates back more than 100 years (Stigler, 1986). The notion of a moment is fundamental for describing features of a population. For example, the population mean (or population average), usually denoted �, is the moment that measures central tendency. If y is a random variable describing the population of interest, we also write the population mean as E ( y), the expected value or mean of y. (The mean of y is also called the first moment of y.) The population variance, usually denoted � 2 or Var ( y), is defined as the second moment of y centered about its mean: Var ( y) � E[(y � �) 2]. The variance, also called the second central moment, is widely used as a measure of spread in a distribution. Since we can rarely obtain information on an entire population, we use a sample from the population to estimate population moments. If { y i: i � 1,...,n} is a sample from a population with mean �, the method of moments estimator of � is just the sample average: y � � ( y 1 � y 2 �... � y n)/n. Under random sampling, y � is unbiased and consistent for � regardless of other features of the underlying population. Further, as long as the population variance is finite, y � is the best linear unbiased estimator of �. An unbiased and consistent estimator of � 2 also exists and is called the sample variance, usually denoted s 2. 1 Method of moments estimation applies in more complicated situations. For example, suppose that in a population with � � 0, we know that the variance is three times the mean: � 2 � 3�. The sample average, y�, is unbiased and consistent 1 See Wooldridge (2000, appendix C) for more discussion of the sample mean and sample variance as method of moments estimators.
Risk Identification and Analysis using a Group Support
- System (GSS)”, HICSS 35, IEEE
, 2002
"... This paper describes the use of a specific Group ..."

