Results 1  10
of
48
The interplay of bayesian and frequentist analysis
 Statist. Sci
, 2004
"... Statistics has struggled for nearly a century over the issue of whether the Bayesian or frequentist paradigm is superior. This debate is far from over and, indeed, should continue, since there are fundamental philosophical and pedagogical issues at stake. At the methodological level, however, the fi ..."
Abstract

Cited by 48 (0 self)
 Add to MetaCart
Statistics has struggled for nearly a century over the issue of whether the Bayesian or frequentist paradigm is superior. This debate is far from over and, indeed, should continue, since there are fundamental philosophical and pedagogical issues at stake. At the methodological level, however, the fight has become considerably muted, with the recognition that each approach has a great deal to contribute to statistical practice and each is actually essential for full development of the other approach. In this article, we embark upon a rather idiosyncratic walk through some of these issues. Key words and phrases: Admissibility; Bayesian model checking; conditional frequentist; confidence intervals; consistency; coverage; design; hierarchical models; nonparametric
Information, Divergence and Risk for Binary Experiments
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2009
"... We unify fdivergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROCcurves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all ..."
Abstract

Cited by 37 (8 self)
 Add to MetaCart
We unify fdivergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROCcurves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all are related to costsensitive binary classification. As well as developing relationships between generative and discriminative views of learning, the new machinery leads to tight and more general surrogate regret bounds and generalised Pinsker inequalities relating fdivergences to variational divergence. The new viewpoint also illuminates existing algorithms: it provides a new derivation of Support Vector Machines in terms of divergences and relates Maximum Mean Discrepancy to Fisher Linear Discriminants.
Testing the Significance of Attribute Interactions
 In Proc. of 21st International Conference on Machine Learning (ICML
, 2004
"... Attribute interactions are the irreducible dependencies between attributes. Interactions underlie feature relevance and selection, the structure of joint probability and classification models: if and only if the attributes interact, they should be connected. While the issue of 2way interactions, es ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
(Show Context)
Attribute interactions are the irreducible dependencies between attributes. Interactions underlie feature relevance and selection, the structure of joint probability and classification models: if and only if the attributes interact, they should be connected. While the issue of 2way interactions, especially of those between an attribute and the label, has already been addressed, we introduce an operational definition of a generalized nway interaction by highlighting two models: the reductionistic parttowhole approximation, where the model of the whole is reconstructed from models of the parts, and the holistic reference model, where the whole is modelled directly.
The epistemology of mathematical and statistical modeling: A quiet methodological revolution
 American Psychologist
, 2010
"... A quiet methodological revolution, a modeling revolution, has occurred over the past several decades, almost without discussion. In contrast, the 20th century ended with contentious argument over the utility of null hypothesis significance testing (NHST). The NHST controversy may have been at leas ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
(Show Context)
A quiet methodological revolution, a modeling revolution, has occurred over the past several decades, almost without discussion. In contrast, the 20th century ended with contentious argument over the utility of null hypothesis significance testing (NHST). The NHST controversy may have been at least partially irrelevant, because in certain ways the modeling revolution obviated the NHST argument. I begin with a history of NHST and modeling and their relation to one another. Next, I define and illustrate principles involved in developing and evaluating mathematical models. Following, I discuss the difference between using statistical procedures within a rulebased framework and building mathematical models from a scientific epistemology. Only the former is treated carefully in most psychology graduate training. The pedagogical implications of this imbalance and the revised pedagogy required to account for the modeling revolution are described. To conclude, I discuss how attention to modeling implies shifting statistical practice in certain progressive ways. The epistemological basis of statistics has moved away from being a set of procedures, applied mechanistically, and moved toward building and evaluating statistical and scientific models.
DEMPSTERSHAFER INFERENCE WITH WEAK BELIEFS
"... Beliefs specified for predicting an unobserved realization of pivotal variables in the context of the fiducial and DempsterShafer (DS) inference can be weakened for credible inference. We consider predictive random sets for predicting an unobserved random sample from a known distribution, e.g., t ..."
Abstract

Cited by 15 (11 self)
 Add to MetaCart
Beliefs specified for predicting an unobserved realization of pivotal variables in the context of the fiducial and DempsterShafer (DS) inference can be weakened for credible inference. We consider predictive random sets for predicting an unobserved random sample from a known distribution, e.g., the uniform distribution U(0, 1). More specifically, we choose our beliefs for inference in two steps: (i) define a class of weak beliefs in terms of DS models for predicting an unobserved sample, and (ii) seek a belief within that class to balance the tradeoff between credibility and efficiency of the resulting DS inference. We call this approach the Maximal Belief (MB) method. The MB method is illustrated with two examples: (1) inference about µ based on a sample n from the Gaussian model N(µ,1), and (2) inference about the number of outliers (µi ̸ = 0) based on the observed data ind X1,..., Xn with the model Xi ∼ N(µi,1). The first example shows that MBDS analysis does a type of conditional inference. The second example demonstrates that MB posterior probabilities are easy to interpret for hypothesis testing.
Multiple testing in statistical analysis of systemsbased information retrieval experiments
 ACM TOIS
"... Highquality reusable test collections and formal statistical hypothesis testing together support a rigorous experimental environment for information retrieval research. But as Armstrong et al. [2009b] recently argued, global analysis of experiments suggests that there has actually been little real ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
Highquality reusable test collections and formal statistical hypothesis testing together support a rigorous experimental environment for information retrieval research. But as Armstrong et al. [2009b] recently argued, global analysis of experiments suggests that there has actually been little real improvement in ad hoc retrieval effectiveness over time. We investigate this phenomenon in the context of simultaneous testing of many hypotheses using a fixed set of data. We argue that the most common approaches to significance testing ignore a great deal of information about the world. Taking into account even a fairly small amount of this information can lead to very different conclusions about systems than those that have appeared in published literature. We demonstrate how to model a set of IR experiments for analysis both mathematically and practically, and show that doing so can cause pvalues from statistical hypothesis tests to increase by orders of magnitude. This has major consequences on the interpretation of experimental results using reusable test collections: it is very difficult to conclude that anything is significant once we have modeled many of the sources of randomness in experimental design and analysis.
Rooij. Switching investments
 Proceedings of the 21st International Conference on Algorithmic Learning Theory (ALT 2010), LNAI 6331
, 2010
"... Abstract. We present a simple online twoway trading algorithm that exploits fluctuations in the unit price of an asset. Rather than analysing worstcase performance under some assumptions, we prove a novel, unconditional performance bound that is parameterised either by the actual dynamics of the ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
Abstract. We present a simple online twoway trading algorithm that exploits fluctuations in the unit price of an asset. Rather than analysing worstcase performance under some assumptions, we prove a novel, unconditional performance bound that is parameterised either by the actual dynamics of the price of the asset, or by a simplifying model thereof. The algorithm processes T prices in O(T 2) time and O(T) space, but if the employed prior density is exponential, the time requirement reduces to O(T). The result translates to the prediction with expert advice framework, and has applications in data compression and hypothesis testing. 1
Misprescription and misuse of onetailed tests
"... Abstract Onetailed statistical tests are often used in ecology, animal behaviour and in most other fields in the biological and social sciences. Here we review the frequency of their use in the 1989 and 2005 volumes of two journals (Animal Behaviour and Oecologia), their advantages and disadvantage ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract Onetailed statistical tests are often used in ecology, animal behaviour and in most other fields in the biological and social sciences. Here we review the frequency of their use in the 1989 and 2005 volumes of two journals (Animal Behaviour and Oecologia), their advantages and disadvantages, the extensive erroneous advice on them in both older and modern statistics texts and their utility in certain narrow areas of applied research. Of those articles with data sets susceptible to onetailed tests, at least 24 % in Animal Behaviour and at least 13 % in Oecologia used onetailed tests at least once.They were used 35 % more frequently with nonparametric methods than with parametric ones and about twice as often in 1989 as in 2005. Debate in the psychological literature of the 1950s established the logical criterion that onetailed tests should be restricted to situations where there is interest only in results in one direction. ‘Interest ’ should be defined; however, in terms of collective or societal interest and not by the individual investigator. By this ‘collective interest ’ criterion, all uses of onetailed tests in the journals surveyed seem invalid. In his book Nonparametric Statistics, S. Siegel unrelentingly suggested the use of onetailed tests whenever the investigator predicts the direction of a result.That work has been a major proximate source of confusion on this issue, but so are most recent statistics textbooks. The utility of onetailed tests in research aimed at obtaining regulatory approval of new drugs and new pesticides is briefly described, to exemplify the narrow range of research situations where such tests can be appropriate.These situations are characterized by null hypotheses stating that the difference