Results 1  10
of
14
Philosophy and the practice of Bayesian statistics
, 2010
"... A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually ..."
Abstract

Cited by 17 (6 self)
 Add to MetaCart
A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypotheticodeductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science. Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework.
When Efficient Model Averaging OutPerforms Boosting and Bagging , To Appear ECML/PKDD
 Bagging, 17th European Conference on Machine Learning and 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD06
, 2006
"... Abstract. Bayesian model averaging also known as the Bayes optimal classifier (BOC) is an ensemble technique used extensively in the statistics literature. However, compared to other ensemble techniques such as bagging and boosting, BOC is less known and rarely used in data mining. This is partly du ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Abstract. Bayesian model averaging also known as the Bayes optimal classifier (BOC) is an ensemble technique used extensively in the statistics literature. However, compared to other ensemble techniques such as bagging and boosting, BOC is less known and rarely used in data mining. This is partly due to model averaging being perceived as being inefficient and because bagging and boosting consistently outperforms a single model, which raises the question: “Do we even need BOC in datamining?”. We show that the answer to this question is “yes ” by illustrating that several recent efficient model averaging approaches can significantly outperform bagging and boosting in realistic difficult situations such as extensive class label noise, sample selection bias and manyclass problems. To our knowledge the insights that model averaging can outperform bagging and boosting in these situations has not been published in the machine learning, mining or statistical communities. 1
Follow the leader if you can, Hedge if you must
 Journal of Machine Learning Research
"... FollowtheLeader (FTL) is an intuitive sequential prediction strategy that guarantees constant regret in the stochastic setting, but has poor performance for worstcase data. Other hedging strategies have better worstcase guarantees but may perform much worse than FTL if the data are not maximall ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
FollowtheLeader (FTL) is an intuitive sequential prediction strategy that guarantees constant regret in the stochastic setting, but has poor performance for worstcase data. Other hedging strategies have better worstcase guarantees but may perform much worse than FTL if the data are not maximally adversarial. We introduce the FlipFlop algorithm, which is the first method that provably combines the best of both worlds. As a stepping stone for our analysis, we develop AdaHedge, which is a new way of dynamically tuning the learning rate in Hedge without using the doubling trick. AdaHedge refines a method by CesaBianchi, Mansour, and Stoltz (2007), yielding improved worstcase guarantees. By interleaving AdaHedge and FTL, FlipFlop achieves regret within a constant factor of the FTL regret, without sacrificing AdaHedge’s worstcase guarantees. AdaHedge and FlipFlop do not need to know the range of the losses in advance; moreover, unlike earlier methods, both have the intuitive property that the issued weights are invariant under rescaling and translation of the losses. The losses are also allowed to be negative, in which case they may be interpreted as gains.
Safe Learning: bridging the gap between Bayes, MDL and statistical learning theory via empirical convexity
"... We extend Bayesian MAP and Minimum Description Length (MDL) learning by testing whether the data can be substantially more compressed by a mixture of the MDL/MAP distribution with another element of the model, and adjusting the learning rate if this is the case. While standard Bayes and MDL can fail ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
We extend Bayesian MAP and Minimum Description Length (MDL) learning by testing whether the data can be substantially more compressed by a mixture of the MDL/MAP distribution with another element of the model, and adjusting the learning rate if this is the case. While standard Bayes and MDL can fail to converge if the model is wrong, the resulting “safe ” estimator continues to achieve good rates with wrong models. Moreover, when applied to classification and regression models as considered in statistical learning theory, the approach achieves optimal rates under, e.g., Tsybakov’s conditions, and reveals new situations in which we can penalize by ( − log prior)/n rather than √ ( − log prior)/n. 1
The Safe Bayesian: learning the learning rate via the mixability gap
"... Abstract. Standard Bayesian inference can behave suboptimally if the model is wrong. We present a modification of Bayesian inference which continues to achieve good rates with wrong models. Our method adapts the Bayesian learning rate to the data, picking the rate minimizing the cumulative loss of s ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Standard Bayesian inference can behave suboptimally if the model is wrong. We present a modification of Bayesian inference which continues to achieve good rates with wrong models. Our method adapts the Bayesian learning rate to the data, picking the rate minimizing the cumulative loss of sequential prediction by posterior randomization. Our results can also be used to adapt the learning rate in a PACBayesian context. The results are based on an extension of an inequality due to T. Zhang and others to dependent random variables. 1
Ninimum Message Length and Statistically Consistent Invariant; (Objective?) Bayesian Probabilistic Inference  From (Medical) “Evidence”
 SOCIAL EPISTEMOLOGY VOL. 22, NO. 4, OCTOBER–DECEMBER 2008, PP. 433–460
, 2008
"... ..."
Advance Access publication on June 18, 2008 doi:10.1093/comjnl/bxm117
"... One of the second generation of computer scientists, Chris Wallace completed his tertiary education in 1959 with a Ph.D. in nuclear physics, on cosmic ray showers, under Dr Paul George at Sydney University. Needless to say, computer science was not, at that stage, an established academic discipline. ..."
Abstract
 Add to MetaCart
(Show Context)
One of the second generation of computer scientists, Chris Wallace completed his tertiary education in 1959 with a Ph.D. in nuclear physics, on cosmic ray showers, under Dr Paul George at Sydney University. Needless to say, computer science was not, at that stage, an established academic discipline. With Max Brennan 1 andJohnMaloshehaddesignedand built a large automatic data logging system for recording cosmic ray air shower events and with Max Brennan also developed a complex computer programme for Bayesian analysis of cosmic ray events on the recently installed SILLIAC computer. Appointed lecturer in Physics at Sydney in 1960 he was sent almost immediately to the University of Illinois to copy the design of ILLIAC II, a duplicate of which was to be built at Sydney. ILLIAC II was not in fact completed at that stage and, after an initial less than warm welcome by a department who seemed unsure exactly what this Australian was doing in their midst, his talents were recognized and he was invited to join their staff (under very generous conditions) to assist in ILLIAC II design 2. He remained there for two years helping in particular to design the input output channels and aspects of the advanced control unit (first stage pipeline). In the event, Sydney decided it would be too expensive to build a copy of ILLIAC II, although a successful copy (the Golem) was built in Israel using circuit designs developed by Wallace and Ken Smith. In spite of the considerable financial and academic inducements to remain in America, Wallace returned to Australia after three months spent in England familiarizing himself with the KDF9 computer being purchased by Sydney University to replace SILLIAC. Returning to the School of Physics he joined the Basser
That Simple Device Already Used by Gauss
"... www.cwi.nl/~pdg From November 1998 until September 1999, Jorma Rissanen and I met on a regular basis. Here I recall some of our stimulating conversations and some of the work that we did together. This work, based almost exclusively on a single page of [12], was left unfinished and has never been pu ..."
Abstract
 Add to MetaCart
(Show Context)
www.cwi.nl/~pdg From November 1998 until September 1999, Jorma Rissanen and I met on a regular basis. Here I recall some of our stimulating conversations and some of the work that we did together. This work, based almost exclusively on a single page of [12], was left unfinished and has never been published, but it has indirectly had a profound impact on my career. 1 Meet Jorma Rissanen I first met Jorma in November 1998. I had just obtained my Ph.D. in Amsterdam and started a postdoc at Stanford University. These were exciting times: it was at the height of the dotcom boom, and Stanford was right in the middle of it. Since my thesis was all about the MDL Principle, I had suggested that Jorma and I could meet in person during my stay in California. Jorma replied that he would like to. I was delighted, honored but also a bit worried, since I had been forewarned that Jorma was not your “usual ” kind of scientist...