## Learning, Simplicity, Truth, and Misinformation

### BibTeX

@MISC{Kelly_learning,simplicity,,

author = {Kevin T. Kelly},

title = {Learning, Simplicity, Truth, and Misinformation},

year = {}

}

### OpenURL

### Abstract

Both in learning and in natural science, one faces the problem of selecting among a range of theories, all of which are compatible with the available evidence. The traditional response to this problem has been to select the simplest such theory on the basis of “Ockham’s Razor”. But how can a fixed bias toward simplicity help us find possibly complex truths? I survey the current, textbook answers to this question and find them all to be wishful, circular, or irrelevant. Then I present a new approach based on minimizing the number of reversals of opinion prior to convergence to the truth. According to this alternative approach, Ockham’s razor is a good idea when it seems to be (e.g., in selecting among parametrized models) and is not a good idea when it feels dubious (e.g., in the inference of arbitrary computable functions). Hence, the proposed vindication of Ockham’s razor can be used to separate vindicated applications In science and learning, one must eventually face up to the problem of choosing among several or even infinitely many theories compatible with all available information. How ought one to choose? The traditional answer is to choose the “simplest ” and to invoke

### Citations

2327 | The structure of scientific revolutions - Kuhn - 1962 |

1359 |
Information theory and an extension of the maximum likelihood principle
- Akaike
- 1973
(Show Context)
Citation Context ...about the kind of connection to the truth model selection methods provide. A familiar technique in model selection is to choose a model that maximizes a score called the Akaike Information Criterion (=-=Akaike 1973-=-, Forster and Sober 1994). Let T [θ] be a theory with n free parameters and let E be a particular sample. Let ˆ θ denote the value of θ that maximizes P (E | T [ ˆ θ]), in which case we say that ˆ θ i... |

1086 |
The Logic of Scientific Discovery
- Popper
- 1934
(Show Context)
Citation Context ...ers of science have remarked on the desirability of some of these features, have derived one of the features from another, and have concluded that the latter is a reason for seeking the former (e.g., =-=Popper 1968-=-, Glymour 1981, Friedman 1983). It is particularly tempting, for example, to recommend simplicity because simple theories are more testable or to demand explanations and to observe that simple theorie... |

608 |
Introduction to computational Learning Theory
- Kearns, Vazirani
- 1994
(Show Context)
Citation Context ...ribution determined by the theory. (Probably Approximately Confusing) Theoreticians in machine learning have coined and carefully examined a concept of probably approximately correct or PAC learning (=-=Kearns and Vazirani 1994-=-). Suppose that all we care about is not being embarrassed by future counterexamples to our inferred classification rule, so we measure “approximate correctness” of a rule by the chance that a single,... |

347 | Classical Descriptive Set Theory - Kechris |

279 | Logical Foundations of Probability - Carnap - 1950 |

223 |
Systems that Learn, An Introduction to Learning Theory for Cognitive and Computer Scientists,” MIT-Press
- Osherson, Stob, et al.
- 1986
(Show Context)
Citation Context ... on this idea for some time. The basic idea of counting mind-changes is originally due to H. Putnam (1965). It has been studied extensively in the computational learning literature— for a review cf. (=-=Jain et al. 1999-=-). But in that literature, the focus is on categorizing the complexities of problems rather than on singling out Ockham’s razor as an optimal method. I viewed the matter the same way in (Kelly 1996). ... |

124 |
The Logic of Reliable Inquiry
- Kelly
- 1996
(Show Context)
Citation Context ...n et al. 1999). But in that literature, the focus is on categorizing the complexities of problems rather than on singling out Ockham’s razor as an optimal method. I viewed the matter the same way in (=-=Kelly 1996-=-). Most philosophers of science have read W. Salmon’s (1967) complaint that convergence results don’t constrain scientific behavior in the short run. To address this complaint, Oliver Schulte and I st... |

100 | Trial and error predicates and the solution to a problem of mostowski - Putnam - 1965 |

91 |
How to Tell when Simpler, More Unified, or Less Ad Hoc Theories will Provide More Accurate Predictions. British Journal for the Philosophy of Science 45
- Forster, Sober
- 1994
(Show Context)
Citation Context ...d of connection to the truth model selection methods provide. A familiar technique in model selection is to choose a model that maximizes a score called the Akaike Information Criterion (Akaike 1973, =-=Forster and Sober 1994-=-). Let T [θ] be a theory with n free parameters and let E be a particular sample. Let ˆ θ denote the value of θ that maximizes P (E | T [ ˆ θ]), in which case we say that ˆ θ is the maximum likelihood... |

69 |
Theory of Probability, Third Edition
- Jeffreys
- 1998
(Show Context)
Citation Context ...ble properties. 0.2.2 Begging the Question Bayesian methodology has an easy “explanation” of Ockham’s razor: just put high prior probabilities on simple theories and turn the crank on Bayes’ theorem (=-=Jeffreys 1985-=-). Then if you are forced (contrary to Bayesian ideology) to choose among theories, choosing simpler theories compatible with the data will look like a better policy to you. Of course, this argument i... |

63 |
Foundations of Space-Time Theories
- Friedman
- 1983
(Show Context)
Citation Context ...d on the desirability of some of these features, have derived one of the features from another, and have concluded that the latter is a reason for seeking the former (e.g., Popper 1968, Glymour 1981, =-=Friedman 1983-=-). It is particularly tempting, for example, to recommend simplicity because simple theories are more testable or to demand explanations and to observe that simple theories explain better. But the tru... |

59 |
The foundations of scientific inference
- Salmon
- 1967
(Show Context)
Citation Context ...s true. That’s a fine thing to know, but it falls far short of a recommendation for Ockham’s razor, for just about any alternative, prior bias will “wash out” in the limit of inquiry in the same way (=-=Salmon 1967-=-). The convergence theorems simply say that a given prior bias doesn’t prevent one from arriving at the truth eventually. But to say on that basis that a given bias helps you 6find the truth is plain... |

32 | Minimum Description Length Induction - Vitanyi, Li - 2000 |

20 | Why Probability Does Not Capture the Logic of Scientific Justification - Kelly, Glymour - 2004 |

19 |
Fact, Fiction, and Forecast, fourth edition
- Goodman
- 1983
(Show Context)
Citation Context ...1 and that these code numbers are fed to the learning agent. Then the “right” prior opinion favors constantly 0 sequences or constantly 1 sequences. Now translate into the evidential language “grue” (=-=Goodman 1983-=-) which means “green up to t and blue thereafter” and “bleen”, which means “blue up to t and green thereafer” and code “grue” by one and “bleen” by zero. Then the rule for assigning simplicity-biased ... |

19 | Efficient Convergence Implies Ockham’s Razor - Kelly - 2002 |

19 |
Means-Ends Epistemology”, The British
- Schulte
- 1999
(Show Context)
Citation Context ...liver Schulte and I started looking at retraction minimization as a way to severely constrain one’s choice of hypothesis in the short run. Schulte’s thesis work in this is summarized and extended in (=-=Schulte 1999-=-a, 1999b). Schulte has also applied the idea to the inference of conservation laws in particle physics (Schule 2001). In 2000, I began to extend the idea, based on a variant of the ordinal mind-change... |

11 | Inferring Conservation Laws in Particle Physics: A Case study - Schulte - 2000 |

7 |
Why Glymour is a Bayesian,” in Testing Scientific Theories
- Rosenkrantz
- 1983
(Show Context)
Citation Context ...P (S|E)/P (C|E) = k. So even though the complex theory could save the data, the simple theory that did so without any ad hoc fiddling ends up being “confirmed” much more sharply by the same data (cf. =-=Rosenkrantz 1983-=-). And who can say that the argument depends on a prior bias toward S? For at the outset, the prior probabilities of S and C are identical. Indeed, it would be a miracle if out of all the possible way... |

5 |
The Logic of Reliable and Efficient
- Schulte
- 1999
(Show Context)
Citation Context ...liver Schulte and I started looking at retraction minimization as a way to severely constrain one’s choice of hypothesis in the short run. Schulte’s thesis work in this is summarized and extended in (=-=Schulte 1999-=-a, 1999b). Schulte has also applied the idea to the inference of conservation laws in particle physics (Schule 2001). In 2000, I began to extend the idea, based on a variant of the ordinal mind-change... |

4 | A universal prior for integers and estimation by inimum description length,” The Annals of Statistics - Rissanen - 1983 |

3 | Theory and Evidence, Princetion - Glymour - 1980 |

2 | Justification as Truth-finding Efficiency - unknown authors - 2004 |