## Ockham’s Razor, Truth, and Information (2007)

Citations: | 2 - 0 self |

### BibTeX

@MISC{Kelly07ockham’srazor,,

author = {Kevin T. Kelly},

title = {Ockham’s Razor, Truth, and Information},

year = {2007}

}

### OpenURL

### Abstract

In science, one faces the problem of selecting the true theory from a range of alternative theories. The typical response is to select the simplest theory compatible with available evidence, on the authority of “Ockham’s Razor”. But how can a fixed bias toward simplicity help one find possibly complex truths? A short survey of standard answers to this question reveals them to be either wishful, circular, or irrelevant. A new explanation is presented, based on minimizing the reversals of opinion prior to convergence to the truth. According to this alternative approach, Ockham’s razor does not inform one which theory is true but is, nonetheless, the uniquely most efficient strategy for arriving at the true theory, where efficiency is a matter of minimizing reversals of opinion prior to finding the true theory. 1

### Citations

8980 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...terion (AIC)) (1973), cross-validation, and Mallow’s statistic (cf. Wasserman 2003). Structural risk minimization (SRM) is an interesting generalization and extension of the over-fitting perspective (=-=Vapnik 1998-=-). In the SRM approach, one does not merely construct an (approximately) unbiased estimate of risk; one solves for objective, worstcase bounds on the chance that estimated risk differs by a given amou... |

1902 | The Structure of Scientific Revolutions - Kuhn - 1962 |

1682 | An Introduction to Kolmogorov Complexity and its Applications
- Li, Vitányi
- 1997
(Show Context)
Citation Context ...ey are more severely testable (Popper 1968, Glymour 1981, Friedman 1983, Mayo 1996), explain better (Kitcher 1981), predict better (Forster and Sober 1994), and provide a compact summary of the data (=-=Li and Vitanyi 1997-=-, Rissanen 1983 1 ). However, if the truth happens not to be simple, then the truth does not possess the consequent virtues, either. To infer that the truth is simple because simple worlds and the the... |

1235 | Information theory and an extension of the maximum likelihood principle - Akaike - 1973 |

924 |
The Logic of Scientific Discovery
- Popper
- 1959
(Show Context)
Citation Context ...es have attractive aesthetic and methodological virtues. Aesthetically, they are more unified, uniform and symmetrical and are less ad hoc or messy. Methodologically, they are more severely testable (=-=Popper 1968-=-, Glymour 1981, Friedman 1983, Mayo 1996), explain better (Kitcher 1981), predict better (Forster and Sober 1994), and provide a compact summary of the data (Li and Vitanyi 1997, Rissanen 1983 1 ). Ho... |

384 |
Knowledge and the Flow of Information
- Dretske
- 1981
(Show Context)
Citation Context ...ealizable demand that simplicity should reliably point toward or inform one of the true theoretical structure, a popular—if infeasible—view both in statistics and philosophy (Goldman 1986, Mayo 1996, =-=Dretske 1981-=-). The approach developed below is quite different: insofar as finding the truth makes reversals of opinion unavoidable, they are not only justified but laudable— whereas, insofar as they are avoidabl... |

304 | Knowledge in Flux - Gardenfors - 1988 |

264 | Logical Foundations of Probability, The - Carnap - 1950 |

218 |
Systems that Learn: An Introduction to Learning Theory, second edition
- Jain, Osherson, et al.
- 1999
(Show Context)
Citation Context ...f information is required. 3 The basic idea of counting mind-changes is originally due to H. Putnam (1965). It has been studied extensively in the computational learning literature— for a review cf. (=-=Jain et al. 1999-=-). But in that literature, the focus is on categorizing the complexities of problems rather than on singling out Ockham’s razor as an optimal strategy. I viewed the matter the same way in (Kelly 1996)... |

207 | The scientific image - Fraassen - 1980 |

159 | Optimal structure identification with greedy search - HEMMECKE, Chickering |

152 | Estimating the dimension of a model. Annals of Statistics - Schwarz - 1978 |

137 | All of statistics: a concise course in statistical inference - Wasserman |

131 |
Error and the growth of experimental knowledge
- Mayo
- 1996
(Show Context)
Citation Context ...gical virtues. Aesthetically, they are more unified, uniform and symmetrical and are less ad hoc or messy. Methodologically, they are more severely testable (Popper 1968, Glymour 1981, Friedman 1983, =-=Mayo 1996-=-), explain better (Kitcher 1981), predict better (Forster and Sober 1994), and provide a compact summary of the data (Li and Vitanyi 1997, Rissanen 1983 1 ). However, if the truth happens not to be si... |

120 |
The Logic of Reliable Inquiry
- Kelly
- 1996
(Show Context)
Citation Context ...et al. 1999). But in that literature, the focus is on categorizing the complexities of problems rather than on singling out Ockham’s razor as an optimal strategy. I viewed the matter the same way in (=-=Kelly 1996-=-). Schulte (1999a, 1999b) derives short-run constraints on strategies from retraction minimization. (Kelly 2002) extends the idea, based on a variant of the ordinal mind-change account due to (Freival... |

110 |
Epistemology and Cognition
- Goldman
- 1986
(Show Context)
Citation Context ...e more symptom of the unrealizable demand that simplicity should reliably point toward or inform one of the true theoretical structure, a popular—if infeasible—view both in statistics and philosophy (=-=Goldman 1986-=-, Mayo 1996, Dretske 1981). The approach developed below is quite different: insofar as finding the truth makes reversals of opinion unavoidable, they are not only justified but laudable— whereas, ins... |

97 | 1980: Theory and Evidence - Glymour |

97 | Trial and error predicates and the solution to a problem of mostowski - Putnam - 1965 |

86 |
How to tell when Simpler, More Unified or Less ad Hoc Theories will Provide More Accurate Predictions”, The British Journal for the Philosophy of Science
- Forster, Sober
- 1994
(Show Context)
Citation Context ...m and symmetrical and are less ad hoc or messy. Methodologically, they are more severely testable (Popper 1968, Glymour 1981, Friedman 1983, Mayo 1996), explain better (Kitcher 1981), predict better (=-=Forster and Sober 1994-=-), and provide a compact summary of the data (Li and Vitanyi 1997, Rissanen 1983 1 ). However, if the truth happens not to be simple, then the truth does not possess the consequent virtues, either. To... |

62 |
Theory of Probability. Third edition
- Jeffreys
- 1961
(Show Context)
Citation Context ... true theory. 2sSubjective Bayesians countenance any value whatever for the prior probability p(T ), so it is permissible to start with a prior probability distribution biased toward simple theories (=-=Jeffreys 1985-=-). But the mere adoption of such a bias hardly explains how finding the truth is facilitated better by that bias than by any other. A more subtle Bayesian argument seems to avoid the preceding circle.... |

59 |
1983]: Foundations of Spacetime Theories
- Friedman
(Show Context)
Citation Context ...c and methodological virtues. Aesthetically, they are more unified, uniform and symmetrical and are less ad hoc or messy. Methodologically, they are more severely testable (Popper 1968, Glymour 1981, =-=Friedman 1983-=-, Mayo 1996), explain better (Kitcher 1981), predict better (Forster and Sober 1994), and provide a compact summary of the data (Li and Vitanyi 1997, Rissanen 1983 1 ). However, if the truth happens n... |

53 |
The Foundations of Scientific Inference
- Salmon
- 1967
(Show Context)
Citation Context ...a bias helps one find the truth better than alternative biases. Convergence, alone, cannot answer that question, since if a method converges to the truth, so does every finite variant of that method (=-=Salmon 1967-=-). Hence, mere convergence says nothing about how the interests of truth-finding are particularly furthered by choosing the simplest theory now. But that is what the puzzle of simplicity is about. 3 D... |

53 | A linear non-Gaussian acyclic model for causal discovery
- Shimizu, Hoyer, et al.
- 2006
(Show Context)
Citation Context ...ons before the data resolve the issue, risking extra surprises. 7 4 It is known that in the linear, non-Gaussian case, causal structure can be recovered uniquely if there are no unobserved variables (=-=Shimizu et al. 2006-=-). The same may be true in the non-linear Gaussian case. 5 In the standard cases, it is known that all of the over-identifying constraints follow from conditional independence constraints (Richardson ... |

33 |
Explanatory unification
- Kitcher
- 1981
(Show Context)
Citation Context ..., they are more unified, uniform and symmetrical and are less ad hoc or messy. Methodologically, they are more severely testable (Popper 1968, Glymour 1981, Friedman 1983, Mayo 1996), explain better (=-=Kitcher 1981-=-), predict better (Forster and Sober 1994), and provide a compact summary of the data (Li and Vitanyi 1997, Rissanen 1983 1 ). However, if the truth happens not to be simple, then the truth does not p... |

30 |
Minimum description length induction
- Vitányi, Li
(Show Context)
Citation Context ...at more compressible strings tend to have higher prior probability. It can be shown that under certain conditions the MDL approach approximates Bayesian updating with the universal prior probability (=-=Vitanyi and Li 2000-=-). Algorithmic complexity may help to explicate some slippery but important methodological concepts, such as interest, beauty, or emergence (Adriaans 2007). The focus here, however, is on the putative... |

18 |
Why Probability does not Capture the Logic of Scientific Justification
- Kelly, Glymour
- 2004
(Show Context)
Citation Context ...Smith 1993), but that approach does not apply to cases like curve fitting, in which theory complexity is unbounded. Subsequent steps to the present approach may be found in (Kelly 2004, 2006) and in (=-=Kelly and Glymour 2004-=-). 9sSo directions to the nearest freeway entrance ramp satisfy all the apparently arcane and paradoxical demands that a successful explanation of Ockham’s razor must satisfy. It remains to explain wh... |

17 | Fact, Fiction, and Forecast, Fourth Edition - Goodman - 1983 |

17 | cient Convergence Implies Ockham’s Razor, in - Kelly, E - 2002 |

17 | Means-Ends Epistemology”, The British - Schulte - 1999 |

11 |
The World of Elementary Particles
- Ford
- 1963
(Show Context)
Citation Context .... In this example, favoring the answer that corresponds to the fewest effects corresponds to positing the greatest possible number of conserved quantities, which corresponds to physical practice (cf. =-=Ford 1963-=-). In this case, simplicity intuitions are consonant with testability and explanation, but run counter to minimization of free parameters (posited conserved quantities). Discovering causal structure. ... |

11 | Epistemic logic and epistemology: The state of their affairs - Benthem - 2006 |

10 |
Justification as Truth-finding Efficiency
- Kelly
- 2004
(Show Context)
Citation Context ...unt due to (Freivalds and Smith 1993), but that approach does not apply to cases like curve fitting, in which theory complexity is unbounded. Subsequent steps to the present approach may be found in (=-=Kelly 2004-=-, 2006) and in (Kelly and Glymour 2004). 9sSo directions to the nearest freeway entrance ramp satisfy all the apparently arcane and paradoxical demands that a successful explanation of Ockham’s razor ... |

10 | Inferring Conservation Laws in Particle Physics: A Case Study
- Schulte
- 2001
(Show Context)
Citation Context ..., to the number of effects presented by nature if f is true. Conservation laws. Consider an idealized version of explaining reactions with conservation laws, as in the theory of elementary particles (=-=Schulte 2001-=-, Valdez-Perez 1996). Suppose that there are n observable types of particles, and it is assumed that they interact so as to conserve n distinct quantities. In other words, each particle of type pi car... |

9 | Mind Change Optimal Learning of Bayes Net Structure - Schulte, Luo, et al. - 2007 |

8 | The Logic of Reliable and Efficient Inquiry - Schulte - 1999 |

6 | 2007a) “Ockham’s Razor, Empirical Complexity, and Truth-finding Efficiency,” Theoretical Computer Science 317
- Kelly
(Show Context)
Citation Context ... the polynomial structure problem, then an obvious definition of the empirical complexity of world w given e is c(w, e) = |Sw| − |Se|, the number of new effects presented by w after the end of e (cf. =-=Kelly 2007-=-). When Γ ⊂ Ω, as in the causal inference problem (some finite sets of partial correlations correspond to no causal graph), a slightly more general approach is required. 10 The basic idea is that effe... |

6 |
Unifying Scientific Theories
- Morrison
- 2000
(Show Context)
Citation Context ...d as a rule of scientific inference, it should help one to select the true theory from among the alternatives. The trouble is that it is far from clear how a fixed bias toward simplicity could do so (=-=Morrison 2000-=-). One wishes that simplicity could somehow indicate or inform one of the true theory, the way a compass needle indicates or informs one about direction. But since Ockham’s razor always points toward ... |

6 |
Why Glymour is a Bayesian,” in Testing Scientific Theories
- Rosenkrantz
- 1983
(Show Context)
Citation Context ...er a very small range of possible values of θ and p(C(θ)|C) is flattish, the integral assumes a value near zero. So the posterior probability of the simple theory S is sharply greater than that of C (=-=Rosenkrantz 1983-=-). It seems, therefore, that simplicity is “truth conducive”, starting from complete ignorance. The magic evaporates when the focus shifts from theories to ways in which the alternative theories can b... |

6 | Information theory and an extension of the maximum likelihood principle - unknown authors - 1973 |

3 |
A universal prior for integers and estimation by inimum description length,” The Annals of Statistics
- Rissanen
- 1983
(Show Context)
Citation Context ...testable (Popper 1968, Glymour 1981, Friedman 1983, Mayo 1996), explain better (Kitcher 1981), predict better (Forster and Sober 1994), and provide a compact summary of the data (Li and Vitanyi 1997, =-=Rissanen 1983-=- 1 ). However, if the truth happens not to be simple, then the truth does not possess the consequent virtues, either. To infer that the truth is simple because simple worlds and the theories that desc... |

2 |
The Incompatibility of Naturalism and Scientific Realism,” In Naturalism: A Critical Appraisal, edited by
- Koons
- 2000
(Show Context)
Citation Context ... across evolutionary time and across domains from subatomic particles to cell metabolism to social policy—the irony of defending Ockham’s razor with such hidden, metaphysical fancies notwithstanding (=-=Koons 2000-=-). Therein lies a concern about the association of information-theoretic terminology with Ockham’s razor, as in the MDL and SRM approaches. When information theory is applied to a telephone line, as o... |

2 |
Ancestral Graph Markov Models,”Annals of Statistics 30
- Richardson, Spirtes
- 2002
(Show Context)
Citation Context ...et al. 2006). The same may be true in the non-linear Gaussian case. 5 In the standard cases, it is known that all of the over-identifying constraints follow from conditional independence constraints (=-=Richardson and Spirtes 2002-=-). That is known to be false in the linear, nonGaussian case (Shimizu et al. 2006), so in that case simplicity must be relativized to a wider range of potential effects. Indeed, in the linear, non-Gau... |

2 | Means-Ends Epistemology,” The British - unknown authors - 1999 |

1 |
The philosophy of learning, the cooperative computational universe
- Adriaans
- 2007
(Show Context)
Citation Context ...g with the universal prior probability (Vitanyi and Li 2000). Algorithmic complexity may help to explicate some slippery but important methodological concepts, such as interest, beauty, or emergence (=-=Adriaans 2007-=-). The focus here, however, is on the putative connection, if any, between data-compression and finding the true theory. Some proponents of the approach (e.g., Rissanen, himself) deny that there is on... |

1 | Systematic Generation of Constituent Models - Valdez-Perez, Zytkow - 1996 |