Results 1 
6 of
6
On Universal Prediction and Bayesian Confirmation
 Theoretical Computer Science
, 2007
"... The Bayesian framework is a wellstudied and successful framework for inductive reasoning, which includes hypothesis testing and confirmation, parameter estimation, sequence prediction, classification, and regression. But standard statistical guidelines for choosing the model class and prior are not ..."
Abstract

Cited by 22 (13 self)
 Add to MetaCart
The Bayesian framework is a wellstudied and successful framework for inductive reasoning, which includes hypothesis testing and confirmation, parameter estimation, sequence prediction, classification, and regression. But standard statistical guidelines for choosing the model class and prior are not always available or can fail, in particular in complex situations. Solomonoff completed the Bayesian framework by providing a rigorous, unique, formal, and universal choice for the model class and the prior. I discuss in breadth how and in which sense universal (noni.i.d.) sequence prediction solves various (philosophical) problems of traditional Bayesian sequence prediction. I show that Solomonoff’s model possesses many desirable properties: Strong total and future bounds, and weak instantaneous bounds, and in contrast to most classical continuous prior densities has no zero p(oste)rior problem, i.e. can confirm universal hypotheses, is reparametrization and regrouping invariant, and avoids the oldevidence and updating problem. It even performs well
A Philosophical Treatise of Universal Induction
 Entropy 2011
"... Understanding inductive reasoning is a problem that has engaged mankind for thousands of years. This problem is relevant to a wide range of fields and is integral to the philosophy of science. It has been tackled by many great minds ranging from philosophers to scientists to mathematicians, and more ..."
Abstract

Cited by 11 (7 self)
 Add to MetaCart
Understanding inductive reasoning is a problem that has engaged mankind for thousands of years. This problem is relevant to a wide range of fields and is integral to the philosophy of science. It has been tackled by many great minds ranging from philosophers to scientists to mathematicians, and more recently computer scientists. In this article we argue the case for Solomonoff Induction, a formal inductive framework which combines algorithmic information theory with the Bayesian framework. Although it achieves excellent theoretical results and is based on solid philosophical foundations, the requisite technical knowledge necessary for understanding this framework has caused it to remain largely unknown and unappreciated in the wider scientific community. The main contribution of this article is to convey Solomonoff induction and its related concepts in a generally accessible form with the aim of bridging this current technical gap. In the process we examine the major historical contributions that have led to the formulation of Solomonoff Induction as well as criticisms of Solomonoff and induction in general. In particular we examine how Solomonoff induction addresses many issues that have plagued other inductive systems, such as the black ravens paradox and the confirmation problem, and compare this approach with other recent approaches.
Sublinear algorithms for approximating string compressibility
 In International Workshop on Randomization and Approximation Techniques in Computer Science
, 2007
"... We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: runlength encoding (RLE) and LempelZiv (LZ), and present sublinear algorithms for app ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: runlength encoding (RLE) and LempelZiv (LZ), and present sublinear algorithms for approximating compressibility with respect to both schemes. We also give several lower bounds that show that our algorithms for both schemes cannot be improved significantly. Our investigation of LZ yields results whose interest goes beyond the initial questions we set out to study. In particular, we prove combinatorial structural lemmas that relate the compressibility of a string with respect to LempelZiv to the number of distinct short substrings contained in it. In addition, we show that approximating the compressibility with respect to LZ is related to approximating the support size of a distribution. I.
TCSTRB1043 Master’s Thesis: Virus Data Clustering based on Kolmogorov Complexity
, 2010
"... Influenza viruses are probably a major cause of morbidity and mortality world wide. Large segments of the human population are affected every year. In June 2009, World Health Organization declared the influenza due to a new strain of swine origin H1N1 was responsible for the 2009 influenza pandemic. ..."
Abstract
 Add to MetaCart
Influenza viruses are probably a major cause of morbidity and mortality world wide. Large segments of the human population are affected every year. In June 2009, World Health Organization declared the influenza due to a new strain of swine origin H1N1 was responsible for the 2009 influenza pandemic. And on June 11, the WHO declared an H1N1 pandemic moving the alert level to phase 6, marking the first global pandemic since the 1968 Hong Kong influenza. There are a lot of data mining methods used in biological sciences to analysis viruses. But if one designs data mining algorithms based on domain knowledge, then the resulting algorithms tend to have many parameters. Determining how relevant particular features are is often difficult and may require a certain amount of guessing. In this thesis, we introduce a universal data mining method which we call parameterfree data mining. The approach of parameterfree data mining is aimed at scenarios where we are not interested in a certain similarity measure but in the similarity between objects themselves. The most promising approach to this paradigm is called normalized information distance which uses Kolmogorov complexity theory as its basis. As the normalized information distance (NID) cannot be computed, we apply this idea to standard compression algorithms, such as gzip and bzip, have been used as approximations of the Kolmogorov complexity. This yields the normalized compression distance (NCD) as approximation of the NID. To demonstrate the usefulness of the normalized compression distance for clustering influenza viruses data, two kinds of compressors and two clustering algorithms have been used, which verified that this approach neither depend on the compression methods nor the clustering methods we choose.
DOI 10.1007/s0045301296186 Sublinear Algorithms for Approximating String
"... Abstract We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: runlength encoding (RLE) and a variant of LempelZiv (LZ77), and present subl ..."
Abstract
 Add to MetaCart
Abstract We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: runlength encoding (RLE) and a variant of LempelZiv (LZ77), and present sublinear algorithms for approximating compressibility with respect to both schemes. We also give several lower bounds that show that our algorithms for both schemes cannot be improved significantly. Our investigation of LZ77 yields results whose interest goes beyond the initial questions we set out to study. In particular, we prove combinatorial structural lem