Results 1  10
of
17
How to Use Expert Advice
 JOURNAL OF THE ASSOCIATION FOR COMPUTING MACHINERY
, 1997
"... We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worstcase situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the ..."
Abstract

Cited by 317 (66 self)
 Add to MetaCart
We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worstcase situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the algorithm by the difference between the expected number of mistakes it makes on the bit sequence and the expected number of mistakes made by the best expert on this sequence, where the expectation is taken with respect to the randomization in the predictions. We show that the minimum achievable difference is on the order of the square root of the number of mistakes of the best expert, and we give efficient algorithms that achieve this. Our upper and lower bounds have matching leading constants in most cases. We then show howthis leads to certain kinds of pattern recognition/learning algorithms with performance bounds that improve on the best results currently known in this context. We also compare our analysis to the case in which log loss is used instead of the expected number of mistakes.
Generalizing Case Frames Using a Thesaurus and the MDL Principle
 Computational Linguistics
, 1998
"... this paper, we confine ourselves to the former issue, and refer the interested reader to Li and Abe (1996), which deals with the latter issue ..."
Abstract

Cited by 106 (4 self)
 Add to MetaCart
this paper, we confine ourselves to the former issue, and refer the interested reader to Li and Abe (1996), which deals with the latter issue
A Transformational Characterization of Equivalent Bayesian Network Structures
, 1995
"... We present a simple characterization of equivalentBayesian network structures based on local transformations. The significance of the characterization is twofold. First, we are able to easily proveseveral new invariant properties of theoretical interest for equivalent structures. Second, we ..."
Abstract

Cited by 92 (1 self)
 Add to MetaCart
We present a simple characterization of equivalentBayesian network structures based on local transformations. The significance of the characterization is twofold. First, we are able to easily proveseveral new invariant properties of theoretical interest for equivalent structures. Second, we use the characterization to derive an efficient algorithm that identifies all of the compelled edges in a structure. Compelled edge identification is of particular importance for learning Bayesian network structures from data because these edges indicate causal relationships when certain assumptions hold. 1
Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity
 IEEE Transactions on Information Theory
, 1998
"... The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition un ..."
Abstract

Cited by 67 (7 self)
 Add to MetaCart
The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition under which the ideal principle should be applied is encapsulated as the Fundamental Inequality, which in broad terms states that the principle is valid when the data are random, relative to every contemplated hypothesis and also these hypotheses are random relative to the (universal) prior. Basically, the ideal principle states that the prior probability associated with the hypothesis should be given by the algorithmic universal probability, and the sum of the log universal probability of the model plus the log of the probability of the data given the model should be minimized. If we restrict the model class to the finite sets then application of the ideal principle turns into Kolmogorov's mi...
The Speed Prior: A New Simplicity Measure Yielding NearOptimal Computable Predictions
 Proceedings of the 15th Annual Conference on Computational Learning Theory (COLT 2002), Lecture Notes in Artificial Intelligence
, 2002
"... Solomonoff's optimal but noncomputable method for inductive inference assumes that observation sequences x are drawn from an recursive prior distribution p(x). Instead of using the unknown p() he predicts using the celebrated universal enumerable prior M() which for all exceeds any recursive p() ..."
Abstract

Cited by 51 (20 self)
 Add to MetaCart
Solomonoff's optimal but noncomputable method for inductive inference assumes that observation sequences x are drawn from an recursive prior distribution p(x). Instead of using the unknown p() he predicts using the celebrated universal enumerable prior M() which for all exceeds any recursive p(), save for a constant factor independent of x. The simplicity measure M() naturally implements "Occam's razor " and is closely related to the Kolmogorov complexity of . However, M assigns high probability to certain data that are extremely hard to compute. This does not match our intuitive notion of simplicity. Here we suggest a more plausible measure derived from the fastest way of computing data. In absence of contrarian evidence, we assume that the physical world is generated by a computational process, and that any possibly infinite sequence of observations is therefore computable in the limit (this assumption is more radical and stronger than Solomonoff's).
Evolutionary induction of sparse neural trees
 Evolutionary Computation
, 1997
"... This paper is concerned with the automatic induction of parsimonious neural networks. In contrast to other program induction situations, network induction entails parametric learning as well as structural adaptation. We present a novel representation scheme called neural trees that allows efficient ..."
Abstract

Cited by 37 (15 self)
 Add to MetaCart
This paper is concerned with the automatic induction of parsimonious neural networks. In contrast to other program induction situations, network induction entails parametric learning as well as structural adaptation. We present a novel representation scheme called neural trees that allows efficient learning of both network architectures and parameters by genetic search. A hybrid evolutionary method is developed for neural tree induction that combines genetic programming and the breeder genetic algorithm under the unified framework of the minimum description length principle. The method is successfully applied to the induction of higher order neural trees while still keeping the resulting structures sparse to ensure good generalization performance. Empirical results are provided on two chaotic time series prediction problems of practical interest.
A Markovian extension of Valiant's learning model (Extended Abstract)
 IN PROCEEDINGS OF THE THIRTYFIRST SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE
, 1990
"... Formalizing the process of natural induction and justifying its predictive value is not only basic to the phi ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
Formalizing the process of natural induction and justifying its predictive value is not only basic to the phi
The New AI: General & Sound & Relevant for Physics
, 2003
"... Most traditional artificial intelligence (AI) systems of the past 50 years are either very limited, or based on heuristics, or both. The new millennium, however, has brought substantial progress in the field of theoretically optimal and practically feasible algorithms for prediction, search, inducti ..."
Abstract

Cited by 15 (9 self)
 Add to MetaCart
Most traditional artificial intelligence (AI) systems of the past 50 years are either very limited, or based on heuristics, or both. The new millennium, however, has brought substantial progress in the field of theoretically optimal and practically feasible algorithms for prediction, search, inductive inference based on Occam's razor, problem solving, decision making, and reinforcement learning in environments of a very general type. Since inductive inference is at the heart of all inductive sciences, some of the results are relevant not only for AI and computer science but also for physics, provoking nontraditional predictions based on Zuse's thesis of the computergenerated universe.
The Econometrics of DSGE Models
, 2009
"... In this paper, I review the literature on the formulation and estimation of dynamic stochastic general equilibrium (DSGE) models with a special emphasis on Bayesian methods. First, I discuss the evolution of DSGE models over the last couple of decades. Second, I explain why the profession has decide ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
In this paper, I review the literature on the formulation and estimation of dynamic stochastic general equilibrium (DSGE) models with a special emphasis on Bayesian methods. First, I discuss the evolution of DSGE models over the last couple of decades. Second, I explain why the profession has decided to estimate these models using Bayesian methods. Third, I brie‡y introduce some of the techniques required to compute and estimate these models. Fourth, I illustrate the techniques under consideration by estimating a benchmark DSGE model with real and nominal rigidities. I conclude by o¤ering some pointers for future research.
Estimation of Discontinuous Displacement Vector Fields with the Minimum Description Length Criterion
, 1990
"... A new nonlterative approach to determine displacement vector fields with discontinuities is described. ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
A new nonlterative approach to determine displacement vector fields with discontinuities is described.