Results 1 - 10
of
95
From Laplace To Supernova Sn 1987a: Bayesian Inference In Astrophysics
, 1990
"... . The Bayesian approach to probability theory is presented as an alternative to the currently used long-run relative frequency approach, which does not offer clear, compelling criteria for the design of statistical methods. Bayesian probability theory offers unique and demonstrably optimal solutions ..."
Abstract
-
Cited by 42 (2 self)
- Add to MetaCart
. The Bayesian approach to probability theory is presented as an alternative to the currently used long-run relative frequency approach, which does not offer clear, compelling criteria for the design of statistical methods. Bayesian probability theory offers unique and demonstrably optimal solutions to well-posed statistical problems, and is historically the original approach to statistics. The reasons for earlier rejection of Bayesian methods are discussed, and it is noted that the work of Cox, Jaynes, and others answers earlier objections, giving Bayesian inference a firm logical and mathematical foundation as the correct mathematical language for quantifying uncertainty. The Bayesian approaches to parameter estimation and model comparison are outlined and illustrated by application to a simple problem based on the gaussian distribution. As further illustrations of the Bayesian paradigm, Bayesian solutions to two interesting astrophysical problems are outlined: the measurement of wea...
A maximum entropy model of phonotactics and phonotactic learning
, 2006
"... The study of phonotactics (e.g., the ability of English speakers to distinguish possible words like blick from impossible words like *bnick) is a central topic in phonology. We propose a theory of phonotactic grammars and a learning algorithm that constructs such grammars from positive evidence. Our ..."
Abstract
-
Cited by 35 (5 self)
- Add to MetaCart
The study of phonotactics (e.g., the ability of English speakers to distinguish possible words like blick from impossible words like *bnick) is a central topic in phonology. We propose a theory of phonotactic grammars and a learning algorithm that constructs such grammars from positive evidence. Our grammars consist of constraints that are assigned numerical weights according to the principle of maximum entropy. Possible words are assessed by these grammars based on the weighted sum of their constraint violations. The learning algorithm yields grammars that can capture both categorical and gradient phonotactic patterns. The algorithm is not provided with any constraints in advance, but uses its own resources to form constraints and weight them. A baseline model, in which Universal Grammar is reduced to a feature set and an SPE-style constraint format, suffices to learn many phonotactic phenomena. In order to learn nonlocal phenomena such as stress and vowel harmony, it is necessary to augment the model with autosegmental tiers and metrical grids. Our results thus offer novel, learning-theoretic support for such representations. We apply the model to English syllable onsets, Shona vowel harmony, quantity-insensitive stress typology, and the full phonotactics of Wargamay, showing that the learned grammars capture the distributional generalizations of these languages and accurately predict the findings of a phonotactic experiment.
A Natural Law of Succession
, 1995
"... We present a new solution to multinomial estimation and demonstrate that our solution outperforms standard solutions both in theory and in practice. The novelty of our approach lies in our use of combinatorial priors on strings. I. Natural Strings An alphabet represents the set of logically possib ..."
Abstract
-
Cited by 33 (4 self)
- Add to MetaCart
We present a new solution to multinomial estimation and demonstrate that our solution outperforms standard solutions both in theory and in practice. The novelty of our approach lies in our use of combinatorial priors on strings. I. Natural Strings An alphabet represents the set of logically possible events. In this world, all strings are finite and most are very short. For this basic reason, natural strings do not include all the symbols in the alphabet. This claim is tautological for short strings, but it is also true for long strings. To model this phenomenon, we propose a uniform prior on the cardinalities of all nonempty subsets of the alphabet. Such a prior on an alphabet of size k entails the probability pN (x n jn) = min(k; n) ` k q '` n \Gamma 1 q \Gamma 1 '` n fn i g ' \Gamma1 for strings x n of length n with cardinality q. This probability is not Kolmogorov compatible. To obtain a conditional probability, we must use p(ijx n ; n + 1) instead of the more o...
An Evolutionary Algorithm for Integer Programming
- Parallel Problem Solving from Nature - PPSN III, Lecture Notes in Computer Science
, 1994
"... . The mutation distribution of evolutionary algorithms usually is oriented at the type of the search space. Typical examples are binomial distributions for binary strings in genetic algorithms or normal distributions for real valued vectors in evolution strategies and evolutionary programming. This ..."
Abstract
-
Cited by 25 (4 self)
- Add to MetaCart
. The mutation distribution of evolutionary algorithms usually is oriented at the type of the search space. Typical examples are binomial distributions for binary strings in genetic algorithms or normal distributions for real valued vectors in evolution strategies and evolutionary programming. This paper is devoted to the construction of a mutation distribution for unbounded integer search spaces. The principle of maximum entropy is used to select a specific distribution from numerous potential candidates. The resulting evolutionary algorithm is tested for five nonlinear integer problems. 1 Introduction Evolutionary algorithms (EAs) represent a class of stochastic optimization algorithms in which principles of organic evolution are regarded as rules in optimization. They are often applied to real parameter optimization problems [2] when specialized techniques are not available or standard methods fail to give satisfactory answers due to multimodality, nondifferentiability or discontin...
The Well-Posed Problem
- Foundations of Physics
, 1973
"... distributions obtained from transformation groups, using as our main example the famous paradox of Bertrand. Bertrand's problem (Bertrand, 1889) was stated originally in terms of drawing a straight line "at random" intersecting a circle. It will be helpful to think of this in a more concrete way; p ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
distributions obtained from transformation groups, using as our main example the famous paradox of Bertrand. Bertrand's problem (Bertrand, 1889) was stated originally in terms of drawing a straight line "at random" intersecting a circle. It will be helpful to think of this in a more concrete way; presumably, we do no violence to the problem (i.e., it is still just as "random") if we suppose that we are tossing straws onto the circle, without specifying how they are tossed. We therefore formulate the problem as follows. A long straw is tossed at random onto a circle; given that it falls so that it intersects the circle, what is the probability that the chord thus defined is longer than a side of the inscribed equilateral triangle? Since Bertrand proposed it in 1889 this problem has been cited to generations of students to demonstrate that Laplace's "principle of indifference" contains logical inconsistencies. For, there appear to be many ways of defining "equally possibl
Representation Dependence in Probabilistic Inference
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2004
"... Non-deductive reasoning systems are often representation dependent: representing the same situation in two different ways may cause such a system to return two different answers. Some have viewed ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
Non-deductive reasoning systems are often representation dependent: representing the same situation in two different ways may cause such a system to return two different answers. Some have viewed
Set-Based Bayesianism
- IEEE Transactions on Systems, Man, and Cybernetics
, 1992
"... . Problems for strict and convex Bayesianism are discussed. A set-based Bayesianism generalizing convex Bayesianism and intervalism is proposed. This approach abandons not only the strict Bayesian requirement of a unique real-valued probability function in any decision-making context but also the re ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
. Problems for strict and convex Bayesianism are discussed. A set-based Bayesianism generalizing convex Bayesianism and intervalism is proposed. This approach abandons not only the strict Bayesian requirement of a unique real-valued probability function in any decision-making context but also the requirement of convexity for a set-based representation of uncertainty. Levi's E-admissibility decision criterion is retained and is shown to be applicable in the non-convex case. Keywords: Uncertainty, decision-making, maximum entropy, Bayesian methods. 1. Introduction. The reigning philosophy of uncertainty representation is strict Bayesianism. One of its central principles is that an agent must adopt a single, real-valued probability function over the events recognized as relevant to a given problem. Prescriptions for defining such a function for a given agent in a given situation range from the extreme personalism of deFinetti (1964, 1974) and Savage (1972) to the objective Bayesianism of...
The Latent Maximum Entropy Principle
- In Proc. of ISIT
, 2002
"... We present an extension to Jaynes' maximum entropy principle that handles latent variables. The principle of latent maximum entropy we propose is di#erent from both Jaynes' maximum entropy principle and maximum likelihood estimation, but often yields better estimates in the presence of hidden vari ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
We present an extension to Jaynes' maximum entropy principle that handles latent variables. The principle of latent maximum entropy we propose is di#erent from both Jaynes' maximum entropy principle and maximum likelihood estimation, but often yields better estimates in the presence of hidden variables and limited training data. We first show that solving for a latent maximum entropy model poses a hard nonlinear constrained optimization problem in general. However, we then show that feasible solutions to this problem can be obtained e#ciently for the special case of log-linear models---which forms the basis for an e#cient approximation to the latent maximum entropy principle. We derive an algorithm that combines expectation-maximization with iterative scaling to produce feasible log-linear solutions. This algorithm can be interpreted as an alternating minimization algorithm in the information divergence, and reveals an intimate connection between the latent maximum entropy and maximum likelihood principles.
On Generalized Entropies and Scale-Space
, 1997
"... this paper we show that the generalized entropies are such functionals. It should be noted that this behavior is not seen for the number of critical points: Although critical points most often disappear when scale is increased, creation of critical points with increasing scale is a generic event [16 ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
this paper we show that the generalized entropies are such functionals. It should be noted that this behavior is not seen for the number of critical points: Although critical points most often disappear when scale is increased, creation of critical points with increasing scale is a generic event [16, 14, 7]. Secondly, generalized entropy is the basis for the theory of Multi-Fractal [11, 18] and it is known that there are very strong algebraic similarities to the fundamental equations of Statistical Mechanics. These are thus well known functions, and while images are not physical systems in classical thermodynamic sense, Linear Scale-Space is governed by the Linear Heat Diffusion Equation, and one could thus without great difficulty extend the view of images to be a classical thermodynamical system for which the Linear Heat Diffusion is valid. Such a system is an ideal gas. These interpretations of images will be discussed in detail in this chapter. Finally, as will be demonstrated the generalized entropies offer practical, mathematical well founded functions to study scaling behaviors of images for scale-selection and texture analysis. Related to this work is Vehel et al. [29], where images are studied in the multi-fractal setting, focusing on certain dimensions, and Brink & Pendock [6], and Brink [5] have used the entropy and the closely related Kullback measure to do local thresholding of images. This article is organized as follows. First, in Section 2 will be given a brief introduction to Linear ScaleSpace and linear entropy. Then, in Section 3 will we discuss the generalized entropies, what the difference is to linear entropy, and what their properties are in Scale-Space. Following this, in Section 4 we will discuss a physical interpretation of images both from the...
Can the Maximum Entropy Principle Be Explained as a Consistency Requirement?
, 1997
"... The principle of maximumentropy is a general method to assign values to probability distributions on the basis of partial information. This principle, introduced by Jaynes in 1957, forms an extension of the classical principle of insufficient reason. It has been further generalized, both in mathe ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
The principle of maximumentropy is a general method to assign values to probability distributions on the basis of partial information. This principle, introduced by Jaynes in 1957, forms an extension of the classical principle of insufficient reason. It has been further generalized, both in mathematical formulation and in intended scope, into the principle of maximum relative entropy or of minimum information. It has been claimed that these principles are singled out as unique methods of statistical inference that agree with certain compelling consistency requirements. This paper reviews these consistency arguments and the surrounding controversy. It is shown that the uniqueness proofs are flawed, or rest on unreasonably strong assumptions. A more general class of 1 inference rules, maximizing the so-called R'enyi entropies, is exhibited which also fulfill the reasonable part of the consistency assumptions. 1 Introduction In any application of probability theory to the pro...

