Results 1  10
of
250
A maximum entropy model of phonotactics and phonotactic learning
, 2006
"... The study of phonotactics (e.g., the ability of English speakers to distinguish possible words like blick from impossible words like *bnick) is a central topic in phonology. We propose a theory of phonotactic grammars and a learning algorithm that constructs such grammars from positive evidence. Our ..."
Abstract

Cited by 132 (15 self)
 Add to MetaCart
(Show Context)
The study of phonotactics (e.g., the ability of English speakers to distinguish possible words like blick from impossible words like *bnick) is a central topic in phonology. We propose a theory of phonotactic grammars and a learning algorithm that constructs such grammars from positive evidence. Our grammars consist of constraints that are assigned numerical weights according to the principle of maximum entropy. Possible words are assessed by these grammars based on the weighted sum of their constraint violations. The learning algorithm yields grammars that can capture both categorical and gradient phonotactic patterns. The algorithm is not provided with any constraints in advance, but uses its own resources to form constraints and weight them. A baseline model, in which Universal Grammar is reduced to a feature set and an SPEstyle constraint format, suffices to learn many phonotactic phenomena. In order to learn nonlocal phenomena such as stress and vowel harmony, it is necessary to augment the model with autosegmental tiers and metrical grids. Our results thus offer novel, learningtheoretic support for such representations. We apply the model to English syllable onsets, Shona vowel harmony, quantityinsensitive stress typology, and the full phonotactics of Wargamay, showing that the learned grammars capture the distributional generalizations of these languages and accurately predict the findings of a phonotactic experiment.
From Laplace To Supernova Sn 1987a: Bayesian Inference In Astrophysics
, 1990
"... . The Bayesian approach to probability theory is presented as an alternative to the currently used longrun relative frequency approach, which does not offer clear, compelling criteria for the design of statistical methods. Bayesian probability theory offers unique and demonstrably optimal solutions ..."
Abstract

Cited by 68 (2 self)
 Add to MetaCart
. The Bayesian approach to probability theory is presented as an alternative to the currently used longrun relative frequency approach, which does not offer clear, compelling criteria for the design of statistical methods. Bayesian probability theory offers unique and demonstrably optimal solutions to wellposed statistical problems, and is historically the original approach to statistics. The reasons for earlier rejection of Bayesian methods are discussed, and it is noted that the work of Cox, Jaynes, and others answers earlier objections, giving Bayesian inference a firm logical and mathematical foundation as the correct mathematical language for quantifying uncertainty. The Bayesian approaches to parameter estimation and model comparison are outlined and illustrated by application to a simple problem based on the gaussian distribution. As further illustrations of the Bayesian paradigm, Bayesian solutions to two interesting astrophysical problems are outlined: the measurement of wea...
Bayesian Fundamentalism or Enlightenment? On the explanatory status and theoretical contributions of Bayesian models of cognition
 Behavioral and Brain Sciences
, 2011
"... To be published in Behavioral and Brain Sciences (in press) ..."
Abstract

Cited by 41 (1 self)
 Add to MetaCart
(Show Context)
To be published in Behavioral and Brain Sciences (in press)
A Natural Law of Succession
, 1995
"... Consider the following problem. You are given an alphabet of k distinct symbols and are told that the i th symbol occurred exactly ni times in the past. On the basis of this information alone, you must now estimate the conditional probability that the next symbol will be i. In this report, we presen ..."
Abstract

Cited by 41 (3 self)
 Add to MetaCart
Consider the following problem. You are given an alphabet of k distinct symbols and are told that the i th symbol occurred exactly ni times in the past. On the basis of this information alone, you must now estimate the conditional probability that the next symbol will be i. In this report, we present a new solution to this fundamental problem in statistics and demonstrate that our solution outperforms standard approaches, both in theory and in practice.
The WellPosed Problem
 Foundations of Physics
, 1973
"... distributions obtained from transformation groups, using as our main example the famous paradox of Bertrand. Bertrand's problem (Bertrand, 1889) was stated originally in terms of drawing a straight line "at random" intersecting a circle. It will be helpful to think of this in a more ..."
Abstract

Cited by 37 (0 self)
 Add to MetaCart
(Show Context)
distributions obtained from transformation groups, using as our main example the famous paradox of Bertrand. Bertrand's problem (Bertrand, 1889) was stated originally in terms of drawing a straight line "at random" intersecting a circle. It will be helpful to think of this in a more concrete way; presumably, we do no violence to the problem (i.e., it is still just as "random") if we suppose that we are tossing straws onto the circle, without specifying how they are tossed. We therefore formulate the problem as follows. A long straw is tossed at random onto a circle; given that it falls so that it intersects the circle, what is the probability that the chord thus defined is longer than a side of the inscribed equilateral triangle? Since Bertrand proposed it in 1889 this problem has been cited to generations of students to demonstrate that Laplace's "principle of indifference" contains logical inconsistencies. For, there appear to be many ways of defining "equally possibl
The formal definition of reference priors
 ANN. STATIST
, 2009
"... Reference analysis produces objective Bayesian inference, in the sense that inferential statements depend only on the assumed model and the available data, and the prior distribution used to make an inference is least informative in a certain informationtheoretic sense. Reference priors have been r ..."
Abstract

Cited by 37 (2 self)
 Add to MetaCart
(Show Context)
Reference analysis produces objective Bayesian inference, in the sense that inferential statements depend only on the assumed model and the available data, and the prior distribution used to make an inference is least informative in a certain informationtheoretic sense. Reference priors have been rigorously defined in specific contexts and heuristically defined in general, but a rigorous general definition has been lacking. We produce a rigorous general definition here and then show how an explicit expression for the reference prior can be obtained under very weak regularity conditions. The explicit expression can be used to derive new reference priors both analytically and numerically.
Can the Maximum Entropy Principle Be Explained as a Consistency Requirement?
, 1997
"... The principle of maximumentropy is a general method to assign values to probability distributions on the basis of partial information. This principle, introduced by Jaynes in 1957, forms an extension of the classical principle of insufficient reason. It has been further generalized, both in mathe ..."
Abstract

Cited by 33 (1 self)
 Add to MetaCart
The principle of maximumentropy is a general method to assign values to probability distributions on the basis of partial information. This principle, introduced by Jaynes in 1957, forms an extension of the classical principle of insufficient reason. It has been further generalized, both in mathematical formulation and in intended scope, into the principle of maximum relative entropy or of minimum information. It has been claimed that these principles are singled out as unique methods of statistical inference that agree with certain compelling consistency requirements. This paper reviews these consistency arguments and the surrounding controversy. It is shown that the uniqueness proofs are flawed, or rest on unreasonably strong assumptions. A more general class of 1 inference rules, maximizing the socalled R'enyi entropies, is exhibited which also fulfill the reasonable part of the consistency assumptions. 1 Introduction In any application of probability theory to the pro...
An Evolutionary Algorithm for Integer Programming
 Parallel Problem Solving from Nature  PPSN III, Lecture Notes in Computer Science
, 1994
"... . The mutation distribution of evolutionary algorithms usually is oriented at the type of the search space. Typical examples are binomial distributions for binary strings in genetic algorithms or normal distributions for real valued vectors in evolution strategies and evolutionary programming. This ..."
Abstract

Cited by 31 (4 self)
 Add to MetaCart
(Show Context)
. The mutation distribution of evolutionary algorithms usually is oriented at the type of the search space. Typical examples are binomial distributions for binary strings in genetic algorithms or normal distributions for real valued vectors in evolution strategies and evolutionary programming. This paper is devoted to the construction of a mutation distribution for unbounded integer search spaces. The principle of maximum entropy is used to select a specific distribution from numerous potential candidates. The resulting evolutionary algorithm is tested for five nonlinear integer problems. 1 Introduction Evolutionary algorithms (EAs) represent a class of stochastic optimization algorithms in which principles of organic evolution are regarded as rules in optimization. They are often applied to real parameter optimization problems [2] when specialized techniques are not available or standard methods fail to give satisfactory answers due to multimodality, nondifferentiability or discontin...
Lattice duality: The origin of probability and entropy
 In press: Neurocomputing
, 2005
"... Bayesian probability theory is an inference calculus, which originates from a generalization of inclusion on the Boolean lattice of logical assertions to a degree of inclusion represented by a real number. Dual to this lattice is the distributive lattice of questions constructed from the ordered set ..."
Abstract

Cited by 31 (10 self)
 Add to MetaCart
(Show Context)
Bayesian probability theory is an inference calculus, which originates from a generalization of inclusion on the Boolean lattice of logical assertions to a degree of inclusion represented by a real number. Dual to this lattice is the distributive lattice of questions constructed from the ordered set of downsets of assertions, which forms the foundation of the calculus of inquiry—a generalization of information theory. In this paper we introduce this novel perspective on these spaces in which machine learning is performed and discuss the relationship between these results and several proposed generalizations of information theory in the literature.