Results 1  10
of
127
A maximum entropy model of phonotactics and phonotactic learning
, 2006
"... The study of phonotactics (e.g., the ability of English speakers to distinguish possible words like blick from impossible words like *bnick) is a central topic in phonology. We propose a theory of phonotactic grammars and a learning algorithm that constructs such grammars from positive evidence. Our ..."
Abstract

Cited by 76 (13 self)
 Add to MetaCart
The study of phonotactics (e.g., the ability of English speakers to distinguish possible words like blick from impossible words like *bnick) is a central topic in phonology. We propose a theory of phonotactic grammars and a learning algorithm that constructs such grammars from positive evidence. Our grammars consist of constraints that are assigned numerical weights according to the principle of maximum entropy. Possible words are assessed by these grammars based on the weighted sum of their constraint violations. The learning algorithm yields grammars that can capture both categorical and gradient phonotactic patterns. The algorithm is not provided with any constraints in advance, but uses its own resources to form constraints and weight them. A baseline model, in which Universal Grammar is reduced to a feature set and an SPEstyle constraint format, suffices to learn many phonotactic phenomena. In order to learn nonlocal phenomena such as stress and vowel harmony, it is necessary to augment the model with autosegmental tiers and metrical grids. Our results thus offer novel, learningtheoretic support for such representations. We apply the model to English syllable onsets, Shona vowel harmony, quantityinsensitive stress typology, and the full phonotactics of Wargamay, showing that the learned grammars capture the distributional generalizations of these languages and accurately predict the findings of a phonotactic experiment.
From Laplace To Supernova Sn 1987a: Bayesian Inference In Astrophysics
, 1990
"... . The Bayesian approach to probability theory is presented as an alternative to the currently used longrun relative frequency approach, which does not offer clear, compelling criteria for the design of statistical methods. Bayesian probability theory offers unique and demonstrably optimal solutions ..."
Abstract

Cited by 51 (2 self)
 Add to MetaCart
. The Bayesian approach to probability theory is presented as an alternative to the currently used longrun relative frequency approach, which does not offer clear, compelling criteria for the design of statistical methods. Bayesian probability theory offers unique and demonstrably optimal solutions to wellposed statistical problems, and is historically the original approach to statistics. The reasons for earlier rejection of Bayesian methods are discussed, and it is noted that the work of Cox, Jaynes, and others answers earlier objections, giving Bayesian inference a firm logical and mathematical foundation as the correct mathematical language for quantifying uncertainty. The Bayesian approaches to parameter estimation and model comparison are outlined and illustrated by application to a simple problem based on the gaussian distribution. As further illustrations of the Bayesian paradigm, Bayesian solutions to two interesting astrophysical problems are outlined: the measurement of wea...
A Natural Law of Succession
, 1995
"... Consider the following problem. You are given an alphabet of k distinct symbols and are told that the i th symbol occurred exactly ni times in the past. On the basis of this information alone, you must now estimate the conditional probability that the next symbol will be i. In this report, we presen ..."
Abstract

Cited by 35 (3 self)
 Add to MetaCart
Consider the following problem. You are given an alphabet of k distinct symbols and are told that the i th symbol occurred exactly ni times in the past. On the basis of this information alone, you must now estimate the conditional probability that the next symbol will be i. In this report, we present a new solution to this fundamental problem in statistics and demonstrate that our solution outperforms standard approaches, both in theory and in practice.
An Evolutionary Algorithm for Integer Programming
 Parallel Problem Solving from Nature  PPSN III, Lecture Notes in Computer Science
, 1994
"... . The mutation distribution of evolutionary algorithms usually is oriented at the type of the search space. Typical examples are binomial distributions for binary strings in genetic algorithms or normal distributions for real valued vectors in evolution strategies and evolutionary programming. This ..."
Abstract

Cited by 26 (4 self)
 Add to MetaCart
. The mutation distribution of evolutionary algorithms usually is oriented at the type of the search space. Typical examples are binomial distributions for binary strings in genetic algorithms or normal distributions for real valued vectors in evolution strategies and evolutionary programming. This paper is devoted to the construction of a mutation distribution for unbounded integer search spaces. The principle of maximum entropy is used to select a specific distribution from numerous potential candidates. The resulting evolutionary algorithm is tested for five nonlinear integer problems. 1 Introduction Evolutionary algorithms (EAs) represent a class of stochastic optimization algorithms in which principles of organic evolution are regarded as rules in optimization. They are often applied to real parameter optimization problems [2] when specialized techniques are not available or standard methods fail to give satisfactory answers due to multimodality, nondifferentiability or discontin...
SetBased Bayesianism
, 1992
"... . Problems for strict and convex Bayesianism are discussed. A setbased Bayesianism generalizing convex Bayesianism and intervalism is proposed. This approach abandons not only the strict Bayesian requirement of a unique realvalued probability function in any decisionmaking context but also the re ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
. Problems for strict and convex Bayesianism are discussed. A setbased Bayesianism generalizing convex Bayesianism and intervalism is proposed. This approach abandons not only the strict Bayesian requirement of a unique realvalued probability function in any decisionmaking context but also the requirement of convexity for a setbased representation of uncertainty. Levi's Eadmissibility decision criterion is retained and is shown to be applicable in the nonconvex case. Keywords: Uncertainty, decisionmaking, maximum entropy, Bayesian methods. 1. Introduction. The reigning philosophy of uncertainty representation is strict Bayesianism. One of its central principles is that an agent must adopt a single, realvalued probability function over the events recognized as relevant to a given problem. Prescriptions for defining such a function for a given agent in a given situation range from the extreme personalism of deFinetti (1964, 1974) and Savage (1972) to the objective Bayesianism of...
The WellPosed Problem
 Foundations of Physics
, 1973
"... distributions obtained from transformation groups, using as our main example the famous paradox of Bertrand. Bertrand's problem (Bertrand, 1889) was stated originally in terms of drawing a straight line "at random" intersecting a circle. It will be helpful to think of this in a more concrete way; p ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
distributions obtained from transformation groups, using as our main example the famous paradox of Bertrand. Bertrand's problem (Bertrand, 1889) was stated originally in terms of drawing a straight line "at random" intersecting a circle. It will be helpful to think of this in a more concrete way; presumably, we do no violence to the problem (i.e., it is still just as "random") if we suppose that we are tossing straws onto the circle, without specifying how they are tossed. We therefore formulate the problem as follows. A long straw is tossed at random onto a circle; given that it falls so that it intersects the circle, what is the probability that the chord thus defined is longer than a side of the inscribed equilateral triangle? Since Bertrand proposed it in 1889 this problem has been cited to generations of students to demonstrate that Laplace's "principle of indifference" contains logical inconsistencies. For, there appear to be many ways of defining "equally possibl
Decision Making with Belief Functions: Compatibility and Incompatibility with the SureThing Principle
 JOURNAL OF RISK AND UNCERTAINTY, 8:255271 (1994) 9 1994
, 1994
"... This article studies situations in which information is ambiguous and only part of it can be probabilized. It is shown that the information can be modeled through belief functions if and only if the nonprobabilizable information is subject to the principles of complete ignorance. Next the representa ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
This article studies situations in which information is ambiguous and only part of it can be probabilized. It is shown that the information can be modeled through belief functions if and only if the nonprobabilizable information is subject to the principles of complete ignorance. Next the representability of decisions by belief functions on outcomes is justified by means of a neutrality axiom. The natural weakening of Savage's surething principle to unambiguous events is examined and its implications for decision making are identified.
Representation Dependence in Probabilistic Inference
 JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2004
"... Nondeductive reasoning systems are often representation dependent: representing the same situation in two different ways may cause such a system to return two different answers. Some have viewed ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
Nondeductive reasoning systems are often representation dependent: representing the same situation in two different ways may cause such a system to return two different answers. Some have viewed
Lattice duality: The origin of probability and entropy
 In press: Neurocomputing
, 2005
"... Bayesian probability theory is an inference calculus, which originates from a generalization of inclusion on the Boolean lattice of logical assertions to a degree of inclusion represented by a real number. Dual to this lattice is the distributive lattice of questions constructed from the ordered set ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
Bayesian probability theory is an inference calculus, which originates from a generalization of inclusion on the Boolean lattice of logical assertions to a degree of inclusion represented by a real number. Dual to this lattice is the distributive lattice of questions constructed from the ordered set of downsets of assertions, which forms the foundation of the calculus of inquiry—a generalization of information theory. In this paper we introduce this novel perspective on these spaces in which machine learning is performed and discuss the relationship between these results and several proposed generalizations of information theory in the literature.
The Latent Maximum Entropy Principle
 In Proc. of ISIT
, 2002
"... We present an extension to Jaynes' maximum entropy principle that handles latent variables. The principle of latent maximum entropy we propose is di#erent from both Jaynes' maximum entropy principle and maximum likelihood estimation, but often yields better estimates in the presence of hidden vari ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
We present an extension to Jaynes' maximum entropy principle that handles latent variables. The principle of latent maximum entropy we propose is di#erent from both Jaynes' maximum entropy principle and maximum likelihood estimation, but often yields better estimates in the presence of hidden variables and limited training data. We first show that solving for a latent maximum entropy model poses a hard nonlinear constrained optimization problem in general. However, we then show that feasible solutions to this problem can be obtained e#ciently for the special case of loglinear modelswhich forms the basis for an e#cient approximation to the latent maximum entropy principle. We derive an algorithm that combines expectationmaximization with iterative scaling to produce feasible loglinear solutions. This algorithm can be interpreted as an alternating minimization algorithm in the information divergence, and reveals an intimate connection between the latent maximum entropy and maximum likelihood principles.