Results 1  10
of
212
A maximum entropy model of phonotactics and phonotactic learning
, 2006
"... The study of phonotactics (e.g., the ability of English speakers to distinguish possible words like blick from impossible words like *bnick) is a central topic in phonology. We propose a theory of phonotactic grammars and a learning algorithm that constructs such grammars from positive evidence. Our ..."
Abstract

Cited by 125 (15 self)
 Add to MetaCart
(Show Context)
The study of phonotactics (e.g., the ability of English speakers to distinguish possible words like blick from impossible words like *bnick) is a central topic in phonology. We propose a theory of phonotactic grammars and a learning algorithm that constructs such grammars from positive evidence. Our grammars consist of constraints that are assigned numerical weights according to the principle of maximum entropy. Possible words are assessed by these grammars based on the weighted sum of their constraint violations. The learning algorithm yields grammars that can capture both categorical and gradient phonotactic patterns. The algorithm is not provided with any constraints in advance, but uses its own resources to form constraints and weight them. A baseline model, in which Universal Grammar is reduced to a feature set and an SPEstyle constraint format, suffices to learn many phonotactic phenomena. In order to learn nonlocal phenomena such as stress and vowel harmony, it is necessary to augment the model with autosegmental tiers and metrical grids. Our results thus offer novel, learningtheoretic support for such representations. We apply the model to English syllable onsets, Shona vowel harmony, quantityinsensitive stress typology, and the full phonotactics of Wargamay, showing that the learned grammars capture the distributional generalizations of these languages and accurately predict the findings of a phonotactic experiment.
From Laplace To Supernova Sn 1987a: Bayesian Inference In Astrophysics
, 1990
"... . The Bayesian approach to probability theory is presented as an alternative to the currently used longrun relative frequency approach, which does not offer clear, compelling criteria for the design of statistical methods. Bayesian probability theory offers unique and demonstrably optimal solutions ..."
Abstract

Cited by 61 (2 self)
 Add to MetaCart
. The Bayesian approach to probability theory is presented as an alternative to the currently used longrun relative frequency approach, which does not offer clear, compelling criteria for the design of statistical methods. Bayesian probability theory offers unique and demonstrably optimal solutions to wellposed statistical problems, and is historically the original approach to statistics. The reasons for earlier rejection of Bayesian methods are discussed, and it is noted that the work of Cox, Jaynes, and others answers earlier objections, giving Bayesian inference a firm logical and mathematical foundation as the correct mathematical language for quantifying uncertainty. The Bayesian approaches to parameter estimation and model comparison are outlined and illustrated by application to a simple problem based on the gaussian distribution. As further illustrations of the Bayesian paradigm, Bayesian solutions to two interesting astrophysical problems are outlined: the measurement of wea...
A Natural Law of Succession
, 1995
"... Consider the following problem. You are given an alphabet of k distinct symbols and are told that the i th symbol occurred exactly ni times in the past. On the basis of this information alone, you must now estimate the conditional probability that the next symbol will be i. In this report, we presen ..."
Abstract

Cited by 39 (3 self)
 Add to MetaCart
Consider the following problem. You are given an alphabet of k distinct symbols and are told that the i th symbol occurred exactly ni times in the past. On the basis of this information alone, you must now estimate the conditional probability that the next symbol will be i. In this report, we present a new solution to this fundamental problem in statistics and demonstrate that our solution outperforms standard approaches, both in theory and in practice.
The WellPosed Problem
 Foundations of Physics
, 1973
"... distributions obtained from transformation groups, using as our main example the famous paradox of Bertrand. Bertrand's problem (Bertrand, 1889) was stated originally in terms of drawing a straight line "at random" intersecting a circle. It will be helpful to think of this in a more ..."
Abstract

Cited by 32 (0 self)
 Add to MetaCart
(Show Context)
distributions obtained from transformation groups, using as our main example the famous paradox of Bertrand. Bertrand's problem (Bertrand, 1889) was stated originally in terms of drawing a straight line "at random" intersecting a circle. It will be helpful to think of this in a more concrete way; presumably, we do no violence to the problem (i.e., it is still just as "random") if we suppose that we are tossing straws onto the circle, without specifying how they are tossed. We therefore formulate the problem as follows. A long straw is tossed at random onto a circle; given that it falls so that it intersects the circle, what is the probability that the chord thus defined is longer than a side of the inscribed equilateral triangle? Since Bertrand proposed it in 1889 this problem has been cited to generations of students to demonstrate that Laplace's "principle of indifference" contains logical inconsistencies. For, there appear to be many ways of defining "equally possibl
Bayesian Fundamentalism or Enlightenment? On the explanatory status and theoretical contributions of Bayesian models of cognition
 Behavioral and Brain Sciences
, 2011
"... To be published in Behavioral and Brain Sciences (in press) ..."
Abstract

Cited by 30 (1 self)
 Add to MetaCart
(Show Context)
To be published in Behavioral and Brain Sciences (in press)
SetBased Bayesianism
, 1992
"... . Problems for strict and convex Bayesianism are discussed. A setbased Bayesianism generalizing convex Bayesianism and intervalism is proposed. This approach abandons not only the strict Bayesian requirement of a unique realvalued probability function in any decisionmaking context but also the re ..."
Abstract

Cited by 30 (0 self)
 Add to MetaCart
. Problems for strict and convex Bayesianism are discussed. A setbased Bayesianism generalizing convex Bayesianism and intervalism is proposed. This approach abandons not only the strict Bayesian requirement of a unique realvalued probability function in any decisionmaking context but also the requirement of convexity for a setbased representation of uncertainty. Levi's Eadmissibility decision criterion is retained and is shown to be applicable in the nonconvex case. Keywords: Uncertainty, decisionmaking, maximum entropy, Bayesian methods. 1. Introduction. The reigning philosophy of uncertainty representation is strict Bayesianism. One of its central principles is that an agent must adopt a single, realvalued probability function over the events recognized as relevant to a given problem. Prescriptions for defining such a function for a given agent in a given situation range from the extreme personalism of deFinetti (1964, 1974) and Savage (1972) to the objective Bayesianism of...
An Evolutionary Algorithm for Integer Programming
 Parallel Problem Solving from Nature  PPSN III, Lecture Notes in Computer Science
, 1994
"... . The mutation distribution of evolutionary algorithms usually is oriented at the type of the search space. Typical examples are binomial distributions for binary strings in genetic algorithms or normal distributions for real valued vectors in evolution strategies and evolutionary programming. This ..."
Abstract

Cited by 30 (4 self)
 Add to MetaCart
(Show Context)
. The mutation distribution of evolutionary algorithms usually is oriented at the type of the search space. Typical examples are binomial distributions for binary strings in genetic algorithms or normal distributions for real valued vectors in evolution strategies and evolutionary programming. This paper is devoted to the construction of a mutation distribution for unbounded integer search spaces. The principle of maximum entropy is used to select a specific distribution from numerous potential candidates. The resulting evolutionary algorithm is tested for five nonlinear integer problems. 1 Introduction Evolutionary algorithms (EAs) represent a class of stochastic optimization algorithms in which principles of organic evolution are regarded as rules in optimization. They are often applied to real parameter optimization problems [2] when specialized techniques are not available or standard methods fail to give satisfactory answers due to multimodality, nondifferentiability or discontin...
Decision Making with Belief Functions: Compatibility and Incompatibility with the SureThing Principle
 JOURNAL OF RISK AND UNCERTAINTY, 8:255271 (1994) 9 1994
, 1994
"... This article studies situations in which information is ambiguous and only part of it can be probabilized. It is shown that the information can be modeled through belief functions if and only if the nonprobabilizable information is subject to the principles of complete ignorance. Next the representa ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
This article studies situations in which information is ambiguous and only part of it can be probabilized. It is shown that the information can be modeled through belief functions if and only if the nonprobabilizable information is subject to the principles of complete ignorance. Next the representability of decisions by belief functions on outcomes is justified by means of a neutrality axiom. The natural weakening of Savage's surething principle to unambiguous events is examined and its implications for decision making are identified.
Can the Maximum Entropy Principle Be Explained as a Consistency Requirement?
, 1997
"... The principle of maximumentropy is a general method to assign values to probability distributions on the basis of partial information. This principle, introduced by Jaynes in 1957, forms an extension of the classical principle of insufficient reason. It has been further generalized, both in mathe ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
The principle of maximumentropy is a general method to assign values to probability distributions on the basis of partial information. This principle, introduced by Jaynes in 1957, forms an extension of the classical principle of insufficient reason. It has been further generalized, both in mathematical formulation and in intended scope, into the principle of maximum relative entropy or of minimum information. It has been claimed that these principles are singled out as unique methods of statistical inference that agree with certain compelling consistency requirements. This paper reviews these consistency arguments and the surrounding controversy. It is shown that the uniqueness proofs are flawed, or rest on unreasonably strong assumptions. A more general class of 1 inference rules, maximizing the socalled R'enyi entropies, is exhibited which also fulfill the reasonable part of the consistency assumptions. 1 Introduction In any application of probability theory to the pro...
Application of Bayesian inference to fMRI data analysis
 IEEE Transactions on Medical Imaging
, 1999
"... Abstract—The methods of Bayesian statistics are applied to the analysis of fMRI data. Three specific models are examined. The first is the familiar linear model with white Gaussian noise. In this section, the Jeffreys ’ Rule for noninformative prior distributions is stated and it is shown how the po ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
(Show Context)
Abstract—The methods of Bayesian statistics are applied to the analysis of fMRI data. Three specific models are examined. The first is the familiar linear model with white Gaussian noise. In this section, the Jeffreys ’ Rule for noninformative prior distributions is stated and it is shown how the posterior distribution may be used to infer activation in individual pixels. Next, linear timeinvariant (LTI) systems are introduced as an example of statistical models with nonlinear parameters. It is shown that the Bayesian approach can lead to quite complex bimodal distributions of the parameters when the specific case of a delta function response with a spatially varying delay is analyzed. Finally, a linear model with autoregressive noise is discussed as an alternative to that with uncorrelated white Gaussian noise. The analysis isolates those pixels that have significant temporal correlation under the model. It is shown that the number of pixels that have a significantly large autoregression parameter is dependent on the terms used to account for confounding effects. Index Terms — Autoregressive modeling, Bayesian statistics, functional MRI data analysis, linear timeinvariant systems.