Results 1  10
of
53
Prior Probabilities
 IEEE Transactions on Systems Science and Cybernetics
, 1968
"... e case of location and scale parameters, rate constants, and in Bernoulli trials with unknown probability of success. In realistic problems, both the transformation group analysis and the principle of maximum entropy are needed to determine the prior. The distributions thus found are uniquely determ ..."
Abstract

Cited by 251 (4 self)
 Add to MetaCart
e case of location and scale parameters, rate constants, and in Bernoulli trials with unknown probability of success. In realistic problems, both the transformation group analysis and the principle of maximum entropy are needed to determine the prior. The distributions thus found are uniquely determined by the prior information, independently of the choice of parameters. In a certain class of problems, therefore, the prior distributions may now be claimed to be fully as "objective" as the sampling distributions. I. Background of the problem Since the time of Laplace, applications of probability theory have been hampered by difficulties in the treatment of prior information. In realistic problems of decision or inference, we often have prior information which is highly relevant to the question being asked; to fail to take it into account is to commit the most obvious inconsistency of reasoning and may lead to absurd or dangerously misleading results. As an extreme examp
A bayesian approach for blind separation of sparse sources
 IEEE Transactions on Speech and Audio Processing
, 2005
"... We present a Bayesian approach for blind separation of linear instantaneous mixtures of sources having a sparse representation in a given basis. The distributions of the coefficients of the sources in the basis are modeled by a Student t distribution, which can be expressed as a Scale Mixture of Gau ..."
Abstract

Cited by 67 (10 self)
 Add to MetaCart
(Show Context)
We present a Bayesian approach for blind separation of linear instantaneous mixtures of sources having a sparse representation in a given basis. The distributions of the coefficients of the sources in the basis are modeled by a Student t distribution, which can be expressed as a Scale Mixture of Gaussians, and a Gibbs sampler is derived to estimate the sources, the mixing matrix, the input noise variance and also the hyperparameters of the Student t distributions. The method allows for separation of underdetermined (more sources than sensors) noisy mixtures. Results are presented with audio signals using a Modified Discrete Cosine Transfrom basis and compared with a finite mixture of Gaussians prior approach. These results show the improved sound quality obtained with the Student t prior and the better robustness to mixing matrices close to singularity of the Markov Chains Monte Carlo approach.
Syntactic Measures of Complexity
, 1999
"... page 14 Declaration  page 15 Notes of copyright and the ownership of intellectual property rights  page 15 The Author  page 16 Acknowledgements  page 16 1  Introduction  page 17 1.1  Background  page 17 1.2  The Style of Approach  page 18 1.3  Motivation  page 19 1.4  Style of ..."
Abstract

Cited by 40 (2 self)
 Add to MetaCart
(Show Context)
page 14 Declaration  page 15 Notes of copyright and the ownership of intellectual property rights  page 15 The Author  page 16 Acknowledgements  page 16 1  Introduction  page 17 1.1  Background  page 17 1.2  The Style of Approach  page 18 1.3  Motivation  page 19 1.4  Style of Presentation  page 20 1.5  Outline of the Thesis  page 21 2  Models and Modelling  page 23 2.1  Some Types of Models  page 25 2.2  Combinations of Models  page 28 2.3  Parts of the Modelling Apparatus  page 33 2.4  Models in Machine Learning  page 38 2.5  The Philosophical Background to the Rest of this Thesis  page 41 Syntactic Measures of Complexity  page 3  3  Problems and Properties  page 44 3.1  Examples of Common Usage  page 44 3.1.1  A case of nails  page 44 3.1.2  Writing a thesis  page 44 3.1.3  Mathematics  page 44 3.1.4  A gas  page 44 3.1.5  An ant hill  page 45 3.1.6  A car engine  page 45 3.1.7  A cell as part of an organism ...
When did Bayesian inference become “Bayesian"?
 BAYESIAN ANALYSIS
, 2006
"... While Bayes’ theorem has a 250year history, and the method of inverse probability that flowed from it dominated statistical thinking into the twentieth century, the adjective “Bayesian” was not part of the statistical lexicon until relatively recently. This paper provides an overview of key Bayesi ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
While Bayes’ theorem has a 250year history, and the method of inverse probability that flowed from it dominated statistical thinking into the twentieth century, the adjective “Bayesian” was not part of the statistical lexicon until relatively recently. This paper provides an overview of key Bayesian developments, beginning with Bayes’ posthumously published 1763 paper and continuing up through approximately 1970, including the period of time when “Bayesian” emerged as the label of choice for those who advocated Bayesian methods.
Robust Bayesianism: Relation to evidence theory
 J. Advances in Information Fusion
"... We are interested in understanding the relationship between Bayesian inference and evidence theory. The concept of a set of probability distributions is central both in robust Bayesian analysis and in some versions of DempsterShafer’s evidence theory. We interpret imprecise probabilities as impreci ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
We are interested in understanding the relationship between Bayesian inference and evidence theory. The concept of a set of probability distributions is central both in robust Bayesian analysis and in some versions of DempsterShafer’s evidence theory. We interpret imprecise probabilities as imprecise posteriors obtainable from imprecise likelihoods and priors, both of which are convex sets that can be considered as evidence and represented with, e.g., DSstructures. Likelihoods and prior are in Bayesian analysis combined with Laplace’s parallel composition. The natural and simple robust combination operator makes all pairwise combinations of elements from the two sets representing prior and likelihood. Our proposed combination operator is unique, and it has interesting normative and factual properties. We compare its behavior with other proposed fusion rules, and earlier efforts to reconcile Bayesian analysis and evidence theory. The behavior of the robust rule is consistent with the behavior of Fixsen/Mahler’s modified Dempster’s (MDS) rule, but not with Dempster’s rule. The Bayesian framework is liberal in allowing all significant uncertainty concepts to be modeled and taken care of and is therefore a viable, but probably not the only, unifying structure that can be economically taught and in which alternative solutions can be modeled, compared and explained. Manuscript received April 20, 2006; released for publication April
On the principle of maximum entropy and the earthquake frequencymagnitude relation. Geophys
 J. Astr. Soc
, 1983
"... Summary. The entropy S for a continuous distribution p (x) is defined by where m ( x) is the ‘prior distribution ’ representing our ‘complete ignorance’ about x. The difficulty in using this definition arises from the problem of ‘arbitrariness ’ or ‘subjectiveness ’ in assigning the prior distribut ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Summary. The entropy S for a continuous distribution p (x) is defined by where m ( x) is the ‘prior distribution ’ representing our ‘complete ignorance’ about x. The difficulty in using this definition arises from the problem of ‘arbitrariness ’ or ‘subjectiveness ’ in assigning the prior distribution m (x). Thus in maximum entropy inference, it is customary to arbitrarily adopt a uniform prior distribution and write the entropy as This expression, however, is a measure of uncertainty relative to the coordinate x so that the probability distribution p ( ~ ) generated from the principle of maximum entropy depends on the choice of x. Only when the chosen parameter actually has a uniform prior distribution, can we expect the generated distribution to conform with the empirical data. For a physical system in which the independent variable x is measured to only limited accuracy, the prior distribution m ( x) can be shown to be inversely proportional to the measurement error of x. A parameter with uniform prior distribution, then, is one that can be measured with equal accuracy throughout its range. In this context, the magnitude of an earthquake is such a parameter because using this parameter in the principle of maximum entropy leads to the empirically determined GutenbergRichter frequencymagnitude relation. Other proposed frequencymagnitude relations can also be generated from the principle of maximum entropy by imposing appropriate constraints. However, it is emphasized that such relations are generated from the principle as null hypotheses, to be tested by the empirical regional or global seismicity data.
Robust Bayesianism: Imprecise and Paradoxical Reasoning
, 2004
"... We are interested in understanding the relationship between Bayesian inference and evidence theory, in particular imprecise and paradoxical reasoning. The concept of a set of probability distributions is central both in robust Bayesian analysis and in some versions of DempsterShafer theory. Most of ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
We are interested in understanding the relationship between Bayesian inference and evidence theory, in particular imprecise and paradoxical reasoning. The concept of a set of probability distributions is central both in robust Bayesian analysis and in some versions of DempsterShafer theory. Most of the literature regards these two theories as incomparable. We interpret imprecise probabilities as imprecise posteriors obtainable from imprecise likelihoods and priors, both of which can be considered as evidence and represented with, e.g., DSstructures. The natural and simple robust combination operator makes all pairwise combinations of elements from the two sets. The DSstructures can represent one particular family of imprecise distributions, Choquet capacities. These are not closed under our combination rule, but can be made so by rounding. The proposed combination operator is unique, and has interesting normative and factual properties. We compare its behavior on Zadeh's example with other proposed fusion rules. We also show how the paradoxical reasoning method appears in the robust framework.
Quantifying parsimony in structural equation modeling
 Multivariate Behavioral Research
, 2006
"... Fitting propensity (FP) is defined as a model’s average ability to fit diverse data patterns, all else being equal. The relevance of FP to model selection is examined in the context of structural equation modeling (SEM). In SEM it is well known that the number of free model parameters influences FP ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Fitting propensity (FP) is defined as a model’s average ability to fit diverse data patterns, all else being equal. The relevance of FP to model selection is examined in the context of structural equation modeling (SEM). In SEM it is well known that the number of free model parameters influences FP, but other facets of FP are routinely excluded from consideration. It is shown that models possessing the same number of free parameters but different structures may exhibit different FPs. The consequences of this fact are demonstrated using illustrative examples and models culled from published research. The case is made that further attention should be given to quantifying FP in SEM and considering it in model selection. Practical approaches are suggested. Models are commonly constructed in an attempt to approximate or explain some process of scientific interest that cannot be directly observed. The ability to predict other (or future) data arising from the same latent process is often seen as a mark of a model’s usefulness or quality, and it is commonly assumed that a model’s fit to a given sample provides a good clue to this predictive ability.1 But it is also recognized that some models are simply better able to fit data than other, more parsimonious models; that is, competing models often differ in terms of their fitting propensity (FP), or average ability to fit data. Consequently, model fit adjusted for FP is often used as a way to distinguish between competing models, taking into account differences in model parsimony. Adjusted fit is traditionally quantified by combining two properties of a model: parsimony and goodness of fit. In this article,
What is the Problem of Simplicity?
"... Abstract: The problem of simplicity involves three questions: How is the simplicity of a hypothesis to be measured? How is the use of simplicity as a guide to hypothesis choice to be justified? And how is simplicity related to other desirable features of hypotheses that is, how is simplicity to be ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Abstract: The problem of simplicity involves three questions: How is the simplicity of a hypothesis to be measured? How is the use of simplicity as a guide to hypothesis choice to be justified? And how is simplicity related to other desirable features of hypotheses that is, how is simplicity to be tradedoff? The present paper explores these three questions, from a variety of viewpoints, including Bayesianism, likelihoodism, and the framework of predictive accuracy formulated by Akaike (1973). It may turn out that simplicity has no global justification that its justification varies from problem to problem. Scientists sometimes choose between rival hypotheses on the basis of their simplicity. Nonscientists do the same thing; this is no surprise, given that the methods used in science often reflect patterns of reasoning that are at work in everyday life. When people choose the simpler of two theories, this “choosing ” can mean different things. The simpler theory may be chosen because it is aesthetically more pleasing, because it is easier to understand or remember, or because it is easier to test. However, when philosophers talk about the “problem of simplicity,” they usually are thinking about another sort of choosing. The idea is that choosing the simpler 1 theory means regarding it as more plausible than its more complex rival. Philosophers often describe the role of simplicity in hypothesis choice by talking about the problem of curvefitting. Consider the following experiment. You put a sealed pot on a stove. The pot has a thermometer attached to it as well as a device that measures how much pressure the gas inside exerts on the walls of the pot. You then heat the pot to various temperatures and observe how much pressure there is in the pot. Each temperature reading with its associated pressure reading can be represented as a point in the coordinate system depicted below. The problem is to decide what the general relationship is between temperature and pressure for this system, given the data. Each hypothesis about this general relationship takes the form of a line. Which line is most plausible, given the observations you have made?
Credit Risk Assessment using Statistical and Machine Learning: Basic Methodology and Risk Modeling Applications
, 1997
"... Abstract. Risk assessment of financial intermediaries is an area of renewed interest due to the financial crises of the 1980’s and 90’s. An accurate estimation of risk, and its use in corporate or global financial risk models, could be translated into a more efficient use of resources. One importan ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Abstract. Risk assessment of financial intermediaries is an area of renewed interest due to the financial crises of the 1980’s and 90’s. An accurate estimation of risk, and its use in corporate or global financial risk models, could be translated into a more efficient use of resources. One important ingredient to accomplish this goal is to find accurate predictors of individual risk in the credit portfolios of institutions. In this context we make a comparative analysis of different statistical and machine learning modeling methods of classification on a mortgage loan dataset with the motivation to understand their limitations and potential. We introduced a specific modeling methodology based on the study of error curves. Using stateoftheart modeling techniques we built more than 9,000 models as part of the study. The results show that CART decisiontree models provide the best estimation for default with an average 8.31 % error rate for a training sample of 2,000 records. As a result of the error curve analysis for this model we conclude that if more data were available, approximately 22,000 records, a potential 7.32 % error rate could be achieved. Neural Networks provided the second best results with an average error of 11.00%. The KNearest Neighbor algorithm had an average error rate of 14.95%. These results outperformed the standard Probit algorithm which attained an average error rate of 15.13%.