Results 1  10
of
69
Updating Probabilities
, 2002
"... As examples such as the Monty Hall puzzle show, applying conditioning to update a probability distribution on a "naive space", which does not take into account the protocol used, can often lead to counterintuitive results. Here we examine why. A criterion known as CAR ("coarsening a ..."
Abstract

Cited by 69 (4 self)
 Add to MetaCart
As examples such as the Monty Hall puzzle show, applying conditioning to update a probability distribution on a "naive space", which does not take into account the protocol used, can often lead to counterintuitive results. Here we examine why. A criterion known as CAR ("coarsening at random") in the statistical literature characterizes when "naive" conditioning in a naive space works. We show that the CAR condition holds rather infrequently, and we provide a procedural characterization of it, by giving a randomized algorithm that generates all and only distributions for which CAR holds. This substantially extends previous characterizations of CAR. We also consider more generalized notions of update such as Jeffrey conditioning and minimizing relative entropy (MRE). We give a generalization of the CAR condition that characterizes when Jeffrey conditioning leads to appropriate answers, and show that there exist some very simple settings in which MRE essentially never gives the right results. This generalizes and interconnects previous results obtained in the literature on CAR and MRE.
From Laplace To Supernova Sn 1987a: Bayesian Inference In Astrophysics
, 1990
"... . The Bayesian approach to probability theory is presented as an alternative to the currently used longrun relative frequency approach, which does not offer clear, compelling criteria for the design of statistical methods. Bayesian probability theory offers unique and demonstrably optimal solutions ..."
Abstract

Cited by 68 (2 self)
 Add to MetaCart
. The Bayesian approach to probability theory is presented as an alternative to the currently used longrun relative frequency approach, which does not offer clear, compelling criteria for the design of statistical methods. Bayesian probability theory offers unique and demonstrably optimal solutions to wellposed statistical problems, and is historically the original approach to statistics. The reasons for earlier rejection of Bayesian methods are discussed, and it is noted that the work of Cox, Jaynes, and others answers earlier objections, giving Bayesian inference a firm logical and mathematical foundation as the correct mathematical language for quantifying uncertainty. The Bayesian approaches to parameter estimation and model comparison are outlined and illustrated by application to a simple problem based on the gaussian distribution. As further illustrations of the Bayesian paradigm, Bayesian solutions to two interesting astrophysical problems are outlined: the measurement of wea...
Conditional limit theorems under Markov conditioning
 IEEE Trans. on Information Theory
, 1987
"... variables taking values in a finite set X and consider the conditional joint distribution of the first m elements of the sample Xt;.., X, on the condition that A’, = x, and the sliding block sample average of a function h (.,.) defined on X2 exceeds a threshold OL> Eh ( Xt, X2). For m fixed and ..."
Abstract

Cited by 39 (1 self)
 Add to MetaCart
variables taking values in a finite set X and consider the conditional joint distribution of the first m elements of the sample Xt;.., X, on the condition that A’, = x, and the sliding block sample average of a function h (.,.) defined on X2 exceeds a threshold OL> Eh ( Xt, X2). For m fixed and M + co, this conditional joint distribution is shown to converge to the mstep joint distribution of a Markov chain started in x1 which is closest to X,, X2, in KullbackLeibler information divergence among all Markov chains whose twodimensional stationary distribution P ( ,.) satisfies EP ( x, y) h ( x, y) 2 OL, provided some distribution P on X2 having equal marginals does satisfy this constraint with strict inequality. Similar conditional limit theorems are obtained when X,, X2,... is an arbitrary finiteorder Markov chain and more general conditioning is allowed. S I.
A Global Optimization Technique for Statistical Classifier Design
 IEEE Transactions on Signal Processing
"... A global optimization method is introduced for the design of statistical classifiers that minimize the rate of misclassification. We first derive the theoretical basis for the method, based on which we develop a novel design algorithm and demonstrate its effectiveness and superior performance in the ..."
Abstract

Cited by 28 (10 self)
 Add to MetaCart
(Show Context)
A global optimization method is introduced for the design of statistical classifiers that minimize the rate of misclassification. We first derive the theoretical basis for the method, based on which we develop a novel design algorithm and demonstrate its effectiveness and superior performance in the design of practical classifiers for some of the most popular structures currently in use. The method, grounded in ideas from statistical physics and information theory, extends the deterministic annealing approach for optimization, both to incorporate structural constraints on data assignments to classes and to minimize the probability of error as the cost objective. During the design, data are assigned to classes in probability, so as to minimize the expected classification error given a specified level of randomness, as measured by Shannon's entropy. The constrained optimization is equivalent to a free energy minimization, motivating a deterministic annealing approach in which the entropy...
Measuring Marginal Risk Contributions in Credit Portfolios
 Journal of Computational Finance
, 2005
"... We consider the problem of decomposing the credit risk in a portfolio into a sum of risk contributions associated with individual obligors or transactions. For some standard measures of risk – including valueatrisk and expected shortfall – the total risk can be usefully decomposed into a sum of ma ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
We consider the problem of decomposing the credit risk in a portfolio into a sum of risk contributions associated with individual obligors or transactions. For some standard measures of risk – including valueatrisk and expected shortfall – the total risk can be usefully decomposed into a sum of marginal risk contributions from individual obligors. Each marginal risk contribution is the conditional expected loss from that obligor, conditional on a large loss for the full portfolio. We develop methods for calculating or approximating these conditional expectations. Ordinary Monte Carlo estimation is impractical for this problem because the conditional expectations defining the marginal risk contributions are conditioned on rare events. We develop three techniques to address this difficulty. First, we develop importance sampling estimators specifically designed for conditioning on large losses. Next, we use the analysis underlying the importance sampling technique to develop a hybrid method that combines an approximation with Monte Carlo. Finally, we take this approach a step further and develop a rough but fast approximation that dispenses entirely with Monte Carlo. We develop these methods in the Gaussian copula framework and illustrate their performance in multifactor models. 1
A New Look at the Entropy for Solving Linear Inverse Problems
 IEEE Transactions on Information Theory
, 1994
"... Entropybased methods are widely used for solving inverse problems, especially when the solution is known to be positive. We address here the linear illposed and noisy inverse problems y = Ax + n with a more general convex constraint x 2 C, where C is a convex set. Although projective methods ar ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
(Show Context)
Entropybased methods are widely used for solving inverse problems, especially when the solution is known to be positive. We address here the linear illposed and noisy inverse problems y = Ax + n with a more general convex constraint x 2 C, where C is a convex set. Although projective methods are well adapted to this context, we study here alternative methods which rely highly on some "informationbased" criteria. Our goal is to enlight the role played by entropy in this frame, and to present a new and deeper point of view on the entropy, using general tools and results of convex analysis and large deviations theory. Then, we present a new and large scheme of entropicbased inversion of linearnoisy inverse problems. This scheme was introduced by Navaza in 1985 [48] in connection with a physical modeling for crystallographic applications, and further studied by DacunhaCastelle and Gamboa [13]. Important features of this paper are (i) a unified presentation of many well kno...
A Logic for Default Reasoning About Probabilities
, 1998
"... A logic is defined that allows to express information about statistical probabilities and about degrees of belief in specific propositions. By interpreting the two types of probabilities in one common probability space, the semantics given are well suited to model the in uence of statistical informa ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
A logic is defined that allows to express information about statistical probabilities and about degrees of belief in specific propositions. By interpreting the two types of probabilities in one common probability space, the semantics given are well suited to model the in uence of statistical information on the formation of subjective beliefs. Cross entropy minimization is a key element in these semantics, the use of which is justified by showing that the resulting logic exhibits some very reasonable properties.
The common patterns of nature
, 2009
"... We typically observe largescale outcomes that arise from the interactions of many hidden, smallscale processes. Examples include age of disease onset, rates of amino acid substitutions and composition of ecological communities. The macroscopic patterns in each problem often vary around a charact ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
We typically observe largescale outcomes that arise from the interactions of many hidden, smallscale processes. Examples include age of disease onset, rates of amino acid substitutions and composition of ecological communities. The macroscopic patterns in each problem often vary around a characteristic shape that can be generated by neutral processes. A neutral generative model assumes that each microscopic process follows unbiased or random stochastic fluctuations: random connections of network nodes; amino acid substitutions with no effect on fitness; species that arise or disappear from communities randomly. These neutral generative models often match common patterns of nature. In this paper, I present the theoretical background by which we can understand why these neutral generative models are so successful. I show where the classic patterns come from, such as the Poisson pattern, the normal or Gaussian pattern and many others. Each classic pattern was often discovered by a simple neutral generative model. The neutral patterns share a special characteristic: they describe the patterns of nature that follow from simple constraints on information. For example, any aggregation of processes that preserves information only about the mean and variance attracts to the Gaussian pattern; any aggregation that preserves information only about the mean attracts to the exponential pattern; any aggregation that preserves information only about the geometric mean attracts to the power law pattern. I present a simple and consistent informational framework of the common patterns of nature based on the method of maximum entropy. This framework shows that each neutral generative model is a special case that helps to discover a particular set of informational constraints; those informational constraints define a much wider domain of nonneutral generative processes that attract to the same neutral pattern.
Network Design and Control Using OnOff and Multilevel Source Traffic Models with LongTailed Distributions
 AT&T Labs
, 1997
"... A major challenge for designing and controlling emerging highspeed integratedservices communication networks is to develop methods for analyzing more realistic source traffic models that are consistent with recent traffic measurements. We consider the familiar onoff source traffic model, but we a ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
A major challenge for designing and controlling emerging highspeed integratedservices communication networks is to develop methods for analyzing more realistic source traffic models that are consistent with recent traffic measurements. We consider the familiar onoff source traffic model, but we allow the on and off times to have longtailed distributions such as the Pareto and Weibull distributions. We also consider a more general traffic model in which the required bandwidth (arrival rate) as a function of time for each source is represented as the sum of two stochastic processes: (1) a macroscopic (longertimescale) level process and (2) a microscopic (shortertimescale) withinlevel variation process. We let the level process be a finitestate semiMarkov process (SMP), allowing general (possibly longtailed) level holdingtime distributions, and we let the withinlevel variation process be a zeromean piecewisestationary process. However, the fine structure of the withinleve...
Maximum entropy density estimation and modeling geographic distributions of species
, 2007
"... Maximum entropy (maxent) approach, formally equivalent to maximum likelihood, is a widely used densityestimation method. When input datasets are small, maxent is likely to overfit. Overfitting can be eliminated by various smoothing techniques, such as regularization and constraint relaxation, but t ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Maximum entropy (maxent) approach, formally equivalent to maximum likelihood, is a widely used densityestimation method. When input datasets are small, maxent is likely to overfit. Overfitting can be eliminated by various smoothing techniques, such as regularization and constraint relaxation, but theory explaining their properties is often missing or needs to be derived for each case separately. In this dissertation, we propose a unified treatment for a large and general class of smoothing techniques. We provide fully general guarantees on their statistical performance and propose optimization algorithms with complete convergence proofs. As special cases, we can easily derive performance guarantees for many known regularization types including L1 and L2squared regularization. Furthermore, our general approach enables us to derive entirely new regularization functions with superior statistical guarantees. The new regularization functions use information about the structure of the feature space, incorporate information about sample selection bias, and combine information across several related densityestimation tasks. We propose algorithms solving a large and general subclass of generalized maxent problems, including all