Results 1 - 10
of
34
Relative Loss Bounds for On-line Density Estimation with the Exponential Family of Distributions
- MACHINE LEARNING
, 2000
"... We consider on-line density estimation with a parameterized density from the exponential family. The on-line algorithm receives one example at a time and maintains a parameter that is essentially an average of the past examples. After receiving an example the algorithm incurs a loss, which is the n ..."
Abstract
-
Cited by 83 (10 self)
- Add to MetaCart
We consider on-line density estimation with a parameterized density from the exponential family. The on-line algorithm receives one example at a time and maintains a parameter that is essentially an average of the past examples. After receiving an example the algorithm incurs a loss, which is the negative loglikelihood of the example with respect to the past parameter of the algorithm. An o-line algorithm can choose the best parameter based on all the examples. We prove bounds on the additional total loss of the on-line algorithm over the total loss of the best o-line parameter. These relative loss bounds hold for an arbitrary sequence of examples. The goal is to design algorithms with the best possible relative loss bounds. We use a Bregman divergence to derive and analyze each algorithm. These divergences are relative entropies between two exponential distributions. We also use our methods to prove relative loss bounds for linear regression.
Bayesian Tests And Model Diagnostics In Conditionally Independent Hierarchical Models
- Journal of the American Statistical Association
, 1994
"... Consider the conditionally independent hierarchical model (CIHM) where observations y i are independently distributed from f(y i j` i ), the parameters ` i are independently distributed from distributions g(`j), and the hyperparameters are distributed according to a distribution h(). The posterior ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Consider the conditionally independent hierarchical model (CIHM) where observations y i are independently distributed from f(y i j` i ), the parameters ` i are independently distributed from distributions g(`j), and the hyperparameters are distributed according to a distribution h(). The posterior distribution of all parameters of the CIHM can be efficiently simulated by Monte Carlo Markov Chain (MCMC) algorithms. Although these simulation algorithms have facilitated the application of CIHM's, they generally have not addressed the problem of computing quantities useful in model selection. This paper explores how MCMC simulation algorithms and other related computational algorithms can be used to compute Bayes factors that are useful in criticizing a particular CIHM. In the case where the CIHM models a belief that the parameters are exchangeable or lie on a regression surface, the Bayes factor can measure the consistency of the data with the structural prior belief. Bayes factors can ...
Multiscale Poisson data smoothing
- J. Roy. Stat. Soc. B
, 2006
"... This paper introduces a framework for nonlinear, multiscale decompositions of Poisson data with piecewise smooth intensity curves. The key concept is conditioning on the sum of the observations that are involved in the computation of a given coefficient. Within this framework, most classical wavelet ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
This paper introduces a framework for nonlinear, multiscale decompositions of Poisson data with piecewise smooth intensity curves. The key concept is conditioning on the sum of the observations that are involved in the computation of a given coefficient. Within this framework, most classical wavelet thresholding schemes for data with additive, homoscedastic noise apply. Any family of wavelet transforms (orthogonal, biorthogonal, second generation) can be incorporated into this framework. The second contribution is a Bayesian shrinkage with an original prior for coefficients of this decomposition. As such, the method combines the advantages of the Fisz-wavelet transform and (Bayesian) Multiscale Likelihood models, with additional benefits, such as the extendibility towards arbitrary wavelet families.
Gibbs sampling, exponential families and orthogonal polynomials
- Statistical Sciences
, 2008
"... Abstract. We give families of examples where sharp rates of convergence to stationarity of the widely used Gibbs sampler are available. The examples involve standard exponential families and their conjugate priors. In each case, the transition operator is explicitly diagonalizable with classical ort ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
Abstract. We give families of examples where sharp rates of convergence to stationarity of the widely used Gibbs sampler are available. The examples involve standard exponential families and their conjugate priors. In each case, the transition operator is explicitly diagonalizable with classical orthogonal polynomials as eigenfunctions. Key words and phrases: Gibbs sampler, running time analyses, exponential families, conjugate priors, location families, orthogonal polynomials, singular value decomposition. 1.
Infinitely Divisible Laws Associated With Hyperbolic Functions
, 2000
"... The infinitely divisible distributions on R + of random variables C t , S t and T t with Laplace transforms ` 1 cosh p 2 ' t ; / p 2 sinh p 2 ! t ; and / tanh p 2 p 2 ! t respectively are characterized for various t ? 0 in a number of different ways: by simple relations bet ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
The infinitely divisible distributions on R + of random variables C t , S t and T t with Laplace transforms ` 1 cosh p 2 ' t ; / p 2 sinh p 2 ! t ; and / tanh p 2 p 2 ! t respectively are characterized for various t ? 0 in a number of different ways: by simple relations between their moments and cumulants, by corresponding relations between the distributions and their L'evy measures, by recursions for their Mellin transforms, and by differential equations satisfied by their Laplace transforms. Some of these results are interpreted probabilistically via known appearances of these distributions for t = 1 or 2 in the description of the laws of various functionals of Brownian motion and Bessel processes, such as the heights and lengths of excursions of a one-dimensional Brownian motion. The distributions of C¹ and S³ are also known to appear in the Mellin representations of two important functions in analytic number theory, the Riemann zeta function and ...
Approaches for Optimal Sequential Decision Analysis in Clinical Trials
, 1997
"... this paper, we offer a fully Bayesian approach to this problem, specifying not only the likelihood and prior distributions, but appropriate loss functions as well. At each data monitoring point, we enumerate the available decisions and investigate the use of backward induction, implemented via Monte ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
this paper, we offer a fully Bayesian approach to this problem, specifying not only the likelihood and prior distributions, but appropriate loss functions as well. At each data monitoring point, we enumerate the available decisions and investigate the use of backward induction, implemented via Monte Carlo methods, to choose the optimal course of action. We then present a forward sampling algorithm that substantially eases the analytic and computational burdens associated with backward induction, offering the possibility of fully Bayesian optimal sequential monitoring for previously untenable numbers of interim looks. We show that forward sampling can always identify the optimal sequential strategy in the case of a one-parameter exponential family with a conjugate prior and monotone loss functions, as well as the best member of a certain class of strategies when backward induction is infeasible. Finally, we illustrate and compare the forward and backward approaches using data from a recent AIDS clinical trial.
Non-Gaussian conditional linear AR(1) models
- Australian and New Zealand Journal of Statistics
, 2000
"... Abstract: We give a general formulation of a non-Gaussian conditional linear AR(1) model subsuming most of the non-Gaussian AR(1) models that have appeared in the literature. We derive some general results giving properties for the stationary process mean, variance and correlation structure, and con ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
Abstract: We give a general formulation of a non-Gaussian conditional linear AR(1) model subsuming most of the non-Gaussian AR(1) models that have appeared in the literature. We derive some general results giving properties for the stationary process mean, variance and correlation structure, and conditions for stationarity. These results highlight similarities and differences with the Gaussian AR(1) model, and unify many separate results appearing in the literature. Examples illustrate the wide range of properties that can appear under the conditional linear autoregressive assumption. These results are used in analysing three real data sets, illustrating general methods of estimation, model diagnostics and model selection. In particular, we show that the theoretical results can be used to develop diagnostics for deciding if a time series can be modelled by some linear autoregressive model, and for selecting among several candidate models.
A fast cross-entropy method for estimating buffer overflows in queueing networks
- Management Science
, 2004
"... In this paper we propose a fast adaptive Importance Sampling method for the efficient simulation of buffer overflow probabilities in queueing networks. The method comprises three stages. First we estimate the minimum Cross-Entropy tilting parameter for a small buffer level; next, we use this as a st ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
In this paper we propose a fast adaptive Importance Sampling method for the efficient simulation of buffer overflow probabilities in queueing networks. The method comprises three stages. First we estimate the minimum Cross-Entropy tilting parameter for a small buffer level; next, we use this as a starting value for the estimation of the optimal tilting parameter for the actual (large) buffer level; finally, the tilting parameter just found is used to estimate the overflow probability of interest. We study various properties of the method in more detail for the M/M/1 queue and conjecture that similar properties also hold for quite general queueing networks. Numer-ical results support this conjecture and demonstrate the high efficiency of the proposed algorithm.
A Model Reference Adaptive Search Method for Global Optimization
- 2007 Oper. Res
, 2008
"... informs ® doi 10.1287/opre.1060.0367 © 2007 INFORMS Model reference adaptive search (MRAS) for solving global optimization problems works with a parameterized probabilistic model on the solution space and generates at each iteration a group of candidate solutions. These candidate solutions are then ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
informs ® doi 10.1287/opre.1060.0367 © 2007 INFORMS Model reference adaptive search (MRAS) for solving global optimization problems works with a parameterized probabilistic model on the solution space and generates at each iteration a group of candidate solutions. These candidate solutions are then used to update the parameters associated with the probabilistic model in such a way that the future search will be biased toward the region containing high-quality solutions. The parameter updating procedure in MRAS is guided by a sequence of implicit probabilistic models we call reference models. We provide a particular algorithm instantiation of the MRAS method, where the sequence of reference models can be viewed as the generalized probability distribution models for estimation of distribution algorithms (EDAs) with proportional selection scheme. In addition, we show that the model reference framework can also be used to describe the recently proposed cross-entropy (CE) method for optimization and to study its properties. Hence, this paper can also be seen as a study on the effectiveness of combining CE and EDAs. We prove global convergence of the proposed algorithm in both continuous and combinatorial domains, and we carry out numerical studies to illustrate the performance of the algorithm.

