Results 1  10
of
55
Relative Loss Bounds for Online Density Estimation with the Exponential Family of Distributions
 MACHINE LEARNING
, 2000
"... We consider online density estimation with a parameterized density from the exponential family. The online algorithm receives one example at a time and maintains a parameter that is essentially an average of the past examples. After receiving an example the algorithm incurs a loss, which is the n ..."
Abstract

Cited by 116 (11 self)
 Add to MetaCart
We consider online density estimation with a parameterized density from the exponential family. The online algorithm receives one example at a time and maintains a parameter that is essentially an average of the past examples. After receiving an example the algorithm incurs a loss, which is the negative loglikelihood of the example with respect to the past parameter of the algorithm. An oline algorithm can choose the best parameter based on all the examples. We prove bounds on the additional total loss of the online algorithm over the total loss of the best oline parameter. These relative loss bounds hold for an arbitrary sequence of examples. The goal is to design algorithms with the best possible relative loss bounds. We use a Bregman divergence to derive and analyze each algorithm. These divergences are relative entropies between two exponential distributions. We also use our methods to prove relative loss bounds for linear regression.
Multiscale Poisson data smoothing
 J. Roy. Stat. Soc. B
, 2006
"... This paper introduces a framework for nonlinear, multiscale decompositions of Poisson data with piecewise smooth intensity curves. The key concept is conditioning on the sum of the observations that are involved in the computation of a given coefficient. Within this framework, most classical wavelet ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
This paper introduces a framework for nonlinear, multiscale decompositions of Poisson data with piecewise smooth intensity curves. The key concept is conditioning on the sum of the observations that are involved in the computation of a given coefficient. Within this framework, most classical wavelet thresholding schemes for data with additive, homoscedastic noise apply. Any family of wavelet transforms (orthogonal, biorthogonal, second generation) can be incorporated into this framework. The second contribution is a Bayesian shrinkage with an original prior for coefficients of this decomposition. As such, the method combines the advantages of the Fiszwavelet transform and (Bayesian) Multiscale Likelihood models, with additional benefits, such as the extendibility towards arbitrary wavelet families.
Gibbs sampling, exponential families and orthogonal polynomials
 Statistical Sciences
, 2008
"... Abstract. We give families of examples where sharp rates of convergence to stationarity of the widely used Gibbs sampler are available. The examples involve standard exponential families and their conjugate priors. In each case, the transition operator is explicitly diagonalizable with classical ort ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
Abstract. We give families of examples where sharp rates of convergence to stationarity of the widely used Gibbs sampler are available. The examples involve standard exponential families and their conjugate priors. In each case, the transition operator is explicitly diagonalizable with classical orthogonal polynomials as eigenfunctions. Key words and phrases: Gibbs sampler, running time analyses, exponential families, conjugate priors, location families, orthogonal polynomials, singular value decomposition. 1.
Infinitely Divisible Laws Associated With Hyperbolic Functions
, 2000
"... The infinitely divisible distributions on R + of random variables C t , S t and T t with Laplace transforms ` 1 cosh p 2 ' t ; / p 2 sinh p 2 ! t ; and / tanh p 2 p 2 ! t respectively are characterized for various t ? 0 in a number of different ways: by simple relations bet ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
The infinitely divisible distributions on R + of random variables C t , S t and T t with Laplace transforms ` 1 cosh p 2 ' t ; / p 2 sinh p 2 ! t ; and / tanh p 2 p 2 ! t respectively are characterized for various t ? 0 in a number of different ways: by simple relations between their moments and cumulants, by corresponding relations between the distributions and their L'evy measures, by recursions for their Mellin transforms, and by differential equations satisfied by their Laplace transforms. Some of these results are interpreted probabilistically via known appearances of these distributions for t = 1 or 2 in the description of the laws of various functionals of Brownian motion and Bessel processes, such as the heights and lengths of excursions of a onedimensional Brownian motion. The distributions of C¹ and S³ are also known to appear in the Mellin representations of two important functions in analytic number theory, the Riemann zeta function and ...
An Iterative Monte Carlo Method for Nonconjugate Bayesian Analysis
 Statistics and Computing
, 1991
"... The Gibbs sampler has been proposed as a general method for Bayesian calculation in Gelfand and Smith (1990). However, the predominance of experience to date resides in applications assuming conjugacy where implementation is reasonably straightforward. This paper describes a tailored approximate rej ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
The Gibbs sampler has been proposed as a general method for Bayesian calculation in Gelfand and Smith (1990). However, the predominance of experience to date resides in applications assuming conjugacy where implementation is reasonably straightforward. This paper describes a tailored approximate rejection method approach for implementation of the Gibbs sampler when nonconjugate structure is present. Several challenging applications are presented for illustration.
Bayesian Tests And Model Diagnostics In Conditionally Independent Hierarchical Models
 Journal of the American Statistical Association
, 1994
"... Consider the conditionally independent hierarchical model (CIHM) where observations y i are independently distributed from f(y i j` i ), the parameters ` i are independently distributed from distributions g(`j), and the hyperparameters are distributed according to a distribution h(). The posterior ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
Consider the conditionally independent hierarchical model (CIHM) where observations y i are independently distributed from f(y i j` i ), the parameters ` i are independently distributed from distributions g(`j), and the hyperparameters are distributed according to a distribution h(). The posterior distribution of all parameters of the CIHM can be efficiently simulated by Monte Carlo Markov Chain (MCMC) algorithms. Although these simulation algorithms have facilitated the application of CIHM's, they generally have not addressed the problem of computing quantities useful in model selection. This paper explores how MCMC simulation algorithms and other related computational algorithms can be used to compute Bayes factors that are useful in criticizing a particular CIHM. In the case where the CIHM models a belief that the parameters are exchangeable or lie on a regression surface, the Bayes factor can measure the consistency of the data with the structural prior belief. Bayes factors can ...
A fast crossentropy method for estimating buffer overflows in queueing networks
 Management Science
, 2004
"... In this paper we propose a fast adaptive Importance Sampling method for the efficient simulation of buffer overflow probabilities in queueing networks. The method comprises three stages. First we estimate the minimum CrossEntropy tilting parameter for a small buffer level; next, we use this as a st ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
In this paper we propose a fast adaptive Importance Sampling method for the efficient simulation of buffer overflow probabilities in queueing networks. The method comprises three stages. First we estimate the minimum CrossEntropy tilting parameter for a small buffer level; next, we use this as a starting value for the estimation of the optimal tilting parameter for the actual (large) buffer level; finally, the tilting parameter just found is used to estimate the overflow probability of interest. We study various properties of the method in more detail for the M/M/1 queue and conjecture that similar properties also hold for quite general queueing networks. Numerical results support this conjecture and demonstrate the high efficiency of the proposed algorithm.
A Model Reference Adaptive Search Method for Global Optimization
 2007 Oper. Res
, 2008
"... informs ® doi 10.1287/opre.1060.0367 © 2007 INFORMS Model reference adaptive search (MRAS) for solving global optimization problems works with a parameterized probabilistic model on the solution space and generates at each iteration a group of candidate solutions. These candidate solutions are then ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
informs ® doi 10.1287/opre.1060.0367 © 2007 INFORMS Model reference adaptive search (MRAS) for solving global optimization problems works with a parameterized probabilistic model on the solution space and generates at each iteration a group of candidate solutions. These candidate solutions are then used to update the parameters associated with the probabilistic model in such a way that the future search will be biased toward the region containing highquality solutions. The parameter updating procedure in MRAS is guided by a sequence of implicit probabilistic models we call reference models. We provide a particular algorithm instantiation of the MRAS method, where the sequence of reference models can be viewed as the generalized probability distribution models for estimation of distribution algorithms (EDAs) with proportional selection scheme. In addition, we show that the model reference framework can also be used to describe the recently proposed crossentropy (CE) method for optimization and to study its properties. Hence, this paper can also be seen as a study on the effectiveness of combining CE and EDAs. We prove global convergence of the proposed algorithm in both continuous and combinatorial domains, and we carry out numerical studies to illustrate the performance of the algorithm.
NonGaussian conditional linear AR(1) models
 Australian and New Zealand Journal of Statistics
, 2000
"... Abstract: We give a general formulation of a nonGaussian conditional linear AR(1) model subsuming most of the nonGaussian AR(1) models that have appeared in the literature. We derive some general results giving properties for the stationary process mean, variance and correlation structure, and con ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
Abstract: We give a general formulation of a nonGaussian conditional linear AR(1) model subsuming most of the nonGaussian AR(1) models that have appeared in the literature. We derive some general results giving properties for the stationary process mean, variance and correlation structure, and conditions for stationarity. These results highlight similarities and differences with the Gaussian AR(1) model, and unify many separate results appearing in the literature. Examples illustrate the wide range of properties that can appear under the conditional linear autoregressive assumption. These results are used in analysing three real data sets, illustrating general methods of estimation, model diagnostics and model selection. In particular, we show that the theoretical results can be used to develop diagnostics for deciding if a time series can be modelled by some linear autoregressive model, and for selecting among several candidate models.