Results 11 - 20
of
44
On Posterior Consistency of Survival Models
- ANN. STATIST
, 1999
"... Ghosh and Ramamoorthi (1995) studied the posterior consistency for survival models and showed that the posterior was consistent, when the prior on the distribution of survival times was the Dirichlet process prior. In this paper, we study the posterior consistency of survival models with neutral to ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Ghosh and Ramamoorthi (1995) studied the posterior consistency for survival models and showed that the posterior was consistent, when the prior on the distribution of survival times was the Dirichlet process prior. In this paper, we study the posterior consistency of survival models with neutral to the right process priors which include Dirichlet process priors. A set of sufficient conditions for the posterior consistency with neutral to the right process priors is given. Interestingly, not all the neutral to the right process priors have consistent posteriors, but most of the popular priors such as Dirichlet processes, beta processes and gamma processes have consistent posteriors. With a class of priors which includes beta processes, a necessary and sufficient condition for the consistency is also established. An interesting counter intuitive phenomenon is found. Suppose there are two priors centered at the true parameter value with finite variances. Surprisingly, the posterior with s...
NONPARAMETRIC FUNCTIONAL DATA ANALYSIS THROUGH BAYESIAN DENSITY ESTIMATION
, 2007
"... In many modern experimental settings, observations are obtained in the form of functions, and interest focuses on inferences on a collection of such functions. Some examples are conductivity-temperature-depth (CTD) data in oceanography, dose-response models in epidemiology and time-course microarray ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
In many modern experimental settings, observations are obtained in the form of functions, and interest focuses on inferences on a collection of such functions. Some examples are conductivity-temperature-depth (CTD) data in oceanography, dose-response models in epidemiology and time-course microarray experiments in biology and medicine. In this paper we propose a hierarchical model that allows us to simultaneously estimate multiple curves nonparametrically by using dependent Dirichlet Process mixtures of Gaussians to characterize the joint distribution of predictors and outcomes. Func-tion estimates are then induced through the conditional distribution of the outcome given the predic-tors. The resulting approach allows for flexible estimation and clustering, while borrowing information across curves. We also show that the function estimates we obtain are consistent on the space of inte-grable functions. As an illustration, we consider an application to the analysis of CTD data in the north Atlantic.
Characterizing predictable classes of processes
- In Proc. 25th Conference on Uncertainty in Artificial Intelligence (UAI’09
, 2009
"... The problem is sequence prediction in the following setting. A sequence x1,..., xn,... of discrete-valued observations is generated according to some unknown probabilistic law (measure) µ. After observing each outcome, it is required to give the conditional probabilities of the next observation. The ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
The problem is sequence prediction in the following setting. A sequence x1,..., xn,... of discrete-valued observations is generated according to some unknown probabilistic law (measure) µ. After observing each outcome, it is required to give the conditional probabilities of the next observation. The measure µ belongs to an arbitrary class C of stochastic processes. We are interested in predictors ρ whose conditional probabilities converge to the “true ” µ-conditional probabilities if any µ ∈ C is chosen to generate the data. We show that if such a predictor exists, then a predictor can also be obtained as a convex combination of a countably many elements of C. In other words, it can be obtained as a Bayesian predictor whose prior is concentrated on a countable set. This result is established for two very different measures of performance of prediction, one of which is very strong, namely, total variation, and the other is very weak, namely, prediction in expected average Kullback-Leibler divergence. 1
On the Consistency of Bayes Factors for Testing Point Null versus Nonparametric Alternatives
, 1999
"... When testing a point null hypothesis versus an alternative that is vaguely specified, a Bayesian test usually proceeds by putting a nonparametric prior on the alternative and then computing a Bayes factor based on the observations. This paper addresses the question of consistency, that is, whether t ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
When testing a point null hypothesis versus an alternative that is vaguely specified, a Bayesian test usually proceeds by putting a nonparametric prior on the alternative and then computing a Bayes factor based on the observations. This paper addresses the question of consistency, that is, whether the Bayes factor is correctly indicative of the null or the alternative as sample size increases. We establish several consistency results in the affirmative under fairly general conditions. Consistency of Bayes factors for testing a point null versus a parametric alternative has long been known. The results here can also be viewed as the nonparametric extension of their parametric counterpart.
Dirichlet Process Mixtures of Generalized Linear Models
"... We propose Dirichlet Process mixtures of Generalized Linear Models (DP-GLMs), a new method of nonparametric regression that accommodates continuous and categorical inputs, models a response variable locally by a generalized linear model. We give conditions for the existence and asymptotic unbiasedne ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We propose Dirichlet Process mixtures of Generalized Linear Models (DP-GLMs), a new method of nonparametric regression that accommodates continuous and categorical inputs, models a response variable locally by a generalized linear model. We give conditions for the existence and asymptotic unbiasedness of the DP-GLM regression mean function estimate; we then give a practical example for when those conditions hold. We evaluate DP-GLM on several data sets, comparing it to modern methods of nonparametric regression including regression trees and Gaussian processes. 1
Mutual Information, Metric Entropy, and Risk in Estimation of Probability Distributions
, 1996
"... Assume fP ` : ` 2 \Thetag is a set of probability distributions with a common dominating measure on a complete separable metric space Y . A state ` 2 \Theta is chosen by Nature. A statistician gets n independent observations Y 1 ; : : : ; Y n from Y distributed according to P ` . For each time ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Assume fP ` : ` 2 \Thetag is a set of probability distributions with a common dominating measure on a complete separable metric space Y . A state ` 2 \Theta is chosen by Nature. A statistician gets n independent observations Y 1 ; : : : ; Y n from Y distributed according to P ` . For each time t between 1 and n, based on the observations Y 1 ; : : : ; Y t\Gamma1 , the statistician produces an estimated distribution P t for P ` , and suffers a loss L(P ` ; P t ). The cumulative risk for the statistician is the average total loss up to time n. Of special interest in information theory, data compression, mathematical finance, computational learning theory and statistical mechanics is the special case when the loss L(P ` ; P t ) is the relative entropy between the true distribution P ` and the estimated distribution P t . Here the cumulative Bayes risk from time 1 to n is the mutual information between the random parameter \Theta and the observations Y 1 ; : : : ;...
Consistency of Posterior Distributions for Neural Networks
- Neural Networks
, 1998
"... In this paper we show that the posterior distribution for feedforward neural networks is asymptotically consistent. This paper extends earlier results on universal approximation properties of neural networks to the Bayesian setting. The proof of consistency embeds the problem in a density estimation ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In this paper we show that the posterior distribution for feedforward neural networks is asymptotically consistent. This paper extends earlier results on universal approximation properties of neural networks to the Bayesian setting. The proof of consistency embeds the problem in a density estimation problem, then uses bounds on the bracketing entropy to show that the posterior is consistent over Hellinger neighborhoods. It then relates this result back to the regression setting. We show consistency in both the setting of the number of hidden nodes growing with the sample size, and in the case where the number of hidden nodes is treated as a parameter. Thus we provide a theoretical justification for using neural networks for nonparametric regression in a Bayesian framework. Keywords: Bayesian statistics, Asymptotic consistency, Posterior approximation, Nonparametric regression, Sieve Asymptotics, Hellinger distance, Bracketing entropy The author is indebted to Larry Wasserman for all ...
Asymptotic Behaviour Of Bayes Estimates Under Possibly Incorrect Models
- Annals Statistics
, 1994
"... Introduction The frequentist asymptotic properties of Bayes estimators and of posterior distributions are well-known and have been investigated in different directions, see e.g. Bickel and Yahav (1969), Ibragimov and Has'minskii (1981), Strasser (1981) or Lehmann (1983). The interesting generalizat ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Introduction The frequentist asymptotic properties of Bayes estimators and of posterior distributions are well-known and have been investigated in different directions, see e.g. Bickel and Yahav (1969), Ibragimov and Has'minskii (1981), Strasser (1981) or Lehmann (1983). The interesting generalization to a possibly incorrect model has been treated by Berk (1966), who proved under regularity conditions, that a.s. the posterior distribution converges weakly toward the Dirac measure at the pseudotrue parameter, assuming its uniqueness. This is the parameter value corresponding to the distribution in the model, which is nearest to the true distribution in the sense of the information distance. The result is proven in the general case of possibly non-unique pseudo-true parameters for a corresponding generalization of the above mentioned weak convergence. Unfortunately, many standard models with unbounded parameter space are not covered by his theorems (see Remark 1 in our Section 2
INTRINSIC METHODS IN FILTER STABILITY
"... Abstract. The purpose of this article is to survey some intrinsic methods for studying the stability of the nonlinear filter. By ‘intrinsic ’ we mean methods which directly exploit the fundamental representation of the filter as a conditional expectation through classical probabilistic techniques su ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract. The purpose of this article is to survey some intrinsic methods for studying the stability of the nonlinear filter. By ‘intrinsic ’ we mean methods which directly exploit the fundamental representation of the filter as a conditional expectation through classical probabilistic techniques such as change of measure, martingale convergence, coupling, etc. Beside their conceptual appeal and the additional insight gained into the filter stability problem, these methods allow one to establish stability of the filter under weaker conditions compared to other methods, e.g., to go beyond strong mixing signals, to reveal connections between filter stability and classical notions of observability, and to discover links to martingale convergence and information theory. 1. Inroduction Consider a pair of random sequences (X, Y) = (Xn, Yn)n∈Z+, where the signal component Xn takes values in a Polish space 1 S and the observation component Yn takes values in R p for some p ≥ 1. The classical filtering problem is to compute the conditional distribution πn(·) = P(Xn ∈ · |F Y 0,n), (1.1) where F Y k,n stands for the σ-algebra of events generated by Ym, k ≤ m ≤ n (similarly, we will use below the σ-algebra F X k,n generated by Xm, k ≤ m ≤ n). Once πn is found, the optimal mean square estimate of f(Xn) can be calculated as E(f(Xn)|F Y ∫ 0,n) = f(x) πn(dx) for any function f with E|f(Xn) | 2 < ∞. If both X and (X, Y) are Markov processes, πn satisfies a recursive filtering equation. Specifically, let Λ and ν denote the transition probability and the initial distribution of X, i.e., for A ∈ B(S) ν(A) = P(X0 ∈ A), Λ(Xn−1, A) = P(Xn ∈ A|F X 0,n−1)

