Results 1  10
of
127
Bayesian Analysis of Mixture Models with an Unknown Number of Components  an alternative to reversible jump methods
, 1998
"... Richardson and Green (1997) present a method of performing a Bayesian analysis of data from a finite mixture distribution with an unknown number of components. Their method is a Markov Chain Monte Carlo (MCMC) approach, which makes use of the "reversible jump" methodology described by Gree ..."
Abstract

Cited by 65 (0 self)
 Add to MetaCart
Richardson and Green (1997) present a method of performing a Bayesian analysis of data from a finite mixture distribution with an unknown number of components. Their method is a Markov Chain Monte Carlo (MCMC) approach, which makes use of the "reversible jump" methodology described by Green (1995). We describe an alternative MCMC method which views the parameters of the model as a (marked) point process, extending methods suggested by Ripley (1977) to create a Markov birthdeath process with an appropriate stationary distribution. Our method is easy to implement, even in the case of data in more than one dimension, and we illustrate it on both univariate and bivariate data. Keywords: Bayesian analysis, Birthdeath process, Markov process, MCMC, Mixture model, Model Choice, Reversible Jump, Spatial point process 1 Introduction Finite mixture models are typically used to model data where each observation is assumed to have arisen from one of k groups, each group being suitably modelle...
Assessing the quality of learned local models
 Advances in Neural Information Processing Systems 6
, 1994
"... An approach is presented to learning high dimensional functions in the case where the learning algorithm can affect the generation of new data. A local modeling algorithm, locally weighted regression, is used to represent the learned function. Architectural parameters of the approach, such as distan ..."
Abstract

Cited by 44 (15 self)
 Add to MetaCart
An approach is presented to learning high dimensional functions in the case where the learning algorithm can affect the generation of new data. A local modeling algorithm, locally weighted regression, is used to represent the learned function. Architectural parameters of the approach, such as distance metrics, are also localized and become a function of the query point instead of being global. Statistical tests are given for when a local model is good enough and sampling should be moved to a new area. Our methods explicitly deal with the case where prediction accuracy requirements exist during exploration: By gradually shifting a “center of exploration ” and controlling the speed of the shift with local prediction accuracy, a goaldirected exploration of state space takes place along the fringes of the current data support until the task goal is achieved. We illustrate this approach with simulation results and results from a real robot learning a complex juggling task. 1
Bayesian density regression
 JOURNAL OF THE ROYAL STATISTICAL SOCIETY B
, 2007
"... This article considers Bayesian methods for density regression, allowing a random probability distribution to change flexibly with multiple predictors. The conditional response distribution is expressed as a nonparametric mixture of parametric densities, with the mixture distribution changing acc ..."
Abstract

Cited by 40 (23 self)
 Add to MetaCart
This article considers Bayesian methods for density regression, allowing a random probability distribution to change flexibly with multiple predictors. The conditional response distribution is expressed as a nonparametric mixture of parametric densities, with the mixture distribution changing according to location in the predictor space. A new class of priors for dependent random measures is proposed for the collection of random mixing measures at each location. The conditional prior for the random measure at a given location is expressed as a mixture of a Dirichlet process (DP) distributed innovation measure and neighboring random measures. This specification results in a coherent prior for the joint measure, with the marginal random measure at each location being a finite mixture of DP basis measures. Integrating out the infinitedimensional collection of mixing measures, we obtain a simple expression for the conditional distribution of the subjectspecific random variables, which generalizes the Pólya urn scheme. Properties are considered and a simple Gibbs sampling algorithm is developed for posterior computation. The methods are illustrated using simulated data examples and epidemiologic studies.
A review of dimension reduction techniques
, 1997
"... The problem of dimension reduction is introduced as a way to overcome the curse of the dimensionality when dealing with vector data in highdimensional spaces and as a modelling tool for such data. It is defined as the search for a lowdimensional manifold that embeds the highdimensional data. A cl ..."
Abstract

Cited by 32 (4 self)
 Add to MetaCart
The problem of dimension reduction is introduced as a way to overcome the curse of the dimensionality when dealing with vector data in highdimensional spaces and as a modelling tool for such data. It is defined as the search for a lowdimensional manifold that embeds the highdimensional data. A classification of dimension reduction problems is proposed. A survey of several techniques for dimension reduction is given, including principal component analysis, projection pursuit and projection pursuit regression, principal curves and methods based on topologically continuous maps, such as Kohonen’s maps or the generalised topographic mapping. Neural network implementations for several of these techniques are also reviewed, such as the projection pursuit learning network and the BCM neuron with an objective function. Several appendices complement the mathematical treatment of the main text.
Smoothing Splines Estimators in Functional Linear Regression with ErrorsinVariables
, 2006
"... This work deals with a generalization of the Total Least Squares method in the context of the functional linear model. We first propose a smoothing splines estimator of the functional coefficient of the model without noise in the covariates and we obtain an asymptotic result for this estimator. Then ..."
Abstract

Cited by 28 (1 self)
 Add to MetaCart
This work deals with a generalization of the Total Least Squares method in the context of the functional linear model. We first propose a smoothing splines estimator of the functional coefficient of the model without noise in the covariates and we obtain an asymptotic result for this estimator. Then, we adapt this estimator to the case where the covariates are noisy and we also derive an upper bound for the convergence speed. Our estimation procedure is evaluated by means of simulations.
Sparse Kernel Feature Analysis
, 1999
"... Kernel Principal Component Analysis (KPCA) has proven to be a versatile tool for unsupervised learning, however at a high computational cost due to the dense expansions in terms of kernel functions. We overcome this problem by proposing a new class of feature extractors employing ` 1 norms in c ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
Kernel Principal Component Analysis (KPCA) has proven to be a versatile tool for unsupervised learning, however at a high computational cost due to the dense expansions in terms of kernel functions. We overcome this problem by proposing a new class of feature extractors employing ` 1 norms in coefficient space instead of the reproducing kernel Hilbert space in which KPCA was originally formulated in. Moreover, the modified setting allows us to efficiently extract features maximizing criteria other than the variance much in a projection pursuit fashion.
Bandwidth Selection for Kernel Conditional Density Estimation
 Computational Statistics & Data Analysis
, 2000
"... : We consider bandwidth selection for the kernel estimator of conditional density with one explanatory variable. Several bandwidth selection methods are derived ranging from fast rulesofthumb which assume the underlying densities are known to relatively slow procedures which use the bootstrap. The ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
: We consider bandwidth selection for the kernel estimator of conditional density with one explanatory variable. Several bandwidth selection methods are derived ranging from fast rulesofthumb which assume the underlying densities are known to relatively slow procedures which use the bootstrap. The methods are compared and a practical bandwidth selection strategy which combines the methods is proposed. The methods are compared using two simulation studies and a real data set. Keywords: bandwidth selection; conditioning; density estimation; kernel smoothing. 1 Introduction To motivate the problem, consider the data given in Azzalini and Bowman (1990) on the waiting time between the starts of successive eruptions and the duration of the subsequent eruption for the Old Faithful geyser in Yellowstone National Park, Wyoming. The data were collected continuously from August 1st until August 15th, 1985. There are a total of 299 observations. The times are measured in minutes. Some duration ...
Density and Hazard Rate Estimation for Right Censored Data Using Wavelet Methods
, 1997
"... This paper describes a wavelet method for the estimation of density and hazard rate functions from randomly right censored data. We adopt a nonparametric approach in assuming that the density and hazard rate have no specific parametric form. The method is based on dividing the time axis into a dyadi ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
This paper describes a wavelet method for the estimation of density and hazard rate functions from randomly right censored data. We adopt a nonparametric approach in assuming that the density and hazard rate have no specific parametric form. The method is based on dividing the time axis into a dyadic number of intervals and then counting the number of events within each interval. The number of events and the survival function of the observations are then separately smoothed over time via linear wavelet smoothers, and then the hazard rate function estimators are obtained by taking the ratio. We prove that the estimators possess pointwise and global mean square consistency, obtain the best possible asymptotic MISE convergence rate and are also asymptotically normally distributed. We also describe simulation experiments that show these estimators are reasonably reliable in practice. The method is illustrated with two real examples. The first uses survival time data for patients with liver...
Optimal Design via Curve Fitting of Monte Carlo Experiments
, 1996
"... This paper explores numerical methods for stochastic optimization, with special attention to Bayesian design problems. A common and challenging situation occurs when the objective function (in Bayesian applications the expected utility) is very expensive to evaluate, perhaps because it requires inte ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
This paper explores numerical methods for stochastic optimization, with special attention to Bayesian design problems. A common and challenging situation occurs when the objective function (in Bayesian applications the expected utility) is very expensive to evaluate, perhaps because it requires integration over a space of very large dimensionality. Our goal is to explore a class of optimization algorithms designed to gain efficiency in such situations, by exploiting smoothness of the expected utility surface and borrowing information from neighboring design points. The central idea is that of implementing stochastic optimization by curve fitting of Monte Carlo samples. This is done by simulating draws from the joint parameter/sample space and evaluating the observed utilities. Fitting a smooth surface through these simulated points serves as estimate for the expected utility surface. The optimal design can then be found deterministically. In this paper we introduce a general algorithm for curvefittingbased optimization, we discuss implementation options, and we present a consistency property for one particular implementation of the algorithm. To illustrate the advantages and limitations of curvefittingbased optimization, and compare it with some of the alternatives, we consider in detail three important practical applications. The first is an information theoretical stopping rule for a clinical trial. The objective function is based on the expected amount of information acquired about a subvector of parameters of interest. The second is concerned with the timing of examination for the early detection of breast cancer in mass screening programs. It involves a twodimensional optimization and an objective function embodying a costbenefit analysis. The third applicat...