Results 1 - 10
of
76
Density Estimation
, 1997
"... this article. We introduce the classic nonparametric estimator, the histogram, and outline its theoretical properties as well as good practice. We demonstrate how to improve the histogram, leading to our discussion of popular kernel methods. We conclude with a bivariate example, a way of choosing sm ..."
Abstract
-
Cited by 304 (1 self)
- Add to MetaCart
this article. We introduce the classic nonparametric estimator, the histogram, and outline its theoretical properties as well as good practice. We demonstrate how to improve the histogram, leading to our discussion of popular kernel methods. We conclude with a bivariate example, a way of choosing smoothing parameters, and new directions that promise further improvements. Why choose nonparametric over parametric density estimation? Parametric density estimation requires both proper specification of the form of the underlying sampling density, f ` (x), and estimation of the parameter vector `. Parametric modeling entails two risks of bias: in estimation of ` and incorrect specification of f ` . Nonparametric density estimation provides a consistent algorithm for nearly any continuous density and avoids the specification step. Although the cumulative distribution and probability density functions carry the same information, densities are more easily interpreted than distributions, especially in more than one dimension, so our focus on the density is appropriate. Density estimation is broadly applicable for exploring data relationships, presenting data summaries, and constructing sophisticated nonparametric models of biostatistical data. Graphical representation of data is a powerful tool for summarization. Three simple exploratory graphical summaries are the box-and-whiskers plot (or boxplot), the stem-andleaf plot, and the histogram. Consider the cholesterol levels of 320 males with diagnosed coronary artery disease (Scott et al., 1978). Figure 1 displays a boxplot of these data. The data appear symmetric with a few outliers. The various percentiles displayed in the boxplot do not hint of any unusual feature such as we see in Figure 2 in the right histogram, which show...
Bayesian Analysis of Mixture Models with an Unknown Number of Components -- an alternative to reversible jump methods
, 1998
"... Richardson and Green (1997) present a method of performing a Bayesian analysis of data from a finite mixture distribution with an unknown number of components. Their method is a Markov Chain Monte Carlo (MCMC) approach, which makes use of the "reversible jump" methodology described by Green (1995). ..."
Abstract
-
Cited by 41 (0 self)
- Add to MetaCart
Richardson and Green (1997) present a method of performing a Bayesian analysis of data from a finite mixture distribution with an unknown number of components. Their method is a Markov Chain Monte Carlo (MCMC) approach, which makes use of the "reversible jump" methodology described by Green (1995). We describe an alternative MCMC method which views the parameters of the model as a (marked) point process, extending methods suggested by Ripley (1977) to create a Markov birth-death process with an appropriate stationary distribution. Our method is easy to implement, even in the case of data in more than one dimension, and we illustrate it on both univariate and bivariate data. Keywords: Bayesian analysis, Birth-death process, Markov process, MCMC, Mixture model, Model Choice, Reversible Jump, Spatial point process 1 Introduction Finite mixture models are typically used to model data where each observation is assumed to have arisen from one of k groups, each group being suitably modelle...
Assessing the quality of learned local models
- Advances in Neural Information Processing Systems 6
, 1994
"... An approach is presented to learning high dimensional functions in the case where the learning algorithm can affect the generation of new data. A local modeling algorithm, locally weighted regression, is used to represent the learned function. Architectural parameters of the approach, such as distan ..."
Abstract
-
Cited by 36 (13 self)
- Add to MetaCart
An approach is presented to learning high dimensional functions in the case where the learning algorithm can affect the generation of new data. A local modeling algorithm, locally weighted regression, is used to represent the learned function. Architectural parameters of the approach, such as distance metrics, are also localized and become a function of the query point instead of being global. Statistical tests are given for when a local model is good enough and sampling should be moved to a new area. Our methods explicitly deal with the case where prediction accuracy requirements exist during exploration: By gradually shifting a “center of exploration ” and controlling the speed of the shift with local prediction accuracy, a goal-directed exploration of state space takes place along the fringes of the current data support until the task goal is achieved. We illustrate this approach with simulation results and results from a real robot learning a complex juggling task. 1
A review of dimension reduction techniques
, 1997
"... The problem of dimension reduction is introduced as a way to overcome the curse of the dimensionality when dealing with vector data in high-dimensional spaces and as a modelling tool for such data. It is defined as the search for a low-dimensional manifold that embeds the high-dimensional data. A cl ..."
Abstract
-
Cited by 29 (4 self)
- Add to MetaCart
The problem of dimension reduction is introduced as a way to overcome the curse of the dimensionality when dealing with vector data in high-dimensional spaces and as a modelling tool for such data. It is defined as the search for a low-dimensional manifold that embeds the high-dimensional data. A classification of dimension reduction problems is proposed. A survey of several techniques for dimension reduction is given, including principal component analysis, projection pursuit and projection pursuit regression, principal curves and methods based on topologically continuous maps, such as Kohonen’s maps or the generalised topographic mapping. Neural network implementations for several of these techniques are also reviewed, such as the projection pursuit learning network and the BCM neuron with an objective function. Several appendices complement the mathematical treatment of the main text.
Bayesian density regression
- JOURNAL OF THE ROYAL STATISTICAL SOCIETY B
, 2007
"... This article considers Bayesian methods for density regression, allowing a random probability distribution to change flexibly with multiple predictors. The conditional response dis-tribution is expressed as a nonparametric mixture of parametric densities, with the mixture distri-bution changing acc ..."
Abstract
-
Cited by 27 (17 self)
- Add to MetaCart
This article considers Bayesian methods for density regression, allowing a random probability distribution to change flexibly with multiple predictors. The conditional response dis-tribution is expressed as a nonparametric mixture of parametric densities, with the mixture distri-bution changing according to location in the predictor space. A new class of priors for dependent random measures is proposed for the collection of random mixing measures at each location. The conditional prior for the random measure at a given location is expressed as a mixture of a Dirichlet process (DP) distributed innovation measure and neighboring random measures. This specifica-tion results in a coherent prior for the joint measure, with the marginal random measure at each location being a finite mixture of DP basis measures. Integrating out the infinite-dimensional col-lection of mixing measures, we obtain a simple expression for the conditional distribution of the subject-specific random variables, which generalizes the Pólya urn scheme. Properties are consid-ered and a simple Gibbs sampling algorithm is developed for posterior computation. The methods are illustrated using simulated data examples and epidemiologic studies.
Sparse Kernel Feature Analysis
, 1999
"... Kernel Principal Component Analysis (KPCA) has proven to be a versatile tool for unsupervised learning, however at a high computational cost due to the dense expansions in terms of kernel functions. We overcome this problem by proposing a new class of feature extractors employing ` 1 norms in c ..."
Abstract
-
Cited by 22 (2 self)
- Add to MetaCart
Kernel Principal Component Analysis (KPCA) has proven to be a versatile tool for unsupervised learning, however at a high computational cost due to the dense expansions in terms of kernel functions. We overcome this problem by proposing a new class of feature extractors employing ` 1 norms in coefficient space instead of the reproducing kernel Hilbert space in which KPCA was originally formulated in. Moreover, the modified setting allows us to efficiently extract features maximizing criteria other than the variance much in a projection pursuit fashion.
Optimal Design via Curve Fitting of Monte Carlo Experiments
, 1996
"... This paper explores numerical methods for stochastic optimization, with special attention to Bayesian design problems. A common and challenging situation occurs when the objective function (in Bayesian applications the expected utility) is very expensive to evaluate, perhaps because it requires inte ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
This paper explores numerical methods for stochastic optimization, with special attention to Bayesian design problems. A common and challenging situation occurs when the objective function (in Bayesian applications the expected utility) is very expensive to evaluate, perhaps because it requires integration over a space of very large dimensionality. Our goal is to explore a class of optimization algorithms designed to gain efficiency in such situations, by exploiting smoothness of the expected utility surface and borrowing information from neighboring design points. The central idea is that of implementing stochastic optimization by curve fitting of Monte Carlo samples. This is done by simulating draws from the joint parameter/sample space and evaluating the observed utilities. Fitting a smooth surface through these simulated points serves as estimate for the expected utility surface. The optimal design can then be found deterministically. In this paper we introduce a general algorithm for curve-fitting-based optimization, we discuss implementation options, and we present a consistency property for one particular implementation of the algorithm. To illustrate the advantages and limitations of curve-fitting-based optimization, and compare it with some of the alternatives, we consider in detail three important practical applications. The first is an information theoretical stopping rule for a clinical trial. The objective function is based on the expected amount of information acquired about a sub-vector of parameters of interest. The second is concerned with the timing of examination for the early detection of breast cancer in mass screening programs. It involves a two-dimensional optimization and an objective function embodying a cost-benefit analysis. The third applicat...
Density and Hazard Rate Estimation for Right Censored Data Using Wavelet Methods
, 1997
"... This paper describes a wavelet method for the estimation of density and hazard rate functions from randomly right censored data. We adopt a nonparametric approach in assuming that the density and hazard rate have no specific parametric form. The method is based on dividing the time axis into a dyadi ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
This paper describes a wavelet method for the estimation of density and hazard rate functions from randomly right censored data. We adopt a nonparametric approach in assuming that the density and hazard rate have no specific parametric form. The method is based on dividing the time axis into a dyadic number of intervals and then counting the number of events within each interval. The number of events and the survival function of the observations are then separately smoothed over time via linear wavelet smoothers, and then the hazard rate function estimators are obtained by taking the ratio. We prove that the estimators possess pointwise and global mean square consistency, obtain the best possible asymptotic MISE convergence rate and are also asymptotically normally distributed. We also describe simulation experiments that show these estimators are reasonably reliable in practice. The method is illustrated with two real examples. The first uses survival time data for patients with liver...
Smoothing Splines Estimators in Functional Linear Regression with Errors-in-Variables
, 2006
"... This work deals with a generalization of the Total Least Squares method in the context of the functional linear model. We first propose a smoothing splines estimator of the functional coefficient of the model without noise in the covariates and we obtain an asymptotic result for this estimator. Then ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
This work deals with a generalization of the Total Least Squares method in the context of the functional linear model. We first propose a smoothing splines estimator of the functional coefficient of the model without noise in the covariates and we obtain an asymptotic result for this estimator. Then, we adapt this estimator to the case where the covariates are noisy and we also derive an upper bound for the convergence speed. Our estimation procedure is evaluated by means of simulations.
Bandwidth Selection for Kernel Conditional Density Estimation
- Computational Statistics & Data Analysis
, 2000
"... : We consider bandwidth selection for the kernel estimator of conditional density with one explanatory variable. Several bandwidth selection methods are derived ranging from fast rules-of-thumb which assume the underlying densities are known to relatively slow procedures which use the bootstrap. The ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
: We consider bandwidth selection for the kernel estimator of conditional density with one explanatory variable. Several bandwidth selection methods are derived ranging from fast rules-of-thumb which assume the underlying densities are known to relatively slow procedures which use the bootstrap. The methods are compared and a practical bandwidth selection strategy which combines the methods is proposed. The methods are compared using two simulation studies and a real data set. Keywords: bandwidth selection; conditioning; density estimation; kernel smoothing. 1 Introduction To motivate the problem, consider the data given in Azzalini and Bowman (1990) on the waiting time between the starts of successive eruptions and the duration of the subsequent eruption for the Old Faithful geyser in Yellowstone National Park, Wyoming. The data were collected continuously from August 1st until August 15th, 1985. There are a total of 299 observations. The times are measured in minutes. Some duration ...

