Results 1  10
of
33
SiZer for exploration of structures in curves
 Journal of the American Statistical Association
, 1997
"... In the use of smoothing methods in data analysis, an important question is often: which observed features are "really there?", as opposed to being spurious sampling artifacts. An approach is described, based on scale space ideas that were originally developed in computer vision literature. Assess ..."
Abstract

Cited by 82 (16 self)
 Add to MetaCart
In the use of smoothing methods in data analysis, an important question is often: which observed features are "really there?", as opposed to being spurious sampling artifacts. An approach is described, based on scale space ideas that were originally developed in computer vision literature. Assessment of Significant ZERo crossings of derivatives, results in the SiZer map, a graphical device for display of significance of features, with respect to both location and scale. Here "scale" means "level of resolution", i.e.
On Locally Adaptive Density Estimation
, 1996
"... : In this paper, theoretical and practical aspects of the samplepoint adaptive positive kernel density estimator are examined. A closedform expression for the mean integrated squared error is obtained through the device of preprocessing the data by binning. With this expression, the exact behavio ..."
Abstract

Cited by 40 (5 self)
 Add to MetaCart
: In this paper, theoretical and practical aspects of the samplepoint adaptive positive kernel density estimator are examined. A closedform expression for the mean integrated squared error is obtained through the device of preprocessing the data by binning. With this expression, the exact behavior of the optimally adaptive smoothing parameter function is studied for the first time. The approach differs from most earlier techniques in that bias of the adaptive estimator remains O(h 2 ) and is not "improved" to the rate O(h 4 ). A practical algorithm is constructed using a modification of leastsquares crossvalidation. Simulated and real examples are presented, including comparisons with a fixed bandwidth estimator and a fully automatic version of Abramson's adaptive estimator. The results are very promising. KEY WORDS: Kernel Function, Variable Bandwidth, Binning, CrossValidation. 1 Stephan R. Sain is Research Associate, Department of Statistical Science, Southern Methodist U...
Approximate Dirichlet Process Computing in Finite Normal Mixtures: Smoothing and Prior Information
 JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS
, 2000
"... ..."
Testing monotonicity of regression
 Journal of Computational and Graphical Statistics
, 1998
"... This article provides a test of monotonicity of a regression function. The test is based on the size of a “critical ” bandwidth, the amount of smoothing necessary to force a nonparametric regression estimate to be monotone. It is analogous to Silverman’s test of multimodality in density estimation. ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
This article provides a test of monotonicity of a regression function. The test is based on the size of a “critical ” bandwidth, the amount of smoothing necessary to force a nonparametric regression estimate to be monotone. It is analogous to Silverman’s test of multimodality in density estimation. Bootstrapping is used to provide a null distribution for the test statistic. The methodology is particularly simple in regression models in which the variance is a specified function of the mean, but we also discuss in detail the homoscedastic case with unknown variance. Simulation evidence indicates the usefulness of the method. Two examples are given.
Bayesian Model Selection in Finite Mixtures by Marginal Density Decompositions
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2001
"... ..."
On the number of modes of a Gaussian mixture

, 2003
"... We consider a problem intimately related to the creation of maxima under Gaussian blurring: the number of modes of a Gaussian mixture in D dimensions. To our knowledge, a general answer to this question is not known. We conjecture that if the components of the mixture have the same covariance matr ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
We consider a problem intimately related to the creation of maxima under Gaussian blurring: the number of modes of a Gaussian mixture in D dimensions. To our knowledge, a general answer to this question is not known. We conjecture that if the components of the mixture have the same covariance matrix (or the same covariance matrix up to a scaling factor), then the number of modes cannot exceed the number of components. We demonstrate
A nonparametric statistical approach to clustering via mode identification
 Journal of Machine Learning Research
"... A new clustering approach based on mode identification is developed by applying new optimization techniques to a nonparametric density estimator. A cluster is formed by those sample points that ascend to the same local maximum (mode) of the density function. The path from a point to its associated m ..."
Abstract

Cited by 11 (7 self)
 Add to MetaCart
A new clustering approach based on mode identification is developed by applying new optimization techniques to a nonparametric density estimator. A cluster is formed by those sample points that ascend to the same local maximum (mode) of the density function. The path from a point to its associated mode is efficiently solved by an EMstyle algorithm, namely, the Modal EM (MEM). This method is then extended for hierarchical clustering by recursively locating modes of kernel density estimators with increasing bandwidths. Without model fitting, the modebased clustering yields a density description for every cluster, a major advantage of mixturemodelbased clustering. Moreover, it ensures that every cluster corresponds to a bump of the density. The issue of diagnosing clustering results is also investigated. Specifically, a pairwise separability measure for clusters is defined using the ridgeline between the density bumps of two clusters. The ridgeline is solved for by the Ridgeline EM (REM) algorithm, an extension of MEM. Based upon this new measure, a cluster merging procedure is created to enforce strong separation. Experiments on simulated and real data demonstrate that the modebased clustering approach tends to combine the strengths of linkage and mixturemodelbased clustering. In addition, the approach is robust in high dimensions and when clusters deviate substantially from Gaussian distributions. Both of these cases pose difficulty for parametric mixture modeling. A C package on the new algorithms is developed for public access at
Estimating the Number of Clusters
, 2000
"... Hartigan (1975) defines the number q of clusters in a dvariate statistical population as the number of connected components of the set {f>c}, where f denotes the underlying density function on R^d and c is a given constant. Some usual cluster algorithms treat q as an input which must be given in ad ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Hartigan (1975) defines the number q of clusters in a dvariate statistical population as the number of connected components of the set {f>c}, where f denotes the underlying density function on R^d and c is a given constant. Some usual cluster algorithms treat q as an input which must be given in advance. The authors propose a method for estimating this parameter which is based on the computation of the number of connected components of an estimate of {f>c}. This set estimator is constructed as a union of balls with centres at an appropriate subsample which is selected via a nonparametric density estimator of f. The asymptotic behaviour of the proposed method is analyzed. A simulation study and an example with real data are also included.
Adaptive Kernel Density Estimation
, 1994
"... The need for improvements over the fixed kernel density estimator in certain situations has been discussed extensively in the literature, particularly in the application of density estimation to mode hunting. Problem densities often exhibit skewness or multimodality with differences in scale for eac ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
The need for improvements over the fixed kernel density estimator in certain situations has been discussed extensively in the literature, particularly in the application of density estimation to mode hunting. Problem densities often exhibit skewness or multimodality with differences in scale for each mode. By varying the bandwidth in some fashion, it is possible to achieve significant improvements over the fixed bandwidth approach. In general, variable bandwidth kernel density estimators can be divided into two categories: those that vary the bandwidth with the estimation point (balloon estimators) and those that vary the bandwidth with each data point (sample point estimators). For univariate balloon estimators, it can be shown that there exists a bandwidth in regions of f where f is convex (e.g. the tails) such that the bias is exactly zero. Such a bandwidth leads to a MSE = O(n \Gamma1 ) for points in the appropriate regions. A global implementation strategy using a local crossv...