Results 1 - 10
of
48
SiZer for exploration of structures in curves
- Journal of the American Statistical Association
, 1997
"... In the use of smoothing methods in data analysis, an important question is often: which observed features are "really there?", as opposed to being spurious sampling artifacts. An approach is described, based on scale space ideas that were originally developed in computer vision literature. Assess ..."
Abstract
-
Cited by 66 (14 self)
- Add to MetaCart
In the use of smoothing methods in data analysis, an important question is often: which observed features are "really there?", as opposed to being spurious sampling artifacts. An approach is described, based on scale space ideas that were originally developed in computer vision literature. Assessment of Significant ZERo crossings of derivatives, results in the SiZer map, a graphical device for display of significance of features, with respect to both location and scale. Here "scale" means "level of resolution", i.e.
Deconvoluting kernel density estimators
- Statistics
, 1990
"... This paper considers estimation ofa continuous bounded probability density when observations from the density are contaminated by additive measurement errors having a known distribution. Properties of the estimator obtained by deconvolving a kernel estimator of the observed data are investigated. Wh ..."
Abstract
-
Cited by 49 (7 self)
- Add to MetaCart
This paper considers estimation ofa continuous bounded probability density when observations from the density are contaminated by additive measurement errors having a known distribution. Properties of the estimator obtained by deconvolving a kernel estimator of the observed data are investigated. When the kernel used is sufficiently smooth the deconvolved estimator is shown to be pointwise consistent and bounds on its integrated mean squared error are derived. Very weak assumptions are made on the measurement-error density thereby permitting a comparison of the effects of different types of measurement error on the deconvolved estimator.
The Mode Tree: A Tool for Visualization of Nonparametric Density Features
- Journal of Computational and Graphical Statistics
, 1993
"... Recognition and extraction of features in a nonparametric density estimate is highly dependent on correct calibration. The data-driven choice of bandwidth h in kernel density estimation is a difficult one, compounded by the fact that the globally optimal h is not generally optimal for all values of ..."
Abstract
-
Cited by 31 (3 self)
- Add to MetaCart
Recognition and extraction of features in a nonparametric density estimate is highly dependent on correct calibration. The data-driven choice of bandwidth h in kernel density estimation is a difficult one, compounded by the fact that the globally optimal h is not generally optimal for all values of x. In recognition of this fact, a new type of graphical tool, the mode tree, is proposed. The basic mode tree plot relates the locations of modes in density estimates with the bandwidths of those estimates. Additional information can be included on the plot indicating such factors as the size of modes, how modes split, and the locations of antimodes and bumps. The use of a mode tree in adaptive multimodality investigations is proposed, and an example is given to show the value in using a Normal kernel, as opposed to the biweight or other kernels, in such investigations. Examples of such investigations are provided for Ahrens' chondrite data and van Winkle's Hidalgo stamp data. Finally, the b...
The Problem of Regions
, 1998
"... In the problem of regions we wish to know which one of a discrete set of possibilities applies to a continuous parameter vector. This problem arises in the following way: we compute a descriptive statistic from a set of data and notice an interesting feature. We wish to assign a confidence level to ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
In the problem of regions we wish to know which one of a discrete set of possibilities applies to a continuous parameter vector. This problem arises in the following way: we compute a descriptive statistic from a set of data and notice an interesting feature. We wish to assign a confidence level to that feature. For example, we compute a density estimate and notice that the estimate is bi-modal. What confidence do we assign to bimodality ? A natural way to measure this confidence is via the bootstrap: we compute our descriptive statistic on a large number of bootstrap samples and record the proportion of times that the feature appears. This proportion seems like a plausible measure of confidence for the feature. We study the construction of such confidence values and examine to what extent they approximate frequentist p-values. We derive more accurate confidence values using both frequentist and objective Bayesian approaches. The methods are illustrated with a number of examples includ...
Testing monotonicity of regression
- Journal of Computational and Graphical Statistics
, 1998
"... This article provides a test of monotonicity of a regression function. The test is based on the size of a “critical ” bandwidth, the amount of smoothing necessary to force a nonparametric regression estimate to be monotone. It is analogous to Silverman’s test of multimodality in density estimation. ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
This article provides a test of monotonicity of a regression function. The test is based on the size of a “critical ” bandwidth, the amount of smoothing necessary to force a nonparametric regression estimate to be monotone. It is analogous to Silverman’s test of multimodality in density estimation. Bootstrapping is used to provide a null distribution for the test statistic. The methodology is particularly simple in regression models in which the variance is a specified function of the mean, but we also discuss in detail the homoscedastic case with unknown variance. Simulation evidence indicates the usefulness of the method. Two examples are given.
On the number of modes of a Gaussian mixture
-
, 2003
"... We consider a problem intimately related to the creation of maxima under Gaussian blurring: the number of modes of a Gaussian mixture in D dimensions. To our knowledge, a general answer to this question is not known. We conjecture that if the components of the mixture have the same covariance matr ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
We consider a problem intimately related to the creation of maxima under Gaussian blurring: the number of modes of a Gaussian mixture in D dimensions. To our knowledge, a general answer to this question is not known. We conjecture that if the components of the mixture have the same covariance matrix (or the same covariance matrix up to a scaling factor), then the number of modes cannot exceed the number of components. We demonstrate
Bayesian Model Selection in Finite Mixtures by Marginal Density Decompositions
- JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2001
"... ..."
Density Estimation via Hybrid Splines
- Journal of Statistical Computation and Simulation
"... The Hybrid Spline method (H-spline) is a method of density estimation which involves regression splines and smoothing splines methods. Using basis functions (B-splines), this method is much faster than Smoothing Spline Density Estimation approach (Gu, 1993). Simulations suggest that with more struct ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
The Hybrid Spline method (H-spline) is a method of density estimation which involves regression splines and smoothing splines methods. Using basis functions (B-splines), this method is much faster than Smoothing Spline Density Estimation approach (Gu, 1993). Simulations suggest that with more structured data (e.g., several modes) H-spline method estimates the modes as well as Logspline (Kooperberg and Stone, 1991). The H-spline algorithm is designed to compute a solution to the penalized likelihood problem. The smoothing parameter is updated jointly with the estimate via a cross-validation performance estimate, where the performance is measured by a proxy of the symmetrized Kullback-Leibler. The initial number of knots is determined automatically based on an estimate of the number of modes and the symmetry of the underlying density. The algorithm increases the number of knots by Postal address: Departamento de Estat'istica, IMECC, Cidade Universit'aria "Zeferino Vaz", Caixa Postal 6...
Testing Monotonicity Of Regression
, 1998
"... this article, we study this problem and construct asymptotically valid tests. Our test statistics are suitable functionals of a stochastic process which may be viewed as a local version of Kendall's tau statistic and have simple natural interpretations. The process involved is a degree-two Uprocess, ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
this article, we study this problem and construct asymptotically valid tests. Our test statistics are suitable functionals of a stochastic process which may be viewed as a local version of Kendall's tau statistic and have simple natural interpretations. The process involved is a degree-two Uprocess, as in Nolan and Pollard (1987). The asymptotic behaviour of the test statistics are studied in three major steps: Approximation of the Uprocess by the empirical process defined by the H'ajek projection, strong approximation of the empirical process by a Gaussian process and finally the extreme value theory for stationary Gaussian processes. The paper is organized as follows. In Section 2, we introduce two different types of test statistics. We also formally describe the model and the hypothesis and explain the notation and regularity conditions in this section. In Section 3, we investigate the asymptotic behaviour of the U-process and establish the Gaussian process approximation. Section 4 is devoted to the study of the limiting distribution of the first test statistics using the extreme value theory for stationary Gaussian processes and the results of Section 3. In Section 5, we show that this test is consistent against all alternatives and also determine the minimal rate so that alternatives further apart than this rate can be effectively tested. The second test statistic is studied in Section 6. Technical proofs are presented in Section 7 and the appendix. 2. The Test Statistics
Nonparametric Selection of Input Variables for Connectionist Learning
, 1996
"... re. However, for a range of explored problems, the relative ordering of mutual information estimates remains correct, despite inaccuracies in individual estimates. Analysis of forward selection explores the amount of data required to select a certain number of relevant input variables. It is shown t ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
re. However, for a range of explored problems, the relative ordering of mutual information estimates remains correct, despite inaccuracies in individual estimates. Analysis of forward selection explores the amount of data required to select a certain number of relevant input variables. It is shown that in order to select a certain number of relevant input variables, the amount of required data increases roughly exponentially as more relevant input variables are considered. It is also shown that the chances of forward selection ending up in a local minimum are reduced by bootstrapping the data. Finally, the method is compared to two connectionist methods for input variable selection: Sensitivity Based Pruning and Automatic Relevance Determination. It is shown that the new method outperforms these two when the number of independent, candidate input variables is large. However, the method requires the number of relevant input variables to be relatively small. These results are confirmed o

