Results 1 - 10
of
11
On Locally Adaptive Density Estimation
, 1996
"... : In this paper, theoretical and practical aspects of the sample-point adaptive positive kernel density estimator are examined. A closed-form expression for the mean integrated squared error is obtained through the device of preprocessing the data by binning. With this expression, the exact behavio ..."
Abstract
-
Cited by 30 (4 self)
- Add to MetaCart
: In this paper, theoretical and practical aspects of the sample-point adaptive positive kernel density estimator are examined. A closed-form expression for the mean integrated squared error is obtained through the device of preprocessing the data by binning. With this expression, the exact behavior of the optimally adaptive smoothing parameter function is studied for the first time. The approach differs from most earlier techniques in that bias of the adaptive estimator remains O(h 2 ) and is not "improved" to the rate O(h 4 ). A practical algorithm is constructed using a modification of least-squares cross-validation. Simulated and real examples are presented, including comparisons with a fixed bandwidth estimator and a fully automatic version of Abramson's adaptive estimator. The results are very promising. KEY WORDS: Kernel Function, Variable Bandwidth, Binning, Cross-Validation. 1 Stephan R. Sain is Research Associate, Department of Statistical Science, Southern Methodist U...
Fast Algorithms for Mutual Information Based Independent Component Analysis
, 2002
"... This paper provides fast algorithms to perform independent component analysis based on the mutual information criterion. The main ingredient is the binning technique and the use of cardinal splines, which allows the fast computation of the density estimator over a regular grid. Using a discretized ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
This paper provides fast algorithms to perform independent component analysis based on the mutual information criterion. The main ingredient is the binning technique and the use of cardinal splines, which allows the fast computation of the density estimator over a regular grid. Using a discretized form of the entropy, the criterion can be evaluated quickly together with its gradient, which can be expressed in terms of the score functions. Both off-line and on-line separation algorithms have been developed. Our density, entropy and score estimators also have their own interest.
On the asymptotics of penalized splines
, 2007
"... The asymptotic behaviour of penalized spline estimators is studied in the univari-ate case. We use B-splines and a penalty is placed on mth-order differences of the coefficients. The number of knots is assumed to converge to infinity as the sample size increases. We show that penalized splines behav ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
The asymptotic behaviour of penalized spline estimators is studied in the univari-ate case. We use B-splines and a penalty is placed on mth-order differences of the coefficients. The number of knots is assumed to converge to infinity as the sample size increases. We show that penalized splines behave similarly to Nadaraya-Watson kernel estimators with ‘equivalent ’ kernels depending upon m. The equivalent kernels we obtain for penalized splines are the same as those found by Silverman for smooth-ing splines. The asymptotic distribution of the penalized spline estimator is Gaussian and we give simple expressions for the asymptotic mean and variance. Provided that it is fast enough, the rate at which the number of knots converges to infinity does not affect the asymptotic distribution. The optimal rate of convergence of the penalty parameter is given. Penalized splines are not design-adaptive.
Accuracy of Binned Kernel Functional Approximations
, 1995
"... this paper is to study the accuracy of binning approximations used in bandwidth selection algorithms. Since virtually all common rules depend on the computation of a particular type of kernel functional estimator, the problem reduces to the study of the accuracy of binned kernel functional approxima ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
this paper is to study the accuracy of binning approximations used in bandwidth selection algorithms. Since virtually all common rules depend on the computation of a particular type of kernel functional estimator, the problem reduces to the study of the accuracy of binned kernel functional approximations. For simplicity and brevity our study is confined to the density estimation context. However, the conclusions apply to other settings where bandwidth selection is used, such as kernel regression. Binning techniques for fast kernel estimation were first proposed by Silverman (1982), Scott (1985) and Hardle & Scott (1992). Wand (1994) describes the extension of binning ideas to multivariate functional estimation. Studies in the approximation accuracy of binned kernel estimators include Jones and Lotwick (1983) and Hall and Wand (1994). The class of kernel functional estimators studied here were introduced by Hall and Marron (1987) and Jones and Sheather (1991). For access to the large literature on automatic bandwidth selection methods and their relative merits see, for example, Cao, Cuevas & Gonz'alez-Manteiga (1994) and Jones, Sheather & Marron (1995). Section 2 contains the theoretical results required for our investigation. In Section 3 we apply the results to a set of specific problems to develop an understanding of the effect of binning on kernel functional estimation and, therefore, the effect on bandwidth selection algorithms. Conclusions of this study are given in Section 4. 1\Delta2 Notation
Cluster Analysis of Massive Datasets in Astronomy
, 2006
"... Clusters of galaxies are a useful proxy to trace the mass distribution of the universe. By measuring the mass of clusters of galaxies at different scales, one can follow the evolution of the mass distribution (Martínez and Saar, 2002). It can be shown that finding galaxies clustering is equivalent t ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Clusters of galaxies are a useful proxy to trace the mass distribution of the universe. By measuring the mass of clusters of galaxies at different scales, one can follow the evolution of the mass distribution (Martínez and Saar, 2002). It can be shown that finding galaxies clustering is equivalent to finding density contour clusters (Hartigan, 1975): connected components of the level set Sc ≡ {f> c} where f is a probability density function. Cuevas et al. (2000, 2001) proposed a nonparametric method for density contour clusters. They attempt to find density contour clusters by the minimal spanning tree. While their algorithm is conceptually simple, it requires intensive computations for large datasets. We propose a more efficient clustering method based on their algorithm with the Fast Fourier Transform (FFT). The method is applied to a study of galaxy clustering on large astronomical sky survey data.
Logspline Density Estimation for Binned Data
"... In this paper we consider logspline density estimation for binned data. Rates of convergence are established when the log-density function is assumed to be in a Besov space. An algorithm involving a procedure similar to maximum likelihood, stepwise knot addition, and stepwise knot deletion is propos ..."
Abstract
- Add to MetaCart
In this paper we consider logspline density estimation for binned data. Rates of convergence are established when the log-density function is assumed to be in a Besov space. An algorithm involving a procedure similar to maximum likelihood, stepwise knot addition, and stepwise knot deletion is proposed for the estimation of the density function based upon binned data. Numerical examples are used to show the finite-sample performance of inference based on the logspline density estimation. Keywords: Besov space, binning, knot selection, MILE, optimal rate of convergence. 1. Introduction This paper proposes a method of density estimation for binned data. Let X 1 ; : : : ; X n be a random sample from a distribution with density f . In some experiments, the data are reported in the form of a histogram. The observed random variables are Y q = #fX i : X i 2 I q g where I q are bins. We want to estimate the unknown density function f based on Y q 's. Flexible exponential families have bee...
On Gauss Quadratures and Partial Cross Validation
"... In the paper we consider new estimators of expected values Eω(X) of functions of a random variable X. The new estimators are based on Gauss quadrature, a numerical method frequently used to approximate integrals over finite intervals. We apply the new estimators in Partial Cross Validation, a ..."
Abstract
- Add to MetaCart
In the paper we consider new estimators of expected values Eω(X) of functions of a random variable X. The new estimators are based on Gauss quadrature, a numerical method frequently used to approximate integrals over finite intervals. We apply the new estimators in Partial Cross Validation, a numerical method for finding optimal smoothing parameters in nonparametric curve estimation. We show that Partial Cross Validation can considerably reduce the computational cost of the Generalized Cross Validation method typically used to determine the optimal smoothing parameter.
On Boundary Effects of Smooth Curve Estimators
, 1994
"... Many nonparametric smooth curve estimators have a problem with boundary effects. Roughly speaking, the discontinuity of the curves under investigation at their endpoints causes difficulties for this kind of estimators. These estimators are visually disturbing at boundary regions and can become misl ..."
Abstract
- Add to MetaCart
Many nonparametric smooth curve estimators have a problem with boundary effects. Roughly speaking, the discontinuity of the curves under investigation at their endpoints causes difficulties for this kind of estimators. These estimators are visually disturbing at boundary regions and can become misleading in modeling the data because they are seriously biased there. In applications, boundary regions can be a substantial portion of the entire support. This has been recognized as an important problem and there are many adjustments suggested in the literature. We investigate properties of Shuster's boundary fold method and Rice's modification. Noticing the automatic boundary adaptive property of the local linear smoother recently highlighted by Fan, we further find out it is 100 % efficient; Le. best out of all possible estimators, for estimation at endpoints in a typical minimax sense. This result is important since it shows in one step that the local linear approach is as good as or better than all of the many other approaches proposed in the literature. The problem of
CAD and Knowledge Solutions, Siemens Healthcare, Malvern, USA and
, 2009
"... The computational complexity of evaluating the kernel density estimate (or its derivatives) at m evaluation points given n sample points scales quadratically as O(nm)–making it prohibitively expensive for large data sets. While approximate methods like binning could speed up the computation they lac ..."
Abstract
- Add to MetaCart
The computational complexity of evaluating the kernel density estimate (or its derivatives) at m evaluation points given n sample points scales quadratically as O(nm)–making it prohibitively expensive for large data sets. While approximate methods like binning could speed up the computation they lack a precise control over the accuracy of the approximation. There is no straightforward way of choosing the binning parameters a priori in order to achieve a desired approximation error. We propose a novel computationally efficient ɛ−exact approximation algorithm for the univariate Gaussian kernel based density derivative estimation that reduces the computational complexity from O(nm) to linear O(n + m). The user can specify a desired accuracy ɛ. The algorithm guarantees that the actual error between the approximation and the original kernel estimate will always be less than ɛ. We also apply our proposed fast algorithm to speedup automatic bandwidth selection procedures. We compare our method to the best available binning methods in terms of the speed and the accuracy. Our experimental results show that the proposed method is almost twice as fast as the best binning methods and is around five orders of magnitude more accurate. The software for the proposed method is available online.

