Results 1  10
of
37
Simplifying mixture models through function approximation
 IEEE Transactions on Neural Networks
, 2010
"... The finite mixture model is widely used in various statistical learning problems. However, the model obtained may contain a large number of components, making it inefficient in practical applications. In this paper, we propose to simplify the mixture model by first grouping similar components togeth ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
The finite mixture model is widely used in various statistical learning problems. However, the model obtained may contain a large number of components, making it inefficient in practical applications. In this paper, we propose to simplify the mixture model by first grouping similar components together and then performing local fitting through function approximation. By using the squared loss to measure the distance between mixture models, our algorithm naturally combines the two different tasks of component clustering and model simplification. The proposed method can be used to speed up various algorithms that use mixture models during training (e.g., Bayesian filtering, belief propagation) or testing (e.g., kernel density estimation, SVM testing). Encouraging results are observed in the experiments on density estimation, clusteringbased image segmentation and simplification of SVM decision functions.
Using samplebased representations under communications constraints
 MIT, Laboratory for Information and Decision Systems
, 2004
"... In many applications, particularly powerconstrained sensor networks, it is important to conserve the amount of data exchanged while maximizing the utility of that data for some inference task. Broadly, this tradeoff has two major cost components—the representation’s size (in distributed networks, t ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
In many applications, particularly powerconstrained sensor networks, it is important to conserve the amount of data exchanged while maximizing the utility of that data for some inference task. Broadly, this tradeoff has two major cost components—the representation’s size (in distributed networks, the communications cost) and the error incurred by its use (the inference cost). We analyze this tradeoff for a particular problem: communicating a particlebased representation (and more generally, a Gaussian mixture or kernel density estimate). We begin by characterizing the exact communication cost of these representations, noting that it is less than might be suggested by traditional communications theory due to the invariance of the representation to reordering. We describe the optimal, lossless encoder when the generating distribution is known, and pose a suboptimal encoder which still benefits from reordering invariance. However, lossless encoding may not be sufficient. We describe one reasonable measure of error for distributionbased messages and its consequences for inference in an acyclic network, and propose a novel density approximation method based on KDtree multiscale representations which enables the communications cost and a bound on error to be balanced efficiently. We show several empirical examples demonstrating the method’s utility in collaborative, distributed signal processing under bandwidth or power constraints. 1
Generalized Crossentropy Methods with Applications to Rareevent Simulation and Optimization
"... On behalf of: ..."
Novelty Detection Employing an L2 Optimal Nonparametric Density Estimator
"... This paper considers the application of a recently proposed L2 optimal nonparametric Reduced Set Density Estimator to novelty detection and binary classification and provides empirical comparisons with other forms of density estimation as well ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
This paper considers the application of a recently proposed L2 optimal nonparametric Reduced Set Density Estimator to novelty detection and binary classification and provides empirical comparisons with other forms of density estimation as well
KERNEL CLASSIFICATION VIA INTEGRATED SQUARED ERROR
"... Nonparametric kernel methods are widely used and proven to be successful in many statistical learning problems. Wellknown examples include the kernel density estimate (KDE) for density estimation and the support vector machine (SVM) for classification. We propose a kernel classifier that optimizes a ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Nonparametric kernel methods are widely used and proven to be successful in many statistical learning problems. Wellknown examples include the kernel density estimate (KDE) for density estimation and the support vector machine (SVM) for classification. We propose a kernel classifier that optimizes an integrated squared error (ISE) criterion based on a “difference of densities ” formulation. Our classifier is sparse, like SVMs, and performs comparably to stateoftheart kernel methods. Furthermore, and unlike SVMs, the ISE criterion does not require the user to set any unknown regularization parameters. As a consequence, classifier training is faster than for support vector methods. Index Terms — kernel methods, integrated squared error, sparse classifiers, quadratic programming, difference of densities 1.
Density estimation with mercer kernels
 of the Research Institute of Advanced Computer Science
, 2003
"... We present a new method for density estimation based on Mercer kernels. The density estimate can be understood as the density induced on a data manifold by a mixture of Gaussians fit in a feature space. As is usual, the feature space and data manifold are defined with any suitable positivedefinite ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We present a new method for density estimation based on Mercer kernels. The density estimate can be understood as the density induced on a data manifold by a mixture of Gaussians fit in a feature space. As is usual, the feature space and data manifold are defined with any suitable positivedefinite kernel function. We modify the standard EM algorithm for mixtures of Gaussians to infer the parameters of the density. One benefit of the approach is it’s conceptual simplicity, and uniform applicability over many different types of data. Preliminary results are presented for a number of simple problems. 1
CAKE: Convex Adaptive Kernel Density Estimation
"... In this paper we present a generalization of kernel density estimation called Convex Adaptive Kernel Density Estimation (CAKE) that replaces single bandwidth selection by a convex aggregation of kernels at all scales, where the convex aggregation is allowed to vary from one training point to another ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
In this paper we present a generalization of kernel density estimation called Convex Adaptive Kernel Density Estimation (CAKE) that replaces single bandwidth selection by a convex aggregation of kernels at all scales, where the convex aggregation is allowed to vary from one training point to another, treating the fundamental problem of heterogeneous smoothness in a novel way. Learning the CAKE estimator given a training set reduces to solving a single convex quadratic programming problem. We derive rates of convergence of CAKE like estimator to the true underlying density under smoothness assumptions on the class and show that given a sufficiently large sample the mean squared error of such estimators is optimal in a minimax sense. We also give a risk bound of the CAKE estimator in terms of its empirical risk. We empirically compare CAKE to other density estimators proposed in the statistics literature for handling heterogeneous smoothness on different synthetic and natural distributions. 1
Performance analysis for L2 kernel classification
"... We provide statistical performance guarantees for a recently introduced kernel classifier that optimizes the L2 or integrated squared error (ISE) of a difference of densities. The classifier is similar to a support vector machine (SVM) in that it is the solution of a quadratic program and yields a s ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We provide statistical performance guarantees for a recently introduced kernel classifier that optimizes the L2 or integrated squared error (ISE) of a difference of densities. The classifier is similar to a support vector machine (SVM) in that it is the solution of a quadratic program and yields a sparse classifier. Unlike SVMs, however, the L2 kernel classifier does not involve a regularization parameter. We prove a distribution free concentration inequality for a crossvalidation based estimate of the ISE, and apply this result to deduce an oracle inequality and consistency of the classifier on the sense of both ISE and probability of error. Our results also specialize to give performance guarantees for an existing method of L2 kernel density estimation. 1
Maximumentropy expectationmaximization algorithm for image processing and sensor networks
 in Proc. SPIE Electronic Imaging: Science and Technology Conf. Visual Communications and Image Processing
, 2007
"... Abstract—In this paper, we propose a maximumentropy expectationmaximization (MEEM) algorithm. We use the proposed algorithm for density estimation. The maximumentropy constraint is imposed for smoothness of the estimated density function. The derivation of the MEEM algorithm requires determinatio ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Abstract—In this paper, we propose a maximumentropy expectationmaximization (MEEM) algorithm. We use the proposed algorithm for density estimation. The maximumentropy constraint is imposed for smoothness of the estimated density function. The derivation of the MEEM algorithm requires determination of the covariance matrix in the framework of the maximumentropy likelihood function, which is difficult to solve analytically. We, therefore, derive the MEEM algorithm by optimizing a lowerbound of the maximumentropy likelihood function. We note that the classical expectationmaximization (EM) algorithm has been employed previously for 2D density estimation. We propose to extend the use of the classical EM algorithm for image recovery from randomly sampled data and sensor field estimation from randomly scattered sensor networks. We further propose to use our approach in density estimation, image recovery and sensor field estimation. Computer simulation experiments are used to demonstrate the superior performance of the proposed MEEM algorithm in comparison to existing methods. Index Terms—Expectationmaximization (EM), Gaussian mixture model (GMM), image reconstrution, Kernel density estimation,
A COMPLETE GRADIENT CLUSTERING ALGORITHM FORMED WITH KERNEL ESTIMATORS
"... The aim of this paper is to provide a gradient clustering algorithm in its complete form, suitable for direct use without requiring a deeper statistical knowledge. The values of all parameters are effectively calculated using optimizing procedures. Moreover, an illustrative analysis of the meaning o ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
The aim of this paper is to provide a gradient clustering algorithm in its complete form, suitable for direct use without requiring a deeper statistical knowledge. The values of all parameters are effectively calculated using optimizing procedures. Moreover, an illustrative analysis of the meaning of particular parameters is shown, followed by the effects resulting from possible modifications with respect to their primarily assigned optimal values. The proposed algorithm does not demand strict assumptions regarding the desired number of clusters, which allows the obtained number to be better suited to a real data structure. Moreover, a feature specific to it is the possibility to influence the proportion between the number of clusters in areas where data elements are dense as opposed to their sparse regions. Finally, the algorithm—by the detection of oneelement clusters—allows identifying atypical elements, which enables their elimination or possible designation to bigger clusters, thus increasing the homogeneity of the data set.