Results 1  10
of
77
Estimating Continuous Distributions in Bayesian Classifiers
 In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence
, 1995
"... When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated by a single Gaussian. In this paper we abandon the normality ..."
Abstract

Cited by 314 (2 self)
 Add to MetaCart
When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated by a single Gaussian. In this paper we abandon the normality assumption and instead use statistical methods for nonparametric density estimation. For a naive Bayesian classifier, we present experimental results on a variety of natural and artificial domains, comparing two methods of density estimation: assuming normality and modeling each conditional distribution with a single Gaussian; and using nonparametric kernel density estimation. We observe large reductions in error on several natural and artificial data sets, which suggests that kernel estimation is a useful tool for learning Bayesian models. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers, San Mateo, 1995 1 Introduction In rec...
Toward efficient agnostic learning
 In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory
, 1992
"... Abstract. In this paper we initiate an investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions. The ultimate goal in this direction is informally termed agnostic learning, in which we make virtua ..."
Abstract

Cited by 194 (7 self)
 Add to MetaCart
Abstract. In this paper we initiate an investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions. The ultimate goal in this direction is informally termed agnostic learning, in which we make virtually no assumptions on the target function. The name derives from the fact that as designers of learning algorithms, we give up the belief that Nature (as represented by the target function) has a simple or succinct explanation. We give a number of positive and negative results that provide an initial outline of the possibilities for agnostic learning. Our results include hardness results for the most obvious generalization of the PAC model to an agnostic setting, an efficient and general agnostic learning method based on dynamic programming, relationships between loss functions for agnostic learning, and an algorithm for a learning problem that involves hidden variables.
On the learnability of discrete distributions
 In The 25th Annual ACM Symposium on Theory of Computing
, 1994
"... We introduce and investigate a new model of learning probability distributions from independent draws. Our model is inspired by the popular Probably Approximately Correct (PAC) model for learning boolean functions from labeled ..."
Abstract

Cited by 93 (11 self)
 Add to MetaCart
We introduce and investigate a new model of learning probability distributions from independent draws. Our model is inspired by the popular Probably Approximately Correct (PAC) model for learning boolean functions from labeled
Mutual Information, Metric Entropy, and Cumulative Relative Entropy Risk
 Annals of Statistics
, 1996
"... Assume fP ` : ` 2 \Thetag is a set of probability distributions with a common dominating measure on a complete separable metric space Y . A state ` 2 \Theta is chosen by Nature. A statistician gets n independent observations Y 1 ; : : : ; Y n from Y distributed according to P ` . For each time ..."
Abstract

Cited by 40 (2 self)
 Add to MetaCart
Assume fP ` : ` 2 \Thetag is a set of probability distributions with a common dominating measure on a complete separable metric space Y . A state ` 2 \Theta is chosen by Nature. A statistician gets n independent observations Y 1 ; : : : ; Y n from Y distributed according to P ` . For each time t between 1 and n, based on the observations Y 1 ; : : : ; Y t\Gamma1 , the statistician produces an estimated distribution P t for P ` , and suffers a loss L(P ` ; P t ). The cumulative risk for the statistician is the average total loss up to time n. Of special interest in information theory, data compression, mathematical finance, computational learning theory and statistical mechanics is the special case when the loss L(P ` ; P t ) is the relative entropy between the true distribution P ` and the estimated distribution P t . Here the cumulative Bayes risk from time 1 to n is the mutual information between the random parameter \Theta and the observations Y 1 ; : : : ;...
Probability Density Estimation from Optimally Condensed Data Samples
 IEEE Trans. Pattern Analysis and Machine Intelligence
, 2003
"... Abstract—The requirement to reduce the computational cost of evaluating a point probability density estimate when employing a Parzen window estimator is a wellknown problem. This paper presents the Reduced Set Density Estimator that provides a kernelbased density estimator which employs a small per ..."
Abstract

Cited by 37 (0 self)
 Add to MetaCart
Abstract—The requirement to reduce the computational cost of evaluating a point probability density estimate when employing a Parzen window estimator is a wellknown problem. This paper presents the Reduced Set Density Estimator that provides a kernelbased density estimator which employs a small percentage of the available data sample and is optimal in the L2 sense. While only requiring OðN 2 Þ optimization routines to estimate the required kernel weighting coefficients, the proposed method provides similar levels of performance accuracy and sparseness of representation as Support Vector Machine density estimation, which requires OðN 3 Þ optimization routines, and which has previously been shown to consistently outperform Gaussian Mixture Models. It is also demonstrated that the proposed density estimator consistently provides superior density estimates for similar levels of data reduction to that provided by the recently proposed DensityBased Multiscale Data Condensation algorithm and, in addition, has comparable computational scaling. The additional advantage of the proposed method is that no extra free parameters are introduced such as regularization, bin width, or condensation ratios, making this method a very simple and straightforward approach to providing a reduced set density estimator with comparable accuracy to that of the full sample Parzen density estimator. Index Terms—Kernel density estimation, Parzen window, data condensation, sparse representation. 1
Universal smoothing factor selection in density estimation: theory and practice (with discussion
 Test
, 1997
"... In earlier work with Gabor Lugosi, we introduced a method to select a smoothing factor for kernel density estimation such that, for all densities in all dimensions, the L1 error of the corresponding kernel estimate is not larger than 3+e times the error of the estimate with the optimal smoothing fac ..."
Abstract

Cited by 23 (10 self)
 Add to MetaCart
In earlier work with Gabor Lugosi, we introduced a method to select a smoothing factor for kernel density estimation such that, for all densities in all dimensions, the L1 error of the corresponding kernel estimate is not larger than 3+e times the error of the estimate with the optimal smoothing factor plus a constant times Ov~~n/n, where n is the sample size, and the constant only depends on the complexity of the kernel used in the estimate. The result is nonasymptotic, that is, the bound is valid for each n. The estimate uses ideas from the minimum distance estimation work of Yatracos. We present a practical implementation of this estimate, report on some comparative results, and highlight some key properties of the new method.
Estimating The Square Root Of A Density Via Compactly Supported Wavelets
, 1997
"... This paper addresses the problem of univariate density estimation in a novel way. Our approach falls in the class of so called projection estimators, introduced by ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
This paper addresses the problem of univariate density estimation in a novel way. Our approach falls in the class of so called projection estimators, introduced by
Simplifying mixture models through function approximation
 IEEE Transactions on Neural Networks
, 2010
"... The finite mixture model is widely used in various statistical learning problems. However, the model obtained may contain a large number of components, making it inefficient in practical applications. In this paper, we propose to simplify the mixture model by first grouping similar components togeth ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
The finite mixture model is widely used in various statistical learning problems. However, the model obtained may contain a large number of components, making it inefficient in practical applications. In this paper, we propose to simplify the mixture model by first grouping similar components together and then performing local fitting through function approximation. By using the squared loss to measure the distance between mixture models, our algorithm naturally combines the two different tasks of component clustering and model simplification. The proposed method can be used to speed up various algorithms that use mixture models during training (e.g., Bayesian filtering, belief propagation) or testing (e.g., kernel density estimation, SVM testing). Encouraging results are observed in the experiments on density estimation, clusteringbased image segmentation and simplification of SVM decision functions.
A Bayesian Approach to Bandwidth Selection for Multivariate Kernel Density Estimation
, 2004
"... Abstract: Kernel density estimation for multivariate data is an important technique that has a wide range of applications. However, it has received significantly less attention than its univariate counterpart. The lower level of interest in multivariate kernel density estimation is mainly due to the ..."
Abstract

Cited by 12 (6 self)
 Add to MetaCart
Abstract: Kernel density estimation for multivariate data is an important technique that has a wide range of applications. However, it has received significantly less attention than its univariate counterpart. The lower level of interest in multivariate kernel density estimation is mainly due to the increased difficulty in deriving an optimal datadriven bandwidth as the dimension of the data increases. We provide Markov chain Monte Carlo (MCMC) algorithms for estimating optimal bandwidth matrices for multivariate kernel density estimation. Our approach is based on treating the elements of the bandwidth matrix as parameters whose posterior density can be obtained through the likelihood crossvalidation criterion. Numerical studies for bivariate data show that the MCMC algorithm generally performs better than the plugin algorithm under the KullbackLeibler information criterion, and is as good as the plugin algorithm under the mean integrated squared error (MISE) criterion. Numerical studies for five dimensional data show that our algorithm is superior to the normal reference rule. Our MCMC algorithm is the first datadriven bandwidth selector for multivariate kernel density estimation that is applicable to data of any dimension. Keywords: Crossvalidation; KullbackLeibler information; Mean integrated squared errors;
Projection Pursuit Discriminant Analysis
 Computational Statistics and Data Analysis
, 1993
"... this paper was also carried out in part within the Sonderforschungsbereich 373 at Humboldt University Berlin. The paper was printed using funds made available by the Deutsche Forschungsgemeinschaft 1 1 Introduction ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
this paper was also carried out in part within the Sonderforschungsbereich 373 at Humboldt University Berlin. The paper was printed using funds made available by the Deutsche Forschungsgemeinschaft 1 1 Introduction