Results 1 - 10
of
57
Estimating Continuous Distributions in Bayesian Classifiers
- In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence
, 1995
"... When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated by a single Gaussian. In this paper we abandon the normality ..."
Abstract
-
Cited by 243 (2 self)
- Add to MetaCart
When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated by a single Gaussian. In this paper we abandon the normality assumption and instead use statistical methods for nonparametric density estimation. For a naive Bayesian classifier, we present experimental results on a variety of natural and artificial domains, comparing two methods of density estimation: assuming normality and modeling each conditional distribution with a single Gaussian; and using nonparametric kernel density estimation. We observe large reductions in error on several natural and artificial data sets, which suggests that kernel estimation is a useful tool for learning Bayesian models. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers, San Mateo, 1995 1 Introduction In rec...
Toward efficient agnostic learning
- In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory
, 1992
"... Abstract. In this paper we initiate an investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions. The ultimate goal in this direction is informally termed agnostic learning, in which we make virtua ..."
Abstract
-
Cited by 169 (7 self)
- Add to MetaCart
Abstract. In this paper we initiate an investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions. The ultimate goal in this direction is informally termed agnostic learning, in which we make virtually no assumptions on the target function. The name derives from the fact that as designers of learning algorithms, we give up the belief that Nature (as represented by the target function) has a simple or succinct explanation. We give a number of positive and negative results that provide an initial outline of the possibilities for agnostic learning. Our results include hardness results for the most obvious generalization of the PAC model to an agnostic setting, an efficient and general agnostic learning method based on dynamic programming, relationships between loss functions for agnostic learning, and an algorithm for a learning problem that involves hidden variables.
On the learnability of discrete distributions
- In The 25th Annual ACM Symposium on Theory of Computing
, 1994
"... We introduce and investigate a new model of learning probability distributions from independent draws. Our model is inspired by the popular Probably Approximately Correct (PAC) model for learning boolean functions from labeled ..."
Abstract
-
Cited by 78 (10 self)
- Add to MetaCart
We introduce and investigate a new model of learning probability distributions from independent draws. Our model is inspired by the popular Probably Approximately Correct (PAC) model for learning boolean functions from labeled
Mutual Information, Metric Entropy, and Cumulative Relative Entropy Risk
- Annals of Statistics
, 1996
"... Assume fP ` : ` 2 \Thetag is a set of probability distributions with a common dominating measure on a complete separable metric space Y . A state ` 2 \Theta is chosen by Nature. A statistician gets n independent observations Y 1 ; : : : ; Y n from Y distributed according to P ` . For each time ..."
Abstract
-
Cited by 30 (2 self)
- Add to MetaCart
Assume fP ` : ` 2 \Thetag is a set of probability distributions with a common dominating measure on a complete separable metric space Y . A state ` 2 \Theta is chosen by Nature. A statistician gets n independent observations Y 1 ; : : : ; Y n from Y distributed according to P ` . For each time t between 1 and n, based on the observations Y 1 ; : : : ; Y t\Gamma1 , the statistician produces an estimated distribution P t for P ` , and suffers a loss L(P ` ; P t ). The cumulative risk for the statistician is the average total loss up to time n. Of special interest in information theory, data compression, mathematical finance, computational learning theory and statistical mechanics is the special case when the loss L(P ` ; P t ) is the relative entropy between the true distribution P ` and the estimated distribution P t . Here the cumulative Bayes risk from time 1 to n is the mutual information between the random parameter \Theta and the observations Y 1 ; : : : ;...
Probability Density Estimation from Optimally Condensed Data Samples
- IEEE Trans. Pattern Analysis and Machine Intelligence
, 2003
"... Abstract—The requirement to reduce the computational cost of evaluating a point probability density estimate when employing a Parzen window estimator is a well-known problem. This paper presents the Reduced Set Density Estimator that provides a kernelbased density estimator which employs a small per ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
Abstract—The requirement to reduce the computational cost of evaluating a point probability density estimate when employing a Parzen window estimator is a well-known problem. This paper presents the Reduced Set Density Estimator that provides a kernelbased density estimator which employs a small percentage of the available data sample and is optimal in the L2 sense. While only requiring OðN 2 Þ optimization routines to estimate the required kernel weighting coefficients, the proposed method provides similar levels of performance accuracy and sparseness of representation as Support Vector Machine density estimation, which requires OðN 3 Þ optimization routines, and which has previously been shown to consistently outperform Gaussian Mixture Models. It is also demonstrated that the proposed density estimator consistently provides superior density estimates for similar levels of data reduction to that provided by the recently proposed Density-Based Multiscale Data Condensation algorithm and, in addition, has comparable computational scaling. The additional advantage of the proposed method is that no extra free parameters are introduced such as regularization, bin width, or condensation ratios, making this method a very simple and straightforward approach to providing a reduced set density estimator with comparable accuracy to that of the full sample Parzen density estimator. Index Terms—Kernel density estimation, Parzen window, data condensation, sparse representation. 1
Universal smoothing factor selection in density estimation: theory and practice (with discussion
- Test
, 1997
"... In earlier work with Gabor Lugosi, we introduced a method to select a smoothing factor for kernel density estimation such that, for all densities in all dimensions, the L1 error of the corresponding kernel estimate is not larger than 3+e times the error of the estimate with the optimal smoothing fac ..."
Abstract
-
Cited by 19 (10 self)
- Add to MetaCart
In earlier work with Gabor Lugosi, we introduced a method to select a smoothing factor for kernel density estimation such that, for all densities in all dimensions, the L1 error of the corresponding kernel estimate is not larger than 3+e times the error of the estimate with the optimal smoothing factor plus a constant times Ov~--~-n/n, where n is the sample size, and the constant only depends on the complexity of the kernel used in the estimate. The result is nonasymptotic, that is, the bound is valid for each n. The estimate uses ideas from the minimum distance estimation work of Yatracos. We present a practical implementation of this estimate, report on some comparative results, and highlight some key properties of the new method.
Estimating The Square Root Of A Density Via Compactly Supported Wavelets
, 1997
"... This paper addresses the problem of univariate density estimation in a novel way. Our approach falls in the class of so called projection estimators, introduced by ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
This paper addresses the problem of univariate density estimation in a novel way. Our approach falls in the class of so called projection estimators, introduced by
Simplifying mixture models through function approximation
- In NIPS
, 2006
"... Finite mixture model is a powerful tool in many statistical learning problems. In this paper, we propose a general, structure-preserving approach to reduce its model complexity, which can bring significant computational benefits in many applications. The basic idea is to group the original mixture c ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Finite mixture model is a powerful tool in many statistical learning problems. In this paper, we propose a general, structure-preserving approach to reduce its model complexity, which can bring significant computational benefits in many applications. The basic idea is to group the original mixture components into compact clusters, and then minimize an upper bound on the approximation error between the original and simplified models. By adopting the L2 norm as the distance measure between mixture models, we can derive closed-form solutions that are more robust and reliable than using the KL-based distance measure. Moreover, the complexity of our algorithm is only linear in the sample size and dimensionality. Experiments on density estimation and clustering-based image segmentation demonstrate its outstanding performance in terms of both speed and accuracy. 1
Projection Pursuit Discriminant Analysis
- Computational Statistics and Data Analysis
, 1993
"... this paper was also carried out in part within the Sonderforschungsbereich 373 at Humboldt University Berlin. The paper was printed using funds made available by the Deutsche Forschungsgemeinschaft 1 1 Introduction ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
this paper was also carried out in part within the Sonderforschungsbereich 373 at Humboldt University Berlin. The paper was printed using funds made available by the Deutsche Forschungsgemeinschaft 1 1 Introduction
Nonparametric Density Estimation using Wavelets
, 1995
"... Here the problem of density estimation using wavelets is considered. Nonparametric wavelet density estimators have recently been proposed and seem to outperform classical estimators in representing discontinuities and local oscillations. The purpose of this paper is to give a review of di#erent ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Here the problem of density estimation using wavelets is considered. Nonparametric wavelet density estimators have recently been proposed and seem to outperform classical estimators in representing discontinuities and local oscillations. The purpose of this paper is to give a review of di#erent types of wavelet density estimators proposed in the literature. Properties, comparisons with classical estimators and applications are stressed. Multivariate extensions are considered. Performances of wavelet estimators are analyzed using a family of normal mixture densities and the Old Faithful Geyser dataset. Key words and phrases: Nonparametric Density Estimation, Wavelets. AMS Subject Classification: 62G07, 42A06. 1 Introduction In nonparametric theory, density estimation is perhaps one of the most investigated topics. Let X 1 , , X n be a sample of size n from an unknown probability density function f . The purpose is to estimate f without any assumption on its form. In this pa...

