Results 1 
5 of
5
On discriminative Bayesian network classifiers and logistic regression
 Machine Learning
"... Abstract. Discriminative learning of the parameters in the naive Bayes model is known to be equivalent to a logistic regression problem. Here we show that the same fact holds for much more general Bayesian network models, as long as the corresponding network structure satisfies a certain graphtheor ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Discriminative learning of the parameters in the naive Bayes model is known to be equivalent to a logistic regression problem. Here we show that the same fact holds for much more general Bayesian network models, as long as the corresponding network structure satisfies a certain graphtheoretic property. The property holds for naive Bayes but also for more complex structures such as treeaugmented naive Bayes (TAN) as well as for mixed diagnosticdiscriminative structures. Our results imply that for networks satisfying our property, the conditional likelihood cannot have local maxima so that the global maximum can be found by simple local optimization methods. We also show that if this property does not hold, then in general the conditional likelihood can have local, nonglobal maxima. We illustrate our theoretical results by empirical experiments with local optimization in a conditional naive Bayes model. Furthermore, we provide a heuristic strategy for pruning the number of parameters and relevant features in such models. For many data sets, we obtain good results with heavily pruned submodels containing many fewer parameters than the original naive Bayes model.
Efficient determination of dynamic split points in a decision tree
 In Proceedings of the 1st IEEE International Conference Anytime Induction of Lowcost, Lowerror Classifiers on Data Mining (ICDM2001
, 2001
"... We consider the problem of choosing split points for continuous predictor variables in a decision tree. Previous approaches to this problem typically either (1) discretize the continuous predictor values prior to learning or (2) apply a dynamic method that considers all possible split points for e ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
We consider the problem of choosing split points for continuous predictor variables in a decision tree. Previous approaches to this problem typically either (1) discretize the continuous predictor values prior to learning or (2) apply a dynamic method that considers all possible split points for each potential split. In this paper, we describe a number of alternative approaches that generate a small number of candidate split points dynamically with little overhead. We argue that these approaches are preferable to prediscretization, and provide experimental evidence that they yield probabilistic decision trees with the same prediction accuracy as the traditional dynamic approach. Furthermore, because the time to grow a decision tree is proportional to the number of split points evaluated, our approach is significantly faster than the traditional dynamic approach. 1
On TextBased Estimation of Document Relevance
, 2004
"... This work is part of a proactive information retrieval project that aims at estimating relevance from implicit user feedback. The noisy feedback signal needs to be complemented with all available information, and textual content is one of the natural sources. Here we take the first steps by investig ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
This work is part of a proactive information retrieval project that aims at estimating relevance from implicit user feedback. The noisy feedback signal needs to be complemented with all available information, and textual content is one of the natural sources. Here we take the first steps by investigating whether this source is at all useful in the challenging setting of estimating the relevance of a new document based on only few samples with known relevance. It turns out that even sophisticated unsupervised methods like multinomial PCA (or Latent Dirichlet Allocation) cannot help much. By contrast, feature extraction supervised by relevant auxiliary data may help. I.
Credal Ensembles of Classifiers
"... It is studied how to aggregate the probabilistic predictions generated by different SPODE (SuperParentOneDependence Estimators) classifiers. It is shown that aggregating such predictions via compressionbased weights achieves a slight but consistent improvement of performance over previously exis ..."
Abstract
 Add to MetaCart
(Show Context)
It is studied how to aggregate the probabilistic predictions generated by different SPODE (SuperParentOneDependence Estimators) classifiers. It is shown that aggregating such predictions via compressionbased weights achieves a slight but consistent improvement of performance over previously existing aggregation methods, including Bayesian Model Averaging and simple average (the approach adopted by the AODE algorithm). Then, attention is given to the problem of choosing the prior probability distribution over the models; this is an important issue in any Bayesian ensemble of models. To robustly deal with the choice of the prior, the single prior over the models is substituted by a set of priors over the models (credal set), thus obtaining a credal ensemble of Bayesian classifiers. The credal ensemble recognizes the priordependent instances, namely the instances whose most probable class varies when different prior over the models are considered. When faced with priordependent instances, the credal ensemble remains reliable by returning a set of classes rather than a single class. Two credal ensembles of SPODEs are developed; the first generalizes the Bayesian Model Averaging and the second the compressionbased aggregation. Extensive experiments show that the novel ensembles compare favorably to traditional methods for aggregating SPODEs and also to previous credal classifiers.
Compressionbased AODE Classifiers
"... Abstract. We propose the COMPAODE classifier, which adopts the compressionbased approach [1] to average the posterior probabilities computed by different nonnaive classifiers (SPODEs). COMPAODE improves classification performance over the wellknown AODE [10] model. COMPAODE assumes a uniform pr ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. We propose the COMPAODE classifier, which adopts the compressionbased approach [1] to average the posterior probabilities computed by different nonnaive classifiers (SPODEs). COMPAODE improves classification performance over the wellknown AODE [10] model. COMPAODE assumes a uniform prior over the SPODEs; we then develop the credal classifier COMPAODE*, substituting the uniform prior by a set of priors. COMPAODE * returns more classes when the classification is priordependent, namely if the most probable class varies with the prior adopted over the SPODEs. COMPAODE * achieves higher classification utility than both COMPAODE and AODE. 1