Results 1 -
4 of
4
On Discriminative Bayesian Network Classifiers and Logistic Regression
- Machine Learning
, 2005
"... Discriminative learning of the parameters in the naive Bayes model is known to be equivalent to a logistic regression problem. Here we show that the same fact holds for much more general Bayesian network models, as long as the corresponding network structure satisfies a certain graph-theoretic prope ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Discriminative learning of the parameters in the naive Bayes model is known to be equivalent to a logistic regression problem. Here we show that the same fact holds for much more general Bayesian network models, as long as the corresponding network structure satisfies a certain graph-theoretic property. The property holds for naive Bayes but also for more complex structures such as tree-augmented naive Bayes (TAN) as well as for mixed diagnostic-discriminative structures. Our results imply that for networks satisfying our property, the conditional likelihood cannot have local maxima so that the global maximum can be found by simple local optimization methods. We also show that if this property does not hold, then in general the conditional likelihood can have local, non-global maxima. We illustrate our theoretical results by empirical experiments with local optimization in a conditional naive Bayes model. Furthermore, we provide a heuristic strategy for pruning the number of parameters and relevant features in such models. For many data sets, we obtain good results with heavily pruned submodels containing many fewer parameters than the original naive Bayes model.
On Text-Based Estimation of Document Relevance
, 2004
"... This work is part of a proactive information retrieval project that aims at estimating relevance from implicit user feedback. The noisy feedback signal needs to be complemented with all available information, and textual content is one of the natural sources. Here we take the first steps by investig ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
This work is part of a proactive information retrieval project that aims at estimating relevance from implicit user feedback. The noisy feedback signal needs to be complemented with all available information, and textual content is one of the natural sources. Here we take the first steps by investigating whether this source is at all useful in the challenging setting of estimating the relevance of a new document based on only few samples with known relevance. It turns out that even sophisticated unsupervised methods like multinomial PCA (or Latent Dirichlet Allocation) cannot help much. By contrast, feature extraction supervised by relevant auxiliary data may help. I.
Efficient Determination of Dynamic Split Points in a Decision Tree
"... We consider the problem of choosing split points for continuous predictor variables in a decision tree. Previous approaches to this problem typically either (1) discretize the continuous predictor values prior to learning or (2) apply a dynamic method that considers all possible split points for eac ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We consider the problem of choosing split points for continuous predictor variables in a decision tree. Previous approaches to this problem typically either (1) discretize the continuous predictor values prior to learning or (2) apply a dynamic method that considers all possible split points for each potential split. In this paper, we describe a number of alternative approaches that generate a small number of candidate split points dynamically with little overhead. We argue that these approaches are preferable to pre-discretization, and provide experimental evidence that they yield probabilistic decision trees with the same prediction accuracy as the traditional dynamic approach. Furthermore, because the time to grow a decision tree is proportional to the number of split points evaluated, our approach is significantly faster than the traditional dynamic approach. 1
Compression-based AODE Classifiers
"... Abstract. We propose the COMP-AODE classifier, which adopts the compression-based approach [1] to average the posterior probabilities computed by different non-naive classifiers (SPODEs). COMP-AODE improves classification performance over the wellknown AODE [10] model. COMP-AODE assumes a uniform pr ..."
Abstract
- Add to MetaCart
Abstract. We propose the COMP-AODE classifier, which adopts the compression-based approach [1] to average the posterior probabilities computed by different non-naive classifiers (SPODEs). COMP-AODE improves classification performance over the wellknown AODE [10] model. COMP-AODE assumes a uniform prior over the SPODEs; we then develop the credal classifier COMP-AODE*, substituting the uniform prior by a set of priors. COMP-AODE * returns more classes when the classification is priordependent, namely if the most probable class varies with the prior adopted over the SPODEs. COMP-AODE * achieves higher classification utility than both COMP-AODE and AODE. 1

