Results 1 -
7 of
7
Discriminatively Trained Markov Model for Sequence Classification
"... In this paper, we propose a discriminative counterpart of the directed Markov Models of order k - 1, or MM(k-1) for sequence classification. MM(k-1) models capture dependencies among neighboring elements of a sequence. The parameters of the classifiers are initialized to based on the maximum likeli ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
In this paper, we propose a discriminative counterpart of the directed Markov Models of order k - 1, or MM(k-1) for sequence classification. MM(k-1) models capture dependencies among neighboring elements of a sequence. The parameters of the classifiers are initialized to based on the maximum likelihood estimates for their generative counterparts. We derive gradient based update equations for the parameters of the sequence classifiers in order to maximize the conditional likelihood function. Results of our experiments with data sets drawn from biological sequence classification (specifically protein function and subcellular localization) and text classification applications show that the discriminatively trained sequence classifiers outperform their generative counterparts, confirming the benefits of discriminative training when the primary objective is classification. Our experiments also show that the discriminatively trained MM(k 1) sequence classifiers are competitive with the computationally much more expensive Support Vector Machines trained using k-gram representations of sequences.
Robust bayesian linear classifier ensembles
- Proc. 16th European Conf. Machine Learning, Lecture Notes in Computer Science
, 2005
"... Abstract. Ensemble classifiers combine the classification results of several classifiers. Simple ensemble methods such as uniform averaging over a set of models usually provide an improvement over selecting the single best model. Usually probabilistic classifiers restrict the set of possible models ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Abstract. Ensemble classifiers combine the classification results of several classifiers. Simple ensemble methods such as uniform averaging over a set of models usually provide an improvement over selecting the single best model. Usually probabilistic classifiers restrict the set of possible models that can be learnt in order to lower computational complexity costs. In these restricted spaces, where incorrect modelling assumptions are possibly made, uniform averaging sometimes performs even better than bayesian model averaging. Linear mixtures over sets of models provide an space that includes uniform averaging as a particular case. We develop two algorithms for learning maximum a posteriori weights for linear mixtures, based on expectation maximization and on constrained optimizition. We provide a nontrivial example of the utility of these two algorithms by applying them for one dependence estimators. We develop the conjugate distribution for one dependence estimators and empirically show that uniform averaging is clearly superior to BMA for this family of models. After that we empirically show that the maximum a posteriori linear mixture weights improve accuracy significantly over uniform aggregation.
Bayesian Network Structure Learning by Recursive Autonomy Identification Raanan Yehezkel ∗ Video Analytics Group
"... We propose the recursive autonomy identification (RAI) algorithm for constraint-based (CB) Bayesian network structure learning. The RAI algorithm learns the structure by sequential application of conditional independence (CI) tests, edge direction and structure decomposition into autonomous sub-stru ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We propose the recursive autonomy identification (RAI) algorithm for constraint-based (CB) Bayesian network structure learning. The RAI algorithm learns the structure by sequential application of conditional independence (CI) tests, edge direction and structure decomposition into autonomous sub-structures. The sequence of operations is performed recursively for each autonomous substructure while simultaneously increasing the order of the CI test. While other CB algorithms d-separate structures and then direct the resulted undirected graph, the RAI algorithm combines the two processes from the outset and along the procedure. By this means and due to structure decomposition, learning a structure using RAI requires a smaller number of CI tests of high orders. This reduces the complexity and run-time of the algorithm and increases the accuracy by diminishing the curse-of-dimensionality. When the RAI algorithm learned structures from databases representing synthetic problems, known networks and natural problems, it demonstrated superiority with respect to computational complexity, run-time, structural correctness and classification accuracy over the
Discriminative Scoring of Bayesian Network Classifiers: a Comparative Study
"... We consider the problem of scoring Bayesian Network Classifiers (BNCs) on the basis of the conditional loglikelihood (CLL). Currently, optimization is usually performed in BN parameter space, but for perfect graphs (such as Naive Bayes, TANs and FANs) a mapping to an equivalent Logistic Regression ( ..."
Abstract
- Add to MetaCart
We consider the problem of scoring Bayesian Network Classifiers (BNCs) on the basis of the conditional loglikelihood (CLL). Currently, optimization is usually performed in BN parameter space, but for perfect graphs (such as Naive Bayes, TANs and FANs) a mapping to an equivalent Logistic Regression (LR) model is possible, and optimization can be performed in LR parameter space. We perform an empirical comparison of the efficiency of scoring in BN parameter space, and in LR parameter space using two different mappings. For each parameterization, we study two popular optimization methods: conjugate gradient, and BFGS. Efficiency of scoring is compared on simulated data and data sets from the UCI Machine Learning repository. 1
Order-based Discriminative Structure Learning for Bayesian Network Classifiers
"... We introduce a simple empirical order-based greedy heuristic for learning discriminative Bayesian network structures. We propose two metrics for establishing the ordering of N features. They are based on the conditional mutual information. Given an ordering, we can find the discriminative classifier ..."
Abstract
- Add to MetaCart
We introduce a simple empirical order-based greedy heuristic for learning discriminative Bayesian network structures. We propose two metrics for establishing the ordering of N features. They are based on the conditional mutual information. Given an ordering, we can find the discriminative classifier structure with O (N q) score evaluations (where constant q is the maximum number of parents per node). We present classification results on the UCI repository (Merz, Murphy, & Aha 1997), for a phonetic classification task using the TIMIT database (Lamel, Kassel, & Seneff 1986), and for the MNIST handwritten digit recognition task (Le-Cun et al. 1998). The discriminative structure found by our new procedures significantly outperforms generatively produced structures, and achieves a classification accuracy on par with the best discriminative (naive greedy) Bayesian network learning approach, but does so with a factor of ∼10 speedup. We also show that the advantages of generative discriminatively structured Bayesian network classifiers still hold in the case of missing features. 1
Discriminative Learning of Bayesian Networks via Factorized Conditional Log-Likelihood
"... We propose an efficient and parameter-free scoring criterion, the factorized conditional log-likelihood (ˆfCLL), for learning Bayesian network classifiers. The proposed score is an approximation of the conditional log-likelihood criterion. The approximation is devised in order to guarantee decomposa ..."
Abstract
- Add to MetaCart
We propose an efficient and parameter-free scoring criterion, the factorized conditional log-likelihood (ˆfCLL), for learning Bayesian network classifiers. The proposed score is an approximation of the conditional log-likelihood criterion. The approximation is devised in order to guarantee decomposability over the network structure, as well as efficient estimation of the optimal parameters, achieving the same time and space complexity as the traditional log-likelihood scoring criterion. The resulting criterion has an information-theoretic interpretation based on interaction information, which exhibits its discriminative nature. To evaluate the performance of the proposed criterion, we present an empirical comparison with state-of-the-art classifiers. Results on a large suite of benchmark data sets from the UCI repository show that ˆfCLL-trained classifiers achieve at least as good accuracy as the best compared classifiers, using significantly less computational resources.

