Results 1  10
of
11
Robust bayesian linear classifier ensembles
 Proc. 16th European Conf. Machine Learning, Lecture Notes in Computer Science
, 2005
"... Abstract. Ensemble classifiers combine the classification results of several classifiers. Simple ensemble methods such as uniform averaging over a set of models usually provide an improvement over selecting the single best model. Usually probabilistic classifiers restrict the set of possible models ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Abstract. Ensemble classifiers combine the classification results of several classifiers. Simple ensemble methods such as uniform averaging over a set of models usually provide an improvement over selecting the single best model. Usually probabilistic classifiers restrict the set of possible models that can be learnt in order to lower computational complexity costs. In these restricted spaces, where incorrect modelling assumptions are possibly made, uniform averaging sometimes performs even better than bayesian model averaging. Linear mixtures over sets of models provide an space that includes uniform averaging as a particular case. We develop two algorithms for learning maximum a posteriori weights for linear mixtures, based on expectation maximization and on constrained optimizition. We provide a nontrivial example of the utility of these two algorithms by applying them for one dependence estimators. We develop the conjugate distribution for one dependence estimators and empirically show that uniform averaging is clearly superior to BMA for this family of models. After that we empirically show that the maximum a posteriori linear mixture weights improve accuracy significantly over uniform aggregation.
Discriminatively Trained Markov Model for Sequence Classification
"... In this paper, we propose a discriminative counterpart of the directed Markov Models of order k  1, or MM(k1) for sequence classification. MM(k1) models capture dependencies among neighboring elements of a sequence. The parameters of the classifiers are initialized to based on the maximum likeli ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
In this paper, we propose a discriminative counterpart of the directed Markov Models of order k  1, or MM(k1) for sequence classification. MM(k1) models capture dependencies among neighboring elements of a sequence. The parameters of the classifiers are initialized to based on the maximum likelihood estimates for their generative counterparts. We derive gradient based update equations for the parameters of the sequence classifiers in order to maximize the conditional likelihood function. Results of our experiments with data sets drawn from biological sequence classification (specifically protein function and subcellular localization) and text classification applications show that the discriminatively trained sequence classifiers outperform their generative counterparts, confirming the benefits of discriminative training when the primary objective is classification. Our experiments also show that the discriminatively trained MM(k 1) sequence classifiers are competitive with the computationally much more expensive Support Vector Machines trained using kgram representations of sequences.
Discriminative Learning of Bayesian Networks via Factorized Conditional LogLikelihood
"... We propose an efficient and parameterfree scoring criterion, the factorized conditional loglikelihood (ˆfCLL), for learning Bayesian network classifiers. The proposed score is an approximation of the conditional loglikelihood criterion. The approximation is devised in order to guarantee decomposa ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We propose an efficient and parameterfree scoring criterion, the factorized conditional loglikelihood (ˆfCLL), for learning Bayesian network classifiers. The proposed score is an approximation of the conditional loglikelihood criterion. The approximation is devised in order to guarantee decomposability over the network structure, as well as efficient estimation of the optimal parameters, achieving the same time and space complexity as the traditional loglikelihood scoring criterion. The resulting criterion has an informationtheoretic interpretation based on interaction information, which exhibits its discriminative nature. To evaluate the performance of the proposed criterion, we present an empirical comparison with stateoftheart classifiers. Results on a large suite of benchmark data sets from the UCI repository show that ˆfCLLtrained classifiers achieve at least as good accuracy as the best compared classifiers, using significantly less computational resources.
Bayesian Network Structure Learning by Recursive Autonomy Identification Raanan Yehezkel ∗ Video Analytics Group
"... We propose the recursive autonomy identification (RAI) algorithm for constraintbased (CB) Bayesian network structure learning. The RAI algorithm learns the structure by sequential application of conditional independence (CI) tests, edge direction and structure decomposition into autonomous substru ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
We propose the recursive autonomy identification (RAI) algorithm for constraintbased (CB) Bayesian network structure learning. The RAI algorithm learns the structure by sequential application of conditional independence (CI) tests, edge direction and structure decomposition into autonomous substructures. The sequence of operations is performed recursively for each autonomous substructure while simultaneously increasing the order of the CI test. While other CB algorithms dseparate structures and then direct the resulted undirected graph, the RAI algorithm combines the two processes from the outset and along the procedure. By this means and due to structure decomposition, learning a structure using RAI requires a smaller number of CI tests of high orders. This reduces the complexity and runtime of the algorithm and increases the accuracy by diminishing the curseofdimensionality. When the RAI algorithm learned structures from databases representing synthetic problems, known networks and natural problems, it demonstrated superiority with respect to computational complexity, runtime, structural correctness and classification accuracy over the
Discriminative Scoring of Bayesian Network Classifiers: a Comparative Study
"... We consider the problem of scoring Bayesian Network Classifiers (BNCs) on the basis of the conditional loglikelihood (CLL). Currently, optimization is usually performed in BN parameter space, but for perfect graphs (such as Naive Bayes, TANs and FANs) a mapping to an equivalent Logistic Regression ( ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We consider the problem of scoring Bayesian Network Classifiers (BNCs) on the basis of the conditional loglikelihood (CLL). Currently, optimization is usually performed in BN parameter space, but for perfect graphs (such as Naive Bayes, TANs and FANs) a mapping to an equivalent Logistic Regression (LR) model is possible, and optimization can be performed in LR parameter space. We perform an empirical comparison of the efficiency of scoring in BN parameter space, and in LR parameter space using two different mappings. For each parameterization, we study two popular optimization methods: conjugate gradient, and BFGS. Efficiency of scoring is compared on simulated data and data sets from the UCI Machine Learning repository. 1
Orderbased Discriminative Structure Learning for Bayesian Network Classifiers
"... We introduce a simple empirical orderbased greedy heuristic for learning discriminative Bayesian network structures. We propose two metrics for establishing the ordering of N features. They are based on the conditional mutual information. Given an ordering, we can find the discriminative classifier ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We introduce a simple empirical orderbased greedy heuristic for learning discriminative Bayesian network structures. We propose two metrics for establishing the ordering of N features. They are based on the conditional mutual information. Given an ordering, we can find the discriminative classifier structure with O (N q) score evaluations (where constant q is the maximum number of parents per node). We present classification results on the UCI repository (Merz, Murphy, & Aha 1997), for a phonetic classification task using the TIMIT database (Lamel, Kassel, & Seneff 1986), and for the MNIST handwritten digit recognition task (LeCun et al. 1998). The discriminative structure found by our new procedures significantly outperforms generatively produced structures, and achieves a classification accuracy on par with the best discriminative (naive greedy) Bayesian network learning approach, but does so with a factor of ∼10 speedup. We also show that the advantages of generative discriminatively structured Bayesian network classifiers still hold in the case of missing features. 1
Generative Prior Knowledge for Discriminative Classification
"... We present a novel framework for integrating prior knowledge into discriminative classifiers. Our framework allows discriminative classifiers such as Support Vector Machines (SVMs) to utilize prior knowledge specified in the generative setting. The dual objective of fitting the data and respecting p ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We present a novel framework for integrating prior knowledge into discriminative classifiers. Our framework allows discriminative classifiers such as Support Vector Machines (SVMs) to utilize prior knowledge specified in the generative setting. The dual objective of fitting the data and respecting prior knowledge is formulated as a bilevel program, which is solved (approximately) via iterative application of secondorder cone programming. To test our approach, we consider the problem of using WordNet (a semantic database of English language) to improve lowsample classification accuracy of newsgroup categorization. WordNet is viewed as an approximate, but readily available source of background knowledge, and our framework is capable of utilizing it in a flexible way. 1.
A Hierarchy of Independence Assumptions for Multirelational Bayes Net Classifiers
"... Abstract—Many databases store data in relational format, with different types of entities and information about their attributes and links between the entities. Linkbased classification (LBC) is the problem of predicting the class attribute of a target entity given the attributes of entities linked ..."
Abstract
 Add to MetaCart
Abstract—Many databases store data in relational format, with different types of entities and information about their attributes and links between the entities. Linkbased classification (LBC) is the problem of predicting the class attribute of a target entity given the attributes of entities linked to it. In this paper we propose a new relational Bayes net classifier method for LBC, which assumes that different links of an object are independently drawn from the same distribution, given attribute information from the linked tables. We show that this assumption allows very fast multirelational Bayes net learning. We define three more independence assumptions for LBC to unify proposals from different researchers in a single novel hierarchy. Our proposed model is at the top and the wellknown multirelational Naive Bayes classifier is at the bottom of this hierarchy. The model in each level of the hierarchy uses a new independence assumption in addition to the assumptions used in the higher levels. In experiments on four benchmark datasets, our proposed link independence model has the best predictive accuracy compared to the hierarchy models and a variety of relational classifiers.
Addressing Missing Values in Kernelbased 1 Multimodal Biometric Fusion using Neutral Point Substitution
"... In multimodal biometric information fusion, it is common to encounter missing modalities in which matching cannot be performed. As a result, at the match score level, this implies that scores will be missing. We address the multimodal fusion problem involving missing modalities (scores) using suppor ..."
Abstract
 Add to MetaCart
In multimodal biometric information fusion, it is common to encounter missing modalities in which matching cannot be performed. As a result, at the match score level, this implies that scores will be missing. We address the multimodal fusion problem involving missing modalities (scores) using support vector machines with the Neutral Point Substitution (NPS) method. The approach starts by processing each modality using a kernel. When a modality is missing, at the kernel level, the missing modality is substituted by one that is unbiased with regards to the classification, called a neutral point. Critically, unlike conventional missingdata substitution methods, explicit calculation of neutral points may be omitted by virtue of their implicit incorporation within the SVM training framework. Experiments based on the publicly available Biosecure DS2 multimodal (scores) data set shows that the SVMNPS approach achieves very good generalization performance compared to the sum rule fusion, especially with severe missing modalities.