Results 1  10
of
24
Robust bayesian linear classifier ensembles
 In Machine Learning: ECML 2005
, 2005
"... Abstract. Ensemble classifiers combine the classification results of several classifiers. Simple ensemble methods such as uniform averaging over a set of models usually provide an improvement over selecting the single best model. Usually probabilistic classifiers restrict the set of possible models ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Ensemble classifiers combine the classification results of several classifiers. Simple ensemble methods such as uniform averaging over a set of models usually provide an improvement over selecting the single best model. Usually probabilistic classifiers restrict the set of possible models that can be learnt in order to lower computational complexity costs. In these restricted spaces, where incorrect modelling assumptions are possibly made, uniform averaging sometimes performs even better than bayesian model averaging. Linear mixtures over sets of models provide an space that includes uniform averaging as a particular case. We develop two algorithms for learning maximum a posteriori weights for linear mixtures, based on expectation maximization and on constrained optimizition. We provide a nontrivial example of the utility of these two algorithms by applying them for one dependence estimators.We develop the conjugate distribution for one dependence estimators and empirically show that uniform averaging is clearly superior to BMA for this family of models. After that we empirically show that the maximum a posteriori linear mixture weights improve accuracy significantly over uniform aggregation.
Discriminatively Trained Markov Model for Sequence Classification
"... In this paper, we propose a discriminative counterpart of the directed Markov Models of order k  1, or MM(k1) for sequence classification. MM(k1) models capture dependencies among neighboring elements of a sequence. The parameters of the classifiers are initialized to based on the maximum likeli ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
In this paper, we propose a discriminative counterpart of the directed Markov Models of order k  1, or MM(k1) for sequence classification. MM(k1) models capture dependencies among neighboring elements of a sequence. The parameters of the classifiers are initialized to based on the maximum likelihood estimates for their generative counterparts. We derive gradient based update equations for the parameters of the sequence classifiers in order to maximize the conditional likelihood function. Results of our experiments with data sets drawn from biological sequence classification (specifically protein function and subcellular localization) and text classification applications show that the discriminatively trained sequence classifiers outperform their generative counterparts, confirming the benefits of discriminative training when the primary objective is classification. Our experiments also show that the discriminatively trained MM(k 1) sequence classifiers are competitive with the computationally much more expensive Support Vector Machines trained using kgram representations of sequences.
Bayesian Network Structure Learning by Recursive Autonomy Identification
, 2009
"... We propose the recursive autonomy identification (RAI) algorithm for constraintbased (CB) Bayesian network structure learning. The RAI algorithm learns the structure by sequential application of conditional independence (CI) tests, edge direction and structure decomposition into autonomous substru ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
We propose the recursive autonomy identification (RAI) algorithm for constraintbased (CB) Bayesian network structure learning. The RAI algorithm learns the structure by sequential application of conditional independence (CI) tests, edge direction and structure decomposition into autonomous substructures. The sequence of operations is performed recursively for each autonomous substructure while simultaneously increasing the order of the CI test. While other CB algorithms dseparate structures and then direct the resulted undirected graph, the RAI algorithm combines the two processes from the outset and along the procedure. By this means and due to structure decomposition, learning a structure using RAI requires a smaller number of CI tests of high orders. This reduces the complexity and runtime of the algorithm and increases the accuracy by diminishing the curseofdimensionality. When the RAI algorithm learned structures from databases representing synthetic problems, known networks and natural problems, it demonstrated superiority with respect to computational complexity, runtime, structural correctness and classification accuracy over the
Discriminative Learning of Bayesian Networks via Factorized Conditional LogLikelihood
"... We propose an efficient and parameterfree scoring criterion, the factorized conditional loglikelihood (ˆfCLL), for learning Bayesian network classifiers. The proposed score is an approximation of the conditional loglikelihood criterion. The approximation is devised in order to guarantee decomposa ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
We propose an efficient and parameterfree scoring criterion, the factorized conditional loglikelihood (ˆfCLL), for learning Bayesian network classifiers. The proposed score is an approximation of the conditional loglikelihood criterion. The approximation is devised in order to guarantee decomposability over the network structure, as well as efficient estimation of the optimal parameters, achieving the same time and space complexity as the traditional loglikelihood scoring criterion. The resulting criterion has an informationtheoretic interpretation based on interaction information, which exhibits its discriminative nature. To evaluate the performance of the proposed criterion, we present an empirical comparison with stateoftheart classifiers. Results on a large suite of benchmark data sets from the UCI repository show that ˆfCLLtrained classifiers achieve at least as good accuracy as the best compared classifiers, using significantly less computational resources.
Efficient Heuristics for Discriminative Structure Learning of Bayesian Network Classifiers
"... We introduce a simple orderbased greedy heuristic for learning discriminative structure within generative Bayesian network classifiers. We propose two methods for establishing an order of N features. They are based on the conditional mutual information and classification rate (i.e., risk), respecti ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
We introduce a simple orderbased greedy heuristic for learning discriminative structure within generative Bayesian network classifiers. We propose two methods for establishing an order of N features. They are based on the conditional mutual information and classification rate (i.e., risk), respectively. Given an ordering, we can find a discriminative structure withO ( N k+1) score evaluations (where constant k is the treewidth of the subgraph over the attributes). We present results on 25 data sets from the UCI repository, for phonetic classification using the TIMIT database, for a visual surface inspection task, and for two handwritten digit recognition tasks. We provide classification performance for both discriminative and generative parameter learning on both discriminatively and generatively structured networks. The discriminative structure found by our new procedures significantly outperforms generatively produced structures, and achieves a classification accuracy on par with the best discriminative (greedy) Bayesian network learning approach, but does so with a factor of ∼1040 speedup. We also show that the advantages of generative discriminatively structured Bayesian network classifiers still hold in the case of missing features, a case where generative classifiers have an advantage over discriminative classifiers.
Knowledge engineering for bayesian networks: How common are noisymax distributions in practice?
 IN PROCEEDING OF THE 2006 CONFERENCE ON ECAI 2006: 17TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE AUGUST 29 – SEPTEMBER 1, 2006, RIVA DEL GARDA, ITALY
, 2006
"... One problem faced in knowledge engineering for Bayesian networks is the exponential growth of the number of parameters in their conditional probability tables (CPTs). The most common practical solution is application of the noisyOR (or their generalization, the noisyMAX) gates, which take advanta ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
One problem faced in knowledge engineering for Bayesian networks is the exponential growth of the number of parameters in their conditional probability tables (CPTs). The most common practical solution is application of the noisyOR (or their generalization, the noisyMAX) gates, which take advantage of independence of causal interactions and provide a logarithmic reduction of the number of parameters required to specify a CPT. In this paper, we propose an algorithm that fits a noisyMAX distribution to an existing CPT and we apply it to search for noisyMAX gates in three existing practical Bayesian networks. We show that noisyMAX gate provides a surprisingly good fit for as many as 50 % of CPTs in these networks. The importance of this finding is that it provides an empirical justification for the use of the noisyMAX gate as a powerful knowledge engineering tool.
Generative Prior Knowledge for Discriminative Classification
"... We present a novel framework for integrating prior knowledge into discriminative classifiers. Our framework allows discriminative classifiers such as Support Vector Machines (SVMs) to utilize prior knowledge specified in the generative setting. The dual objective of fitting the data and respecting p ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
We present a novel framework for integrating prior knowledge into discriminative classifiers. Our framework allows discriminative classifiers such as Support Vector Machines (SVMs) to utilize prior knowledge specified in the generative setting. The dual objective of fitting the data and respecting prior knowledge is formulated as a bilevel program, which is solved (approximately) via iterative application of secondorder cone programming. To test our approach, we consider the problem of using WordNet (a semantic database of English language) to improve lowsample classification accuracy of newsgroup categorization. WordNet is viewed as an approximate, but readily available source of background knowledge, and our framework is capable of utilizing it in a flexible way. 1.
Alleviating Naive Bayes Attribute Independence Assumption by Attribute Weighting
"... Despite the simplicity of the Naive Bayes classifier, it has continued to perform well against more sophisticated newcomers and has remained, therefore, of great interest to the machine learning community. Of numerous approaches to refining the naive Bayes classifier, attribute weighting has receive ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Despite the simplicity of the Naive Bayes classifier, it has continued to perform well against more sophisticated newcomers and has remained, therefore, of great interest to the machine learning community. Of numerous approaches to refining the naive Bayes classifier, attribute weighting has received less attention than it warrants. Most approaches, perhaps influenced by attribute weighting in other machine learning algorithms, use weighting to place more emphasis on highly predictive attributes than those that are less predictive. In this paper, we argue that for naive Bayes attribute weighting should instead be used to alleviate the conditional independence assumption. Based on this premise, we propose a weighted naive Bayes algorithm, called WANBIA, that selects weights to minimize either the negative conditional log likelihood or the mean squared error objective functions. We perform extensive evaluations and find that WANBIA is a competitive alternative to state of the art classifiers like Random Forest, Logistic Regression and A1DE.
Discriminative Scoring of Bayesian Network Classifiers: a Comparative Study
"... We consider the problem of scoring Bayesian Network Classifiers (BNCs) on the basis of the conditional loglikelihood (CLL). Currently, optimization is usually performed in BN parameter space, but for perfect graphs (such as Naive Bayes, TANs and FANs) a mapping to an equivalent Logistic Regression ( ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
We consider the problem of scoring Bayesian Network Classifiers (BNCs) on the basis of the conditional loglikelihood (CLL). Currently, optimization is usually performed in BN parameter space, but for perfect graphs (such as Naive Bayes, TANs and FANs) a mapping to an equivalent Logistic Regression (LR) model is possible, and optimization can be performed in LR parameter space. We perform an empirical comparison of the efficiency of scoring in BN parameter space, and in LR parameter space using two different mappings. For each parameterization, we study two popular optimization methods: conjugate gradient, and BFGS. Efficiency of scoring is compared on simulated data and data sets from the UCI Machine Learning repository. 1
Varying levels of complexity in transcription factor binding motifs
 Nucleic Acids Res
, 2015
"... Binding of transcription factors to DNA is one of the keystones of gene regulation. The existence of statistical dependencies between binding site positions is widely accepted, while their relevance for computational predictions has been debated. Building probabilistic models of binding sites tha ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Binding of transcription factors to DNA is one of the keystones of gene regulation. The existence of statistical dependencies between binding site positions is widely accepted, while their relevance for computational predictions has been debated. Building probabilistic models of binding sites that may capture dependencies is still challenging, since the most successful motif discovery approaches require numerical optimization techniques, which are not suited for selecting dependency structures. To overcome this issue, we propose sparse local inhomogeneous mixture (Slim) models that combine putative dependency structures in a weighted manner allowing for numerical optimization of dependency structure and model parameters simultaneously. We find that Slim models yield a substantially better prediction performance than previous models on genomic context protein binding microarray data sets and on ChIPseq data sets. To elucidate the reasons for the improved performance, we develop dependency logos, which allow for visual inspection of dependency structures within binding sites. We find that the dependency structures discovered by Slim models are highly diverse and highly transcription factorspecific, which emphasizes the need for flexible dependency models. The observed dependency structures range from broad heterogeneities to sparse dependencies between neighboring and nonneighboring binding site positions.