Results 1  10
of
17
Discriminative Learning of Bayesian Networks via Factorized Conditional LogLikelihood
"... We propose an efficient and parameterfree scoring criterion, the factorized conditional loglikelihood (ˆfCLL), for learning Bayesian network classifiers. The proposed score is an approximation of the conditional loglikelihood criterion. The approximation is devised in order to guarantee decomposa ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We propose an efficient and parameterfree scoring criterion, the factorized conditional loglikelihood (ˆfCLL), for learning Bayesian network classifiers. The proposed score is an approximation of the conditional loglikelihood criterion. The approximation is devised in order to guarantee decomposability over the network structure, as well as efficient estimation of the optimal parameters, achieving the same time and space complexity as the traditional loglikelihood scoring criterion. The resulting criterion has an informationtheoretic interpretation based on interaction information, which exhibits its discriminative nature. To evaluate the performance of the proposed criterion, we present an empirical comparison with stateoftheart classifiers. Results on a large suite of benchmark data sets from the UCI repository show that ˆfCLLtrained classifiers achieve at least as good accuracy as the best compared classifiers, using significantly less computational resources.
Learning Locally Minimax Optimal Bayesian Networks
"... We consider the problem of learning Bayesian network models in a noninformative setting, where the only available information is a set of observational data, and no background knowledge is available. The problem can be divided into two different subtasks: learning the structure of the network (a se ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
We consider the problem of learning Bayesian network models in a noninformative setting, where the only available information is a set of observational data, and no background knowledge is available. The problem can be divided into two different subtasks: learning the structure of the network (a set of independence relations), and learning the parameters of the model (that fix the probability distribution from the set of all distributions consistent with the chosen structure). There are not many theoretical frameworks that consistently handle both these problems together, the Bayesian framework being an exception. In this paper we propose an alternative, informationtheoretic framework which sidesteps some of the technical problems facing the Bayesian approach. The framework is based on the minimaxoptimal Normalized Maximum Likelihood (NML) distribution, which is motivated by the Minimum Description Length (MDL) principle. The resulting model selection criterion is consistent, and it provides a way to construct highly predictive Bayesian network models. Our empirical tests show that the proposed method compares favorably with alternative approaches in both model selection and prediction tasks. 1
A New Hybrid Method for Bayesian Network Learning With Dependency Constraints
"... Abstract — A Bayes net has qualitative and quantitative aspects: The qualitative aspect is its graphical structure that corresponds to correlations among the variables in the Bayes net. The quantitative aspects are the net parameters. This paper develops a hybrid criterion for learning Bayes net str ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract — A Bayes net has qualitative and quantitative aspects: The qualitative aspect is its graphical structure that corresponds to correlations among the variables in the Bayes net. The quantitative aspects are the net parameters. This paper develops a hybrid criterion for learning Bayes net structures that is based on both aspects. We combine model selection criteria measuring data fit with correlation information from statistical tests: Given a sample d, search for a structure G that maximizes score(G, d), over the set of structures G that satisfy the dependencies detected in d. We rely on the statistical test only to accept conditional dependencies, not conditional independencies. We show how to adapt local search algorithms to accommodate the observed dependencies. Simulation studies with GES search and the BDeu/BIC scores provide evidence that the additional dependency information leads to Bayes nets that better fit the target model in distribution and structure. I.
Mind Change Optimal Learning: . . .
, 2007
"... Learning theories play a significant role to machine learning as computability and complexity theories to software engineering. Gold’s language learning paradigm is one cornerstone of modern learning theories. The aim of this thesis is to establish an inductive principle in Gold’s language learning ..."
Abstract
 Add to MetaCart
Learning theories play a significant role to machine learning as computability and complexity theories to software engineering. Gold’s language learning paradigm is one cornerstone of modern learning theories. The aim of this thesis is to establish an inductive principle in Gold’s language learning paradigm to guide the design of machine learning algorithms. We follow the common practice of using the number of mind changes to measure complexity of Gold’s language learning problems, and study efficient learning with respect to mind changes. Our starting point is the idea that a learner that is efficient with respect to mind changes minimizes mind changes not only globally in the entire learning problem, but also locally in subproblems after receiving some evidence. Formalizing this idea leads to the notion of mind change optimality. We characterize mind change complexity of language collections with Cantor’s classic concept of accumulation order. We show that the characteristic property of mind change optimal learners is that they output conjectures (languages) with maximal accumulation order. Therefore, we obtain an inductive principle in Gold’s language learning paradigm based on the simple topological concept accumulation order. The new
Weather Forecasting with Bayesian Networks Causal Modelling
"... objective of the weather model was to investigate the methods and effects of forecasting weather for stations in Southern Africa. Furthermore it consisted of three sections namely, Causal Modelling, Dynamic Bayesian Network Learning, and a Visualization System. This report focuses on causal modellin ..."
Abstract
 Add to MetaCart
objective of the weather model was to investigate the methods and effects of forecasting weather for stations in Southern Africa. Furthermore it consisted of three sections namely, Causal Modelling, Dynamic Bayesian Network Learning, and a Visualization System. This report focuses on causal modelling. The Naive Bayes, K2, LK2, and Greedy Thick Thinning algorithms were implemented and evaluated. The results show that the Naive Bayes algorithm constructs networks in the shortest time but with the lowest predictive accuracy. The networks produced by the LK2 algorithm forecasted with the highest predictive accuracy when using precipitation data, while the Greedy Thick Thinning algorithm produced the highest predictive accuracy networks for minimum and maximum temperature data. Keywords: Graph algorithms, Computations on discrete structures.
Adaptive Thresholding in Structure Learning of a Bayesian Network
"... Thresholding a measure in conditional independence (CI) tests using a fixed value enables learning and removing edges as part of learning a Bayesian network structure. However, the learned structure is sensitive to the threshold that is commonly selected: 1) arbitrarily; 2) irrespective of character ..."
Abstract
 Add to MetaCart
Thresholding a measure in conditional independence (CI) tests using a fixed value enables learning and removing edges as part of learning a Bayesian network structure. However, the learned structure is sensitive to the threshold that is commonly selected: 1) arbitrarily; 2) irrespective of characteristics of the domain; and 3) fixed for all CI tests. We analyze the impact on mutual information – a CI measure – of factors, such as sample size, degree of variable dependence, and variables ’ cardinalities. Following, we suggest to adaptively threshold individual tests based on the factors. We show that adaptive thresholds better distinguish between pairs of dependent variables and pairs of independent variables and enable learning structures more accurately and quickly than when using fixed thresholds. 1
SparsityBoost: A New Scoring Function for Learning Bayesian Network Structure
"... We give a new consistent scoring function for structure learning of Bayesian networks. In contrast to traditional approaches to scorebased structure learning, such as BDeu or MDL, the complexity penalty that we propose is datadependent and is given by the probability that a conditional independence ..."
Abstract
 Add to MetaCart
We give a new consistent scoring function for structure learning of Bayesian networks. In contrast to traditional approaches to scorebased structure learning, such as BDeu or MDL, the complexity penalty that we propose is datadependent and is given by the probability that a conditional independence test correctly shows that an edge cannot exist. What really distinguishes this new scoring function from earlier work is that it has the property of becoming computationally easier to maximize as the amount of data increases. We prove a polynomial sample complexity result, showing that maximizing this score is guaranteed to correctly learn a structure with no false edges and a distribution close to the generating distribution, whenever there exists a Bayesian network which is a perfect map for the data generating distribution. Although the new score can be used with any search algorithm, we give empirical results showing that it is particularly effective when used together with a linear programming relaxation approach to Bayesian network structure learning. 1
Efficient Approximation of the Conditional Relative Entropy with Applications to Discriminative Learning of Bayesian Network Classifiers
, 2013
"... entropy ..."
The IMAP Hybrid Method for Learning Gaussian Bayes Nets
"... Abstract. This paper presents the Imap hybrid algorithm for selecting, given a data sample, a linear Gaussian model whose structure is a directed graph. The algorithm performs a local search for a model that meets the following criteria: (1) The Markov blankets in the model should be consistent wit ..."
Abstract
 Add to MetaCart
Abstract. This paper presents the Imap hybrid algorithm for selecting, given a data sample, a linear Gaussian model whose structure is a directed graph. The algorithm performs a local search for a model that meets the following criteria: (1) The Markov blankets in the model should be consistent with dependency information from statistical tests. (2) Minimize the number of edges subject to the first constraint. (3) Maximize a given score function subject to the first two constraints. Our local search is based on Graph Equivalence Search (GES); we also apply the recently developed SIN statistical testing strategy to help avoid local minima. Simulation studies with GES search and the BIC score provide evidence that for nets with 10 or more variables, the hybrid method selects simpler graphs whose structure is closer to the target graph. 1