The maxmin hillclimbing bayesian network structure learning algorithm
 Machine Learning
, 2006
Abstract. We present a new algorithm for Bayesian network structure learning, called MaxMin HillClimbing (MMHC). The algorithm combines ideas from local learning, constraintbased, and searchandscore techniques in a principled and effective way. It first reconstructs the skeleton of a Bayesian network and then performs a Bayesianscoring greedy hillclimbing search to orient the edges. In our extensive empirical evaluation MMHC outperforms on average and in terms of various metrics several prototypical and stateoftheart algorithms, namely the PC, Sparse Candidate, Three Phase Dependency Analysis, Optimal Reinsertion, Greedy Equivalence Search, and Greedy Search. These are the first empirical results simultaneously comparing most of the major Bayesian network algorithms against each other. MMHC offers certain theoretical advantages, specifically over the Sparse Candidate algorithm, corroborated by our experiments. MMHC and detailed results of our study are publicly available at
Stratified exponential families: Graphical models and model selection
 ANNALS OF STATISTICS
, 2001
Algorithms for Large Scale Markov Blanket Discovery
 In The 16th International FLAIRS Conference, St
, 2003
This paper presents a number of new algorithras for discovering the Markov Blanket of a target variable T from training data. The Markov Blanket can be used for variable selection for classification, for causal discovery, and for Bayesian Network learning. We introduce a loworder polynomial algorithm and several variants that soundly induce the Markov Blanket under certain broad conditions in datasets with thousands of variables and compare them to other stateoftheart local and global methods with excellent results.
Time and Sample Efficient Discovery of Markov Blankets And Direct Causal Relations
 Proceedings of the 9th CAN SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2003
Data Mining with Bayesian Network learning has two important characteristics: under broad conditions learned edges between variables correspond to causal influences, and second, for every variable T in the network a special subset (Markov Blanket) identifiable by the network is the mini mal variable set required to predict T. However, all known algorithms learning a complete BN do not scale up beyond a few hundred variables. On the other hand, all known sound algorithms learning a local region of the network require an exponential number of training instances to the size of the learned region.
A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
We propose a new scoring function for learning Bayesian networks from data using score search algorithms. This is based on the concept of mutual information and exploits some wellknown properties of this measure in a novel way. Essentially, a statistical independence test based on the chisquare distribution, associated with the mutual information measure, together with a property of additive decomposition of this measure, are combined in order to measure the degree of interaction between each variable and its parent variables in the network. The result is a nonBayesian scoring function called MIT (mutual information tests) which belongs to the family of scores based on information theory. The MIT score also represents a penalization of the KullbackLeibler divergence between the joint probability distributions associated with a candidate network and with the available data set. Detailed results of a complete experimental evaluation of the proposed scoring function and its comparison with the wellknown K2, BDeu and BIC/MDL scores are also presented.
Searching for Bayesian Network Structures in the Space of Restricted Acyclic Aprtially Directed Graphs
 Journal of Artificial Intelligence Research
, 2003
Although many algorithms have been designed to construct Bayesian network structures using dierent approaches and principles, they all employ only two methods: those based on independence criteria, and those based on a scoring function and a search procedure (although some methods combine the two). Within the score+search paradigm, the dominant approach uses local search methods in the space of directed acyclic graphs (DAGs), where the usual choices for de ning the elementary modi cations (local changes) that can be applied are arc addition, arc deletion, and arc reversal. In this paper, we propose a new local search method that uses a dierent search space, and which takes account of the concept of equivalence between network structures: restricted acyclic partially directed graphs (RPDAGs). In this way, the number of dierent con gurations of the search space is reduced, thus improving eciency. Moreover, although the nal result must necessarily be a local optimum given the nature of the search method, the topology of the new search space, which avoids making early decisions about the directions of the arcs, may help to nd better local optima than those obtained by searching in the DAG space.
Q.: Learning Bayesian network equivalence classes with ant colony optimization
 Journal of Artificial Intelligence Research
, 2009
Bayesian networks are a useful tool in the representation of uncertain knowledge. This paper proposes a new algorithm called ACOE, to learn the structure of a Bayesian network. It does this by conducting a search through the space of equivalence classes of Bayesian networks using Ant Colony Optimization (ACO). To this end, two novel extensions of traditional ACO techniques are proposed and implemented. Firstly, multiple types of moves are allowed. Secondly, moves can be given in terms of indices that are not based on construction graph nodes. The results of testing show that ACOE performs better than a greedy search and other stateoftheart and metaheuristic algorithms whilst searching in the space of equivalence classes. 1.
Bayesian networks for probabilistic weather prediction
 In Proceedings of the 15th Eureopean Conference on Artificial Intelligence, ECAI’2002
, 2002
Abstract. Several standard approaches have been introduced for meteorological time series prediction (analog techniques, neural networks, etc.). However, when dealing with multivariate spatially distributed time series (e.g., a network of meteorological stations over the Iberian peninsula) the above methods do not consider all the available information (they consider special independency assumptions to simplify the model). In this work, we introduce Bayesian Networks (BNs) in this framework to model the spatial and temporal dependencies among the different stations using a directed acyclic graph. This graph is learnt from the available databases and allows deriving a probabilistic model consistent with all the available information. Afterwards, the resulting model is combined with numerical atmospheric predictions which are given as evidence for the model. Effıcient inference mechanisms provide the conditional distributions of the desired variables at a desired future time. We illustrate the effıciency of the proposed methodology by obtaining precipitation forecasts for 100 stations in the North basin of the Iberian peninsula during Winter 1999. We show how standard analog techniques are a special case of the proposed methodology when no spatial dependencies are considered in the model. 1
A Recursive Method for Structural Learning of Directed Acyclic Graphs
In this paper, we propose a recursive method for structural learning of directed acyclic graphs (DAGs), in which a problem of structural learning for a large DAG is first decomposed into two problems of structural learning for two small vertex subsets, each of which is then decomposed recursively into two problems of smaller subsets until none subset can be decomposed further. In our approach, search for separators of a pair of variables in a large DAG is localized to small subsets, and thus the approach can improve the efficiency of searches and the power of statistical tests for structural learning. We show how the recent advances in the learning of undirected graphical models can be employed to facilitate the decomposition. Simulations are given to demonstrate the performance of the proposed method.
Expert Systems for Forecasting
 In Principles of Forecasting: A Handbook for Researchers and Practitioners
, 2001
Expert systems use rules to represent experts ’ reasoning in solving problems. The rules are based on knowledge about methods and the problem domain. To acquire knowledge for an expert system, one should rely on a variety of sources, such as textbooks, research papers, interviews, surveys, and protocol analysis. Protocol analysis is especially useful if the area to be modeled is complex or if experts lack an awareness of their processes. Expert systems should be easy to use, incorporate the best available knowledge, and reveal the reasoning behind the recommendations they make. In forecasting, the most promising applications of expert systems are to replace unaided judgment in cases requiring many forecasts, to model complex problems where data on the dependent variable are of poor quality, and to handle semistructured problems. We found 15 comparisons of forecast validity involving expert systems. As expected, expert systems were more accurate than unaided judgment, six comparisons to one, with one tie. Expert systems were less accurate than judgmental bootstrapping in two comparisons with two ties. There was little evidence with which to compare