Results 1  10
of
36
The maxmin hillclimbing bayesian network structure learning algorithm
 Machine Learning
, 2006
"... Abstract. We present a new algorithm for Bayesian network structure learning, called MaxMin HillClimbing (MMHC). The algorithm combines ideas from local learning, constraintbased, and searchandscore techniques in a principled and effective way. It first reconstructs the skeleton of a Bayesian n ..."
Abstract

Cited by 76 (7 self)
 Add to MetaCart
Abstract. We present a new algorithm for Bayesian network structure learning, called MaxMin HillClimbing (MMHC). The algorithm combines ideas from local learning, constraintbased, and searchandscore techniques in a principled and effective way. It first reconstructs the skeleton of a Bayesian network and then performs a Bayesianscoring greedy hillclimbing search to orient the edges. In our extensive empirical evaluation MMHC outperforms on average and in terms of various metrics several prototypical and stateoftheart algorithms, namely the PC, Sparse Candidate, Three Phase Dependency Analysis, Optimal Reinsertion, Greedy Equivalence Search, and Greedy Search. These are the first empirical results simultaneously comparing most of the major Bayesian network algorithms against each other. MMHC offers certain theoretical advantages, specifically over the Sparse Candidate algorithm, corroborated by our experiments. MMHC and detailed results of our study are publicly available at
Orderingbased search: A simple and effective algorithm for learning Bayesian networks
 In UAI
, 2005
"... One of the basic tasks for Bayesian networks (BNs) is that of learning a network structure from data. The BNlearning problem is NPhard, so the standard solution is heuristic search. Many approaches have been proposed for this task, but only a very small number outperform the baseline of greedy hill ..."
Abstract

Cited by 46 (0 self)
 Add to MetaCart
One of the basic tasks for Bayesian networks (BNs) is that of learning a network structure from data. The BNlearning problem is NPhard, so the standard solution is heuristic search. Many approaches have been proposed for this task, but only a very small number outperform the baseline of greedy hillclimbing with tabu lists; moreover, many of the proposed algorithms are quite complex and hard to implement. In this paper, we propose a very simple and easytoimplement method for addressing this task. Our approach is based on the wellknown fact that the best network (of bounded indegree) consistent with a given node ordering can be found very efficiently. We therefore propose a search not over the space of structures, but over the space of orderings, selecting for each ordering the best network consistent with it. This search space is much smaller, makes more global search steps, has a lower branching factor, and avoids costly acyclicity checks. We present results for this algorithm on both synthetic and real data sets, evaluating both the score of the network found and in the running time. We show that orderingbased search outperforms the standard baseline, and is competitive with recent algorithms that are much harder to implement. 1
Bayesian Network Anomaly Pattern Detection for Disease Outbreaks
 In Proceedings of the Twentieth International Conference on Machine Learning
, 2003
"... Early disease outbreak detection systems typically monitor health care data for irregularities by comparing the distribution of recent data against a baseline distribution. Determining the baseline is dicult due to the presence of dierent trends in health care data, such as trends caused by th ..."
Abstract

Cited by 34 (6 self)
 Add to MetaCart
Early disease outbreak detection systems typically monitor health care data for irregularities by comparing the distribution of recent data against a baseline distribution. Determining the baseline is dicult due to the presence of dierent trends in health care data, such as trends caused by the day of week and by seasonal variations in temperature and weather. Creating the baseline distribution without taking these trends into account can lead to unacceptably high false positive counts and slow detection times.
Unsupervised learning
 Advanced Lectures on Machine Learning
, 2004
"... We give a tutorial and overview of the field of unsupervised learning from the perspective of statistical modelling. Unsupervised learning can be motivated from information theoretic and Bayesian principles. We briefly review basic models in unsupervised learning, including factor analysis, PCA, mix ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
We give a tutorial and overview of the field of unsupervised learning from the perspective of statistical modelling. Unsupervised learning can be motivated from information theoretic and Bayesian principles. We briefly review basic models in unsupervised learning, including factor analysis, PCA, mixtures of Gaussians, ICA, hidden Markov models, statespace models, and many variants and extensions. We derive the EM algorithm and give an overview of fundamental concepts in graphical models, and inference algorithms on graphs. This is followed by a quick tour of approximate Bayesian inference, including Markov chain Monte Carlo (MCMC), Laplace approximation, BIC, variational approximations, and expectation propagation (EP). The aim of this chapter is to provide a highlevel view of the field. Along the way, many stateoftheart ideas and future directions are also reviewed. Contents 1
Detecting anomalous records in categorical datasets
 Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
, 2007
"... We consider the problem of detecting anomalies in high arity categorical datasets. In most applications, anomalies are defined as data points that are ’abnormal’. Quite often we have access to data which consists mostly of normal records, along with a small percentage of unlabelled anomalous records ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
We consider the problem of detecting anomalies in high arity categorical datasets. In most applications, anomalies are defined as data points that are ’abnormal’. Quite often we have access to data which consists mostly of normal records, along with a small percentage of unlabelled anomalous records. We are interested in the problem of unsupervised anomaly detection, where we use the unlabelled data for training, and detect records that do not follow the definition of normality. A standard approach is to create a model of normal data, and compare test records against it. A probabilistic approach builds a likelihood model from the training data. Records are tested for anomalousness based on the complete record likelihood given the probability model. For categorical attributes, bayes nets give a standard representation of the likelihood. While this approach is good at finding outliers in the dataset, it often tends to detect records with attribute values that are rare. Sometimes, just detecting rare values of an attribute is not desired and such outliers are not considered as anomalies in that context. We present an alternative definition of anomalies, and propose an approach of comparing against marginal distributions of attribute subsets. We show that this is a more meaningful way of detecting anomalies, and has a better performance over semisynthetic as well as real world datasets.
ABSTRACT Anomaly Pattern Detection in Categorical Datasets
"... We propose a new method for detecting patterns of anomalies in categorical datasets. We assume that anomalies are generated by some underlying process which affects only a particular subset of the data. Our method consists of two steps: we first use a “local anomaly detector ” to identify individual ..."
Abstract

Cited by 17 (6 self)
 Add to MetaCart
We propose a new method for detecting patterns of anomalies in categorical datasets. We assume that anomalies are generated by some underlying process which affects only a particular subset of the data. Our method consists of two steps: we first use a “local anomaly detector ” to identify individual records with anomalous attribute values, and then detect patterns where the number of anomalous records is higher than expected. Given the set of anomalies flagged by the local anomaly detector, we search over all subsets of the data defined by any set of fixed values of a subset of the attributes, in order to detect selfsimilar patterns of anomalies. We wish to detect any such subset of the test data which displays a significant increase in anomalous activity as compared to the normal behavior of the system (as indicated by the training data). We perform significance testing to determine if the number of anomalies in any subset of the test data is significantly higher than expected, and propose an efficient algorithm to perform this test over all such subsets of the data. We show that this algorithm is able to accurately detect anomalous patterns in realworld hospital, container shipping and network intrusion data.
A Comparison of Statistical and Machine Learning Algorithms on the Task of Link Completion
 In KDD Workshop on Link Analysis for Detecting Complex Behavior
, 2003
"... Link data, consisting of a collection of subsets of entities, can be an important source of information for a variety of fields including the social sciences, biology, criminology, and business intelligence. However, these links may be incomplete, containing one or more unknown members. We consider ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
Link data, consisting of a collection of subsets of entities, can be an important source of information for a variety of fields including the social sciences, biology, criminology, and business intelligence. However, these links may be incomplete, containing one or more unknown members. We consider the problem of link completion, identifying which entities are the most likely missing members of a link given the previously observed links. We concentrate on the case of one missing entity. We compare a variety of recently developed along with standard machine learning and strawman algorithms adjusted to suit the task. The algorithms were tested extensively on a simulated and a range of realworld data sets.
The “ideal parent” structure learning for continuous variable networks
 Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence
, 2004
"... In recent years, there is a growing interest in learning Bayesian networks with continuous variables. Learning the structure of such networks is a computationally expensive procedure, which limits most applications to parameter learning. This problem is even more acute when learning networks with hi ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
In recent years, there is a growing interest in learning Bayesian networks with continuous variables. Learning the structure of such networks is a computationally expensive procedure, which limits most applications to parameter learning. This problem is even more acute when learning networks with hidden variables. We present a general method for significantly speeding the structure search algorithm for continuous variable networks with common parametric distributions. Importantly, our method facilitates the addition of new hidden variables into the network structure efficiently. We demonstrate the method on several data sets, both for learning structure on fully observable data, and for introducing new hidden variables during structure search. 1
Bayesian structure learning using dynamic programming and MCMC
 In UAI, 2007b
"... We show how to significantly speed up MCMC sampling of DAG structures by using a powerful nonlocal proposal based on Koivisto’s dynamic programming (DP) algorithm (11; 10), which computes the exact marginal posterior edge probabilities by analytically summing over orders. Furthermore, we show how s ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
We show how to significantly speed up MCMC sampling of DAG structures by using a powerful nonlocal proposal based on Koivisto’s dynamic programming (DP) algorithm (11; 10), which computes the exact marginal posterior edge probabilities by analytically summing over orders. Furthermore, we show how sampling in DAG space can avoid subtle biases that are introduced by approaches that work only with orders, such as Koivisto’s DP algorithm and MCMC order samplers (6; 5). 1
Finding optimal Bayesian networks by dynamic programming
, 2005
"... Finding the Bayesian network that maximizes a score function is known as structure learning or structure discovery. Most approaches use local search in the space of acyclic digraphs, which is prone to local maxima. Exhaustive enumeration requires superexponential time. In this paper we describe a “ ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
Finding the Bayesian network that maximizes a score function is known as structure learning or structure discovery. Most approaches use local search in the space of acyclic digraphs, which is prone to local maxima. Exhaustive enumeration requires superexponential time. In this paper we describe a “merely ” exponential space/time algorithm for finding a Bayesian network that corresponds to a global maxima of a decomposable scoring function, such as BDeu or BIC. NSF IIS0325581, NSERC PGSB