Results 1  10
of
39
The maxmin hillclimbing bayesian network structure learning algorithm
 Machine Learning
, 2006
"... Abstract. We present a new algorithm for Bayesian network structure learning, called MaxMin HillClimbing (MMHC). The algorithm combines ideas from local learning, constraintbased, and searchandscore techniques in a principled and effective way. It first reconstructs the skeleton of a Bayesian n ..."
Abstract

Cited by 138 (8 self)
 Add to MetaCart
Abstract. We present a new algorithm for Bayesian network structure learning, called MaxMin HillClimbing (MMHC). The algorithm combines ideas from local learning, constraintbased, and searchandscore techniques in a principled and effective way. It first reconstructs the skeleton of a Bayesian network and then performs a Bayesianscoring greedy hillclimbing search to orient the edges. In our extensive empirical evaluation MMHC outperforms on average and in terms of various metrics several prototypical and stateoftheart algorithms, namely the PC, Sparse Candidate, Three Phase Dependency Analysis, Optimal Reinsertion, Greedy Equivalence Search, and Greedy Search. These are the first empirical results simultaneously comparing most of the major Bayesian network algorithms against each other. MMHC offers certain theoretical advantages, specifically over the Sparse Candidate algorithm, corroborated by our experiments. MMHC and detailed results of our study are publicly available at
Efficient markov network structure discovery using independence tests
 In Proc SIAM Data Mining
, 2006
"... We present two algorithms for learning the structure of a Markov network from discrete data: GSMN and GSIMN. Both algorithms use statistical conditional independence tests on data to infer the structure by successively constraining the set of structures consistent with the results of these tests. GS ..."
Abstract

Cited by 30 (5 self)
 Add to MetaCart
(Show Context)
We present two algorithms for learning the structure of a Markov network from discrete data: GSMN and GSIMN. Both algorithms use statistical conditional independence tests on data to infer the structure by successively constraining the set of structures consistent with the results of these tests. GSMN is a natural adaptation of the GrowShrink algorithm of Margaritis and Thrun for learning the structure of Bayesian networks. GSIMN extends GSMN by additionally exploiting Pearl’s wellknown properties of conditional independence relations to infer novel independencies from known independencies, thus avoiding the need to perform these tests. Experiments on artificial and real data sets show GSIMN can yield savings of up to 70 % with respect to GSMN, while generating a Markov network with comparable or in several cases considerably improved quality. In addition
Efficient Markov Network Discovery Using Particle Filters
"... In this paper we introduce an efficient independencebased algorithm for the induction of the Markov network structure of a domain from the outcomes of independence test conducted on data. Our algorithm utilizes a particle filter (sequential Monte Carlo) method to maintain a population of Markov net ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
In this paper we introduce an efficient independencebased algorithm for the induction of the Markov network structure of a domain from the outcomes of independence test conducted on data. Our algorithm utilizes a particle filter (sequential Monte Carlo) method to maintain a population of Markov network structures that represent the posterior probability distribution over structures, given the outcomes of the tests performed. This enables us to select, at each step, the maximally informative test to conduct next from a pool of candidates according to information gain, which minimizes the cost of the statistical tests conducted on data. This makes our approach useful in domains where independence tests are expensive, such as cases of very large data sets and/or distributed data. In addition, our method maintains multiple candidate structures weighed by posterior probability, which allows flexibility in the presence of potential errors in the test outcomes.
Scalable, efficient and correct learning of Markov boundaries under the faithfulness assumption
 In ECSQARU 05, volume 3571 of LNCS
, 2005
"... Abstract. We propose an algorithm for learning the Markov boundary of a random variable from data without having to learn a complete Bayesian network. The algorithm is correct under the faithfulness assumption, scalable and data efficient. The last two properties are important because we aim to appl ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
Abstract. We propose an algorithm for learning the Markov boundary of a random variable from data without having to learn a complete Bayesian network. The algorithm is correct under the faithfulness assumption, scalable and data efficient. The last two properties are important because we aim to apply the algorithm to identify the minimal set of random variables that is relevant for probabilistic classification in databases with many random variables but few instances. We report experiments with synthetic and real databases with 37, 441 and 139352 random variables showing that the algorithm performs satisfactorily. 1
Toward Provably Correct Feature Selection in Arbitrary Domains
"... In this paper we address the problem of provably correct feature selection in arbitrary domains. An optimal solution to the problem is a Markov boundary, which is a minimal set of features that make the probability distribution of a target variable conditionally invariant to the state of all other f ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
In this paper we address the problem of provably correct feature selection in arbitrary domains. An optimal solution to the problem is a Markov boundary, which is a minimal set of features that make the probability distribution of a target variable conditionally invariant to the state of all other features in the domain. While numerous algorithms for this problem have been proposed, their theoretical correctness and practical behavior under arbitrary probability distributions is unclear. We address this by introducing the Markov Boundary Theorem that precisely characterizes the properties of an ideal Markov boundary, and use it to develop algorithms that learn a more general boundary that can capture complex interactions that only appear when the values of multiple features are considered together. We introduce two algorithms: an exact, provably correct one as well a more practical randomized anytime version, and show that they perform well on artificial as well as benchmark and realworld data sets. Throughout the paper we make minimal assumptions that consist of only a general set of axioms that hold for every probability distribution, which gives these algorithms universal applicability. 1
BASSUM: A Bayesian SemiSupervised Method for Classification Feature Selection
, 2010
"... Feature selection is an important preprocessing step for building efficient, generalizable and interpretable classifiers on high dimensional data sets. Given the assumption on the sufficient labelled samples, the Markov Blanket provides a complete and sound solution to the selection of optimal featu ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Feature selection is an important preprocessing step for building efficient, generalizable and interpretable classifiers on high dimensional data sets. Given the assumption on the sufficient labelled samples, the Markov Blanket provides a complete and sound solution to the selection of optimal features, by exploring the conditional independence relationships among the features. In realworld applications, unfortunately, it is usually easy to get unlabelled samples, but expensive to obtain the corresponding accurate labels on the samples. This leads to the potential waste of valuable classification information buried in unlabelled samples. In this paper, we propose a new BAyesian SemiSUpervised Method, or BASSUM in short, to exploit the values of unlabelled samples on classification feature selection problem. Generally speaking, the inclusion of unlabelled samples helps the feature selection algorithm on 1) pinpointing more specific conditional independence tests involving fewer variable features, and 2)improving the robustness of individual conditional independence tests with additional statistical information. Our experimental results show that BASSUM enhances the efficiency of traditional feature selection methods and overcomes the difficulties on redundant features in existing semisupervised solutions.
Using Markov Blankets for Causal Structure Learning
"... We show how a generic featureselection algorithm returning strongly relevant variables can be turned into a causal structurelearning algorithm. We prove this under the Faithfulness assumption for the data distribution. In a causal graph, the strongly relevant variables for a node X are its parents ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We show how a generic featureselection algorithm returning strongly relevant variables can be turned into a causal structurelearning algorithm. We prove this under the Faithfulness assumption for the data distribution. In a causal graph, the strongly relevant variables for a node X are its parents, children, and children’s parents (or spouses), also known as the Markov blanket of X. Identifying the spouses leads to the detection of the Vstructure patterns and thus to causal orientations. Repeating the task for all variables yields a valid partially oriented causal graph. We first show an efficient way to identify the spouse links. We then perform several experiments in the continuous domain using the Recursive Feature Elimination featureselection algorithm with Support Vector Regression and empirically verify the intuition of this direct (but computationally expensive) approach. Within the same framework, we then devise a fast and consistent algorithm, Total Conditioning (TC), and a variant, TCbw, with an explicit backward featureselection heuristics, for Gaussian data. After running a series of comparative experiments on five artificial networks, we argue that Markov blanket algorithms such as TC/TCbw or GrowShrink scale better than the reference PC algorithm and provides higher structural accuracy.
Learning Boolean Queries for Article Quality Filtering
, 2004
"... Prior research has shown that Support Vector Machine models have the ability to identify high quality contentspecific articles in the domain of internal medicine. These models, though powerful, cannot be used in Boolean search engines nor can the content of the models be verified via human inspecti ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Prior research has shown that Support Vector Machine models have the ability to identify high quality contentspecific articles in the domain of internal medicine. These models, though powerful, cannot be used in Boolean search engines nor can the content of the models be verified via human inspection. In this paper, we use decision trees combined with several feature selection methods to generate Boolean query filters for the same domain and task. The resulting trees are generated automatically and exhibit high performance. The trees are understandable, manageable, and able to be validated by humans. The subsequent Boolean queries are sensible and can be readily used as filters by Boolean search engines. Keywords: