Results 1 - 10
of
32
Estimating high-dimensional directed acyclic graphs with the pc-algorithm
- Journal of Machine Learning Research
, 2005
"... We consider the PC-algorithm (Spirtes et al., 2000) for estimating the skeleton and equivalence class of a very high-dimensional directed acyclic graph (DAG) with corresponding Gaussian distribution. The PC-algorithm is computationally feasible and often very fast for sparse problems with many nodes ..."
Abstract
-
Cited by 31 (4 self)
- Add to MetaCart
We consider the PC-algorithm (Spirtes et al., 2000) for estimating the skeleton and equivalence class of a very high-dimensional directed acyclic graph (DAG) with corresponding Gaussian distribution. The PC-algorithm is computationally feasible and often very fast for sparse problems with many nodes (variables), and it has the attractive property to automatically achieve high computational efficiency as a function of sparseness of the true underlying DAG. We prove uniform consistency of the algorithm for very high-dimensional, sparse DAGs where the number of nodes is allowed to quickly grow with sample size n, as fast as O(n a) for any 0 < a < ∞. The sparseness assumption is rather minimal requiring only that the neighborhoods in the DAG are of lower order than sample size n. We also demonstrate the PC-algorithm for simulated data. Keywords: asymptotic consistency, DAG, graphical model, PC-algorithm, skeleton 1.
Learning Graphical Model Structure using L1-Regularization Paths
"... Sparsity-promoting L1-regularization has recently been succesfully used to learn the structure of undirected graphical models. In this paper, we apply this technique to learn the structure of directed graphical models. Specifically, we make three contributions. First, we show how the decomposability ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Sparsity-promoting L1-regularization has recently been succesfully used to learn the structure of undirected graphical models. In this paper, we apply this technique to learn the structure of directed graphical models. Specifically, we make three contributions. First, we show how the decomposability of the MDL score, plus the ability to quickly compute entire regularization paths, allows us to efficiently pick the optimal regularization parameter on a per-node basis. Second, we show how to use L1 variable selection to select the Markov blanket, before a DAG search stage. Finally, we show how L1 variable selection can be used inside of an order search algorithm. The effectiveness of these L1-based approaches are compared to current state of the art methods on 10 datasets.
Efficient markov network structure discovery using independence tests
- In Proc SIAM Data Mining
, 2006
"... We present two algorithms for learning the structure of a Markov network from discrete data: GSMN and GSIMN. Both algorithms use statistical conditional independence tests on data to infer the structure by successively constraining the set of structures consistent with the results of these tests. GS ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
We present two algorithms for learning the structure of a Markov network from discrete data: GSMN and GSIMN. Both algorithms use statistical conditional independence tests on data to infer the structure by successively constraining the set of structures consistent with the results of these tests. GSMN is a natural adaptation of the Grow-Shrink algorithm of Margaritis and Thrun for learning the structure of Bayesian networks. GSIMN extends GSMN by additionally exploiting Pearl’s well-known properties of conditional independence relations to infer novel independencies from known independencies, thus avoiding the need to perform these tests. Experiments on artificial and real data sets show GSIMN can yield savings of up to 70 % with respect to GSMN, while generating a Markov network with comparable or in several cases considerably improved quality. In addition
Modelling Activity Global Temporal Dependencies using Time Delayed Probabilistic Graphical Model
"... We present a novel approach for detecting global behaviour anomalies in multiple disjoint cameras by learning time delayed dependencies between activities cross camera views. Specifically, we propose to model multi-camera activities using a Time Delayed Probabilistic Graphical Model (TD-PGM) with di ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
We present a novel approach for detecting global behaviour anomalies in multiple disjoint cameras by learning time delayed dependencies between activities cross camera views. Specifically, we propose to model multi-camera activities using a Time Delayed Probabilistic Graphical Model (TD-PGM) with different nodes representing activities in different semantically decomposed regions from different camera views, and the directed links between nodes encoding causal relationships between the activities. A novel two-stage structure learning algorithm is formulated to learn globally optimised time-delayed dependencies. A new cumulative abnormality score is also introduced to replace the conventional log-likelihood score for gaining significantly more robust and reliable real-time anomaly detection. The effectiveness of the proposed approach is validated using a camera network installed at a busy underground station. 1.
Structure Learning of Bayesian Networks using Constraints
"... This paper addresses exact learning of Bayesian network structure from data and expert’s knowledge based on score functions that are decomposable. First, it describes useful properties that strongly reduce the time and memory costs of many known methods such as hill-climbing, dynamic programming and ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This paper addresses exact learning of Bayesian network structure from data and expert’s knowledge based on score functions that are decomposable. First, it describes useful properties that strongly reduce the time and memory costs of many known methods such as hill-climbing, dynamic programming and sampling variable orderings. Secondly, a branch and bound algorithm is presented that integrates parameter and structural constraints with data in a way to guarantee global optimality with respect to the score function. It is an any-time procedure because, if stopped, it provides the best current solution and an estimation about how far it is from the global solution. We show empirically the advantages of the properties and the constraints, and the applicability of the algorithm to large data sets (up to one hundred variables) that cannot be handled by other current methods (limited to around 30 variables). 1.
Bayesian structure learning using dynamic programming and MCMC
- In UAI, 2007b
"... We show how to significantly speed up MCMC sampling of DAG structures by using a powerful non-local proposal based on Koivisto’s dynamic programming (DP) algorithm (11; 10), which computes the exact marginal posterior edge probabilities by analytically summing over orders. Furthermore, we show how s ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
We show how to significantly speed up MCMC sampling of DAG structures by using a powerful non-local proposal based on Koivisto’s dynamic programming (DP) algorithm (11; 10), which computes the exact marginal posterior edge probabilities by analytically summing over orders. Furthermore, we show how sampling in DAG space can avoid subtle biases that are introduced by approaches that work only with orders, such as Koivisto’s DP algorithm and MCMC order samplers (6; 5). 1
Convergence of Estimation of Distribution Algorithms for Finite Samples
"... Estimation of Distribution Algorithms (EDA) have been proposed as an extension of genetic algorithms. Our algorithm FDA assumes that the function to be optimized is additively decomposed (ADF). The interaction graph GADF is used to create exact or approximate factorizations of the Boltzmann distribu ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Estimation of Distribution Algorithms (EDA) have been proposed as an extension of genetic algorithms. Our algorithm FDA assumes that the function to be optimized is additively decomposed (ADF). The interaction graph GADF is used to create exact or approximate factorizations of the Boltzmann distribution. Using Gibbs sampling instead of probabilistic logic sampling is investigated. We also discuss the algorithm LFDA which learns a Bayesian network from data. For both algorithms estimates of the necessary sample size N to find the optimum are derived. The bounds are based on statistical learning theory and PAC learning. If the assumptions of a factorization theorem are fulfilled, the upper bound of the sample size N of FDA is of order O(n ln n) where n is the size of the problem. The computational complexity per generation is O(N ∗ n). For LFDA a bound cannot be proven because the network learned might be far from optimal. In many applications the optimal network is not necessary for converge to the global optima. For the 2D Ising model only 60 % of the edges of GADF need to be contained in the learned graph. Bounds can be obtained for two new learning methods. The first one learns factor graphs instead of Bayesian networks, the second one detects the structure of the function by computing its Walsh or Fourier coefficients. The computational complexity to compute the Walsh coefficients is O(n 2 ln n). The networks computed by FDA and LFDA are analyzed for a set of benchmark functions.
2010): “Penalized Likelihood Methods for Estimation of sparse high dimensional directed acyclic graphs,” Biometrika (to appear
"... Directed acyclic graphs (DAGs) are commonly used to represent causal relationships among random variables in graphical models. Applications of these models arise in the study of physical, as well as biological systems, where directed edges between nodes represent the influence of components of the s ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Directed acyclic graphs (DAGs) are commonly used to represent causal relationships among random variables in graphical models. Applications of these models arise in the study of physical, as well as biological systems, where directed edges between nodes represent the influence of components of the system on each other. The general problem of estimating DAGs from observed data is computationally NP-hard, Moreover two directed graphs may be observationally equivalent. When the nodes exhibit a natural ordering, the problem of estimating directed graphs reduces to the problem of estimating the structure of the network. In this paper, we propose a penalized likelihood approach that directly estimates the adjacency matrix of DAGs. Both lasso and adaptive lasso penalties are considered and an efficient algorithm is proposed for estimation of high dimensional DAGs. We study variable selection consistency of the two penalties when the number of variables grows to infinity with the sample size. We show that although lasso can only consistently estimate the true network under stringent assumptions, adaptive lasso achieves this task under mild regularity conditions. The performance of the proposed methods are compared to alternative methods in simulated, as well as real, data examples. 1
Improved search for structure learning of large bayesian networks
"... The problem of Bayesian network structure learning is defined as an optimization problem over the space of all possible network structures. For low-dimensional data, optimal structure learning approaches exist. For high-dimensional data, structure learning remains a significant challenge. Most commo ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The problem of Bayesian network structure learning is defined as an optimization problem over the space of all possible network structures. For low-dimensional data, optimal structure learning approaches exist. For high-dimensional data, structure learning remains a significant challenge. Most commonly, approaches to high-dimensional structure learning employ a reduced search space and apply hill climbing methods to find high-scoring network structures. But even the reduced search space contains many local optima so that local search methods are unable to find near-optimal network structures. Instead of focusing on search space reduction, as most of the previous work in this area, we propose to replace the greedy search schemes with more effective search methods. We show that for high-dimensional data the proposed search method finds significantly better structures than other leading approaches to structure learning. 1
A Statistical Implicative Analysis Based Algorithm and MMPC Algorithm for Detecting Multiple Dependencies
"... Discovering the dependencies among the variables of a domain from examples is an important problem in optimization. Many methods have been proposed for this purpose, but few large-scale evaluations were conducted. Most of these methods are based on measurements of conditional probability. The statis ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Discovering the dependencies among the variables of a domain from examples is an important problem in optimization. Many methods have been proposed for this purpose, but few large-scale evaluations were conducted. Most of these methods are based on measurements of conditional probability. The statistical implicative analysis offers another perspective of dependencies. It is important to compare the results obtained using this approach with one of the best methods currently available for this task: the MMPC heuristic. As the SIA is not used directly to address this problem, we designed an extension of it for our purpose. We conducted a large number of experiments by varying parameters such as the number of dependencies, the number of variables involved or the type of their distribution to compare the two approaches. The results show strong complementarities of the two methods. Keywords: Statistical Implicative Analysis, multiple dependencies, Bayesian network.

