Results 11  20
of
135
Estimating highdimensional directed acyclic graphs with the PCalgorithm
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2007
"... We consider the PCalgorithm (Spirtes et al., 2000) for estimating the skeleton and equivalence class of a very highdimensional directed acyclic graph (DAG) with corresponding Gaussian distribution. The PCalgorithm is computationally feasible and often very fast for sparse problems with many nodes ..."
Abstract

Cited by 47 (4 self)
 Add to MetaCart
We consider the PCalgorithm (Spirtes et al., 2000) for estimating the skeleton and equivalence class of a very highdimensional directed acyclic graph (DAG) with corresponding Gaussian distribution. The PCalgorithm is computationally feasible and often very fast for sparse problems with many nodes (variables), and it has the attractive property to automatically achieve high computational efficiency as a function of sparseness of the true underlying DAG. We prove uniform consistency of the algorithm for very highdimensional, sparse DAGs where the number of nodes is allowed to quickly grow with sample size n, as fast as O(n a) for any 0 < a < ∞. The sparseness assumption is rather minimal requiring only that the neighborhoods in the DAG are of lower order than sample size n. We also demonstrate the PCalgorithm for simulated data.
An Alternative Markov Property for Chain Graphs
 Scand. J. Statist
, 1996
"... Graphical Markov models use graphs, either undirected, directed, or mixed, to represent possible dependences among statistical variables. Applications of undirected graphs (UDGs) include models for spatial dependence and image analysis, while acyclic directed graphs (ADGs), which are especially conv ..."
Abstract

Cited by 47 (4 self)
 Add to MetaCart
Graphical Markov models use graphs, either undirected, directed, or mixed, to represent possible dependences among statistical variables. Applications of undirected graphs (UDGs) include models for spatial dependence and image analysis, while acyclic directed graphs (ADGs), which are especially convenient for statistical analysis, arise in such fields as genetics and psychometrics and as models for expert systems and Bayesian belief networks. Lauritzen, Wermuth, and Frydenberg (LWF) introduced a Markov property for chain graphs, which are mixed graphs that can be used to represent simultaneously both causal and associative dependencies and which include both UDGs and ADGs as special cases. In this paper an alternative Markov property (AMP) for chain graphs is introduced, which in some ways is a more direct extension of the ADG Markov property than is the LWF property for chain graph. 1 INTRODUCTION Graphical Markov models use graphs, either undirected, directed, or mixed, to represent...
Learning Probabilistic Networks
 THE KNOWLEDGE ENGINEERING REVIEW
, 1998
"... A probabilistic network is a graphical model that encodes probabilistic relationships between variables of interest. Such a model records qualitative influences between variables in addition to the numerical parameters of the probability distribution. As such it provides an ideal form for combini ..."
Abstract

Cited by 36 (1 self)
 Add to MetaCart
A probabilistic network is a graphical model that encodes probabilistic relationships between variables of interest. Such a model records qualitative influences between variables in addition to the numerical parameters of the probability distribution. As such it provides an ideal form for combining prior knowledge, which might be limited solely to experience of the influences between some of the variables of interest, and data. In this paper, we first show how data can be used to revise initial estimates of the parameters of a model. We then progress to showing how the structure of the model can be revised as data is obtained. Techniques for learning with incomplete data are also covered.
Learning Bayesian Networks from Data: An Efficient Approach Based on Information Theory
, 1997
"... This paper addresses the problem of learning Bayesian network structures from data by using an information theoretic dependency analysis approach. Based on our threephase construction mechanism, two efficient algorithms have been developed. One of our algorithms deals with a special case where the ..."
Abstract

Cited by 35 (0 self)
 Add to MetaCart
This paper addresses the problem of learning Bayesian network structures from data by using an information theoretic dependency analysis approach. Based on our threephase construction mechanism, two efficient algorithms have been developed. One of our algorithms deals with a special case where the node ordering is given, the algorithm only require ) ( 2 N O CI tests and is correct given that the underlying model is DAGFaithful [Spirtes et. al., 1996]. The other algorithm deals with the general case and requires ) ( 4 N O conditional independence (CI) tests. It is correct given that the underlying model is monotone DAGFaithful (see Section 4.4). A system based on these algorithms has been developed and distributed through the Internet. The empirical results show that our approach is efficient and reliable. 1 Introduction The Bayesian network is a powerful knowledge representation and reasoning tool under conditions of uncertainty. A Bayesian network is a directed acyclic graph ...
Expanding From Discrete To Continuous Estimation Of Distribution Algorithms: The IDEA
 In Parallel Problem Solving From Nature  PPSN VI
, 2000
"... . The direct application of statistics to stochastic optimization based on iterated density estimation has become more important and present in evolutionary computation over the last few years. The estimation of densities over selected samples and the sampling from the resulting distributions, i ..."
Abstract

Cited by 30 (7 self)
 Add to MetaCart
. The direct application of statistics to stochastic optimization based on iterated density estimation has become more important and present in evolutionary computation over the last few years. The estimation of densities over selected samples and the sampling from the resulting distributions, is a combination of the recombination and mutation steps used in evolutionary algorithms. We introduce the framework named IDEA to formalize this notion. By combining continuous probability theory with techniques from existing algorithms, this framework allows us to dene new continuous evolutionary optimization algorithms. 1 Introduction Algorithms in evolutionary optimization guide their search through statistics based on a vector of samples, often called a population. By using this stochastic information, non{deterministic induction is performed in order to attempt to use the structure of the search space and thereby aid the search for the optimal solution. In order to perform induct...
A robust procedure for gaussian graphical model search from microarray data with p larger than n
 Journal of Machine Learning Research
, 2006
"... Learning of largescale networks of interactions from microarray data is an important and challenging problem in bioinformatics. A widely used approach is to assume that the available data constitute a random sample from a multivariate distribution belonging to a Gaussian graphical model. As a conse ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
Learning of largescale networks of interactions from microarray data is an important and challenging problem in bioinformatics. A widely used approach is to assume that the available data constitute a random sample from a multivariate distribution belonging to a Gaussian graphical model. As a consequence, the prime objects of inference are fullorder partial correlations which are partial correlations between two variables given the remaining ones. In the context of microarray data the number of variables exceed the sample size and this precludes the application of traditional structure learning procedures because a sampling version of fullorder partial correlations does not exist. In this paper we consider limitedorder partial correlations, these are partial correlations computed on marginal distributions of manageable size, and provide a set of rules that allow one to assess the usefulness of these quantities to derive the independence structure of the underlying Gaussian graphical model. Furthermore, we introduce a novel structure learning procedure based on a quantity, obtained from limitedorder partial correlations, that we call the nonrejection rate. The applicability and usefulness of the procedure are demonstrated by both simulated and real data.
Inference and Learning in Hybrid Bayesian Networks
, 1998
"... We survey the literature on methods for inference and learning in Bayesian Networks composed of discrete and continuous nodes, in which the continuous nodes have a multivariate Gaussian distribution, whose mean and variance depends on the values of the discrete nodes. We also briefly consider hybrid ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
We survey the literature on methods for inference and learning in Bayesian Networks composed of discrete and continuous nodes, in which the continuous nodes have a multivariate Gaussian distribution, whose mean and variance depends on the values of the discrete nodes. We also briefly consider hybrid Dynamic Bayesian Networks, an extension of switching Kalman filters. This report is meant to summarize what is known at a sufficient level of detail to enable someone to implement the algorithms, but without dwelling on formalities.
Remarks concerning graphical models for time series and point processes
 Revista de Econometria
, 1996
"... Uma rede estatística é uma cole,cão de nós representando variáveis aleatórias e um conjunto de arestas que ligam os nós. Um modelo estocástico por isso e chamado um modelo gráfico. Estes modelos, de gráficos e redes, sáo particularmente úteis para examinar as dependéncias estatísticas baseadas em co ..."
Abstract

Cited by 21 (3 self)
 Add to MetaCart
Uma rede estatística é uma cole,cão de nós representando variáveis aleatórias e um conjunto de arestas que ligam os nós. Um modelo estocástico por isso e chamado um modelo gráfico. Estes modelos, de gráficos e redes, sáo particularmente úteis para examinar as dependéncias estatísticas baseadas em condi,coes do tipo das que ocorrem frequentemente em economia e estatística. Neste artigo as variáveis aleatórias dos nós serão séries temporais ou processos pontuais. Os casos de gráfos direcionados e nãodirecionados são apresentados. A statistical network is a collection of nodes representing random variables and a set of edges that connect the nodes. A probabilistic model for such is called a graphical model. These models, graphs and networks are particularly useful for examining statistical dependencies based on conditioning as often occurs in economics and statistics. In this paper the nodal random variables will be time series or point proceses. The cases of undirected and directed graphs are focussed on.
Statistical strategies for avoiding false discoveries in metabolomics and related experiments
, 2006
"... Many metabolomics, and other highcontent or highthroughput, experiments are set up such that the primary aim is the discovery of biomarker metabolites that can discriminate, with a certain level of certainty, between nominally matched ‘case ’ and ‘control ’ samples. However, it is unfortunately ve ..."
Abstract

Cited by 20 (5 self)
 Add to MetaCart
Many metabolomics, and other highcontent or highthroughput, experiments are set up such that the primary aim is the discovery of biomarker metabolites that can discriminate, with a certain level of certainty, between nominally matched ‘case ’ and ‘control ’ samples. However, it is unfortunately very easy to find markers that are apparently persuasive but that are in fact entirely spurious, and there are wellknown examples in the proteomics literature. The main types of danger are not entirely independent of each other, but include bias, inadequate sample size (especially relative to the number of metabolite variables and to the required statistical power to prove that a biomarker is discriminant), excessive false discovery rate due to multiple hypothesis testing, inappropriate choice of particular numerical methods, and overfitting (generally caused by the failure to perform adequate validation and crossvalidation). Many studies fail to take these into account, and thereby fail to discover anything of true significance (despite their claims). We summarise these problems, and provide pointers to a substantial existing literature that should assist in the improved design and evaluation of metabolomics experiments, thereby allowing robust scientific conclusions to be drawn from the available data. We provide a list of some of the simpler checks that might improve one’s confidence that a candidate biomarker is not simply a statistical artefact, and suggest a series of preferred tests and visualisation tools that can assist readers and authors in assessing papers. These tools can be applied to individual metabolites by using multiple univariate tests performed in parallel across all metabolite peaks. They may also be applied to the validation of multivariate models. We stress in
MGraph: Graphical models for microarray data analysis
 Bioinformatics
, 2003
"... Summary:This paper introduces a MATLAB toolbox, MGraph, which applies graphical models as a natural environment to formulate and solve problems in microarray data analysis. MGraph with its graphical interface allows the user to predict genetic regulatory networks by a graphical gaussian model (GGM), ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
Summary:This paper introduces a MATLAB toolbox, MGraph, which applies graphical models as a natural environment to formulate and solve problems in microarray data analysis. MGraph with its graphical interface allows the user to predict genetic regulatory networks by a graphical gaussian model (GGM), and to quantify the effects of different experimental treatment conditions on gene expression profiles by a graphical loglinear model (GLM). The power of graphical models was explored and illustrated through two example applications. First, four MAPK pathways in yeast were meaningfully reconstructed through GGM. Second, GLM was used to quantify the contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster.This application may provide a valuable aid in the prediction of genetic regulatory networks, as well as in investigations of various experimental conditions that affect global gene expression profiles.