Results 1 -
4 of
4
Predictive Discretization during Model Selection
"... We present an approach to discretizing multivariate continuous data while learning the structure of a graphical model. We derive the joint scoring function from the principle of predictive accuracy, which inherently ensures the optimal trade-off between goodness of fit and model complexity (includin ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We present an approach to discretizing multivariate continuous data while learning the structure of a graphical model. We derive the joint scoring function from the principle of predictive accuracy, which inherently ensures the optimal trade-off between goodness of fit and model complexity (including the number of discretization levels). Using the so-called finest grid implied by the data, our scoring function depends only on the number of data points in the various discretization levels. Not only can it be computed efficiently, but it is also invariant under monotonic transformations of the continuous space. Our experiments show that the discretization method can substantially impact the resulting graph structure. 1
Clustering by coexpression
, 2005
"... High-throughput assays can probe cells at a genome-wide scale. Very prominent: microarrays that measure mRNA transcript quantitites. Need to use probabilistic models, which account for • measurement noise, • variability in the biological system, and • aspects of the system not captured by ..."
Abstract
- Add to MetaCart
High-throughput assays can probe cells at a genome-wide scale. Very prominent: microarrays that measure mRNA transcript quantitites. Need to use probabilistic models, which account for • measurement noise, • variability in the biological system, and • aspects of the system not captured by
Ranking by Dependence—A Fair Criteria
"... Estimating the dependences between random variables, and ranking them accordingly, is a prevalent problem in machine learning. Pursuing frequentist and information-theoretic approaches, we first show that the p-value and the mutual information can fail even in simplistic situations. We then propose ..."
Abstract
- Add to MetaCart
Estimating the dependences between random variables, and ranking them accordingly, is a prevalent problem in machine learning. Pursuing frequentist and information-theoretic approaches, we first show that the p-value and the mutual information can fail even in simplistic situations. We then propose two conditions for regularizing an estimator of dependence, which leads to a simple yet effective new measure. We discuss its advantages and compare it to well-established modelselection criteria. Apart from that, we derive a simple constraint for regularizing parameter estimates in a graphical model. This results in an analytical approximation for the optimal value of the equivalent sample size, which agrees very well with the more involved Bayesian approach in our experiments. 1
Bagged Structure Learning of Bayesian Networks
"... We present a novel approach for density estimation using Bayesian networks when faced with scarce and partially observed data. Our approach relies on Efron’s bootstrap framework, and replaces the standard model selection score by a bootstrap aggregation objective aimed at sifting out bad decisions d ..."
Abstract
- Add to MetaCart
We present a novel approach for density estimation using Bayesian networks when faced with scarce and partially observed data. Our approach relies on Efron’s bootstrap framework, and replaces the standard model selection score by a bootstrap aggregation objective aimed at sifting out bad decisions during the learning procedure. Unlike previous bootstrap or MCMC based approaches that are only aimed at recovering specific structural features, we learn a concrete density model that can be used for probabilistic generalization. To make use of our objective when some of the data is missing, we propose a bagged structural EM procedure that does not incur the heavy computational cost typically associated with a bootstrap-based approach. We compare our bagged objective to the Bayesian score and the Bayesian information criterion (BIC), as well as other bootstrap-based model selection objectives, and demonstrate its effectiveness in improving generalization performance for varied real-life datasets. 1

