## Predictive Discretization during Model Selection

### Cached

### Download Links

- [www.ai.mit.edu]
- [www.ai.mit.edu]
- [people.csail.mit.edu]
- [jmlr.csail.mit.edu]
- [www.csail.mit.edu]
- [people.csail.mit.edu]
- [jmlr.org]
- [ml-pub.inf.ethz.ch]
- DBLP

### Other Repositories/Bibliography

Citations: | 1 - 1 self |

### BibTeX

@MISC{Steck_predictivediscretization,

author = {Harald Steck},

title = {Predictive Discretization during Model Selection},

year = {}

}

### OpenURL

### Abstract

We present an approach to discretizing multivariate continuous data while learning the structure of a graphical model. We derive the joint scoring function from the principle of predictive accuracy, which inherently ensures the optimal trade-off between goodness of fit and model complexity (including the number of discretization levels). Using the so-called finest grid implied by the data, our scoring function depends only on the number of data points in the various discretization levels. Not only can it be computed efficiently, but it is also invariant under monotonic transformations of the continuous space. Our experiments show that the discretization method can substantially impact the resulting graph structure. 1

### Citations

1079 | Bayesian method for the induction of probabilistic networks from data - Cooper, Herskovits - 1992 |

905 | Learning Bayesian networks: the combination of knowledge and statistical - Heckerman, Geiger, et al. - 1995 |

741 | Using Bayesian networks to analyze expression data
- Friedman, Nachman, et al.
- 2000
(Show Context)
Citation Context ...owledge suggests that the underlying variables are indeed discrete. While it is computationally efficient to discretize the data in a preprocessing step that is independent of the subsequent analysis =-=[6, 10, 7]-=-, the impact of the discretization policy on the subsequent analysis is often unclear in this approach. Existing methods that optimize the discretization policy jointly with the graph structure [3, 9]... |

249 |
Stochastic complexity and modeling
- Rissanen
- 1986
(Show Context)
Citation Context ...iscretization policy and the model structure. The objective relies on predictive accuracy, where predictive accuracy is assessed sequentially as in prequential validation [2] or stochastic complexity =-=[12]-=-. 2 Sequential Approach Let Y = (Y1, ..., Yk, ..., Yn) denote a vector of n continuous variables in the domain of interest, and y any specific instantiation of these variables. The discretization of Y... |

171 | Statistical theory: the prequential approach - Dawid - 1984 |

159 | Inferring subnetworks from perturbed expression profiles. Bioinformatics
- Pe’er, Regev, et al.
- 2001
(Show Context)
Citation Context ...ned with gene expression data. In computational biology, regulatory networks are often modeled by Bayesian networks, and their structures are learned from discretized gene-expression data, see, e.g., =-=[6, 11, 7]-=-. Obviously, one would like to recover the ”true” network structure underlying the continuous data, rather than a degraded network structure due to a suboptimal discretization policy. Typically, the e... |

100 | Combining Location and Expression Data for Principled Discovery of Genetic Regulatory Networks
- Hartemink, Gifford
- 2002
(Show Context)
Citation Context ...owledge suggests that the underlying variables are indeed discrete. While it is computationally efficient to discretize the data in a preprocessing step that is independent of the subsequent analysis =-=[6, 10, 7]-=-, the impact of the discretization policy on the subsequent analysis is often unclear in this approach. Existing methods that optimize the discretization policy jointly with the graph structure [3, 9]... |

64 | Discretizing Continuous Attributes While Learning Bayesian Networks - Friedman, Goldszmidt - 1996 |

48 | Data analysis with Bayesian networks: A bootstrap approach
- Friedman, Goldszmidt, et al.
- 1999
(Show Context)
Citation Context ...ainty by means of Markov Chain Monte Carlo in the model space, we used a non-parametric re-sampling method, as the latter is independent of any model assumptions. While the bootstrap has been used in =-=[5, 4, 6, 11]-=-, we prefer the jackknife when learning the graph structure, i.e., conditional independences. The reason is that the bootstrap procedure can easily induce spurious dependencies when given a small data... |

22 | On the dirichlet prior and bayesian regularization
- Steck, Jaakkola
- 2002
(Show Context)
Citation Context ...(DΛ|m), which is part of our scoring function, contains a free parameter, namely the so-called scale-parameter α regarding the Dirichlet prior over the model parameters, e.g., cf. [8]. As outlined in =-=[13]-=-, itss6 Harald Steck and Tommi S. Jaakkola # discr. levels 10 8 6 4 2 0 10 100 1000 sample size 10000 Fig. 2. The number of discretization levels (mean and standard deviation, averaged over 10 samples... |

18 |
A Multivariate Discretization Method for Learning Bayesian Networks from Mixed Data
- Monti, GF
- 1998
(Show Context)
Citation Context ...10, 7], the impact of the discretization policy on the subsequent analysis is often unclear in this approach. Existing methods that optimize the discretization policy jointly with the graph structure =-=[3, 9]-=- are computationally very involved and therefore not directly suitable for large domains. We present a novel and more efficient scoring function for joint optimization of the discretization policy and... |

11 | A latent variable model for multivariate discretization
- Monti, GF
- 1999
(Show Context)
Citation Context ...owledge suggests that the underlying variables are indeed discrete. While it is computationally efficient to discretize the data in a preprocessing step that is independent of the subsequent analysis =-=[6, 10, 7]-=-, the impact of the discretization policy on the subsequent analysis is often unclear in this approach. Existing methods that optimize the discretization policy jointly with the graph structure [3, 9]... |

10 | Bias-corrected bootstrap and model uncertainty
- Steck, Jaakkola
- 2003
(Show Context)
Citation Context ... that the bootstrap procedure can easily induce spurious dependencies when given a small data set D; as a consequence, the resulting network structure can be considerably biased towards denser graphs =-=[14]-=-. The jackknife avoids this problem. We obtained very similar results using three different variants of the jackknife: delete-1, delete-30, and delete-64. Averaging over 320 delete-30 jackknife sub-sa... |

9 | On the application of the bootstrap for computing confidence measures on features of induced bayesian networks
- Friedman, Goldszmidt, et al.
- 1999
(Show Context)
Citation Context ...ainty by means of Markov Chain Monte Carlo in the model space, we used a non-parametric re-sampling method, as the latter is independent of any model assumptions. While the bootstrap has been used in =-=[5, 4, 6, 11]-=-, we prefer the jackknife when learning the graph structure, i.e., conditional independences. The reason is that the bootstrap procedure can easily induce spurious dependencies when given a small data... |

7 |
Semi-)predictive discretization during model selection
- Steck, Jaakkola
- 2003
(Show Context)
Citation Context ...d to the same discretized state x, cf. [9]. Assuming a uniform probability density is overly stringent and degrades the predictive accuracy; moreover, this might also give rise to ”empty states”, cf. =-=[15]-=-. In contrast, we require only independence of the variables Yk. 3 Finest Grid implied by the Data The finest grid implied by the data is a simple mapping between Y and X that retains the desired inde... |