## Thin Junction Trees (2001)

### Cached

### Download Links

- [www-2.cs.cmu.edu]
- [www.cs.berkeley.edu]
- [www.di.ens.fr]
- DBLP

### Other Repositories/Bibliography

Venue: | Advances in Neural Information Processing Systems 14 |

Citations: | 45 - 1 self |

### BibTeX

@INPROCEEDINGS{Bach01thinjunction,

author = {Francis R. Bach and Michael I. Jordan},

title = {Thin Junction Trees},

booktitle = {Advances in Neural Information Processing Systems 14},

year = {2001},

pages = {569--576},

publisher = {MIT Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

We present an algorithm that induces a class of models with thin junction trees---models that are characterized by an upper bound on the size of the maximal cliques of their triangulated graph. By ensuring that the junction tree is thin, inference in our models remains tractable throughout the learning process. This allows both an efficient implementation of an iterative scaling parameter estimation algorithm and also ensures that inference can be performed efficiently with the final model. We illustrate the approach with applications in handwritten digit recognition and DNA splice site detection.

### Citations

903 | Learning Bayesian networks: The combination of knowledge and statistical data
- Heckerman, Geiger, et al.
- 1995
(Show Context)
Citation Context ...lecting models of higher-order dependencies in data, either within the maximum entropy setting---in which features are selected [9, 16]---and the graphical model setting---in which edges are selected =-=[8]-=-. Simplicity also plays an important role in the design of these algorithms; in particular, greedy methods that add or subtract a single feature or edge at a time are generally employed. The model tha... |

637 | Approximating discrete probability distributions with dependence trees
- Chow, Liu
- 1968
(Show Context)
Citation Context ...computationally and statistically in large-scale problems. In the current paper we describe a methodology that can be viewed as a generalization of the Chow-Liu algorithm for constructing tree models =-=[2]-=-. Note that tree models have the property that their junction trees have no more than two nodes in any clique---the treewidth of tree models is one. In our generalization, we allow the treewidth to be... |

624 |
Probabilistic networks and expert systems
- Cowell, Dawid, et al.
- 1999
(Show Context)
Citation Context ...ence distribution q 0 is also decomposable in this graph. We assume without loss of generality that the graph is connected. For each possible triangulation of the graph, we can define a junction tree =-=[4-=-], where for all k there exists a maximal clique containing T k . The complexity of exact inference depends on the size of the maximal clique of the triangulated graph. We define the treewidth of our... |

553 | Inducing features of random fields
- Pietra, Pietra, et al.
- 1997
(Show Context)
Citation Context ...n be a serious liability. A number of methods have been developed for selecting models of higher-order dependencies in data, either within the maximum entropy setting---in which features are selected =-=[9, 16]-=----and the graphical model setting---in which edges are selected [8]. Simplicity also plays an important role in the design of these algorithms; in particular, greedy methods that add or subtract a si... |

431 |
Generalized iterative scaling for log-linear models
- Darroch, Ratcliff
- 1972
(Show Context)
Citation Context ...nal Fitting (IPF), it updates the parameters i sequentially [5]. Algorithms that update the parameters in parallel have also been proposed; in particular the Generalized Iterative Scaling algorithm [=-=6]-=-, which imposes the constraint that the features sum to one, and the Improved Iterative Scaling algorithm [9], which removes this constraint. These algorithms have an important advantage in our settin... |

257 |
I-divergence geometry of probability distributions and minimization problems. The Annals of Probability 3(1
- Csiszár
- 1975
(Show Context)
Citation Context ...ectation constraints is a well studied problem; the basic technique is known as Iterative Scaling. A generalization of Iterative Proportional Fitting (IPF), it updates the parameters i sequentially [=-=5]-=-. Algorithms that update the parameters in parallel have also been proposed; in particular the Generalized Iterative Scaling algorithm [6], which imposes the constraint that the features sum to one, a... |

252 | SVMTorch: Support Vector Machines for Large-Scale Regression Problems - Collobert, Bengio - 2001 |

192 | Minimax entropy principle and its application to texture modeling
- Zhu, Wu, et al.
- 1997
(Show Context)
Citation Context ...n be a serious liability. A number of methods have been developed for selecting models of higher-order dependencies in data, either within the maximum entropy setting---in which features are selected =-=[9, 16]-=----and the graphical model setting---in which edges are selected [8]. Simplicity also plays an important role in the design of these algorithms; in particular, greedy methods that add or subtract a si... |

187 |
A linear-time algorithm for finding treedecompositions of small treewidth
- Bodlaender
- 1996
(Show Context)
Citation Context ...n, algorithms exist that determine in time linear in the number of nodes whether a graph has a treewidth smaller than , and if so output a triangulation in which all cliques are of size less than [1=-=-=-]. These algorithms are super-exponential in , however, and thus are applicable only to problems with small treewidths. In practice we have had success using fast heuristic triangulation methods [11]... |

136 | Training invariant support vector machines
- DeCoste, Schölkopf
(Show Context)
Citation Context ...in Figure 3 shows the error rate on the testing set as a function of the percentage of unknown pixels, for our models and for a SVM. In the case of the SVM, we used a polynomial kernel of degree four =-=[7]-=- and we tried various heuristics to fill in the value of the non-observed pixels, such as the average of that pixel over the training set or the value of a blank pixel. Best classification performance... |

110 | Learning with mixtures of trees
- Meila, Jordan, et al.
(Show Context)
Citation Context ...error rate of 4:1%. This is better than the best reported results in the literature; in particular, neural networks have an error rate of 5:5% and the Chow and Liu algorithm has an error rate of 4:4% =-=[14]-=-. 5 Conclusions We have described a methodology for feature selection, edge selection and parameter estimation that can be viewed as a generalization of the Chow-Liu algorithm. Drawing on the feature ... |

41 | Maximum Likelihood Bounded Tree-width Markov Networks
- Srebro
- 2001
(Show Context)
Citation Context ...n the fitted model and the generating model vs the number of available training examples (in thousands). We should not expect to be able to find an exact edge-selection method---recent work by Srebro =-=[15]-=- has shown that the related problem of finding the maximum likelihood graphical model with bounded treewidth is NP-hard. 4 Empirical results 4.1 Small graphs with known generative model In this experi... |

30 | Recognizing hand-written digits using hierarchical products of experts
- Mayraz, Hinton
(Show Context)
Citation Context ...er case. The classification error rates were as follows: LeNet 0.7, SVM 0.8, Product of experts, 2.0, TJT-SVM 3.8, TJT-Softmax 4.2, TJT-ML 5.3, Chow-Liu 8.5, and Linear classifier 12.0. (See [12] and =-=[13]-=- for further details on the non-TJT models). It is important to emphasize that our models are tractable for full joint inference; indeed, the junction trees have a maximal clique size of 10 in the lar... |

24 |
Triangulation of graphs-algorithms giving small total state space
- Kjaerulff
- 1990
(Show Context)
Citation Context ... [1]. These algorithms are super-exponential in , however, and thus are applicable only to problems with small treewidths. In practice we have had success using fast heuristic triangulation methods [=-=11-=-] that allow us to guarantee the existence of a junction tree with a maximal clique no larger than for a given model. (This is a conservative technique that may occasionally throw out models that in ... |

20 | On the effective implementation of the iterative proportional fitting procedure. Comput Stat Data Anal 19:177–189 - Jiroušek, Pˇreušil - 1995 |