## On Improving the Efficiency of the Iterative Proportional Fitting Procedure

### BibTeX

@MISC{_onimproving,

author = {},

title = {On Improving the Efficiency of the Iterative Proportional Fitting Procedure},

year = {}

}

### OpenURL

### Abstract

Iterative proportional fitting (IPF) on junction trees is an important tool for learning in graphical models. We identify the propagation and IPF updates on the junction tree as fixed point equations of a single constrained entropy maximization problem. This allows a more efficient message updating protocol than the well known effective IPF of Jirouˇsek and Pˇreučil (1995). When the junction tree has an intractably large maximum clique size we propose to maximize an approximate constrained entropy based on region graphs (Yedidia et al., 2002). To maximize the new objective we propose a “loopy” version of IPF. We show that this yields accurate estimates of the weights of undirected graphical models in a simple experiment. 1

### Citations

914 |
An Introduction to Bayesian Networks
- Jensen
- 1996
(Show Context)
Citation Context ... a junction tree and computes the posterior probability over the cliques of the junction tree using local propagation rules. Two out of many well known schemes for this purpose are Hugin propagation (=-=Jensen, 1996-=-) and Shafer-Shenoy propagation (Shafer & Shenoy, 1990). Junction trees are also indispensable for learning graphical models from data through the iterative proportional fitting (IPF) procedure, other... |

551 | Inducing features of random fields
- Pietra, S, et al.
- 1997
(Show Context)
Citation Context ...6) is equivalent to the following primal update: P(x) ← P(x) �pα(xα) P(xα) (7) The maximum entropy framework is intimately related to maximum likelihood learning of undirected graphical models (Della =-=Pietra et al., 1997-=-). Let the clusters of the graphical model be given by A. The distribution expressed by the graphical model has the form P(x) = 1 exp (�α Z λα(xα)) (8) where λα(xα) are the parameters of the model and... |

108 |
Probability propagation
- Shafer, Shenoy
- 1990
(Show Context)
Citation Context ...robability over the cliques of the junction tree using local propagation rules. Two out of many well known schemes for this purpose are Hugin propagation (Jensen, 1996) and Shafer-Shenoy propagation (=-=Shafer & Shenoy, 1990-=-). Junction trees are also indispensable for learning graphical models from data through the iterative proportional fitting (IPF) procedure, otherwise known as iterative scaling (Jirouˇsek & Pˇreučil,... |

45 | Thin junction trees
- Bach, Jordan
- 2002
(Show Context)
Citation Context ...tunately not true if the cliques do not form a tree (see section 5). 4 Equivalently, we can use a CollectEvidence phase before each iterative scaling update to compute the required marginal Pc1(xc1) (=-=Bach & Jordan, 2002-=-). When the cliques are relatively small, the junction tree representation of P(x) is much more efficient than a straight probability table. Let M = maxc∈C |Xc| ≪ |X|. Each iterative scaling update is... |

42 | The unified propagation and scaling algorithm
- Teh, Welling
(Show Context)
Citation Context ...en minimum divergence problems and inference. If the marginal constraints put all the probability mass on a single state, i.e. �pi(xi) = δxi,bxi for some �xi, then the two problems become equivalent (=-=Teh & Welling, 2002-=-). This implies that the generalized distributive law of Aji and McEliece (2001) and the generalized belief propagation algorithms are in fact special cases of loopy iterative scaling on junction grap... |

41 | The generalized distributive law and free energy minimization - Aji, Mceliece |

14 | On the effective implementation of the iterative proportional fitting procedure. Computational Statistics and Data Analysis 19, 177–189. Padhraic - Jiˇrousek, Pˇreučil - 1995 |

6 |
Accumulator networks: Suitors of local probability propagation
- Frey, Kannan
- 2000
(Show Context)
Citation Context ...d, a standard method to train them is the EM algorithm. When the posterior distribution is intractable, a number of researchers have looked at approximating the E steps with loopy belief propagation (=-=Frey & Kannan, 2001-=-). Because there is no global cost function which both the E and M steps are minimizing, we cannot make any statements on the accuracy or convergence properties of such algorithms. An exciting researc... |

4 |
On a least square adjustment of a samplied frequency table when the expected marginal totals are known
- Deming, Stephan
- 1940
(Show Context)
Citation Context ...coordinatewise descent in λα(xα). This is the classical iterative 1 The extension to being given feature expectations b fα = 〈fα(x)〉 is straight-forward and described in section 7. scaling algorithm (=-=Deming & Stephan, 1940-=-), given by the following updates: λα(xα) ← λα(xα) + log �pα(xα) P(xα) (6) where P(x) is given by (3,4). In terms of the primal variables P(x), we can understand each update of (6) as setting the marg... |

2 |
Conditional random fields: Propabilistic models for segmenting and labeling sequence data
- Lafferty, McCallum, et al.
- 2001
(Show Context)
Citation Context ...er a training set. The algorithms discussed in this paper therefore open the way for more efficient training of maximum entropy models (Della Pietra et al., 1997), conditional maximum entropy models (=-=Lafferty et al., 2001-=-) and thin junction trees (Bach & Jordan, 2002). There is an interesting link between minimum divergence problems and inference. If the marginal constraints put all the probability mass on a single st... |