## Learning Associative Markov Networks (2004)

### Cached

### Download Links

Venue: | Proc. ICML |

Citations: | 78 - 9 self |

### BibTeX

@INPROCEEDINGS{Taskar04learningassociative,

author = {Ben Taskar and Vassil Chatalbashev and Daphne Koller},

title = {Learning Associative Markov Networks},

booktitle = {Proc. ICML},

year = {2004},

pages = {102},

publisher = {ACM Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

Markov networks are extensively used to model complex sequential, spatial, and relational interactions in fields as diverse as image processing, natural language analysis, and bioinformatics.

### Citations

2506 | Conditional random fields: probabilistic modeling for segmenting and labeling sequence data
- Lafferty, McCallum, et al.
- 2001
(Show Context)
Citation Context ...t improvements in classification accuracy over generative training. For example, Markov networks can be trained to maximize the conditional likelihood of the labels given the features of the objects (=-=Lafferty et al., 2001-=-; Taskar et al., 2002). Recently, maximum margin-based training has been shown to additionally boost accuracy over conditional likelihood methods and allow a seamless integration of kernel methods wit... |

1485 | Fast approximate energy minimization via graph cuts
- BOYKOV, VEKSLER, et al.
- 2001
(Show Context)
Citation Context ...t interact are often highly correlated (Vazquez et al., 2003). In image processing, neighboring pixels exhibit local label coherence in denoising, segmentation and stereo correspondence (Besag, 1986; =-=Boykov et al., 1999-=-a). Markov networks compactly represent complex joint distributions of the label variables by modeling their local interactions. Such models are encoded by a graph, whose nodes represent the different... |

993 | On the statistical analysis of dirty pictures
- Besag
- 1986
(Show Context)
Citation Context ... proteins that interact are often highly correlated (Vazquez et al., 2003). In image processing, neighboring pixels exhibit local label coherence in denoising, segmentation and stereo correspondence (=-=Besag, 1986-=-; Boykov et al., 1999a). Markov networks compactly represent complex joint distributions of the label variables by modeling their local interactions. Such models are encoded by a graph, whose nodes re... |

461 | Max-margin markov networks - Taskar, Guestrin, et al. - 2003 |

406 | Enhanced hypertext categorization using hyperlinks
- Chakrabarti, Dom, et al.
- 1998
(Show Context)
Citation Context ...however, involve sets of related objects whose labels must also be consistent with each other. In hypertext or bibliographic classification, labels of linked and cocited documents tend to be similar (=-=Chakrabarti et al., 1998-=-; Taskar et al., 2002). In proteomic analysis, loAppearing in Proceedings of the 21 st International Conference on Machine Learning, Banff, Canada, 2004. Copyright 2004 by the first author. cation and... |

362 | Learning to Extract Symbolic Knowledge from the World Wide Web
- Craven, DiPasquo, et al.
- 1998
(Show Context)
Citation Context ...MN allow us to correct for some of the distribution drift between the training and test sets. Hypertext. We tested AMNs on collective hypertext classification, using the variant of the WebKB dataset (=-=Craven et al., 1998-=-) used by Taskar et al. (2002). This data set contains web pages from four di#erent Computer Science departments: Cornell, Texas, Washington, and Wisconsin. Each page is labeled as one of course, facu... |

360 | Discriminative probabilistic models for relational data, in
- Taskar, Abbeel, et al.
- 2002
(Show Context)
Citation Context ...elated objects whose labels must also be consistent with each other. In hypertext or bibliographic classification, labels of linked and cocited documents tend to be similar (Chakrabarti et al., 1998; =-=Taskar et al., 2002-=-). In proteomic analysis, loAppearing in Proceedings of the 21 st International Conference on Machine Learning, Banff, Canada, 2004. Copyright 2004 by the first author. cation and function of proteins... |

346 | Exact Maximum A Posteriori Estimation for Binary Images - Greig, Porteous, et al. - 1989 |

254 | A multiscale random-field model for Bayesian image segmentation
- Bouman, Shapiro
- 1994
(Show Context)
Citation Context ...topology networks (Besag, 1986). One can address the tractability issue by limiting the structure of the underlying network. In some cases, such as the the quadtree model used for image segmentation (=-=Bouman & Shapiro, 1994-=-), a tractable structure is determined in advance. In other cases (e.g., (Bach & Jordan, 2001)),sthe network structure is learned, subject to the constraint that inference on these networks is tractab... |

178 | Markov Random Fields with Efficient Approximations
- Boykov, Veksler, et al.
- 1998
(Show Context)
Citation Context ...t interact are often highly correlated (Vazquez et al., 2003). In image processing, neighboring pixels exhibit local label coherence in denoising, segmentation and stereo correspondence (Besag, 1986; =-=Boykov et al., 1999-=-a). Markov networks compactly represent complex joint distributions of the label variables by modeling their local interactions. Such models are encoded by a graph, whose nodes represent the different... |

165 | Approximation algorithms for classification problems with pairwise relationships: Metric labeling and Markov Random Fields
- Kleinberg, Tardos
- 1999
(Show Context)
Citation Context ... models in polynomial time. For K > 2, the MAP problem is NP-hard, but a procedure based on a relaxed linear program guarantees a factor 2 approximation of the optimal solution (Boykov et al., 1999b; =-=Kleinberg & Tardos, 1999-=-). Kleinberg and Tardos (1999) extend the multi-class Potts model to have more general edge potentials, under the constraints that negative log potentials − log φij(k, l) form a metric on the set of l... |

142 |
Some generalized order-disorder transformation
- Potts
- 1952
(Show Context)
Citation Context ... in the context of image processing, where nearby pixels are likely to have the same label (Besag, 1986; Boykov et al., 1999b). In this setting, a common approach is to use a generalized Potts model (=-=Potts, 1952-=-), which penalizes assignments that do not have the same label across the edge: # ij (k, l) = # ij , #k #= l and # ij (k, k) = 1, where # ij # 1. For binary-valued Potts models, Greig et al. (1989) sh... |

111 | MAP estimation via agreement on (hyper)trees: Message-passing and linear-programming approaches - Wainwright, Jaakkola, et al. - 2005 |

93 |
Global protein function prediction from protein–protein interaction networks
- Vazquez, Flammini, et al.
- 2003
(Show Context)
Citation Context ...in Proceedings of the 21 st International Conference on Machine Learning, Ban#, Canada, 2004. Copyright 2004 by the authors. cation and function of proteins that interact are often highly correlated (=-=Vazquez et al., 2003-=-). In image processing, neighboring pixels exhibit local label coherence in denoising, segmentation and stereo correspondence (Besag, 1986; Boykov et al., 1999a). Markov networks compactly represent c... |

45 | Thin junction trees
- Bach, Jordan
- 2001
(Show Context)
Citation Context ...e of the underlying network. In some cases, such as the the quadtree model used for image segmentation (Bouman & Shapiro, 1994), a tractable structure is determined in advance. In other cases (e.g., (=-=Bach & Jordan, 2001-=-)),sthe network structure is learned, subject to the constraint that inference on these networks is tractable. In many cases, however, the topology of the Markov network does not allow tractable infer... |

13 | Learning on the test data: Leveraging ‘unseen’ features
- Taskar, Wong, et al.
- 2003
(Show Context)
Citation Context ...02). Recently, maximum margin-based training has been shown to additionally boost accuracy over conditional likelihood methods and allow a seamless integration of kernel methods with Markov networks (=-=Taskar et al., 2003-=-a). The chief computational bottleneck in this task is inference in the underlying network, which is a core subroutine for all methods for training Markov networks. Probabilistic inference is NP-hard ... |

7 | Markov random fields with e#cient approximations - Boykov, Veksler, et al. - 1998 |

6 |
Introduction to linear programming , Athena Scientific
- Bertsimas, Tsitsiklis
- 1997
(Show Context)
Citation Context ... to replace the MAP integer program within the QP with a linear program, the resulting QP does not appear tractable. However, here we can exploit fundamental properties of linear programming duality (=-=Bertsimas & Tsitsiklis, 1997-=-). Assume that our relaxed LP for the inference task has the form: max y wBy s.t. y ≥ 0, Ay ≤ b. (5) for some polynomial-size A, B, b. (For example, Eq. (1) and Eq. (2) can be easily written in this c... |

5 | What energy functions can be minimized using graph cuts - Kolmogorov, Zabih - 2002 |

1 |
Generalized belief propagation. NIPS. A. Binary AMNs Proof (For Theorem 3.1) Consider any fractional, feasible y. We show that we can construct a new feasible assignment z which increases the objective (or leaves it unchanged) and furthermore has fewer fr
- Yedidia, Freeman, et al.
- 2000
(Show Context)
Citation Context ... trained the same AMN model using the RMN approach of Taskar et al. (2002). In this approach, the Markov network is trained to maximize the conditional log-likelihood, using loopy belief propagation (=-=Yedidia et al., 2000-=-) for computing the posterior probabilities needed for optimization. Due to the high connectivity in the network, the algorithm is not exact, and not guaranteed to converge to the true values for the ... |

1 | AMNs For K > 2, we use the randomized rounding procedure of Kleinberg and Tardos - Multi-class - 1999 |