• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Graphical models, exponential families, and variational inference (2008)

by Martin J. Wainwright, Michael I. Jordan
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 819
Next 10 →

Online Learning with Kernels

by Jyrki Kivinen, Alexander J. Smola, Robert C. Williamson , 2003
"... Kernel based algorithms such as support vector machines have achieved considerable success in various problems in the batch setting where all of the training data is available in advance. Support vector machines combine the so-called kernel trick with the large margin idea. There has been little u ..."
Abstract - Cited by 2831 (123 self) - Add to MetaCart
Kernel based algorithms such as support vector machines have achieved considerable success in various problems in the batch setting where all of the training data is available in advance. Support vector machines combine the so-called kernel trick with the large margin idea. There has been little use of these methods in an online setting suitable for real-time applications. In this paper we consider online learning in a Reproducing Kernel Hilbert Space. By considering classical stochastic gradient descent within a feature space, and the use of some straightforward tricks, we develop simple and computationally efficient algorithms for a wide range of problems such as classification, regression, and novelty detection. In addition to allowing the exploitation of the kernel trick in an online setting, we examine the value of large margins for classification in the online setting with a drifting target. We derive worst case loss bounds and moreover we show the convergence of the hypothesis to the minimiser of the regularised risk functional. We present some experimental results that support the theory as well as illustrating the power of the new algorithms for online novelty detection. In addition

Learning in graphical models

by Michael I. Jordan - STATISTICAL SCIENCE , 2004
"... Statistical applications in fields such as bioinformatics, information retrieval, speech processing, image processing and communications often involve large-scale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for ..."
Abstract - Cited by 806 (10 self) - Add to MetaCart
Statistical applications in fields such as bioinformatics, information retrieval, speech processing, image processing and communications often involve large-scale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for approaching these problems, and indeed many of the models developed by researchers in these applied fields are instances of the general graphical model formalism. We review some of the basic ideas underlying graphical models, including the algorithmic ideas that allow graphical models to be deployed in large-scale data analysis problems. We also present examples of graphical models in bioinformatics, error-control coding and language processing.

Constructing Free Energy Approximations and Generalized Belief Propagation Algorithms

by Jonathan S. Yedidia, William T. Freeman, Yair Weiss - IEEE Transactions on Information Theory , 2005
"... Important inference problems in statistical physics, computer vision, error-correcting coding theory, and artificial intelligence can all be reformulated as the computation of marginal probabilities on factor graphs. The belief propagation (BP) algorithm is an efficient way to solve these problems t ..."
Abstract - Cited by 585 (13 self) - Add to MetaCart
Important inference problems in statistical physics, computer vision, error-correcting coding theory, and artificial intelligence can all be reformulated as the computation of marginal probabilities on factor graphs. The belief propagation (BP) algorithm is an efficient way to solve these problems that is exact when the factor graph is a tree, but only approximate when the factor graph has cycles. We show that BP fixed points correspond to the stationary points of the Bethe approximation of the free energy for a factor graph. We explain how to obtain regionbased free energy approximations that improve the Bethe approximation, and corresponding generalized belief propagation (GBP) algorithms. We emphasize the conditions a free energy approximation must satisfy in order to be a “valid ” or “maxent-normal ” approximation. We describe the relationship between four different methods that can be used to generate valid approximations: the “Bethe method, ” the “junction graph method, ” the “cluster variation method, ” and the “region graph method.” Finally, we explain how to tell whether a region-based approximation, and its corresponding GBP algorithm, is likely to be accurate, and describe empirical results showing that GBP can significantly outperform BP.
(Show Context)

Citation Context

...osed new variational inference techniques, closely related to our region-based approximations, but differentiated by a requirement that the set of beliefs used must be marginals of some global belief =-=[43]. Th-=-ey call the set of beliefs realizable from a global belief the “marginal polytope.” B. Negative Entropies Because some of the terms in the Bethe entropy have a sign that is flipped from the normal...

Overview of the scalable video coding extension of the H.264/AVC standard

by Heiko Schwarz, Detlev Marpe, Thomas Wiegand - IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY IN CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY , 2007
"... With the introduction of the H.264/AVC video coding standard, significant improvements have recently been demonstrated in video compression capability. The Joint Video Team of the ITU-T VCEG and the ISO/IEC MPEG has now also standardized a Scalable Video Coding (SVC) extension of the H.264/AVC stand ..."
Abstract - Cited by 522 (6 self) - Add to MetaCart
With the introduction of the H.264/AVC video coding standard, significant improvements have recently been demonstrated in video compression capability. The Joint Video Team of the ITU-T VCEG and the ISO/IEC MPEG has now also standardized a Scalable Video Coding (SVC) extension of the H.264/AVC standard. SVC enables the transmission and decoding of partial bit streams to provide video services with lower temporal or spatial resolutions or reduced fidelity while retaining a reconstruction quality that is high relative to the rate of the partial bit streams. Hence, SVC provides functionalities such as graceful degradation in lossy transmission environments as well as bit rate, format, and power adaptation. These functionalities provide enhancements to transmission and storage applications. SVC has achieved significant improvements in coding efficiency with an increased degree of supported scalability relative to the scalable profiles of prior video coding standards. This paper provides an overview of the basic concepts for extending H.264/AVC towards SVC. Moreover, the basic tools for providing temporal, spatial, and quality scalability are described in detail and experimentally analyzed regarding their efficiency and complexity.

Clustering with Bregman Divergences

by Arindam Banerjee, Srujana Merugu, Inderjit Dhillon, Joydeep Ghosh - JOURNAL OF MACHINE LEARNING RESEARCH , 2005
"... A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergence ..."
Abstract - Cited by 443 (57 self) - Add to MetaCart
A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences. The proposed algorithms unify centroid-based parametric clustering approaches, such as classical kmeans and information-theoretic clustering, which arise by special choices of the Bregman divergence. The algorithms maintain the simplicity and scalability of the classical kmeans algorithm, while generalizing the basic idea to a very large class of clustering loss functions. There are two main contributions in this paper. First, we pose the hard clustering problem in terms of minimizing the loss in Bregman information, a quantity motivated by rate-distortion theory, and present an algorithm to minimize this loss. Secondly, we show an explicit bijection between Bregman divergences and exponential families. The bijection enables the development of an alternative interpretation of an ecient EM scheme for learning models involving mixtures of exponential distributions. This leads to a simple soft clustering algorithm for all Bregman divergences.
(Show Context)

Citation Context

...ch that 〈a, t(ω)〉 = c (a constant) ∀ω ∈ Ω, then this representation is said to be minimal. 7 For a minimal representation, there exists a unique probability density f(ω; θ) for every choice of θ ∈ Θ (=-=Wainwright and Jordan, 2003-=-). Fψ is called a full exponential family of order d in such a case. In addition, if the parameter space Θ is open, i.e., Θ = int(Θ), then Fψ is called a regular exponential family. It can be easily s...

Mixed membership stochastic block models for relational data with application to protein-protein interactions

by Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, Eric P. Xing, Tommi Jaakkola - In Proceedings of the International Biometrics Society Annual Meeting , 2006
"... We develop a model for examining data that consists of pairwise measurements, for example, presence or absence of links between pairs of objects. Examples include protein interactions and gene regulatory networks, collections of author-recipient email, and social networks. Analyzing such data with p ..."
Abstract - Cited by 378 (52 self) - Add to MetaCart
We develop a model for examining data that consists of pairwise measurements, for example, presence or absence of links between pairs of objects. Examples include protein interactions and gene regulatory networks, collections of author-recipient email, and social networks. Analyzing such data with probabilistic models requires special assumptions, since the usual independence or exchangeability assumptions no longer hold. We introduce a class of latent variable models for pairwise measurements: mixed membership stochastic blockmodels. Models in this class combine a global model of dense patches of connectivity (blockmodel) and a local model to instantiate nodespecific variability in the connections (mixed membership). We develop a general variational inference algorithm for fast approximate posterior inference. We demonstrate the advantages of mixed membership stochastic blockmodels with applications to social networks and protein interaction networks.
(Show Context)

Citation Context

...se parameters are fit to be close in Kullback-Leibler divergence to the true posterior of interest. Good reviews of variational methods method can be found in a number of papers (Jordan et al., 1999; =-=Wainwright and Jordan, 2003-=-; Xing et al., 2003; Bishop et al., 2003) The log of the marginal probability in Equation 2 can be bound with Jensen’s inequality as follows, [ log p(R | α, B) ≥ Eq log p(R, ⃗π1:N, Z→, Z←|α, B) ] [ −E...

Supervised topic models

by David M. Blei, Jon D. Mcauliffe - In preparation , 2008
"... ..."
Abstract - Cited by 336 (8 self) - Add to MetaCart
Abstract not found

Computer Vision: Algorithms and Applications

by Richard Szeliski , 2010
"... ..."
Abstract - Cited by 252 (2 self) - Add to MetaCart
Abstract not found

Variational inference for Dirichlet process mixtures

by David M. Blei, Michael I. Jordan - Bayesian Analysis , 2005
"... Abstract. Dirichlet process (DP) mixture models are the cornerstone of nonparametric Bayesian statistics, and the development of Monte-Carlo Markov chain (MCMC) sampling methods for DP mixtures has enabled the application of nonparametric Bayesian methods to a variety of practical data analysis prob ..."
Abstract - Cited by 244 (27 self) - Add to MetaCart
Abstract. Dirichlet process (DP) mixture models are the cornerstone of nonparametric Bayesian statistics, and the development of Monte-Carlo Markov chain (MCMC) sampling methods for DP mixtures has enabled the application of nonparametric Bayesian methods to a variety of practical data analysis problems. However, MCMC sampling can be prohibitively slow, and it is important to explore alternatives. One class of alternatives is provided by variational methods, a class of deterministic algorithms that convert inference problems into optimization problems (Opper and Saad 2001; Wainwright and Jordan 2003). Thus far, variational methods have mainly been explored in the parametric setting, in particular within the formalism of the exponential family (Attias 2000; Ghahramani and Beal 2001; Blei et al. 2003). In this paper, we present a variational inference algorithm for DP mixtures. We present experiments that compare the algorithm to Gibbs sampling algorithms for DP mixtures of Gaussians and present an application to a large-scale image analysis problem.
(Show Context)

Citation Context

...plore alternatives. One class of alternatives is provided by variational methods, a class of deterministic algorithms that convert inference problems into optimization problems (Opper and Saad, 2001; =-=Wainwright and Jordan, 2003-=-). Thus far, variational methods have mainly been explored in the parametric setting, in particular within the formalism of the exponential family (Attias, 2000; Ghahramani and Beal, 2001; Blei et al....

Probabilistic topic models

by David M. Blei - IEEE Signal Processing Magazine , 2010
"... Probabilistic topic models are a suite of algorithms whose aim is to discover the ..."
Abstract - Cited by 235 (6 self) - Add to MetaCart
Probabilistic topic models are a suite of algorithms whose aim is to discover the
(Show Context)

Citation Context

...ing families of distributions. It is a field that illuminates the deep mathematical links between probabilistic independence, graph theory, and algorithms for computing with probability distributions =-=[35]-=-. 7That number of possible topic structures, however, is exponentially large; this sum is intractable to compute. 6 As for many modern probabilistic models of interest—and for much of modern Bayesian...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University