Results 1  10
of
59
Fixing MaxProduct: Convergent Message Passing Algorithms for MAP LPRelaxations
"... We present a novel message passing algorithm for approximating the MAP problem in graphical models. The algorithm is similar in structure to maxproduct but unlike maxproduct it always converges, and can be proven to find the exact MAP solution in various settings. The algorithm is derived via bloc ..."
Abstract

Cited by 160 (14 self)
 Add to MetaCart
(Show Context)
We present a novel message passing algorithm for approximating the MAP problem in graphical models. The algorithm is similar in structure to maxproduct but unlike maxproduct it always converges, and can be proven to find the exact MAP solution in various settings. The algorithm is derived via block coordinate descent in a dual of the LP relaxation of MAP, but does not require any tunable parameters such as step size or tree weights. We also describe a generalization of the method to cluster based potentials. The new method is tested on synthetic and realworld problems, and compares favorably with previous approaches. Graphical models are an effective approach for modeling complex objects via local interactions. In such models, a distribution over a set of variables is assumed to factor according to cliques of a graph with potentials assigned to each clique. Finding the assignment with highest probability in these models is key to using them in practice, and is often referred to as the MAP (maximum aposteriori) assignment problem. In the general case the problem is NP hard, with complexity exponential in the treewidth of the underlying graph.
Exponentiated gradient algorithms for conditional random fields and maxmargin Markov networks
, 2008
"... Loglinear and maximummargin models are two commonlyused methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large dat ..."
Abstract

Cited by 94 (2 self)
 Add to MetaCart
(Show Context)
Loglinear and maximummargin models are two commonlyused methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large data sets. This paper describes exponentiated gradient (EG) algorithms for training such models, where EG updates are applied to the convex dual of either the loglinear or maxmargin objective function; the dual in both the loglinear and maxmargin cases corresponds to minimizing a convex function with simplex constraints. We study both batch and online variants of the algorithm, and provide rates of convergence for both cases. In the maxmargin case, O ( 1 ε) EG updates are required to reach a given accuracy ε in the dual; in contrast, for loglinear models only O(log (1/ε)) updates are required. For both the maxmargin and loglinear cases, our bounds suggest that the online EG algorithm requires a factor of n less computation to reach a desired accuracy than the batch EG algorithm, where n is the number of training examples. Our experiments confirm that the online algorithms are much faster than the batch algorithms in practice. We describe how the EG updates factor in a convenient way for structured prediction problems, allowing the algorithms to be
Coresets, sparse greedy approximation and the FrankWolfe algorithm
 Proceedings of the 19th Annual ACMSIAM Symposium on Discrete Algorithms
"... The problem of maximizing a concave function f(x) in a simplex S can be solved approximately by a simple greedy algorithm. For given k, the algorithm can find a point x(k) on a kdimensional face of S, such that f(x(k)) ≥ f(x∗) − O(1/k). Here f(x∗) is the maximum value of f in S. This algorithm an ..."
Abstract

Cited by 84 (1 self)
 Add to MetaCart
The problem of maximizing a concave function f(x) in a simplex S can be solved approximately by a simple greedy algorithm. For given k, the algorithm can find a point x(k) on a kdimensional face of S, such that f(x(k)) ≥ f(x∗) − O(1/k). Here f(x∗) is the maximum value of f in S. This algorithm and analysis were known before, and related to problems of statistics and machine learning, such as boosting, regression, and density mixture estimation. In other work, coming from computational geometry, the existence of ɛcoresets was shown for the minimum enclosing ball problem, by means of a simple greedy algorithm. Similar greedy algorithms, that are special cases of the FrankWolfe algorithm, were described for other enclosure problems. Here these results are tied together, stronger convergence results are reviewed, and several coreset bounds are generalized or strengthened.
A Discriminative Latent Model of Object Classes and Attributes
"... Abstract. We present a discriminatively trained model for joint modelling of object class labels (e.g. “person”, “dog”, “chair”, etc.) and their visual attributes (e.g. “has head”, “furry”, “metal”, etc.). We treat attributes of an object as latent variables in our model and capture the correlations ..."
Abstract

Cited by 81 (5 self)
 Add to MetaCart
(Show Context)
Abstract. We present a discriminatively trained model for joint modelling of object class labels (e.g. “person”, “dog”, “chair”, etc.) and their visual attributes (e.g. “has head”, “furry”, “metal”, etc.). We treat attributes of an object as latent variables in our model and capture the correlations among attributes using an undirected graphical model built from training data. The advantage of our model is that it allows us to infer object class labels using the information of both the test image itself and its (latent) attributes. Our model unifies object class prediction and attribute prediction in a principled framework. It is also flexible enough to deal with different performance measurements. Our experimental results provide quantitative evidence that attributes can improve object naming. 1
Maxmargin hidden conditional random fields for human action recognition
, 2008
"... We present a new method for classification with structured latent variables. Our model is formulated using the maxmargin formalism in the discriminative learning literature. We propose an efficient learning algorithm based on the cutting plane method and decomposed dual optimization. We apply our m ..."
Abstract

Cited by 52 (9 self)
 Add to MetaCart
(Show Context)
We present a new method for classification with structured latent variables. Our model is formulated using the maxmargin formalism in the discriminative learning literature. We propose an efficient learning algorithm based on the cutting plane method and decomposed dual optimization. We apply our model to the problem of recognizing human actions from video sequences, where we model a human action as a global root template and a constellation of several “parts”. We show that our model outperforms another similar method that uses hidden conditional random fields, and is comparable to other stateoftheart approaches. More importantly, our proposed work is quite general and can potentially be applied in a wide variety of vision problems that involve various complex, interdependent latent structures. 1.
Learning Efficiently with Approximate Inference via Dual Losses
"... Many structured prediction tasks involve complex models where inference is computationally intractable, but where it can be well approximated using a linear programming relaxation. Previous approaches for learning for structured prediction (e.g., cuttingplane, subgradient methods, perceptron) repeat ..."
Abstract

Cited by 38 (7 self)
 Add to MetaCart
(Show Context)
Many structured prediction tasks involve complex models where inference is computationally intractable, but where it can be well approximated using a linear programming relaxation. Previous approaches for learning for structured prediction (e.g., cuttingplane, subgradient methods, perceptron) repeatedly make predictions for some of the data points. These approaches are computationally demanding because each prediction involves solving a linear program to optimality. We present a scalable algorithm for learning for structured prediction. The main idea is to instead solve the dual of the structured prediction loss. We formulate the learning task as a convex minimization over both the weights and the dual variables corresponding to each data point. As a result, we can begin to optimize the weights even before completely solving any of the individual prediction problems. We show how the dual variables can be efficiently optimized using coordinate descent. Our algorithm is competitive with stateoftheart methods such as stochastic subgradient and cuttingplane. 1.
A primaldual messagepassing algorithm for approximated large scale structured prediction
 In Advances in Neural Information Processing Systems 23
, 2010
"... In this paper we propose an approximated structured prediction framework for large scale graphical models and derive messagepassing algorithms for learning their parameters efficiently. We first relate CRFs and structured SVMs and show that in CRFs a variant of the logpartition function, known as ..."
Abstract

Cited by 38 (19 self)
 Add to MetaCart
(Show Context)
In this paper we propose an approximated structured prediction framework for large scale graphical models and derive messagepassing algorithms for learning their parameters efficiently. We first relate CRFs and structured SVMs and show that in CRFs a variant of the logpartition function, known as the softmax, smoothly approximates the hinge loss function of structured SVMs. We then propose an intuitive approximation for the structured prediction problem, using duality, based on a local entropy approximation and derive an efficient messagepassing algorithm that is guaranteed to converge. Unlike existing approaches, this allows us to learn efficiently graphical models with cycles and very large number of parameters. 1
Structured prediction via the extragradient method
"... We present a simple and scalable algorithm for largemargin estimation of structured models, including an important class of Markov networks and combinatorial models. The estimation problem can be formulated as a quadratic program (QP) that exploits the problem structure to achieve polynomial number ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
(Show Context)
We present a simple and scalable algorithm for largemargin estimation of structured models, including an important class of Markov networks and combinatorial models. The estimation problem can be formulated as a quadratic program (QP) that exploits the problem structure to achieve polynomial number of variables and constraints. However, offtheshelf QP solvers scale poorly with problem and training sample size. We recast the formulation as a convexconcave saddle point problem that allows us to use simple projection methods. We show the projection step can be solved using combinatorial algorithms for mincost convex flow. We provide linear convergence guarantees for our method and present experiments on two very different structured prediction tasks: 3D image segmentation and word alignment, illustrating the favorable scaling properties of our algorithm.
Maxmargin weight learning for Markov logic networks
 In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD09). Bled
, 2009
"... Abstract. Markov logic networks (MLNs) are an expressive representation for statistical relational learning that generalizes both firstorder logic and graphical models. Existing discriminative weight learning methods for MLNs all try to learn weights that optimize the Conditional Log Likelihood (CL ..."
Abstract

Cited by 30 (5 self)
 Add to MetaCart
(Show Context)
Abstract. Markov logic networks (MLNs) are an expressive representation for statistical relational learning that generalizes both firstorder logic and graphical models. Existing discriminative weight learning methods for MLNs all try to learn weights that optimize the Conditional Log Likelihood (CLL) of the training examples. In this work, we present a new discriminative weight learning method for MLNs based on a maxmargin framework. This results in a new model, MaxMargin Markov Logic Networks (M3LNs), that combines the expressiveness of MLNs with the predictive accuracy of structural Support Vector Machines (SVMs). To train the proposed model, we design a new approximation algorithm for lossaugmented inference in MLNs based on Linear Programming (LP). The experimental result shows that the proposed approach generally achieves higher F1 scores than the current best discriminative weight learner for MLNs. 1
The Interplay of Optimization and Machine Learning Research
 Journal of Machine Learning Research
, 2006
"... The fields of machine learning and mathematical programming are increasingly intertwined. Optimization problems lie at the heart of most machine learning approaches. The Special Topic on Machine Learning and Large Scale Optimization examines this interplay. Machine learning researchers have embra ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
The fields of machine learning and mathematical programming are increasingly intertwined. Optimization problems lie at the heart of most machine learning approaches. The Special Topic on Machine Learning and Large Scale Optimization examines this interplay. Machine learning researchers have embraced the advances in mathematical programming allowing new types of models to be pursued. The special topic includes models using quadratic, linear, secondorder cone, semidefinite, and semiinfinite programs. We observe that the qualities of good optimization algorithms from the machine learning and optimization perspectives can be quite different. Mathematical programming puts a premium on accuracy, speed, and robustness. Since generalization is the bottom line in machine learning and training is normally done offline, accuracy and small speed improvements are of little concern in machine learning. Machine learning prefers simpler algorithms that work in reasonable computational time for specific classes of problems. Reducing machine learning problems to wellexplored mathematical programming classes with robust general purpose optimization codes allows machine learning researchers to rapidly develop new techniques.