Results 1  10
of
40
Fixing MaxProduct: Convergent Message Passing Algorithms for MAP LPRelaxations
"... We present a novel message passing algorithm for approximating the MAP problem in graphical models. The algorithm is similar in structure to maxproduct but unlike maxproduct it always converges, and can be proven to find the exact MAP solution in various settings. The algorithm is derived via bloc ..."
Abstract

Cited by 75 (10 self)
 Add to MetaCart
We present a novel message passing algorithm for approximating the MAP problem in graphical models. The algorithm is similar in structure to maxproduct but unlike maxproduct it always converges, and can be proven to find the exact MAP solution in various settings. The algorithm is derived via block coordinate descent in a dual of the LP relaxation of MAP, but does not require any tunable parameters such as step size or tree weights. We also describe a generalization of the method to cluster based potentials. The new method is tested on synthetic and realworld problems, and compares favorably with previous approaches. Graphical models are an effective approach for modeling complex objects via local interactions. In such models, a distribution over a set of variables is assumed to factor according to cliques of a graph with potentials assigned to each clique. Finding the assignment with highest probability in these models is key to using them in practice, and is often referred to as the MAP (maximum aposteriori) assignment problem. In the general case the problem is NP hard, with complexity exponential in the treewidth of the underlying graph.
Exponentiated gradient algorithms for conditional random fields and maxmargin Markov networks
, 2008
"... Loglinear and maximummargin models are two commonlyused methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large dat ..."
Abstract

Cited by 59 (1 self)
 Add to MetaCart
Loglinear and maximummargin models are two commonlyused methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large data sets. This paper describes exponentiated gradient (EG) algorithms for training such models, where EG updates are applied to the convex dual of either the loglinear or maxmargin objective function; the dual in both the loglinear and maxmargin cases corresponds to minimizing a convex function with simplex constraints. We study both batch and online variants of the algorithm, and provide rates of convergence for both cases. In the maxmargin case, O ( 1 ε) EG updates are required to reach a given accuracy ε in the dual; in contrast, for loglinear models only O(log (1/ε)) updates are required. For both the maxmargin and loglinear cases, our bounds suggest that the online EG algorithm requires a factor of n less computation to reach a desired accuracy than the batch EG algorithm, where n is the number of training examples. Our experiments confirm that the online algorithms are much faster than the batch algorithms in practice. We describe how the EG updates factor in a convenient way for structured prediction problems, allowing the algorithms to be
Maxmargin hidden conditional random fields for human action recognition
, 2008
"... We present a new method for classification with structured latent variables. Our model is formulated using the maxmargin formalism in the discriminative learning literature. We propose an efficient learning algorithm based on the cutting plane method and decomposed dual optimization. We apply our m ..."
Abstract

Cited by 31 (6 self)
 Add to MetaCart
We present a new method for classification with structured latent variables. Our model is formulated using the maxmargin formalism in the discriminative learning literature. We propose an efficient learning algorithm based on the cutting plane method and decomposed dual optimization. We apply our model to the problem of recognizing human actions from video sequences, where we model a human action as a global root template and a constellation of several “parts”. We show that our model outperforms another similar method that uses hidden conditional random fields, and is comparable to other stateoftheart approaches. More importantly, our proposed work is quite general and can potentially be applied in a wide variety of vision problems that involve various complex, interdependent latent structures. 1.
Coresets, sparse greedy approximation and the FrankWolfe algorithm
 Proceedings of the 19th Annual ACMSIAM Symposium on Discrete Algorithms
"... The problem of maximizing a concave function f(x) in a simplex S can be solved approximately by a simple greedy algorithm. For given k, the algorithm can find a point x(k) on a kdimensional face of S, such that f(x(k)) ≥ f(x∗) − O(1/k). Here f(x∗) is the maximum value of f in S. This algorithm an ..."
Abstract

Cited by 30 (1 self)
 Add to MetaCart
The problem of maximizing a concave function f(x) in a simplex S can be solved approximately by a simple greedy algorithm. For given k, the algorithm can find a point x(k) on a kdimensional face of S, such that f(x(k)) ≥ f(x∗) − O(1/k). Here f(x∗) is the maximum value of f in S. This algorithm and analysis were known before, and related to problems of statistics and machine learning, such as boosting, regression, and density mixture estimation. In other work, coming from computational geometry, the existence of ɛcoresets was shown for the minimum enclosing ball problem, by means of a simple greedy algorithm. Similar greedy algorithms, that are special cases of the FrankWolfe algorithm, were described for other enclosure problems. Here these results are tied together, stronger convergence results are reviewed, and several coreset bounds are generalized or strengthened.
A Discriminative Latent Model of Object Classes and Attributes
"... Abstract. We present a discriminatively trained model for joint modelling of object class labels (e.g. “person”, “dog”, “chair”, etc.) and their visual attributes (e.g. “has head”, “furry”, “metal”, etc.). We treat attributes of an object as latent variables in our model and capture the correlations ..."
Abstract

Cited by 29 (4 self)
 Add to MetaCart
Abstract. We present a discriminatively trained model for joint modelling of object class labels (e.g. “person”, “dog”, “chair”, etc.) and their visual attributes (e.g. “has head”, “furry”, “metal”, etc.). We treat attributes of an object as latent variables in our model and capture the correlations among attributes using an undirected graphical model built from training data. The advantage of our model is that it allows us to infer object class labels using the information of both the test image itself and its (latent) attributes. Our model unifies object class prediction and attribute prediction in a principled framework. It is also flexible enough to deal with different performance measurements. Our experimental results provide quantitative evidence that attributes can improve object naming. 1
Structured prediction via the extragradient method
 In Advances in
, 2006
"... We present a simple and scalable algorithm for largemargin estimation of structured models, including an important class of Markov networks and combinatorial models. We formulate the estimation problem as a convexconcave saddlepoint problem and apply the extragradient method, yielding an algorith ..."
Abstract

Cited by 27 (2 self)
 Add to MetaCart
We present a simple and scalable algorithm for largemargin estimation of structured models, including an important class of Markov networks and combinatorial models. We formulate the estimation problem as a convexconcave saddlepoint problem and apply the extragradient method, yielding an algorithm with linear convergence using simple gradient and projection calculations. The projection step can be solved using combinatorial algorithms for mincost quadratic flow. This makes the approach an efficient alternative to formulations based on reductions to a quadratic program (QP). We present experiments on two very different structured prediction tasks: 3D image segmentation and word alignment, illustrating the favorable scaling properties of our algorithm. 1
Maxmargin weight learning for Markov logic networks
 In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD09). Bled
, 2009
"... Abstract. Markov logic networks (MLNs) are an expressive representation for statistical relational learning that generalizes both firstorder logic and graphical models. Existing discriminative weight learning methods for MLNs all try to learn weights that optimize the Conditional Log Likelihood (CL ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
Abstract. Markov logic networks (MLNs) are an expressive representation for statistical relational learning that generalizes both firstorder logic and graphical models. Existing discriminative weight learning methods for MLNs all try to learn weights that optimize the Conditional Log Likelihood (CLL) of the training examples. In this work, we present a new discriminative weight learning method for MLNs based on a maxmargin framework. This results in a new model, MaxMargin Markov Logic Networks (M3LNs), that combines the expressiveness of MLNs with the predictive accuracy of structural Support Vector Machines (SVMs). To train the proposed model, we design a new approximation algorithm for lossaugmented inference in MLNs based on Linear Programming (LP). The experimental result shows that the proposed approach generally achieves higher F1 scores than the current best discriminative weight learner for MLNs. 1
The Interplay of Optimization and Machine Learning Research
 Journal of Machine Learning Research
, 2006
"... The fields of machine learning and mathematical programming are increasingly intertwined. Optimization problems lie at the heart of most machine learning approaches. The Special Topic on Machine Learning and Large Scale Optimization examines this interplay. Machine learning researchers have embra ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
The fields of machine learning and mathematical programming are increasingly intertwined. Optimization problems lie at the heart of most machine learning approaches. The Special Topic on Machine Learning and Large Scale Optimization examines this interplay. Machine learning researchers have embraced the advances in mathematical programming allowing new types of models to be pursued. The special topic includes models using quadratic, linear, secondorder cone, semidefinite, and semiinfinite programs. We observe that the qualities of good optimization algorithms from the machine learning and optimization perspectives can be quite different. Mathematical programming puts a premium on accuracy, speed, and robustness. Since generalization is the bottom line in machine learning and training is normally done offline, accuracy and small speed improvements are of little concern in machine learning. Machine learning prefers simpler algorithms that work in reasonable computational time for specific classes of problems. Reducing machine learning problems to wellexplored mathematical programming classes with robust general purpose optimization codes allows machine learning researchers to rapidly develop new techniques.
An LP View of the Mbest MAP problem
"... We consider the problem of finding the M assignments with maximum probability in a probabilistic graphical model. We show how this problem can be formulated as a linear program (LP) on a particular polytope. We prove that, for tree graphs (and junction trees in general), this polytope has a particul ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
We consider the problem of finding the M assignments with maximum probability in a probabilistic graphical model. We show how this problem can be formulated as a linear program (LP) on a particular polytope. We prove that, for tree graphs (and junction trees in general), this polytope has a particularly simple form and differs from the marginal polytope in a single inequality constraint. We use this characterization to provide an approximation scheme for nontree graphs, by using the set of spanning trees over such graphs. The method we present puts the Mbest inference problem in the context of LP relaxations, which have recently received considerable attention and have proven useful in solving difficult inference problems. We show empirically that our method often finds the provably exact M best configurations for problems of high treewidth. A common task in probabilistic modeling is finding the assignment with maximum probability given a model. This is often referred to as the MAP (maximum aposteriori) problem.
Learning Efficiently with Approximate Inference via Dual Losses
"... Many structured prediction tasks involve complex models where inference is computationally intractable, but where it can be well approximated using a linear programming relaxation. Previous approaches for learning for structured prediction (e.g., cuttingplane, subgradient methods, perceptron) repeat ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
Many structured prediction tasks involve complex models where inference is computationally intractable, but where it can be well approximated using a linear programming relaxation. Previous approaches for learning for structured prediction (e.g., cuttingplane, subgradient methods, perceptron) repeatedly make predictions for some of the data points. These approaches are computationally demanding because each prediction involves solving a linear program to optimality. We present a scalable algorithm for learning for structured prediction. The main idea is to instead solve the dual of the structured prediction loss. We formulate the learning task as a convex minimization over both the weights and the dual variables corresponding to each data point. As a result, we can begin to optimize the weights even before completely solving any of the individual prediction problems. We show how the dual variables can be efficiently optimized using coordinate descent. Our algorithm is competitive with stateoftheart methods such as stochastic subgradient and cuttingplane. 1.