Results 1  10
of
267
Large margin methods for structured and interdependent output variables
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... Learning general functional dependencies between arbitrary input and output spaces is one of the key challenges in computational intelligence. While recent progress in machine learning has mainly focused on designing flexible and powerful input representations, this paper addresses the complementary ..."
Abstract

Cited by 372 (11 self)
 Add to MetaCart
Learning general functional dependencies between arbitrary input and output spaces is one of the key challenges in computational intelligence. While recent progress in machine learning has mainly focused on designing flexible and powerful input representations, this paper addresses the complementary issue of designing classification algorithms that can deal with more complex outputs, such as trees, sequences, or sets. More generally, we consider problems involving multiple dependent output variables, structured output spaces, and classification problems with class attributes. In order to accomplish this, we propose to appropriately generalize the wellknown notion of a separation margin and derive a corresponding maximummargin formulation. While this leads to a quadratic program with a potentially prohibitive, i.e. exponential, number of constraints, we present a cutting plane algorithm that solves the optimization problem in polynomial time for a large class of problems. The proposed method has important applications in areas such as computational biology, natural language processing, information retrieval/extraction, and optical character recognition. Experiments from various domains involving different types of output spaces emphasize the breadth and generality of our approach.
Online passiveaggressive algorithms
 JMLR
, 2006
"... We present a unified view for online classification, regression, and uniclass problems. This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the nonrealizable case. The end result is new alg ..."
Abstract

Cited by 293 (22 self)
 Add to MetaCart
We present a unified view for online classification, regression, and uniclass problems. This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the nonrealizable case. The end result is new algorithms and accompanying loss bounds for hingeloss regression and uniclass. We also get refined loss bounds for previously studied classification algorithms. 1
A support vector method for multivariate performance measures
 Proceedings of the 22nd International Conference on Machine Learning
, 2005
"... This paper presents a Support Vector Method for optimizing multivariate nonlinear performance measures like the F1score. Taking a multivariate prediction approach, we give an algorithm with which such multivariate SVMs can be trained in polynomial time for large classes of potentially nonlinear per ..."
Abstract

Cited by 192 (5 self)
 Add to MetaCart
This paper presents a Support Vector Method for optimizing multivariate nonlinear performance measures like the F1score. Taking a multivariate prediction approach, we give an algorithm with which such multivariate SVMs can be trained in polynomial time for large classes of potentially nonlinear performance measures, in particular ROCArea and all measures that can be computed from the contingency table. The conventional classification SVM arises as a special case of our method. 1.
Learning structured prediction models: a large margin approach
, 2004
"... We consider large margin estimation in a broad range of prediction models where inference involves solving combinatorial optimization problems, for example, weighted graphcuts or matchings. Our goal is to learn parameters such that inference using the model reproduces correct answers on the training ..."
Abstract

Cited by 164 (7 self)
 Add to MetaCart
We consider large margin estimation in a broad range of prediction models where inference involves solving combinatorial optimization problems, for example, weighted graphcuts or matchings. Our goal is to learn parameters such that inference using the model reproduces correct answers on the training data. Our method relies on the expressive power of convex optimization problems to compactly capture inference or solution optimality in structured prediction models. Directly embedding this structure within the learning formulation produces concise convex problems for efficient estimation of very complex and diverse models. We describe experimental results on a matching task, disulfide connectivity prediction, showing significant improvements over stateoftheart methods. 1.
Learning Structural SVMs with Latent Variables
"... It is well known in statistics and machine learning that the combination of latent (or hidden) variables and observed variables offer more expressive power than models with observed variables alone. Latent variables ..."
Abstract

Cited by 114 (2 self)
 Add to MetaCart
It is well known in statistics and machine learning that the combination of latent (or hidden) variables and observed variables offer more expressive power than models with observed variables alone. Latent variables
T.: Hierarchical document categorization with support vector machines
 In: Proceedings of the 13th Conference on Information and Knowledge Management
, 2004
"... Automatically categorizing documents into predefined topic hierarchies or taxonomies is a crucial step in knowledge and content management. Standard machine learning techniques like Support Vector Machines and related large margin methods have been successfully applied for this task, albeit the fac ..."
Abstract

Cited by 113 (4 self)
 Add to MetaCart
Automatically categorizing documents into predefined topic hierarchies or taxonomies is a crucial step in knowledge and content management. Standard machine learning techniques like Support Vector Machines and related large margin methods have been successfully applied for this task, albeit the fact that they ignore the interclass relationships. In this paper, we propose a novel hierarchical classification method that generalizes Support Vector Machine learning and that is based on discriminant functions that are structured in a way that mirrors the class hierarchy. Our method can work with arbitrary, not necessarily singly connected taxonomies and can deal with taskspecific loss functions. All parameters are learned jointly by optimizing a common objective function corresponding to a regularized upper bound on the empirical loss. We present experimental results on the WIPOalpha patent collection to show the competitiveness of our approach.
Discriminative models for multiclass object layout
"... Many stateoftheart approaches for object recognition reduce the problem to a 01 classification task. Such reductions allow one to leverage sophisticated classifiers for learning. These models are typically trained independently for each class using positive and negative examples cropped from ima ..."
Abstract

Cited by 106 (5 self)
 Add to MetaCart
Many stateoftheart approaches for object recognition reduce the problem to a 01 classification task. Such reductions allow one to leverage sophisticated classifiers for learning. These models are typically trained independently for each class using positive and negative examples cropped from images. At testtime, various postprocessing heuristics such as nonmaxima suppression (NMS) are required to reconcile multiple detections within and between different classes for each image. Though crucial to good performance on benchmarks, this postprocessing is usually defined heuristically. We introduce a unified model for multiclass object recognition that casts the problem as a structured prediction task. Rather than predicting a binary label for each image
Learning to localize objects with structured output regression
 In ECCV
, 2008
"... Abstract. Sliding window classifiers are among the most successful and widely applied techniques for object localization. However, training is typically done in a way that is not specific to the localization task. First a binary classifier is trained using a sample of positive and negative examples, ..."
Abstract

Cited by 70 (11 self)
 Add to MetaCart
Abstract. Sliding window classifiers are among the most successful and widely applied techniques for object localization. However, training is typically done in a way that is not specific to the localization task. First a binary classifier is trained using a sample of positive and negative examples, and this classifier is subsequently applied to multiple regions within test images. We propose instead to treat object localization in a principled way by posing it as a problem of predicting structured data: we model the problem not as binary classification, but as the prediction of the bounding box of objects located in images. The use of a jointkernel framework allows us to formulate the training procedure as a generalization of an SVM, which can be solved efficiently. We further improve computational efficiency by using a branchandbound strategy for localization during both training and testing. Experimental evaluation on the PASCAL VOC and TU Darmstadt datasets show that the structured training procedure improves performance over binary training as well as the best previously published scores. 1
Exponentiated gradient algorithms for conditional random fields and maxmargin Markov networks
, 2008
"... Loglinear and maximummargin models are two commonlyused methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large dat ..."
Abstract

Cited by 59 (1 self)
 Add to MetaCart
Loglinear and maximummargin models are two commonlyused methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large data sets. This paper describes exponentiated gradient (EG) algorithms for training such models, where EG updates are applied to the convex dual of either the loglinear or maxmargin objective function; the dual in both the loglinear and maxmargin cases corresponds to minimizing a convex function with simplex constraints. We study both batch and online variants of the algorithm, and provide rates of convergence for both cases. In the maxmargin case, O ( 1 ε) EG updates are required to reach a given accuracy ε in the dual; in contrast, for loglinear models only O(log (1/ε)) updates are required. For both the maxmargin and loglinear cases, our bounds suggest that the online EG algorithm requires a factor of n less computation to reach a desired accuracy than the batch EG algorithm, where n is the number of training examples. Our experiments confirm that the online algorithms are much faster than the batch algorithms in practice. We describe how the EG updates factor in a convenient way for structured prediction problems, allowing the algorithms to be
Supervised clustering with support vector machines
 in ICML
, 2005
"... Supervised clustering is the problem of training a clustering algorithm to produce desirable clusterings: given sets of items and complete clusterings over these sets, we learn how to cluster future sets of items. Example applications include nounphrase coreference clustering, and clustering news a ..."
Abstract

Cited by 57 (4 self)
 Add to MetaCart
Supervised clustering is the problem of training a clustering algorithm to produce desirable clusterings: given sets of items and complete clusterings over these sets, we learn how to cluster future sets of items. Example applications include nounphrase coreference clustering, and clustering news articles by whether they refer to the same topic. In this paper we present an SVM algorithm that trains a clustering algorithm by adapting the itempair similarity measure. The algorithm may optimize a variety of different clustering functions to a variety of clustering performance measures. We empirically evaluate the algorithm for nounphrase and news article clustering. 1.