Results 1  10
of
92
LIBLINEAR: A Library for Large Linear Classification
, 2008
"... LIBLINEAR is an open source library for largescale linear classification. It supports logistic regression and linear support vector machines. We provide easytouse commandline tools and library calls for users and developers. Comprehensive documents are available for both beginners and advanced u ..."
Abstract

Cited by 570 (24 self)
 Add to MetaCart
LIBLINEAR is an open source library for largescale linear classification. It supports logistic regression and linear support vector machines. We provide easytouse commandline tools and library calls for users and developers. Comprehensive documents are available for both beginners and advanced users. Experiments demonstrate that LIBLINEAR is very efficient on large sparse data sets.
Pegasos: Primal Estimated subgradient solver for SVM
"... We describe and analyze a simple and effective stochastic subgradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy ɛ is Õ(1/ɛ), where each iteration operates on a singl ..."
Abstract

Cited by 284 (15 self)
 Add to MetaCart
We describe and analyze a simple and effective stochastic subgradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy ɛ is Õ(1/ɛ), where each iteration operates on a single training example. In contrast, previous analyses of stochastic gradient descent methods for SVMs require Ω(1/ɛ2) iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/λ, where λ is the regularization parameter of SVM. For a linear kernel, the total runtime of our method is Õ(d/(λɛ)), where d is a bound on the number of nonzero features in each example. Since the runtime does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach also extends to nonlinear kernels while working solely on the primal objective function, though in this case the runtime does depend linearly on the training set size. Our algorithm is particularly well suited for large text classification problems, where we demonstrate an orderofmagnitude speedup over previous SVM learning methods.
Bundle Methods for Regularized Risk Minimization
"... A wide variety of machine learning problems can be described as minimizing a regularized risk functional, with different algorithms using different notions of risk and different regularizers. Examples include linear Support Vector Machines (SVMs), Gaussian Processes, Logistic Regression, Conditional ..."
Abstract

Cited by 35 (2 self)
 Add to MetaCart
A wide variety of machine learning problems can be described as minimizing a regularized risk functional, with different algorithms using different notions of risk and different regularizers. Examples include linear Support Vector Machines (SVMs), Gaussian Processes, Logistic Regression, Conditional Random Fields (CRFs), and Lasso amongst others. This paper describes the theory and implementation of a scalable and modular convex solver which solves all these estimation problems. It can be parallelized on a cluster of workstations, allows for datalocality, and can deal with regularizers such as L1 and L2 penalties. In addition to the unified framework we present tight convergence bounds, which show that our algorithm converges in O(1/ɛ) steps to ɛ precision for general convex problems and in O(log(1/ɛ)) steps for continuously differentiable problems. We demonstrate the performance of our general purpose solver on a variety of publicly available datasets.
MaxMargin Additive Classifiers for Detection
 ICCV
"... We present methods for training high quality object detectors very quickly. The core contribution is a pair of fast training algorithms for piecewise linear classifiers, which can approximate arbitrary additive models. The classifiers are trained in a maxmargin framework and significantly outperfo ..."
Abstract

Cited by 30 (4 self)
 Add to MetaCart
We present methods for training high quality object detectors very quickly. The core contribution is a pair of fast training algorithms for piecewise linear classifiers, which can approximate arbitrary additive models. The classifiers are trained in a maxmargin framework and significantly outperform linear classifiers on a variety of vision datasets. We report experimental results quantifying training time and accuracy on image classification tasks and pedestrian detection, including detection results better than the best previous on the INRIA dataset with faster training. 1.
Discriminative learning over constrained latent representations
 In Proc. of the Annual Meeting of the North American Association of Computational Linguistics (NAACL
"... This paper proposes a general learning framework for a class of problems that require learning over latent intermediate representations. Many natural language processing (NLP) decision problems are defined over an expressive intermediate representation that is not explicit in the input, leaving the ..."
Abstract

Cited by 27 (10 self)
 Add to MetaCart
This paper proposes a general learning framework for a class of problems that require learning over latent intermediate representations. Many natural language processing (NLP) decision problems are defined over an expressive intermediate representation that is not explicit in the input, leaving the algorithm with both the task of recovering a good intermediate representation and learning to classify correctly. Most current systems separate the learning problem into two stages by solving the first step of recovering the intermediate representation heuristically and using it to learn the final classifier. This paper develops a novel joint learning algorithm for both tasks, that uses the final prediction to guide the selection of the best intermediate representation. We evaluate our algorithm on three different NLP tasks – transliteration, paraphrase identification and textual entailment – and show that our joint method significantly improves performance. 1
Batch tuning strategies for statistical machine translation
 In HLTNAACL
, 2012
"... There has been a proliferation of recent work on SMT tuning algorithms capable of handling larger feature sets than the traditional MERT approach. We analyze a number of these algorithms in terms of their sentencelevel loss functions, which motivates several new approaches, including a Structured SV ..."
Abstract

Cited by 21 (3 self)
 Add to MetaCart
There has been a proliferation of recent work on SMT tuning algorithms capable of handling larger feature sets than the traditional MERT approach. We analyze a number of these algorithms in terms of their sentencelevel loss functions, which motivates several new approaches, including a Structured SVM. We perform empirical comparisons of eight different tuning strategies, including MERT, in a variety of settings. Among other results, we find that a simple and efficient batch version of MIRA performs at least as well as training online, and consistently outperforms other options. 1
Large Scale MaxMargin MultiLabel Classification with Priors
"... We propose a maxmargin formulation for the multilabel classification problem where the goal is to tag a data point with a set of prespecified labels. Given a set of L labels, a data point can be tagged with any of the 2 L possible subsets. The main challenge therefore lies in optimising over this ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
We propose a maxmargin formulation for the multilabel classification problem where the goal is to tag a data point with a set of prespecified labels. Given a set of L labels, a data point can be tagged with any of the 2 L possible subsets. The main challenge therefore lies in optimising over this exponentially large label space subject to label correlations. Existing solutions take either of two approaches. The first assumes, a priori, that there are no label correlations and independently trains a classifier for each label (as is done in the 1vsAll heuristic). This reduces the problem complexity from exponential to linear and such methods can scale to large problems. The second approach explicitly models correlations by pairwise label interactions. However, the complexity remains exponential unless one assumes that label correlations are sparse. Furthermore, the learnt correlations reflect the training set biases. We take a middle approach that assumes labels are correlated but does not incorporate pairwise label terms in the prediction function. We show that the complexity can still be reduced from exponential to linear while modelling dense pairwise label correlations. By incorporating correlation priors we can overcome training set biases and improve prediction accuracy. We provide a principled interpretation of the 1vsAll method and show
Large Linear Classification When Data Cannot Fit In Memory ABSTRACT
"... Recent advances in linear classification have shown that for applications such as document classification, the training can be extremely efficient. However, most of the existing training methods are designed by assuming that data can be stored in the computer memory. These methods cannot be easily a ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
Recent advances in linear classification have shown that for applications such as document classification, the training can be extremely efficient. However, most of the existing training methods are designed by assuming that data can be stored in the computer memory. These methods cannot be easily applied to data larger than the memory capacity due to the random access to the disk. We propose and analyze a block minimization framework for data larger than the memory size. At each step a block of data is loaded from the disk and handled by certain learning methods. We investigate two implementations of the proposed framework for primal and dual SVMs, respectively. As data cannot fit in memory, many design considerations are very different from those for traditional algorithms. Experiments using data sets 20 times larger than the memory demonstrate the effectiveness of the proposed method.
Tighter and convex maximum margin clustering
 In AISTATS, 2009b
"... Maximum margin principle has been successfully applied to many supervised and semisupervised problems in machine learning. Recently, this principle was extended for clustering, referred to as Maximum Margin Clustering (MMC) and achieved promising performance in recent studies. To avoid the problem ..."
Abstract

Cited by 19 (10 self)
 Add to MetaCart
Maximum margin principle has been successfully applied to many supervised and semisupervised problems in machine learning. Recently, this principle was extended for clustering, referred to as Maximum Margin Clustering (MMC) and achieved promising performance in recent studies. To avoid the problem of local minima, MMC can be solved globally via convex semidefinite programming (SDP) relaxation. Although many efficient approaches have been proposed to alleviate the computational burden of SDP, convex MMCs are still not scalable for medium data sets. In this paper, we propose a novel convex optimization method, LGMMC, which maximizes the margin of opposite clusters via “Label Generation”. It can be shown that LGMMC is much more scalable than existing convex approaches. Moreover, we show that our convex relaxation is tighter than stateofart convex MMCs. Experiments on seventeen UCI datasets and MNIST dataset show significant improvement over existing MMC algorithms. 1