Results 1  10
of
17
Coresets, sparse greedy approximation and the FrankWolfe algorithm
 Proceedings of the 19th Annual ACMSIAM Symposium on Discrete Algorithms
"... The problem of maximizing a concave function f(x) in a simplex S can be solved approximately by a simple greedy algorithm. For given k, the algorithm can find a point x(k) on a kdimensional face of S, such that f(x(k)) ≥ f(x∗) − O(1/k). Here f(x∗) is the maximum value of f in S. This algorithm an ..."
Abstract

Cited by 79 (1 self)
 Add to MetaCart
The problem of maximizing a concave function f(x) in a simplex S can be solved approximately by a simple greedy algorithm. For given k, the algorithm can find a point x(k) on a kdimensional face of S, such that f(x(k)) ≥ f(x∗) − O(1/k). Here f(x∗) is the maximum value of f in S. This algorithm and analysis were known before, and related to problems of statistics and machine learning, such as boosting, regression, and density mixture estimation. In other work, coming from computational geometry, the existence of ɛcoresets was shown for the minimum enclosing ball problem, by means of a simple greedy algorithm. Similar greedy algorithms, that are special cases of the FrankWolfe algorithm, were described for other enclosure problems. Here these results are tied together, stronger convergence results are reviewed, and several coreset bounds are generalized or strengthened.
Coresets for Polytope Distance
, 2009
"... Following recent work of Clarkson, we translate the coreset framework to the problems of finding the point closest to the origin inside a polytope, finding the shortest distance between two polytopes, Perceptrons, and soft as well as hardmargin Support Vector Machines (SVM). We prove asymptoticall ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
(Show Context)
Following recent work of Clarkson, we translate the coreset framework to the problems of finding the point closest to the origin inside a polytope, finding the shortest distance between two polytopes, Perceptrons, and soft as well as hardmargin Support Vector Machines (SVM). We prove asymptotically matching upper and lower bounds on the size of coresets, stating that ɛcoresets of size ⌈(1 + o(1))E ∗ /ɛ⌉ do always exist as ɛ → 0, and that this is best possible. The crucial quantity E ∗ is what we call the excentricity of a polytope, or a pair of polytopes. Additionally, we prove linear convergence speed of Gilbert’s algorithm, one of the earliest known approximation algorithms for polytope distance, and generalize both the algorithm and the proof to the two polytope case. Interestingly, our coreset bounds also imply that we can for the first time prove matching upper and lower bounds for the sparsity of Perceptron and SVM solutions.
Structured prediction by joint kernel support estimation
 MACH LEARN
, 2009
"... Discriminative techniques, such as conditional random fields (CRFs) or structure aware maximummargin techniques (maximum margin Markov networks (M 3 N), structured output support vector machines (SSVM)), are stateoftheart in the prediction of structured data. However, to achieve good results th ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
Discriminative techniques, such as conditional random fields (CRFs) or structure aware maximummargin techniques (maximum margin Markov networks (M 3 N), structured output support vector machines (SSVM)), are stateoftheart in the prediction of structured data. However, to achieve good results these techniques require complete and reliable ground truth, which is not always available in realistic problems. Furthermore, training either CRFs or marginbased techniques is computationally costly, because the runtime of current training methods depends not only on the size of the training set but also on properties of the output space to which the training samples are assigned. We propose an alternative model for structured output prediction, Joint Kernel Support Estimation (JKSE), which is rather generative in nature as it relies on estimating the joint probability density of samples and labels in the training set. This makes it tolerant against incomplete or incorrect labels and also opens the possibility of learning in situations where more than one output label can be considered correct. At the same time, we avoid typical problems of generative models as we do not attempt to learn the full joint probability distribution, but we model only its support in a joint reproducing
Activized Learning: Transforming Passive to Active with Improved Label Complexity
"... Active learning methods often achieve improved performance using fewer labels compared to passive learning methods. A variety of practically successful active learning algorithms use a passive learning algorithm as a subroutine, and the essential role of the active component is to construct data set ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
Active learning methods often achieve improved performance using fewer labels compared to passive learning methods. A variety of practically successful active learning algorithms use a passive learning algorithm as a subroutine, and the essential role of the active component is to construct data sets to feed into the passive subroutine. This general idea is appealing for a variety of reasons, as it may be able
Streamed Learning: OnePass SVMs
"... We present a streaming model for largescale classification (in the context of ℓ2SVM) by leveraging connections between learning and computational geometry. The streaming model imposes the constraint that only a single pass over the data is allowed. The ℓ2SVM is known to have an equivalent formula ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
We present a streaming model for largescale classification (in the context of ℓ2SVM) by leveraging connections between learning and computational geometry. The streaming model imposes the constraint that only a single pass over the data is allowed. The ℓ2SVM is known to have an equivalent formulation in terms of the minimum enclosing ball (MEB) problem, and an efficient algorithm based on the idea of core sets exists (CVM) [Tsang et al., 2005]. CVM learns a (1+ε)approximate MEB for a set of points and yields an approximate solution to corresponding SVM instance. However CVM works in batch mode requiring multiple passes over the data. This paper presents a singlepass SVM which is based on the minimum enclosing ball of streaming data. We show that the MEB updates for the streaming case can be easily adapted to learn the SVM weight vector in a way similar to using online stochastic gradient updates. Our algorithm performs polylogarithmic computation at each example, and requires very small and constant storage. Experimental results show that, even in such restrictive settings, we can learn efficiently in just one pass and get accuracies comparable to other stateoftheart SVM solvers (batch and online). We also give an analysis of the algorithm, and discuss some open issues and possible extensions. 1
Active Learning as NonConvex Optimization
"... We propose a new view of active learning algorithms as optimization. We show that many online active learning algorithms can be viewed as stochastic gradient descent on nonconvex objective functions. Variations of some of these algorithms and objective functions have been previously proposed withou ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We propose a new view of active learning algorithms as optimization. We show that many online active learning algorithms can be viewed as stochastic gradient descent on nonconvex objective functions. Variations of some of these algorithms and objective functions have been previously proposed without noting this connection. We also point out a connection between the standard minmargin offline active learning algorithm and nonconvex losses. Finally, we discuss and show empirically how viewing active learning as nonconvex loss minimization helps explain two previously observed phenomena: certain active learning algorithms achieve better generalization error than passive learning algorithms on certain data sets (Schohn and Cohn, 2000; Bordes et al., 2005) and on other data sets many active learning algorithms are prone to local minima (Schütze et al., 2006). 1 1
Unsupervised SVMs: On the Complexity of the Furthest Hyperplane Problem
"... This paper introduces the Furthest Hyperplane Problem (FHP), which is an unsupervised counterpart of Support Vector Machines. Given a set of n points in R d, the objective is to produce the hyperplane (passing through the origin) which maximizes the separation margin, that is, the minimal distance b ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
This paper introduces the Furthest Hyperplane Problem (FHP), which is an unsupervised counterpart of Support Vector Machines. Given a set of n points in R d, the objective is to produce the hyperplane (passing through the origin) which maximizes the separation margin, that is, the minimal distance between the hyperplane and any input point. To the best of our knowledge, this is the first paper achieving provable results regarding FHP. We provide both lower and upper bounds to this NPhard problem. First, we give a simple randomized algorithm whose running time is n O(1/θ2) where θ is the optimal separation margin. We show that its exponential dependency on 1/θ 2 is tight, up to subpolynomial factors, assuming SAT cannot be solved in subexponential time. Next, we give an efficient approximation algorithm. For any α ∈ [0, 1], the algorithm produces a hyperplane whose distance from at least 1 − 3α fraction of the points is at least α times the optimal separation margin. Finally, we show that FHP does not admit a PTAS by presenting a gap preserving reduction from a particular version of the PCP theorem. 1.
A characterization theorem and an algorithm for a convex hull problem
, 1204
"... ar ..."
(Show Context)
A Linearly Convergent LinearTime FirstOrder Algorithm for Support Vector Classification with a Core Set Result
"... We present a simple, firstorder approximation algorithm for the support vector classification problem. Given a pair of linearly separable data sets and ɛ ∈ (0, 1), the proposed algorithm computes a separating hyperplane whose margin is within a factor of (1 − ɛ) of that of the maximummargin separa ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
We present a simple, firstorder approximation algorithm for the support vector classification problem. Given a pair of linearly separable data sets and ɛ ∈ (0, 1), the proposed algorithm computes a separating hyperplane whose margin is within a factor of (1 − ɛ) of that of the maximummargin separating hyperplane. We discuss how our algorithm can be extended to nonlinearly separable and inseparable data sets. The running time of our algorithm is linear in the number of data points and in 1/ɛ. In particular, the number of support vectors computed by the algorithm is bounded above by O(ζ/ɛ) for all sufficiently small ɛ> 0, where ζ is the square of the ratio of the distances between the farthest and closest points in the two data sets. Furthermore, we establish that our algorithm exhibits linear convergence. We adopt the real number model of computation in our analysis.
New Approximation Algorithms for Minimum Enclosing Convex Shapes
"... Given n points in a d dimensional Euclidean space, the Minimum Enclosing Ball (MEB) problem is to find the ball with the smallest radius which contains all n points. We give two approximation algorithms for producing an enclosing ball whose radius is at most ɛ away from the optimum. The first requir ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Given n points in a d dimensional Euclidean space, the Minimum Enclosing Ball (MEB) problem is to find the ball with the smallest radius which contains all n points. We give two approximation algorithms for producing an enclosing ball whose radius is at most ɛ away from the optimum. The first requires O(ndL / √ ɛ) effort, where L is a constant that depends on the scaling of the data. The second is a O ∗ (ndQ / √ ɛ) approximation algorithm, where Q is an upper bound on the norm of the points. This is in contrast with coresets based algorithms which yield a O(nd/ɛ) greedy algorithm. Finding the Minimum Enclosing Convex Polytope (MECP) is a related problem wherein a convex polytope of a fixed shape is given and the aim is to find the smallest magnification of the polytope which encloses the given points. For this problem we present O(mndL/ɛ) and O ∗ (mndQ/ɛ) approximation algorithms, where m is the number of faces of the polytope. Our algorithms borrow heavily from convex duality and recently developed techniques in nonsmooth optimization, and are in contrast with existing methods which rely on geometric arguments. In particular, we specialize the excessive gap framework of Nesterov [19] to obtain our results. 1