Results 1  10
of
16
Relax and randomize: From value to algorithms
, 2012
"... We show a principled way of deriving online learning algorithms from a minimax analysis. Various upper bounds on the minimax value, previously thought to be nonconstructive, are shown to yield algorithms. This allows us to seamlessly recover known methods and to derive new ones, also capturing suc ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
(Show Context)
We show a principled way of deriving online learning algorithms from a minimax analysis. Various upper bounds on the minimax value, previously thought to be nonconstructive, are shown to yield algorithms. This allows us to seamlessly recover known methods and to derive new ones, also capturing such “unorthodox” methods as Follow the Perturbed Leader and the R2 forecaster. Understanding the inherent complexity of the learning problem thus leads to the development of algorithms. To illustrate our approach, we present several new algorithms, including a family of randomized methods that use the idea of a “random playout”. New versions of the FollowthePerturbedLeader algorithms are presented, as well as methods based on the Littlestone’s dimension, efficient methods for matrix completion with trace norm, and algorithms for the problems of transductive learning and prediction with static experts. 1
More data speeds up training time in learning halfspaces over sparse vectors
 In NIPS
, 2013
"... The increased availability of data in recent years has led several authors to ask whether it is possible to use data as a computational resource. That is, if more data is available, beyond the sample complexity limit, is it possible to use the extra examples to speed up the computation time required ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
(Show Context)
The increased availability of data in recent years has led several authors to ask whether it is possible to use data as a computational resource. That is, if more data is available, beyond the sample complexity limit, is it possible to use the extra examples to speed up the computation time required to perform the learning task? We give the first positive answer to this question for a natural supervised learning problem — we consider agnostic PAC learning of halfspaces over 3sparse vectors in {−1, 1, 0}n. This class is inefficiently learnable using O (n/2) examples. Our main contribution is a novel, noncryptographic, methodology for establishing computationalstatistical gaps, which allows us to show that, under a widely believed assumption that refuting random 3CNF formulas is hard, it is impossible to efficiently learn this class using only O n/2 examples. We further show that under stronger hardness assumptions, even O n1.499/2 examples do not suffice. On the other hand, we show a new algorithm that learns this class efficiently using Ω̃ n2/2 examples. This formally establishes the tradeoff between sample and computational complexity for a natural supervised learning problem. 1
Using More Data to Speedup Training Time
"... In many recent applications, data is plentiful. By now, we have a rather clear understanding of how more data can be used to improve the accuracy of learning algorithms. Recently, there has been a growing interest in understanding how more data can be leveraged to reduce the required training runtim ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
In many recent applications, data is plentiful. By now, we have a rather clear understanding of how more data can be used to improve the accuracy of learning algorithms. Recently, there has been a growing interest in understanding how more data can be leveraged to reduce the required training runtime. In this paper, we study the runtime of learning as a function of the number of available training examples, and underscore the main highlevel techniques. We provide the first formal positive result showing that even in the unrealizable case, the runtime can decrease exponentially while only requiring a polynomial growth of the number of examples. Our construction corresponds to a synthetic learning problem and an interesting open question is whether the tradeoff can be shown for more natural learning problems. We spell out several interesting candidates of natural learning problems for which we conjecture that there is a tradeoff between computational and sample complexity. 1
Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm
"... We present an adaptive variant of the exponentiated gradient algorithm. Leveraging the optimistic learning framework of Rakhlin & Sridharan (2012), we obtain regret bounds that in the learning from experts setting depend on the variance and path length of the best expert, improving on resul ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
We present an adaptive variant of the exponentiated gradient algorithm. Leveraging the optimistic learning framework of Rakhlin & Sridharan (2012), we obtain regret bounds that in the learning from experts setting depend on the variance and path length of the best expert, improving on results by Hazan & Kale (2008) and Chiang et al. (2012), and resolving an open problem posed by Kale (2012). Our techniques naturally extend to matrixvalued loss functions, where we present an adaptive matrix exponentiated gradient algorithm. To obtain the optimal regret bound in the matrix case, we generalize the FollowtheRegularizedLeader algorithm to vectorvalued payoffs, which may be of independent interest. 1.
Matrix reconstruction with the local max norm
"... We introduce a new family of matrix norms, the “local max ” norms, generalizing existing methods such as the max norm, the trace norm (nuclear norm), and the weighted or smoothed weighted trace norms, which have been extensively used in the literature as regularizers for matrix reconstruction proble ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
We introduce a new family of matrix norms, the “local max ” norms, generalizing existing methods such as the max norm, the trace norm (nuclear norm), and the weighted or smoothed weighted trace norms, which have been extensively used in the literature as regularizers for matrix reconstruction problems. We show that this new family can be used to interpolate between the (weighted or unweighted) trace norm and the more conservative max norm. We test this interpolation on simulated data and on the largescale Netflix and MovieLens ratings data, and find improved accuracy relative to the existing matrix norms. We also provide theoretical results showing learning guarantees for some of the new norms. 1
Novel Factorization Strategies for Higher Order Tensors: Implications for Compression and Recovery of Multilinear Data
"... In this paper we propose novel methods for compression and recovery of multilinear data under limited sampling. We exploit the recently proposed tensorSingular Value Decomposition (tSVD)[1], which is a group theoretic framework for tensor decomposition. In contrast to popular existing tensor deco ..."
Abstract
 Add to MetaCart
(Show Context)
In this paper we propose novel methods for compression and recovery of multilinear data under limited sampling. We exploit the recently proposed tensorSingular Value Decomposition (tSVD)[1], which is a group theoretic framework for tensor decomposition. In contrast to popular existing tensor decomposition techniques such as higherorder SVD (HOSVD), tSVD has optimality properties similar to the truncated SVD for matrices. Based on tSVD, we first construct novel tensorrank like measures to characterize informational and structural complexity of multilinear data. Following that we outline a complexity penalized algorithm for tensor completion from missing entries. As an application, 3D and 4D (color) video data compression and recovery are considered. We show that videos with linear camera motion can be represented more efficiently using tSVD compared to traditional approaches based on vectorizing or flattening of the tensors. Application of the proposed tensor completion algorithm for video recovery from missing entries is shown to yield a superior performance over existing methods. In conclusion we point out several research directions and implications to online prediction of multilinear data. 1
Novel
"... methods for multilinear data completion and denoising based on tensorSVD ..."
Abstract
 Add to MetaCart
(Show Context)
methods for multilinear data completion and denoising based on tensorSVD