Results 11  20
of
327
A Game of Prediction with Expert Advice
 Journal of Computer and System Sciences
, 1997
"... We consider the following problem. At each point of discrete time the learner must make a prediction; he is given the predictions made by a pool of experts. Each prediction and the outcome, which is disclosed after the learner has made his prediction, determine the incurred loss. It is known that, u ..."
Abstract

Cited by 152 (10 self)
 Add to MetaCart
(Show Context)
We consider the following problem. At each point of discrete time the learner must make a prediction; he is given the predictions made by a pool of experts. Each prediction and the outcome, which is disclosed after the learner has made his prediction, determine the incurred loss. It is known that, under weak regularity, the learner can ensure that his cumulative loss never exceeds cL+ a ln n, where c and a are some constants, n is the size of the pool, and L is the cumulative loss incurred by the best expert in the pool. We find the set of those pairs (c; a) for which this is true.
A SNoWbased face detector
 Adbances in Neural Information Processing System 12, pp 855 861
, 2000
"... mhyang~vision.ai.uiuc.edu danr~cs.uiuc.edu ahuja~vision.ai.uiuc.edu A novel learning approach for human face detection using a network of linear units is presented. The SNoW learning architecture is a sparse network of linear functions over a predefined or incrementally learned feature space and i ..."
Abstract

Cited by 146 (17 self)
 Add to MetaCart
(Show Context)
mhyang~vision.ai.uiuc.edu danr~cs.uiuc.edu ahuja~vision.ai.uiuc.edu A novel learning approach for human face detection using a network of linear units is presented. The SNoW learning architecture is a sparse network of linear functions over a predefined or incrementally learned feature space and is specifically tailored for learning in the presence of a very large number of features. A wide range of face images in different poses, with different expressions and under different lighting conditions are used as a training set to capture the variations of human faces. Experimental results on commonly used benchmark data sets of a wide range of face images show that the SNoWbased approach outperforms methods that use neural networks, Bayesian methods, support vector machines and others. Furthermore, learning and evaluation using the SNoWbased method are significantly more efficient than with other methods. 1
Dual averaging methods for regularized stochastic learning and online optimization
 In Advances in Neural Information Processing Systems 23
, 2009
"... We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning task, and the other is a simple regularization term such as ℓ1norm for promoting sparsity. We develop extensions of Nes ..."
Abstract

Cited by 131 (7 self)
 Add to MetaCart
(Show Context)
We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning task, and the other is a simple regularization term such as ℓ1norm for promoting sparsity. We develop extensions of Nesterov’s dual averaging method, that can exploit the regularization structure in an online setting. At each iteration of these methods, the learning variables are adjusted by solving a simple minimization problem that involves the running average of all past subgradients of the loss function and the whole regularization term, not just its subgradient. In the case of ℓ1regularization, our method is particularly effective in obtaining sparse solutions. We show that these methods achieve the optimal convergence rates or regret bounds that are standard in the literature on stochastic and online convex optimization. For stochastic learning problems in which the loss functions have Lipschitz continuous gradients, we also present an accelerated version of the dual averaging method.
MistakeDriven Learning in Text Categorization
 IN EMNLP97, THE SECOND CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING
, 1997
"... Learning problems in the text processing domain often map the text to a space whose dimensions are the measured fea tures of the text, e.g., its words. Three characteristic properties of this domain are (a) very high dimensionality, (b) both the learned concepts and the instances reside very ..."
Abstract

Cited by 108 (9 self)
 Add to MetaCart
(Show Context)
Learning problems in the text processing domain often map the text to a space whose dimensions are the measured fea tures of the text, e.g., its words. Three characteristic properties of this domain are (a) very high dimensionality, (b) both the learned concepts and the instances reside very sparsely in the feature space, and (c) a high variation in the number of active features in an instance. In this work we study three mistakedriven learning algo rithms for a typical task of this nature  text categorization. We argue
Sparse Online Learning via Truncated Gradient
"... We propose a general method called truncated gradient to induce sparsity in the weights of onlinelearning algorithms with convex loss. This method has several essential properties. First, the degree of sparsity is continuous—a parameter controls the rate of sparsification from no sparsification to ..."
Abstract

Cited by 105 (4 self)
 Add to MetaCart
(Show Context)
We propose a general method called truncated gradient to induce sparsity in the weights of onlinelearning algorithms with convex loss. This method has several essential properties. First, the degree of sparsity is continuous—a parameter controls the rate of sparsification from no sparsification to total sparsification. Second, the approach is theoretically motivated, and an instance of it can be regarded as an online counterpart of the popular L1regularization method in the batch setting. We prove small rates of sparsification result in only small additional regret with respect to typical onlinelearning guarantees. Finally, the approach works well empirically. We apply it to several datasets and find for datasets with large numbers of features, substantial sparsity is discoverable. 1
A WinnowBased Approach to ContextSensitive Spelling Correction
 Machine Learning
, 1999
"... A large class of machinelearning problems in natural language require the characterization of linguistic context. Two characteristic properties of such problems are that their feature space is of very high dimensionality, and their target concepts depend on only a small subset of the features in th ..."
Abstract

Cited by 105 (1 self)
 Add to MetaCart
(Show Context)
A large class of machinelearning problems in natural language require the characterization of linguistic context. Two characteristic properties of such problems are that their feature space is of very high dimensionality, and their target concepts depend on only a small subset of the features in the space. Under such conditions, multiplicative weightupdate algorithms such as Winnow have been shown to have exceptionally good theoretical properties. In the work reported here, we present an algorithm combining variants of Winnow and weightedmajority voting, and apply it to a problem in the aforementioned class: contextsensitive spelling correction. This is the task of fixing spelling errors that happen to result in valid words, such as substituting to for too, casual for causal, and so on. We evaluate our algorithm, WinSpell, by comparing it against BaySpell, a statisticsbased method representing the state of the art for this task. We find: (1) When run with a full (unpruned) set ...
Evaluating TopicDriven Web Crawlers
, 2001
"... Due to limited bandwidth, storage, and computational resources, and to the dynamic nature of the Web, search engines cannot index every Web page, and even the covered portion of the Web cannot be monitored continuously for changes. Therefore it is essential to develop effective crawling strategies t ..."
Abstract

Cited by 102 (22 self)
 Add to MetaCart
Due to limited bandwidth, storage, and computational resources, and to the dynamic nature of the Web, search engines cannot index every Web page, and even the covered portion of the Web cannot be monitored continuously for changes. Therefore it is essential to develop effective crawling strategies to prioritize the pages to be indexed. The issue is even more important for topicspecific search engines, where crawlers must make additional decisions based on the relevance of visited pages. However, it is difficult to evaluate alternative crawling strategies because relevant sets are unknown and the search space is changing. We propose three different methods to evaluate crawling strategies. We apply the proposed metrics to compare three topicdriven crawling algorithms based on similarity ranking, link analysis, and adaptive agents.
General convergence results for linear discriminant updates
 Machine Learning
, 1997
"... Abstract. The problem of learning lineardiscriminant concepts can be solved by various mistakedriven update procedures, including the Winnow family of algorithms and the wellknown Perceptron algorithm. In this paper we define the general class of “quasiadditive ” algorithms, which includes Perce ..."
Abstract

Cited by 98 (0 self)
 Add to MetaCart
Abstract. The problem of learning lineardiscriminant concepts can be solved by various mistakedriven update procedures, including the Winnow family of algorithms and the wellknown Perceptron algorithm. In this paper we define the general class of “quasiadditive ” algorithms, which includes Perceptron and Winnow as special cases. We give a single proof of convergence that covers a broad subset of algorithms in this class, including both Perceptron and Winnow, but also many new algorithms. Our proof hinges on analyzing a generic measure of progress construction that gives insight as to when and how such algorithms converge. Our measure of progress construction also permits us to obtain good mistake bounds for individual algorithms. We apply our unified analysis to new algorithms as well as existing algorithms. When applied to known algorithms, our method “automatically ” produces close variants of existing proofs (recovering similar bounds)—thus showing that, in a certain sense, these seemingly diverse results are fundamentally isomorphic. However, we also demonstrate that the unifying principles are more broadly applicable, and analyze a new class of algorithms that smoothly interpolate between the additiveupdate behavior of Perceptron and the multiplicativeupdate behavior of Winnow.
Competitive online statistics
 International Statistical Review
, 1999
"... A radically new approach to statistical modelling, which combines mathematical techniques of Bayesian statistics with the philosophy of the theory of competitive online algorithms, has arisen over the last decade in computer science (to a large degree, under the influence of Dawid’s prequential sta ..."
Abstract

Cited by 97 (15 self)
 Add to MetaCart
(Show Context)
A radically new approach to statistical modelling, which combines mathematical techniques of Bayesian statistics with the philosophy of the theory of competitive online algorithms, has arisen over the last decade in computer science (to a large degree, under the influence of Dawid’s prequential statistics). In this approach, which we call “competitive online statistics”, it is not assumed that data are generated by some stochastic mechanism; the bounds derived for the performance of competitive online statistical procedures are guaranteed to hold (and not just hold with high probability or on the average). This paper reviews some results in this area; the new material in it includes the proofs for the performance of the Aggregating Algorithm in the problem of linear regression with square loss. Keywords: Bayes’s rule, competitive online algorithms, linear regression, prequential statistics, worstcase analysis.
Online portfolio selection using multiplicative updates
 Mathematical Finance
, 1998
"... We present an online investment algorithm which achieves almost the same wealth as the best constantrebalanced portfolio determined in hindsight from the actual market outcomes. The algorithm employs a multiplicative update rule derived using a framework introduced by Kivinen and Warmuth. Our algo ..."
Abstract

Cited by 94 (10 self)
 Add to MetaCart
(Show Context)
We present an online investment algorithm which achieves almost the same wealth as the best constantrebalanced portfolio determined in hindsight from the actual market outcomes. The algorithm employs a multiplicative update rule derived using a framework introduced by Kivinen and Warmuth. Our algorithm is very simple to implement and requires only constant storage and computing time per stock ineach trading period. We tested the performance of our algorithm on real stock data from the New York Stock Exchange accumulated during a 22year period. On this data, our algorithm clearly outperforms the best single stock aswell as Cover's universal portfolio selection algorithm. We also present results for the situation in which the We present an online investment algorithm which achieves almost the same wealth as the best constantrebalanced portfolio investment strategy. The algorithm employsamultiplicative update rule derived using a framework introduced by Kivinen and Warmuth [20]. Our algorithm is very simple to implement and its time and storage requirements grow linearly in the number of stocks.