Results 1  10
of
81
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
, 2010
"... Stochastic subgradient methods are widely used, well analyzed, and constitute effective tools for optimization and online learning. Stochastic gradient methods ’ popularity and appeal are largely due to their simplicity, as they largely follow predetermined procedural schemes. However, most common s ..."
Abstract

Cited by 287 (3 self)
 Add to MetaCart
Stochastic subgradient methods are widely used, well analyzed, and constitute effective tools for optimization and online learning. Stochastic gradient methods ’ popularity and appeal are largely due to their simplicity, as they largely follow predetermined procedural schemes. However, most common subgradient approaches are oblivious to the characteristics of the data being observed. We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradientbased learning. The adaptation, in essence, allows us to find needles in haystacks in the form of very predictive but rarely seenfeatures. Ourparadigmstemsfromrecentadvancesinstochasticoptimizationandonlinelearning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal function that can be chosen in hindsight. In a companion paper, we validate experimentally our theoretical analysis and show that the adaptive subgradient approach outperforms stateoftheart, but nonadaptive, subgradient algorithms. 1
On the Generalization Ability of Online Learning Algorithms
 IEEE Transactions on Information Theory
, 2001
"... In this paper we show that online algorithms for classification and regression can be naturally used to obtain hypotheses with good datadependent tail bounds on their risk. Our results are proven without requiring complicated concentrationofmeasure arguments and they hold for arbitrary onlin ..."
Abstract

Cited by 184 (8 self)
 Add to MetaCart
(Show Context)
In this paper we show that online algorithms for classification and regression can be naturally used to obtain hypotheses with good datadependent tail bounds on their risk. Our results are proven without requiring complicated concentrationofmeasure arguments and they hold for arbitrary online learning algorithms. Furthermore, when applied to concrete online algorithms, our results yield tail bounds that in many cases are comparable or better than the best known bounds.
Incremental algorithms for hierarchical classification
 Journal of Machine Learning Research
, 2004
"... We study the problem of classifying data in a given taxonomy when classifications associated with multiple and/or partial paths are allowed. We introduce a new algorithm that incrementally learns a linearthreshold classifier for each node of the taxonomy. A hierarchical classification is obtained b ..."
Abstract

Cited by 111 (9 self)
 Add to MetaCart
(Show Context)
We study the problem of classifying data in a given taxonomy when classifications associated with multiple and/or partial paths are allowed. We introduce a new algorithm that incrementally learns a linearthreshold classifier for each node of the taxonomy. A hierarchical classification is obtained by evaluating the trained node classifiers in a topdown fashion. To evaluate classifiers in our multipath framework, we define a new hierarchical loss function, the Hloss, capturing the intuition that whenever a classification mistake is made on a node of the taxonomy, then no loss should be charged for any additional mistake occurring in the subtree of that node. Making no assumptions on the mechanism generating the data instances, and assuming a linear noise model for the labels, we bound the Hloss of our online algorithm in terms of the Hloss of a reference classifier knowing the true parameters of the labelgenerating process. We show that, in expectation, the excess cumulative Hloss grows at most logarithmically in the length of the data sequence. Furthermore, our analysis reveals the precise dependence of the rate of convergence on the eigenstructure of the data each node observes. Our theoretical results are complemented by a number of experiments on texual corpora. In these experiments we show that, after only one epoch of training, our algorithm performs much better than Perceptronbased hierarchical classifiers, and reasonably close to a hierarchical support vector machine.
The jackknifea review
 Biometrika
, 1974
"... Interleukin (IL)33 is a new member of the IL1 superfamily of cytokines that is expressed by mainly stromal cells, such as epithelial and endothelial cells, and its expression is upregulated following proinflammatory stimulation. IL33 can function both as a traditional cytokine and as a nuclear f ..."
Abstract

Cited by 101 (0 self)
 Add to MetaCart
(Show Context)
Interleukin (IL)33 is a new member of the IL1 superfamily of cytokines that is expressed by mainly stromal cells, such as epithelial and endothelial cells, and its expression is upregulated following proinflammatory stimulation. IL33 can function both as a traditional cytokine and as a nuclear factor regulating gene transcription. It is thought to function as an ‘alarmin ’ released following cell necrosis to alerting the immune system to tissue damage or stress. It mediates its biological effects via interaction with the receptors ST2 (IL1RL1) and IL1 receptor accessory protein (IL1RAcP), both of which are widely expressed, particularly by innate immune cells and T helper 2 (Th2) cells. IL33 strongly induces Th2 cytokine production from these cells and can promote the pathogenesis of Th2related disease such as asthma, atopic dermatitis and anaphylaxis. However, IL33 has shown various protective effects in cardiovascular diseases such as atherosclerosis, obesity, type 2 diabetes and cardiac remodeling. Thus, the effects of IL33 are either pro or antiinflammatory depending on the disease and the model. In this review the role of IL33 in the inflammation of several disease pathologies will be discussed, with particular emphasis on recent advances.
Confidenceweighted linear classification
 In ICML ’08: Proceedings of the 25th international conference on Machine learning
, 2008
"... We introduce confidenceweighted linear classifiers, which add parameter confidence information to linear classifiers. Online learners in this setting update both classifier parameters and the estimate of their confidence. The particular online algorithms we study here maintain a Gaussian distributi ..."
Abstract

Cited by 95 (15 self)
 Add to MetaCart
(Show Context)
We introduce confidenceweighted linear classifiers, which add parameter confidence information to linear classifiers. Online learners in this setting update both classifier parameters and the estimate of their confidence. The particular online algorithms we study here maintain a Gaussian distribution over parameter vectors and update the mean and covariance of the distribution with each instance. Empirical evaluation on a range of NLP tasks show that our algorithm improves over other state of the art online and batch methods, learns faster in the online setting, and lends itself to better classifier combination after parallel training. 1.
Adaptive Regularization of Weight Vectors
 Advances in Neural Information Processing Systems 22
, 2009
"... We present AROW, a new online learning algorithm that combines several useful properties: large margin training, confidence weighting, and the capacity to handle nonseparable data. AROW performs adaptive regularization of the prediction function upon seeing each new instance, allowing it to perform ..."
Abstract

Cited by 65 (14 self)
 Add to MetaCart
(Show Context)
We present AROW, a new online learning algorithm that combines several useful properties: large margin training, confidence weighting, and the capacity to handle nonseparable data. AROW performs adaptive regularization of the prediction function upon seeing each new instance, allowing it to perform especially well in the presence of label noise. We derive a mistake bound, similar in form to the second order perceptron bound, that does not assume separability. We also relate our algorithm to recent confidenceweighted online learning techniques and show empirically that AROW achieves stateoftheart performance and notable robustness in the case of nonseparable data. 1
WorstCase Analysis of Selective Sampling for Linear Classification
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... A selective sampling algorithm is a learning algorithm for classification that, based on the past observed data, decides whether to ask the label of each new instance to be classified. In this paper, we introduce a general technique for turning linearthreshold classification algorithms from the ..."
Abstract

Cited by 53 (6 self)
 Add to MetaCart
A selective sampling algorithm is a learning algorithm for classification that, based on the past observed data, decides whether to ask the label of each new instance to be classified. In this paper, we introduce a general technique for turning linearthreshold classification algorithms from the general additive family into randomized selective sampling algorithms. For the most popular algorithms in this family we derive mistake bounds that hold for individual sequences of examples. These bounds
Potentialbased Algorithms in Online Prediction and Game Theory
"... In this paper we show that several known algorithms for sequential prediction problems (including Weighted Majority and the quasiadditive family of Grove, Littlestone, and Schuurmans), for playing iterated games (including Freund and Schapire's Hedge and MW, as well as the strategies of Hart ..."
Abstract

Cited by 42 (4 self)
 Add to MetaCart
In this paper we show that several known algorithms for sequential prediction problems (including Weighted Majority and the quasiadditive family of Grove, Littlestone, and Schuurmans), for playing iterated games (including Freund and Schapire's Hedge and MW, as well as the strategies of Hart and MasColell), and for boosting (including AdaBoost) are special cases of a general decision strategy based on the notion of potential. By analyzing this strategy we derive known performance bounds, as well as new bounds, as simple corollaries of a single general theorem. Besides offering a new and unified view on a large family of algorithms, we establish a connection between potentialbased analysis in learning and their counterparts independently developed in game theory. By exploiting this connection, we show that certain learning problems are instances of more general gametheoretic problems. In particular, we describe a notion of generalized regret and show its applications in learning theory.
Exact convex confidenceweighted learning
 In Advances in Neural Information Processing Systems 22
, 2008
"... Confidenceweighted (CW) learning [6], an online learning method for linear classifiers, maintains a Gaussian distributions over weight vectors, with a covariance matrix that represents uncertainty about weights and correlations. Confidence constraints ensure that a weight vector drawn from the hypo ..."
Abstract

Cited by 34 (3 self)
 Add to MetaCart
(Show Context)
Confidenceweighted (CW) learning [6], an online learning method for linear classifiers, maintains a Gaussian distributions over weight vectors, with a covariance matrix that represents uncertainty about weights and correlations. Confidence constraints ensure that a weight vector drawn from the hypothesis distribution correctly classifies examples with a specified probability. Within this framework, we derive a new convex form of the constraint and analyze it in the mistake bound model. Empirical evaluation with both synthetic and text data shows our version of CW learning achieves lower cumulative and outofsample errors than commonly used firstorder and secondorder online methods. 1
Linear Algorithms for Online Multitask Classification
"... We design and analyze interacting online algorithms for multitask classification that perform better than independent learners whenever the tasks are related in a certain sense. We formalize task relatedness in different ways, and derive formal guarantees on the performance advantage provided by int ..."
Abstract

Cited by 34 (4 self)
 Add to MetaCart
(Show Context)
We design and analyze interacting online algorithms for multitask classification that perform better than independent learners whenever the tasks are related in a certain sense. We formalize task relatedness in different ways, and derive formal guarantees on the performance advantage provided by interaction. Our online analysis gives new stimulating insights into previously known coregularization techniques, such as the multitask kernels and the margin correlation analysis for multiview learning. In the last part we apply our approach to spectral coregularization: we introduce a natural matrix extension of the quasiadditive algorithm for classification and prove bounds depending on certain unitarily invariant norms of the matrix of task coefficients. 1