Results 11  20
of
89
Tracking the best hyperplane with a simple budget perceptron
 IN PROC. OF THE NINETEENTH ANNUAL CONFERENCE ON COMPUTATIONAL LEARNING THEORY
, 2006
"... Shifting bounds for online classification algorithms ensure good performance on any sequence of examples that is well predicted by a sequence of smoothly changing classifiers. When proving shifting bounds for kernelbased classifiers, one also faces the problem of storing a number of support vecto ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
Shifting bounds for online classification algorithms ensure good performance on any sequence of examples that is well predicted by a sequence of smoothly changing classifiers. When proving shifting bounds for kernelbased classifiers, one also faces the problem of storing a number of support vectors that can grow unboundedly, unless an eviction policy is used to keep this number under control. In this paper, we show that shifting and online learning on a budget can be combined surprisingly well. First, we introduce and analyze a shifting Perceptron algorithm achieving the best known shifting bounds while using an unlimited budget. Second, we show that by applying to the Perceptron algorithm the simplest possible eviction policy, which discards a random support vector each time a new one comes in, we achieve a shifting bound close to the one we obtained with no budget restrictions. More importantly, we show that our randomized algorithm strikes the optimal tradeoff U = Θ ` √ B ´ between budget B and norm U of the largest classifier in the comparison sequence.
Sparse Kernel SVMs via CuttingPlane Training
"... Abstract. We explore an algorithm for training SVMs with Kernels that can represent the learned rule using arbitrary basis vectors, not just the support vectors (SVs) from the training set. This results in two benefits. First, the added flexibility makes it possible to find sparser solutions of good ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
Abstract. We explore an algorithm for training SVMs with Kernels that can represent the learned rule using arbitrary basis vectors, not just the support vectors (SVs) from the training set. This results in two benefits. First, the added flexibility makes it possible to find sparser solutions of good quality, substantially speedingup prediction. Second, the improved sparsity can also make training of Kernel SVMs more efficient, especially for highdimensional and sparse data (e.g. text classification). This has the potential to make training of Kernel SVMs tractable for large training sets, where conventional methods scale quadratically due to the linear growth of the number of SVs. In addition to a theoretical analysis of the algorithm, we also present an empirical evaluation. 1
Hashing Hyperplane Queries to Near Points with Applications to LargeScale Active Learning
"... We consider the problem of retrieving the database points nearest to a given hyperplane query without exhaustively scanning the database. We propose two hashingbased solutions. Our first approach maps the data to twobit binary keys that are localitysensitive for the angle between the hyperplane no ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
We consider the problem of retrieving the database points nearest to a given hyperplane query without exhaustively scanning the database. We propose two hashingbased solutions. Our first approach maps the data to twobit binary keys that are localitysensitive for the angle between the hyperplane normal and a database point. Our second approach embeds the data into a vector space where the Euclidean norm reflects the desired distance between the original points and hyperplane query. Both use hashing to retrieve near points in sublinear time. Our first method’s preprocessing stage is more efficient, while the second has stronger accuracy guarantees. We apply both to poolbased active learning: taking the current hyperplane classifier as a query, our algorithm identifies those points (approximately) satisfying the wellknown minimal distancetohyperplane selection criterion. We empirically demonstrate our methods ’ tradeoffs, and show that they make it practical to perform active selection with millions of unlabeled points. 1
Step size adaptation in reproducing kernel Hilbert
, 2006
"... This paper presents an online Support Vector Machine (SVM) that uses the Stochastic MetaDescent (SMD) algorithm to adapt its step size automatically. We formulate the online learning problem as a stochastic gradient descent in Reproducing Kernel Hilbert Space (RKHS) and translate SMD to the nonpara ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
This paper presents an online Support Vector Machine (SVM) that uses the Stochastic MetaDescent (SMD) algorithm to adapt its step size automatically. We formulate the online learning problem as a stochastic gradient descent in Reproducing Kernel Hilbert Space (RKHS) and translate SMD to the nonparametric setting, where its gradient trace parameter is no longer a coefficient vector but an element of the RKHS. We derive efficient updates that allow us to perform the step size adaptation in linear time. We apply the online SVM framework to a variety of loss functions, and in particular show how to handle structured output spaces and achieve efficient online multiclass classification. Experiments show that our algorithm outperforms more primitive methods for setting the gradient step size.
Extraction and Search of Chemical Formulae in Text Documents on the Web
 Proceedings of the 16th International World Wide Web Conference (WWW 2007
, 2007
"... Often scientists seek to search for articles on the Web related to a particular chemical. When a scientist searches for a chemical formula using a search engine today, she gets articles where the exact keyword string expressing the chemical formula is found. Searching for the exact occurrence of key ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
Often scientists seek to search for articles on the Web related to a particular chemical. When a scientist searches for a chemical formula using a search engine today, she gets articles where the exact keyword string expressing the chemical formula is found. Searching for the exact occurrence of keywords during searching results in two problems for this domain: a) if the author searches for CH4 and the article has H4C, the article is not returned, and b) ambiguous searches like “He ” return all documents where Helium is mentioned as well as documents where the pronoun “he” occurs. To remedy these deficiencies, we propose a chemical formula search engine. To build a chemical formula search engine, we must solve the following problems: 1) extract chemical formulae from text documents, 2) index chemical
Active learning for class imbalance problem
 In Proceedings of the 30th annual international ACM SIGIR con
"... The class imbalance problem has been known to hinder the learning performance of classification algorithms. Various realworld classification tasks such as text categorization suffer from this phenomenon. We demonstrate that active learning is capable of solving the problem. Categories and Subject D ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
The class imbalance problem has been known to hinder the learning performance of classification algorithms. Various realworld classification tasks such as text categorization suffer from this phenomenon. We demonstrate that active learning is capable of solving the problem. Categories and Subject Descriptors
Training Invariant Support Vector Machines using Selective Sampling
"... Editor: Bordes et al. (2005) describe the efficient online LASVM algorithm using selective sampling. On the other hand, Loosli et al. (2005) propose a strategy for handling invariance in SVMs, also using selective sampling. This paper combines the two approaches to build a very large SVM. We present ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
Editor: Bordes et al. (2005) describe the efficient online LASVM algorithm using selective sampling. On the other hand, Loosli et al. (2005) propose a strategy for handling invariance in SVMs, also using selective sampling. This paper combines the two approaches to build a very large SVM. We present stateoftheart results obtained on a handwritten digit recognition problem with 8 millions points on a single processor. This work also demonstrates that online SVMs can effectively handle really large databases.
L.: Sequence labelling SVMs trained in one pass
 8 Extracting DrugDrug Interaction
, 2008
"... Abstract. This paper proposes an online solver of the dual formulation of support vector machines for structured output spaces. We apply it to sequence labelling using the exact and greedy inference schemes. In both cases, the persequence training time is the same as a perceptron based on the same ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
Abstract. This paper proposes an online solver of the dual formulation of support vector machines for structured output spaces. We apply it to sequence labelling using the exact and greedy inference schemes. In both cases, the persequence training time is the same as a perceptron based on the same inference procedure, up to a small multiplicative constant. Comparing the two inference schemes, the greedy version is much faster. It is also amenable to higher order Markov assumptions and performs similarly on test. In comparison to existing algorithms, both versions match the accuracies of batch solvers that use exact inference after a single pass over the training examples. 1
Scalable purelydiscriminative training for word and tree transducers
, 2006
"... Discriminative training methods have recently led to significant advances in the state of the art of machine translation (MT). Another promising trend is the incorporation of syntactic information into MT systems. Combining these trends is difficult for reasons of system complexity and computational ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Discriminative training methods have recently led to significant advances in the state of the art of machine translation (MT). Another promising trend is the incorporation of syntactic information into MT systems. Combining these trends is difficult for reasons of system complexity and computational complexity. The present study makes progress towards a syntaxaware MT system whose every component is trained discriminatively. Our main innovation is an approach to discriminative learning that is computationally efficient enough for large statistical MT systems, yet whose accuracy on translation subtasks is near the state of the art. Our source code is downloadable from
Automatic Tagging of Audio: The StateoftheArt
"... Recently there has been a great deal of attention paid to the automatic prediction of tags for music and audio in general. Social tags are usergenerated keywords associated with some resource on the Web. In the case of music, social tags have become an important component of ``Web 2.0' ' recommender ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
Recently there has been a great deal of attention paid to the automatic prediction of tags for music and audio in general. Social tags are usergenerated keywords associated with some resource on the Web. In the case of music, social tags have become an important component of ``Web 2.0' ' recommender systems. There have been many attempts at automatically applying tags to audio for different purposes: database management, music recommendation, improved humancomputer interfaces, estimating similarity among songs, and so on. Many published results show that this problem can be tackled using machine learning techniques, however, no method so far has been proven to be particularly suited to the task. First, it seems that no one has yet found an appropriate algorithm to solve this challenge. But second, the task definition itself is problematic. In an effort to better understand the task and also to help new researchers bring their insights to bear on this problem, this chapter provides a review of the stateoftheart methods for addressing automatic tagging of audio. It is divided in the following sections: goal, framework, audio representation, labeled data, classification, evaluation, and future directions. Such a division helps understand the commonalities and strengths of the different methods that have been proposed.