Results 1  10
of
136
Random features for largescale kernel machines
 In Neural Infomration Processing Systems
, 2007
"... To accelerate the training of kernel machines, we propose to map the input data to a randomized lowdimensional feature space and then apply existing fast linear methods. Our randomized features are designed so that the inner products of the transformed data are approximately equal to those in the f ..."
Abstract

Cited by 258 (4 self)
 Add to MetaCart
To accelerate the training of kernel machines, we propose to map the input data to a randomized lowdimensional feature space and then apply existing fast linear methods. Our randomized features are designed so that the inner products of the transformed data are approximately equal to those in the feature space of a user specified shiftinvariant kernel. We explore two sets of random features, provide convergence bounds on their ability to approximate various radial basis kernels, and show that in largescale classification and regression tasks linear machine learning algorithms that use these features outperform stateoftheart largescale kernel machines. 1
Training a support vector machine in the primal
 Neural Computation
, 2007
"... Most literature on Support Vector Machines (SVMs) concentrate on the dual optimization problem. In this paper, we would like to point out that the primal problem can also be solved efficiently, both for linear and nonlinear SVMs, and that there is no reason for ignoring this possibilty. On the cont ..."
Abstract

Cited by 155 (5 self)
 Add to MetaCart
(Show Context)
Most literature on Support Vector Machines (SVMs) concentrate on the dual optimization problem. In this paper, we would like to point out that the primal problem can also be solved efficiently, both for linear and nonlinear SVMs, and that there is no reason for ignoring this possibilty. On the contrary, from the primal point of view new families of algorithms for large scale SVM training can be investigated.
Coresets, sparse greedy approximation and the FrankWolfe algorithm
 Proceedings of the 19th Annual ACMSIAM Symposium on Discrete Algorithms
"... The problem of maximizing a concave function f(x) in a simplex S can be solved approximately by a simple greedy algorithm. For given k, the algorithm can find a point x(k) on a kdimensional face of S, such that f(x(k)) ≥ f(x∗) − O(1/k). Here f(x∗) is the maximum value of f in S. This algorithm an ..."
Abstract

Cited by 84 (1 self)
 Add to MetaCart
The problem of maximizing a concave function f(x) in a simplex S can be solved approximately by a simple greedy algorithm. For given k, the algorithm can find a point x(k) on a kdimensional face of S, such that f(x(k)) ≥ f(x∗) − O(1/k). Here f(x∗) is the maximum value of f in S. This algorithm and analysis were known before, and related to problems of statistics and machine learning, such as boosting, regression, and density mixture estimation. In other work, coming from computational geometry, the existence of ɛcoresets was shown for the minimum enclosing ball problem, by means of a simple greedy algorithm. Similar greedy algorithms, that are special cases of the FrankWolfe algorithm, were described for other enclosure problems. Here these results are tied together, stronger convergence results are reviewed, and several coreset bounds are generalized or strengthened.
TrajectoryBased Anomalous Event Detection
"... Abstract—During the last years, the task of automatic event analysis in video sequences has gained an increasing attention among the research community. The application domains are disparate, ranging from video surveillance to automatic video annotation for sport videos or TV shots. Whatever the app ..."
Abstract

Cited by 45 (5 self)
 Add to MetaCart
(Show Context)
Abstract—During the last years, the task of automatic event analysis in video sequences has gained an increasing attention among the research community. The application domains are disparate, ranging from video surveillance to automatic video annotation for sport videos or TV shots. Whatever the application field, most of the works in event analysis are based on two main approaches: the former based on explicit event recognition, focused on finding highlevel, semantic interpretations of video sequences, and the latter based on anomaly detection. This paper deals with the second approach, where the final goal is not the explicit labeling of recognized events, but the detection of anomalous events differing from typical patterns. In particular, the proposed work addresses anomaly detection by means of trajectory analysis, an approach with several application fields, most notably video surveillance and traffic monitoring. The proposed approach is based on singleclass support vector machine (SVM) clustering, where the novelty detection SVM capabilities are used for the identification of anomalous trajectories. Particular attention is given to trajectory classification in absence of a priori information on the distribution of outliers. Experimental results prove the validity of the proposed approach. Index Terms—Anomaly detection, event analysis, support vector machines (SVMs), trajectory clustering.
Incremental Support Vector Learning: Analysis, Implementation and Applications
 Journal of Machine Learning Research
, 1968
"... Incremental Support Vector Machines (SVM) are instrumental in practical applications of online learning. This work focuses on the design and analysis of efficient incremental SVM learning, with the aim of providing a fast, numerically stable and robust implementation. A detailed analysis of converge ..."
Abstract

Cited by 43 (5 self)
 Add to MetaCart
Incremental Support Vector Machines (SVM) are instrumental in practical applications of online learning. This work focuses on the design and analysis of efficient incremental SVM learning, with the aim of providing a fast, numerically stable and robust implementation. A detailed analysis of convergence and of algorithmic complexity of incremental SVM learning is carried out. Based on this analysis, a new design of storage and numerical operations is proposed, which speeds up the training of an incremental SVM by a factor of 5 to 20. The performance of the new algorithm is demonstrated in two scenarios: learning with limited resources and active learning. Various applications of the algorithm, such as in drug discovery, online monitoring of industrial devices and and surveillance of network traffic, can be foreseen.
Tighter and convex maximum margin clustering
 In AISTATS, 2009b
"... Maximum margin principle has been successfully applied to many supervised and semisupervised problems in machine learning. Recently, this principle was extended for clustering, referred to as Maximum Margin Clustering (MMC) and achieved promising performance in recent studies. To avoid the problem ..."
Abstract

Cited by 41 (14 self)
 Add to MetaCart
(Show Context)
Maximum margin principle has been successfully applied to many supervised and semisupervised problems in machine learning. Recently, this principle was extended for clustering, referred to as Maximum Margin Clustering (MMC) and achieved promising performance in recent studies. To avoid the problem of local minima, MMC can be solved globally via convex semidefinite programming (SDP) relaxation. Although many efficient approaches have been proposed to alleviate the computational burden of SDP, convex MMCs are still not scalable for medium data sets. In this paper, we propose a novel convex optimization method, LGMMC, which maximizes the margin of opposite clusters via “Label Generation”. It can be shown that LGMMC is much more scalable than existing convex approaches. Moreover, we show that our convex relaxation is tighter than stateofart convex MMCs. Experiments on seventeen UCI datasets and MNIST dataset show significant improvement over existing MMC algorithms. 1
Sparse Kernel SVMs via CuttingPlane Training
, 2009
"... We explore an algorithm for training SVMs with Kernels that can represent the learned rule using arbitrary basis vectors, not just the support vectors (SVs) from the training set. This results in two benefits. First, the added flexibility makes it possible to find sparser solutions of good quality ..."
Abstract

Cited by 35 (1 self)
 Add to MetaCart
(Show Context)
We explore an algorithm for training SVMs with Kernels that can represent the learned rule using arbitrary basis vectors, not just the support vectors (SVs) from the training set. This results in two benefits. First, the added flexibility makes it possible to find sparser solutions of good quality, substantially speedingup prediction. Second, the improved sparsity can also make training of Kernel SVMs more efficient, especially for highdimensional and sparse data (e.g. text classification). This has the potential to make training of Kernel SVMs tractable for large training sets, where conventional methods scale quadratically due to the linear growth of the number of SVs. In addition to a theoretical analysis of the algorithm, we also present an empirical evaluation.
PSVM: Parallelizing Support Vector Machines on Distributed Computers
"... Support Vector Machines (SVMs) suffer from a widely recognized scalability problem in both memory use and computational time. To improve scalability, we have developed a parallel SVM algorithm (PSVM), which reduces memory use through performing a rowbased, approximate matrix factorization, and whic ..."
Abstract

Cited by 30 (1 self)
 Add to MetaCart
(Show Context)
Support Vector Machines (SVMs) suffer from a widely recognized scalability problem in both memory use and computational time. To improve scalability, we have developed a parallel SVM algorithm (PSVM), which reduces memory use through performing a rowbased, approximate matrix factorization, and which loads only essential data to each machine to perform parallel computation. Let n denote the number of training instances, p the reduced matrix dimension after factorization (p is significantly smaller than n), and m the number of machines. PSVM reduces the memory requirement from O(n 2) to O(np/m), and improves computation time to O(np 2 /m). Empirical study shows PSVM to be effective. PSVM Open Source is available for download at
Hashing Hyperplane Queries to Near Points with Applications to LargeScale Active Learning
"... We consider the problem of retrieving the database points nearest to a given hyperplane query without exhaustively scanning the database. We propose two hashingbased solutions. Our first approach maps the data to twobit binary keys that are localitysensitive for the angle between the hyperplane no ..."
Abstract

Cited by 27 (3 self)
 Add to MetaCart
(Show Context)
We consider the problem of retrieving the database points nearest to a given hyperplane query without exhaustively scanning the database. We propose two hashingbased solutions. Our first approach maps the data to twobit binary keys that are localitysensitive for the angle between the hyperplane normal and a database point. Our second approach embeds the data into a vector space where the Euclidean norm reflects the desired distance between the original points and hyperplane query. Both use hashing to retrieve near points in sublinear time. Our first method’s preprocessing stage is more efficient, while the second has stronger accuracy guarantees. We apply both to poolbased active learning: taking the current hyperplane classifier as a query, our algorithm identifies those points (approximately) satisfying the wellknown minimal distancetohyperplane selection criterion. We empirically demonstrate our methods ’ tradeoffs, and show that they make it practical to perform active selection with millions of unlabeled points. 1
Largescale sparsified manifold regularization
 Advances in Neural Information Processing Systems (NIPS) 19
, 2006
"... Semisupervised learning is more powerful than supervised learning by using both labeled and unlabeled data. In particular, the manifold regularization framework, together with kernel methods, leads to the Laplacian SVM (LapSVM) that has demonstrated stateoftheart performance. However, the LapSVM ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
(Show Context)
Semisupervised learning is more powerful than supervised learning by using both labeled and unlabeled data. In particular, the manifold regularization framework, together with kernel methods, leads to the Laplacian SVM (LapSVM) that has demonstrated stateoftheart performance. However, the LapSVM solution typically involves kernel expansions of all the labeled and unlabeled examples, and is slow on testing. Moreover, existing semisupervised learning methods, including the LapSVM, can only handle a small number of unlabeled examples. In this paper, we integrate manifold regularization with the core vector machine, which has been used for largescale supervised and unsupervised learning. By using a sparsified manifold regularizer and formulating as a centerconstrained minimum enclosing ball problem, the proposed method produces sparse solutions with low time and space complexities. Experimental results show that it is much faster than the LapSVM, and can handle a million unlabeled examples on a standard PC; while the LapSVM can only handle several thousand patterns. 1