Results 11  20
of
1,183
A Survey of Kernels for Structured Data
"... Kernel methods in general and support vector machines in particular have been successful in various learning tasks on data represented in a single table. Much 'realworld ' data, however, is structured it has no natural representation in a single table. Usually, to apply kernel methods to 'realworl ..."
Abstract

Cited by 113 (3 self)
 Add to MetaCart
Kernel methods in general and support vector machines in particular have been successful in various learning tasks on data represented in a single table. Much 'realworld ' data, however, is structured it has no natural representation in a single table. Usually, to apply kernel methods to 'realworld' data, extensive preprocessing is performed toembed the data into areal vector space and thus in a single table. This survey describes several approaches ofdefining positive definite kernels on structured instances directly.
SongLevel Features And Support Vector Machines For Music Classification
, 2005
"... Searching and organizing growing digital music collections requires automatic classification of music. This paper describes a new system, tested on the task of artist identification, that uses support vector machines to classify songs based on features calculated over their entire lengths. Since sup ..."
Abstract

Cited by 108 (11 self)
 Add to MetaCart
Searching and organizing growing digital music collections requires automatic classification of music. This paper describes a new system, tested on the task of artist identification, that uses support vector machines to classify songs based on features calculated over their entire lengths. Since support vector machines are exemplarbased classifiers, training on and classifying entire songs instead of shorttime features makes intuitive sense. On a dataset of 1200 pop songs performed by 18 artists, we show that this classifier outperforms similar classifiers that use only SVMs or songlevel features. We also show that the KL divergence between single Gaussians and Mahalanobis distance between MFCC statistics vectors perform comparably when classifiers are trained and tested on separate albums, but KL divergence outperforms Mahalanobis distance when trained and tested on songs from the same albums.
On the Nyström Method for Approximating a Gram Matrix for Improved KernelBased Learning
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... A problem for many kernelbased methods is that the amount of computation required to find the solution scales as O(n³), where n is the number of training examples. We develop and analyze an algorithm to compute an easilyinterpretable lowrank approximation to an nn Gram matrix G such that compu ..."
Abstract

Cited by 108 (7 self)
 Add to MetaCart
A problem for many kernelbased methods is that the amount of computation required to find the solution scales as O(n³), where n is the number of training examples. We develop and analyze an algorithm to compute an easilyinterpretable lowrank approximation to an nn Gram matrix G such that computations of interest may be performed more rapidly. The approximation is of the form G k = CW , where C is a matrix consisting of a small number c of columns of G and W k is the best rankk approximation to W , the matrix formed by the intersection between those c columns of G and the corresponding c rows of G. An important aspect of the algorithm is the probability distribution used to randomly sample the columns; we will use a judiciouslychosen and datadependent nonuniform probability distribution. Let F denote the spectral norm and the Frobenius norm, respectively, of a matrix, and let G k be the best rankk approximation to G. We prove that by choosing O(k/# ) columns both in expectation and with high probability, for both # = 2, F , and for all k : 0 rank(W ). This approximation can be computed using O(n) additional space and time, after making two passes over the data from external storage. The relationships between this algorithm, other related matrix decompositions, and the Nyström method from integral equation theory are discussed.
Fast Kernel Classifiers With Online And Active Learning
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... Very high dimensional learning systems become theoretically possible when training examples are abundant. The computing cost then becomes the limiting factor. Any efficient learning algorithm should at least take a brief look at each example. But should all examples be given equal attention? This ..."
Abstract

Cited by 102 (17 self)
 Add to MetaCart
Very high dimensional learning systems become theoretically possible when training examples are abundant. The computing cost then becomes the limiting factor. Any efficient learning algorithm should at least take a brief look at each example. But should all examples be given equal attention? This contribution proposes an empirical answer. We first present an online SVM algorithm based on this premise. LASVM yields competitive misclassification rates after a single pass over the training examples, outspeeding stateoftheart SVM solvers. Then we show how active example selection can yield faster training, higher accuracies, and simpler models, using only a fraction of the training example labels.
Fast Binary Feature Selection with Conditional Mutual Information
 Journal of Machine Learning Research
, 2004
"... We propose in this paper a very fast feature selection technique based on conditional mutual information. ..."
Abstract

Cited by 101 (1 self)
 Add to MetaCart
We propose in this paper a very fast feature selection technique based on conditional mutual information.
A New Approximate Maximal Margin Classification Algorithm
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2001
"... A new incremental learning algorithm is described which approximates the maximal margin hyperplane w.r.t. norm p 2 for a set of linearly separable data. Our algorithm, called alma p (Approximate Large Margin algorithm w.r.t. norm p), takes O (p 1) 2 2 corrections to separate the data wi ..."
Abstract

Cited by 87 (6 self)
 Add to MetaCart
A new incremental learning algorithm is described which approximates the maximal margin hyperplane w.r.t. norm p 2 for a set of linearly separable data. Our algorithm, called alma p (Approximate Large Margin algorithm w.r.t. norm p), takes O (p 1) 2 2 corrections to separate the data with pnorm margin larger than (1 ) , where is the (normalized) pnorm margin of the data. alma p avoids quadratic (or higherorder) programming methods. It is very easy to implement and is as fast as online algorithms, such as Rosenblatt's Perceptron algorithm. We performed extensive experiments on both realworld and artificial datasets. We compared alma 2 (i.e., alma p with p = 2) to standard Support vector Machines (SVM) and to two incremental algorithms: the Perceptron algorithm and Li and Long's ROMMA. The accuracy levels achieved by alma 2 are superior to those achieved by the Perceptron algorithm and ROMMA, but slightly inferior to SVM's. On the other hand, alma 2 is quite faster and easier to implement than standard SVM training algorithms. When learning sparse target vectors, alma p with p > 2 largely outperforms Perceptronlike algorithms, such as alma 2 .
Applying support vector machines to imbalanced datasets
 In Proceedings of the 15th European Conference on Machine Learning (ECML
, 2004
"... Abstract. Support Vector Machines (SVM) have been extensively studied and have shown remarkable success in many applications. However the success of SVM is very limited when it is applied to the problem of learning from imbalanced datasets in which negative instances heavily outnumber the positive i ..."
Abstract

Cited by 87 (2 self)
 Add to MetaCart
Abstract. Support Vector Machines (SVM) have been extensively studied and have shown remarkable success in many applications. However the success of SVM is very limited when it is applied to the problem of learning from imbalanced datasets in which negative instances heavily outnumber the positive instances (e.g. in gene profiling and detecting credit card fraud). This paper discusses the factors behind this failure and explains why the common strategy of undersampling the training data may not be the best choice for SVM. We then propose an algorithm for overcoming these problems which is based on a variant of the SMOTE algorithm by Chawla et al, combined with Veropoulos et al’s different error costs algorithm. We compare the performance of our algorithm against these two algorithms, along with undersampling and regular SVM and show that our algorithm outperforms all of them. 1
Automatic document metadata extraction using support vector machines
 In JCDL ’03: Proceedings of the 3rd ACM/IEEECS Joint Conference on Digital Libraries
, 2003
"... Automatic metadata generation provides scalability and usability for digital libraries and their collections. Machine learning methods offer robust and adaptable automatic metadata extraction. We describe a Support Vector Machine classificationbased method for metadata extraction from header part o ..."
Abstract

Cited by 85 (21 self)
 Add to MetaCart
Automatic metadata generation provides scalability and usability for digital libraries and their collections. Machine learning methods offer robust and adaptable automatic metadata extraction. We describe a Support Vector Machine classificationbased method for metadata extraction from header part of research papers and show that it outperforms other machine learning methods on the same task. The method first classifies each line of the header into one or more of 15 classes. An iterative convergence procedure is then used to improve the line classification by using the predicted class labels of its neighbor lines in the previous round. Further metadata extraction is done by seeking the best chunk boundaries of each line. We found that discovery and use of the structural patterns of the data and domain based word clustering can improve the metadata extraction performance. An appropriate feature normalization also greatly improves the classification performance. Our metadata extraction method was originally designed to improve the metadata extraction quality of the digital libraries Citeseer[17] and EbizSearch[24]. We believe it can be generalized to other digital libraries. 1 Introduction and related work Interoperability is crucial to the effective use of Digital
Support Vector Machines: Hype or Hallelujah?
 SIGKDD Explorations
, 2003
"... Support Vector Machines (SVMs) and related kernel methods have become increasingly popular tools for data mining tasks such as classification, regression, and novelty detection. The goal of this tutorial is to provide an intuitive explanation of SVMs from a geometric perspective. The classification ..."
Abstract

Cited by 81 (0 self)
 Add to MetaCart
Support Vector Machines (SVMs) and related kernel methods have become increasingly popular tools for data mining tasks such as classification, regression, and novelty detection. The goal of this tutorial is to provide an intuitive explanation of SVMs from a geometric perspective. The classification problem is used to investigate the basic concepts behind SVMs and to examine their strengths and weaknesses from a data mining perspective. While this overview is not comprehensive, it does provide resources for those interested in further exploring SVMs.