Results 1 
8 of
8
The WEKA Data Mining Software: An Update
"... More than twelve years have elapsed since the first public release of WEKA. In that time, the software has been rewritten entirely from scratch, evolved substantially and now accompanies a text on data mining [35]. These days, WEKA enjoys widespread acceptance in both academia and business, has an a ..."
Abstract

Cited by 605 (11 self)
 Add to MetaCart
More than twelve years have elapsed since the first public release of WEKA. In that time, the software has been rewritten entirely from scratch, evolved substantially and now accompanies a text on data mining [35]. These days, WEKA enjoys widespread acceptance in both academia and business, has an active community, and has been downloaded more than 1.4 million times since being placed on SourceForge in April 2000. This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003. 1.
Using string kernels to identify famous performers from their playing style
 In Proceedings of the 15th European Conference on Machine Learning (ECML’2004
, 2004
"... In this chapter we show a novel application of string kernels: that is to the problem of recognising famous pianists from their style of playing. The characteristics of performers playing the same piece are obtained from changes in beatlevel tempo and beatlevel loudness, which over the time of the ..."
Abstract

Cited by 31 (9 self)
 Add to MetaCart
In this chapter we show a novel application of string kernels: that is to the problem of recognising famous pianists from their style of playing. The characteristics of performers playing the same piece are obtained from changes in beatlevel tempo and beatlevel loudness, which over the time of the piece form a performance worm. From such worms, general performance alphabets can be derived, and pianists ’ performances can then be represented as strings. We show that when using the string kernel on this data, both kernel partial least squares and Support Vector Machines outperform the current best results. Furthermore we suggest a new method of obtaining feature directions from the Kernel Partial Least Squares algorithm and show that this can deliver better performance than methods previously used in the literature when used in conjunction with a Support Vector Machine 1
GaborBased Kernel PartialLeastSquares Discrimination Features for Face Recognition ∗
, 2007
"... Abstract. The paper presents a novel method for the extraction of facial features based on the Gaborwavelet representation of face images and the kernel partialleastsquares discrimination (KPLSD) algorithm. The proposed featureextraction method, called the Gaborbased kernel partialleastsquare ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Abstract. The paper presents a novel method for the extraction of facial features based on the Gaborwavelet representation of face images and the kernel partialleastsquares discrimination (KPLSD) algorithm. The proposed featureextraction method, called the Gaborbased kernel partialleastsquares discrimination (GKPLSD), is performed in two consecutive steps. In the first step a set of forty Gabor wavelets is used to extract discriminative and robust facial features, while in the second step the kernel partialleastsquares discrimination technique is used to reduce the dimensionality of the Gabor feature vector and to further enhance its discriminatory power. For optimal performance, the KPLSDbased transformation is implemented using the recently proposed fractionalpowerpolynomial models. The experimental results based on the XM2VTS and ORL databases show that the GKPLSD approach outperforms featureextraction methods such as principal component analysis (PCA), linear discriminant analysis (LDA), kernel principal component analysis (KPCA) or generalized discriminant analysis (GDA) as well as combinations of these methods with Gabor representations of the face images. Furthermore, as the KPLSD algorithm is derived from the kernel partialleastsquares regression (KPLSR) model it does not suffer from the smallsamplesize problem, which is regularly encountered in the field of face recognition.
Random Forests Feature Selection with KPLS: Detecting Ischemia from Magnetocardiograms
, 2006
"... Random Forests were introduced by Breiman for feature (variable) selection and improved predictions for decision tree models. The resulting model is often superior to AdaBoost and bagging approaches. In this paper the random forests approach is extended for variable selection with other learning mo ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Random Forests were introduced by Breiman for feature (variable) selection and improved predictions for decision tree models. The resulting model is often superior to AdaBoost and bagging approaches. In this paper the random forests approach is extended for variable selection with other learning models, in this case Partial Least Squares (PLS) and Kernel Partial Least Squares (KPLS) to estimate the importance of variables. This variable selection method is demonstrated on two benchmark datasets (Boston Housing and South African heart disease data). Finally, this methodology is applied to magnetocardiogram data for the detection of ischemic heart disease.
Sigma Tuning of Gaussian Kernels: Detection of Ischemia from
"... This chapter introduces a novel LevenbergMarquardt like secondorder algorithm for tuning the Parzen window σ in a Radial Basis Function (Gaussian) kernel. In this case each attribute has its own sigma parameter associated with it. The values of the optimized σ are then used as a gauge for variable ..."
Abstract
 Add to MetaCart
This chapter introduces a novel LevenbergMarquardt like secondorder algorithm for tuning the Parzen window σ in a Radial Basis Function (Gaussian) kernel. In this case each attribute has its own sigma parameter associated with it. The values of the optimized σ are then used as a gauge for variable selection. In this study Kernel Partial Least Squares (KPLS) model is applied to several benchmark data sets in order to estimate the effectiveness of the secondorder sigma tuning procedure for an RBF kernel. The variable subset selection method based on these sigma values is then compared with different feature selection procedures such as random forests and sensitivity analysis. The sigmatuned RBF kernel model outperforms KPLS and SVM models with a single sigma value. KPLS models also compare favorably with Least Squares Support Vector Machines (LSSVM), epsiloninsensitive Support Vector Regression and traditional PLS. The sigma tuning and variable selection procedure introduced in this paper is applied to industrial magnetocardiogram data for the detection of ischemic heart disease from measurement of the magnetic field around the heart. BACKGROUND OF SIGMA TUNING This chapter introduces a novel tuning mechanism for Gaussian or Radial Basis Function (RBF) kernels where each attribute (or feature) is characterized by its own Parzen window sigma. The kernel trick is frequently used in machine learning to transform the input domain into a feature domain where linear methods are then used to find an optimal solution to a regression or classification problem. Support Vector Machines (SVM), Kernel Principal Component Regression (KPCR), Kernel Ridge Regression (KRR), Kernel Partial Least Squares (KPLS) are examples of techniques that apply kernels for machine learning and data mining. There are many different possible kernels, but the RBF (Gaussian) kernel is one of the most popular ones. Equation (1) represents a single element in the RBF kernel, 2 x i − x j
Land et al. BMC Systems Biology 2011, 5(Suppl 3):S13
"... Kernelized partial least squares for feature reduction and classification of gene microarray data ..."
Abstract
 Add to MetaCart
Kernelized partial least squares for feature reduction and classification of gene microarray data
Combining the Matching Loss with Partial Least Squares for Regression Problems
"... Assume you are given examples of the form (x, a), where the instances x lie in R d and the labels a are real. The goal is to find a linear weight vector w ∈ R d for estimating the labels a with the linear activation w · x. The standard way to do this type of linear regression is to find a w that min ..."
Abstract
 Add to MetaCart
Assume you are given examples of the form (x, a), where the instances x lie in R d and the labels a are real. The goal is to find a linear weight vector w ∈ R d for estimating the labels a with the linear activation w · x. The standard way to do this type of linear regression is to find a w that minimizes the square loss (w · x − a) 2 summed over the examples. We explore the use of alternate asymmetric losses so that we can achieve high precision in some label ranges versus others, and punish overprediction differently than underprediction. Our running example is glucose estimation from skin spectroscopy data where the medical requirements are better modeled by such an asymmetric loss as specified by the Clarke Error Grid (Clarke, 2005). We define these losses using a known notion of matching loss (Azoury & Warmuth, 2000; Kivinen & Warmuth, 2001) that defines losses as areas underneath a transfer function. By shifting and scaling the sigmoid function we essentially use a reformulated version of the logistic loss to train a linear neuron. We apply our losses to predicting the glucose value in skin from Raman spectroscopy measurements and show that our predictions better avoid certain critical regions of the Clarke Error Grid than the square loss. In many spectroscopy applications, the features composing each instance x ∈ R d (i.e. d − features) could
2013 IEEE Conference on Computer Vision and Pattern Recognition MKPLS: Manifold Kernel Partial Least Squares for Lipreading and Speaker Identification
"... Visual speech recognition is a challenging problem, due to confusion between visual speech features. The speaker identification problem is usually coupled with speech recognition. Moreover, speaker identification is important to several applications, such as automatic access control, biometrics, aut ..."
Abstract
 Add to MetaCart
Visual speech recognition is a challenging problem, due to confusion between visual speech features. The speaker identification problem is usually coupled with speech recognition. Moreover, speaker identification is important to several applications, such as automatic access control, biometrics, authentication, and personal privacy issues. In this paper, we propose a novel approach for lipreading and speaker identification. We propose a new approach for manifold parameterization in a lowdimensional latent space, where each manifold is represented as a point in that space. We initially parameterize each instance manifold using a nonlinear mapping from a unified manifold representation. We then factorize the parameter space using Kernel Partial Least Squares (KPLS) to achieve a lowdimension manifold latent space. We use twoway projections to achieve two manifold latent spaces, one for the speech content and one for the speaker. We apply our approach on two public databases: AVLetters and OuluVS. We show the results for three different settings of lipreading: speaker independent, speaker dependent, and speaker semidependent. Our approach outperforms for the speaker semidependent setting by at least 15 % of the baseline, and competes in the other two settings. 1.