Results 11  20
of
45
T.: Fast nfold crossvalidation for regularized leastsquares
 In: Proceedings of the Ninth Scandinavian Conference on Artificial Intelligence (SCAI
, 2006
"... Kernelbased learning algorithms have recently become the stateoftheart machine learning methods of which the support vector machines are the most popular ones. Regularized leastsquares (RLS), another kernelbased learning algorithm that is also known as the leastsquares support vector machine, ..."
Abstract

Cited by 13 (12 self)
 Add to MetaCart
Kernelbased learning algorithms have recently become the stateoftheart machine learning methods of which the support vector machines are the most popular ones. Regularized leastsquares (RLS), another kernelbased learning algorithm that is also known as the leastsquares support vector machine, is shown to have a performance comparable to that of the support vector machines in several machine learning tasks. In small scale problems, RLS have several computational advantages as compared to the support vector machines. Firstly, it is possible to calculate the crossvalidation (CV) performance of RLS on the training data without retraining in each CV round. We give a formal proof for this claim. Secondly, we can compute the RLS solution for several different values of the regularization parameter in parallel. Finally, several problems on the same data set can be solved in parallel provided that the same kernel function is used with each problem. We consider a simple implementation of the RLS algorithm for the small scale machine learning problems that takes advantage of all the above properties. The implementation is done via the eigen decomposition of the kernel matrix. The proposed CV method for RLS is a generalization of the fast leaveoneout crossvalidation (LOOCV) method for RLS which is widely known in the literature. For some tasks, the LOOCV gives a poor performance estimate for the learning machines, because of the dependencies between the training data points. We demonstrate this by experimentally comparing the performance estimates given by LOOCV and CV in a ranking task of dependency parses generated from biomedical texts.
A tutorial on νSupport Vector Machines
 APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY
, 2005
"... We briefly describe the main ideas of statistical learning theory, support vector machines (SVMs), and kernel feature spaces. We place particular emphasis on a description of the socalled nSVM, including details of the algorithm and its implementation, theoretical results, and practical applicatio ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
We briefly describe the main ideas of statistical learning theory, support vector machines (SVMs), and kernel feature spaces. We place particular emphasis on a description of the socalled nSVM, including details of the algorithm and its implementation, theoretical results, and practical applications.
Towards Perceptual Intelligence: Statistical Modeling of Human Individual and Interactive Behaviors
 Prediction of Human Behavior, IEEE Intelligent Vehicles
, 1995
"... This thesis presents a computational framework for the automatic recognition and prediction of different kinds of human behaviors from video cameras and other sensors, via perceptually intelligent systems that automatically sense and correctly classify human behaviors, by means of Machine Perception ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
This thesis presents a computational framework for the automatic recognition and prediction of different kinds of human behaviors from video cameras and other sensors, via perceptually intelligent systems that automatically sense and correctly classify human behaviors, by means of Machine Perception and Machine Learning techniques. In the thesis I develop the statistical machine learning algorithms (dynamic graphical models) necessary for detecting and recognizing individual and interactive behaviors. In the case of the interactions two Hidden Markov Models (HMMs) are coupled in a novel architecture called Coupled Hidden Markov Models (CHMMs) that explicitly captures the interactions between them. The algorithms for learning the parameters from data as well as for doing inference with those models are developed and described. Four systems that experimentally evaluate the proposed paradigm are presented: (1) LAFTER, an automatic face detection and tracking system with facial expression recognition; (2) a TaiChi gesture recognition system; (3) a pedestrian surveillance system that recognizes typical human to human interactions; (4) and a SmartCar for driver maneuver recognition. These systems capture human behaviors of different nature and increasing complexity: first, isolated, singleuser facial expressions, then, twohand gestures and humantohuman interactions,...
Support Vector Learning for Fuzzy RuleBased Classification Systems
, 2003
"... To design a fuzzy rulebased classi cation system (fuzzy classi er) with good generalization abilityina high dimensional feature space has been an active research topic for a long time. As a powerful machine learning approach for pattern recognition problems, support vector machine (SVM) is known to ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
To design a fuzzy rulebased classi cation system (fuzzy classi er) with good generalization abilityina high dimensional feature space has been an active research topic for a long time. As a powerful machine learning approach for pattern recognition problems, support vector machine (SVM) is known to have good generalization ability. More importantly, an SVM can work very well on a high (or even in nite) dimensional feature space. This paper investigates the connection between fuzzy classi ers and kernel machines, establishes a link between fuzzy rules and kernels, and proposes a learning algorithm for fuzzy classi ers. We rst show that a fuzzy classi er implicitly de nes a translation invariant kernel under the assumption that all membership functions associated with the same input variable are generated from location transformation of a reference function. Fuzzy inference on the IFpart of a fuzzy rule can be viewed as evaluating the kernel function. The kernel function is then proven to be a Mercer kernel if the reference functions meet certain spectral requirement. The corresponding fuzzy classi er is named positive de  nite fuzzy classi er (PDFC). A PDFC can be built from the given training samples based on a support vector learning approach with the IFpart fuzzy rules given by the support vectors. Since the learning process minimizes an upper bound on the expected risk (expected prediction error) instead of the empirical risk (training error), the resulting PDFC usually has good generalization. Moreover, because of the sparsity properties of the SVMs, the number of fuzzy rules is irrelevant to the dimension of input space. In this sense, we avoid the "curse of dimensionality." Finally, PDFCs with dierent reference functions are constructed using the su...
A short introduction to learning with kernels
 IN ADVANCED LECTURES ON MACHINE LEARNING, S.MENDELSON
, 2002
"... We briefly describe the main ideas of statistical learning theory, support vector ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
We briefly describe the main ideas of statistical learning theory, support vector
Support Vector Methods in Learning and Feature Extraction
, 1998
"... The last years have witnessed an increasing interest in Support Vector (SV) machines, which use Mercer kernels for efficiently performing computations in highdimensional spaces. In pattern recognition, the SV algorithm constructs nonlinear decision functions by training a classifier to perform a li ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
The last years have witnessed an increasing interest in Support Vector (SV) machines, which use Mercer kernels for efficiently performing computations in highdimensional spaces. In pattern recognition, the SV algorithm constructs nonlinear decision functions by training a classifier to perform a linear separation in some highdimensional space which is nonlinearly related to input space. Recently, we have developed a technique for Nonlinear Principal Component Analysis (Kernel PCA) based on the same types of kernels. This way, we can for instance efficiently extract polynomial features of arbitrary order by computing projections onto principal components in the space of all products of n pixels of images. We explain the idea of Mercer kernels and associated feature spaces, and describe connections to the theory of reproducing kernels and to regularization theory, followed by an overview of the above algorithms employing these kernels. 1. Introduction For the case of twoclass pattern...
Downbeat prediction by listening and learning
 in Proc. of IEEE WASPAA
, 2005
"... The perception of downbeat is not yet very well understood, although it generally “feels ” natural and intuitive to music listeners. We propose an unbiased and predictive modeling strategy of downbeat by combining psychoacoustic models of music listening with timelag embedded segment learning. The ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
The perception of downbeat is not yet very well understood, although it generally “feels ” natural and intuitive to music listeners. We propose an unbiased and predictive modeling strategy of downbeat by combining psychoacoustic models of music listening with timelag embedded segment learning. The model is causal, tempo independent, and could inform in a topdown manner a bottomup beat tracker. Results with particularly complex music examples are presented. 1.
Model Selection: Beyond the Bayesian/Frequentist Divide
"... The principle of parsimony also known as “Ockham’s razor ” has inspired many theories of model selection. Yet such theories, all making arguments in favor of parsimony, are based on very different premises and have developed distinct methodologies to derive algorithms. We have organized challenges a ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
The principle of parsimony also known as “Ockham’s razor ” has inspired many theories of model selection. Yet such theories, all making arguments in favor of parsimony, are based on very different premises and have developed distinct methodologies to derive algorithms. We have organized challenges and edited a special issue of JMLR and several conference proceedings around the theme of model selection. In this editorial, we revisit the problem of avoiding overfitting in light of the latest results. We note the remarkable convergence of theories as different as Bayesian theory, Minimum Description Length, bias/variance tradeoff, Structural Risk Minimization, and regularization, in some approaches. We also present new and interesting examples of the complementarity of theories leading to hybrid algorithms, neither frequentist, nor Bayesian, or perhaps both frequentist and Bayesian!
Optimizing fusion architectures for limited training data", in Detection and Remediation Technologies for Mines and Minelike Targets
 Proc. SPIE
, 2000
"... A method is described to improve the performance of sensor fusion algorithms. Data sets available for training fusion algorithms are often smaller than desired, since the sensor suite used for data acquisition is always limited by the slowest, least reliable sensor. In addition, the fusion process e ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
A method is described to improve the performance of sensor fusion algorithms. Data sets available for training fusion algorithms are often smaller than desired, since the sensor suite used for data acquisition is always limited by the slowest, least reliable sensor. In addition, the fusion process expands the dimension of the data, which increases the requirement for training data. By using structural risk minimization, a technique of statistical learning theory, a classiÞer of optimal complexity can be obtained, leading to improved performance. A technique for jointly optimizing the local decision thresholds is also described for harddecision fusion. The procedure is demonstrated for EMI, GPR and MWIR data acquired at the US Army mine lanes at Fort A.P. Hill, VA, Site 71A. It is shown that fusion of features, soft decisions, and hard decisions each yield improved performance with respect to the individual sensors. Fusion decreases the overall error rate (false alarms and missed detections) from roughly 20 % for the best single sensor to roughly 10 % for the best fused result.