Results 1  10
of
20
An introduction to kernelbased learning algorithms
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2001
"... This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and ..."
Abstract

Cited by 377 (48 self)
 Add to MetaCart
This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and
Query Learning with Large Margin Classifiers
, 2000
"... The active selection of instances can significantly improve the generalisation performance of a learning machine. Large margin classifiers such as Support Vector Machines classify data using the most informative instances (the support vectors). This makes them natural candidates for instance s ..."
Abstract

Cited by 123 (1 self)
 Add to MetaCart
The active selection of instances can significantly improve the generalisation performance of a learning machine. Large margin classifiers such as Support Vector Machines classify data using the most informative instances (the support vectors). This makes them natural candidates for instance selection strategies. In this paper we propose an algorithm for the training of Support Vector Machines using instance selection. We give a theoretical justification for the strategy and experimental results on real and artificial data demonstrating its effectiveness. The technique is most efficient when the dataset can be learnt using few support vectors. 1. Introduction The labourintensive task of labelling data is a serious bottleneck for many data mining tasks. Often cost or time constraints mean that only a fraction of the available instances can be labeled. For this reason there has been increasing interest in the problem of handling partially labeled datasets. One approach ...
Latent Semantic Kernels
"... Kernel methods like Support Vector Machines have successfully been used for text categorization. A standard choice of kernel function has been the inner product between the vectorspace representationoftwo documents, in analogy with classical information retrieval (IR) approaches. Latent Semantic In ..."
Abstract

Cited by 88 (7 self)
 Add to MetaCart
Kernel methods like Support Vector Machines have successfully been used for text categorization. A standard choice of kernel function has been the inner product between the vectorspace representationoftwo documents, in analogy with classical information retrieval (IR) approaches. Latent Semantic Indexing (LSI) has been successfully used for IR purposes as a technique for capturing semantic relations between terms and inserting them into the similarity measure between two documents. One of its main drawbacks, in IR, is its computational cost. In this paper we describe how the LSI approach can be implementedinakernelde ned feature space. We provide experimental results demonstrating that the approach can significantly improve performance, and that it does not impair it.
A greedy EM algorithm for Gaussian mixture learning
 Neural Processing Letters
, 2000
"... Learning a Gaussian mixture with a local algorithm like EM can be difficult because (i) the true number of mixing components is usually unknown, (ii) there is no generally accepted method for parameter initialization, and (iii) the algorithm can get stuck in one of the many local maxima of the likel ..."
Abstract

Cited by 40 (9 self)
 Add to MetaCart
Learning a Gaussian mixture with a local algorithm like EM can be difficult because (i) the true number of mixing components is usually unknown, (ii) there is no generally accepted method for parameter initialization, and (iii) the algorithm can get stuck in one of the many local maxima of the likelihood function. In this paper we propose a greedy algorithm for learning a Gaussian mixture which tries to overcome these limitations. In particular, starting with a single component and adding components sequentially until a maximum number $k$, the algorithm is capable of achieving solutions superior to EM with $k$ components in terms of the likelihood of a test set. The algorithm is based on recent theoretical results on incremental mixture density estimation, and uses a combination of global and local search each time a new component is added to the mixture.
Projecting to a slow manifold: Singularly perturbed systems and legacy codes, Part 2 (working title), in preparation
"... Abstract. We consider dynamical systems possessing an attracting, invariant “slow manifold ” that can be parameterized by a few observable variables. We present a procedure that, given a process for integrating the system step by step and a set of values of the observables, finds the values of the r ..."
Abstract

Cited by 35 (16 self)
 Add to MetaCart
Abstract. We consider dynamical systems possessing an attracting, invariant “slow manifold ” that can be parameterized by a few observable variables. We present a procedure that, given a process for integrating the system step by step and a set of values of the observables, finds the values of the remaining system variables such that the state is close to the slow manifold to some desired accuracy. It should be noted that this is not equivalent to “integrating down to the manifold ” since the latter process may significantly change the values of the observables. We consider problems whose solution has a singular perturbation expansion, although we do not know what it is nor have any way to compute it (because the system is not necessarily expressed in a singular perturbation form). We show in this paper that, under some conditions, computing the values of the remaining variables so that their (m + 1)st time derivatives are zero provides an estimate of the unknown variables that is an mthorder approximation to a point on the slow manifold in a sense to be defined. We then show how this criterion can be applied approximately when the system is defined by a legacy code rather than directly through closed form equations. This procedure can be valuable when one wishes to start a simulation of the detailed model on the slow manifold with particular values of observable variables characterizing the slow manifold.
Greedy Mixture Learning for Multiple Motif Discovery in Biological Sequences
, 2003
"... Motivation: This paper studies the problem of discovering subsequences, known as motifs, that are common to a given collection of related biosequences, by proposing agreedy algorithm for learning a mixture of motifs model through likelihood maximization. The approach adds sequentially a new motif to ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
Motivation: This paper studies the problem of discovering subsequences, known as motifs, that are common to a given collection of related biosequences, by proposing agreedy algorithm for learning a mixture of motifs model through likelihood maximization. The approach adds sequentially a new motif to a mixture model by performing a combined scheme of global and local search for appropriately initializing its parameters. In addition, a hierarchical partitioning scheme based on kdtrees is presented for partitioning the input dataset in order to speedup the global searching procedure. The proposed method compares favorably over the wellknown MEME approach and treats successfully several drawbacks of MEME.
Kernel Methods for Deep Learning
"... We introduce a new family of positivedefinite kernel functions that mimic the computation in large, multilayer neural nets. These kernel functions can be used in shallow architectures, such as support vector machines (SVMs), or in deep kernelbased architectures that we call multilayer kernel machi ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
We introduce a new family of positivedefinite kernel functions that mimic the computation in large, multilayer neural nets. These kernel functions can be used in shallow architectures, such as support vector machines (SVMs), or in deep kernelbased architectures that we call multilayer kernel machines (MKMs). We evaluate SVMs and MKMs with these kernel functions on problems designed to illustrate the advantages of deep architectures. On several problems, we obtain better results than previous, leading benchmarks from both SVMs with Gaussian kernels as well as deep belief nets. 1
Eigenvoice speaker adaptation via composite kernel PCA
 in Advances in Neural Information Processing Systems 16
, 2004
"... Eigenvoice speaker adaptation has been shown to be effective when only a small amount of adaptation data is available. At the heart of the method is principal component analysis (PCA) employed to find the most important eigenvoices. In this paper, we postulate that nonlinear PCA, in particular kerne ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
Eigenvoice speaker adaptation has been shown to be effective when only a small amount of adaptation data is available. At the heart of the method is principal component analysis (PCA) employed to find the most important eigenvoices. In this paper, we postulate that nonlinear PCA, in particular kernel PCA, may be even more effective. One major challenge is to map the featurespace eigenvoices back to the observation space so that the state observation likelihoods can be computed during the estimation of eigenvoice weights and subsequent decoding. Our solution is to compute kernel PCA using composite kernels, and we will call our new method kernel eigenvoice speaker adaptation. On the TIDIGITS corpus, we found that compared with a speakerindependent model, our kernel eigenvoice adaptation method can reduce the word error rate by 28–33% while the standard eigenvoice approach can only match the performance of the speakerindependent model. 1
Subspace Information Criterion for NonQuadratic Regularizers  Model Selection for Sparse Regressors
 IEEE Transactions on Neural Networks
, 2002
"... Nonquadratic regularizers, in particular the # 1 norm regularizer can yield sparse solutions that generalize well. In this work we propose the Generalized Subspace Information Criterion (GSIC) that allows to predict the generalization error for this useful family of regularizers. We show that un ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
Nonquadratic regularizers, in particular the # 1 norm regularizer can yield sparse solutions that generalize well. In this work we propose the Generalized Subspace Information Criterion (GSIC) that allows to predict the generalization error for this useful family of regularizers. We show that under some technical assumptions GSIC is an asymptotically unbiased estimator of the generalization error. GSIC is demonstrated to have a good performance in experiments with the # 1 norm regularizer as we compare with the Network Information Criterion and crossvalidation in relatively large sample cases. However in the small sample case, GSIC tends to fail to capture the optimal model due to its large variance. Therefore, also a biased version of GSIC is introduced, which achieves reliable model selection in the relevant and challenging scenario of high dimensional data and few samples.
Compactly supported radial basis function kernels. Available at www4.stat.ncsu.edu/˜hzhang/research.html
, 2004
"... Abstract The use of kernels is a key factor in the success of many classification algorithms by allowing nonlinear decision surfaces. Radial basis function (RBF) kernels are commonly used but often associated with dense Gram matrices. We consider a mathematical operator to sparsify any RBF kernel sy ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Abstract The use of kernels is a key factor in the success of many classification algorithms by allowing nonlinear decision surfaces. Radial basis function (RBF) kernels are commonly used but often associated with dense Gram matrices. We consider a mathematical operator to sparsify any RBF kernel systematically, yielding a kernel with a compact support and sparse Gram matrix. Having many zero elements in Gram matrices can greatly reduce computer storage requirements and the number of floating point operations needed in computation. This paper develops a unified framework to study the efficiency gain and information loss due to the sparsifying operation. In particular, we propose two quantitative measures: similarity and sparsity and study their tradeoff, which is used to adpatively tune the thresholding parameter in the sparsifying operator. We then implement compactly supported RBF kernels to support vector machines (SVM), least squares SVM, and kernel principal component analysis. Simulations demonstrate that properlytuned compactly supported kernels give favorable performances while enjoying efficient algorithms for computation.