Results 1  10
of
30
An introduction to kernelbased learning algorithms
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2001
"... This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and ..."
Abstract

Cited by 589 (54 self)
 Add to MetaCart
This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and
Query Learning with Large Margin Classifiers
, 2000
"... The active selection of instances can significantly improve the generalisation performance of a learning machine. Large margin classifiers such as Support Vector Machines classify data using the most informative instances (the support vectors). This makes them natural candidates for instance s ..."
Abstract

Cited by 156 (1 self)
 Add to MetaCart
The active selection of instances can significantly improve the generalisation performance of a learning machine. Large margin classifiers such as Support Vector Machines classify data using the most informative instances (the support vectors). This makes them natural candidates for instance selection strategies. In this paper we propose an algorithm for the training of Support Vector Machines using instance selection. We give a theoretical justification for the strategy and experimental results on real and artificial data demonstrating its effectiveness. The technique is most efficient when the dataset can be learnt using few support vectors. 1. Introduction The labourintensive task of labelling data is a serious bottleneck for many data mining tasks. Often cost or time constraints mean that only a fraction of the available instances can be labeled. For this reason there has been increasing interest in the problem of handling partially labeled datasets. One approach ...
Latent Semantic Kernels
"... Kernel methods like Support Vector Machines have successfully been used for text categorization. A standard choice of kernel function has been the inner product between the vectorspace representationoftwo documents, in analogy with classical information retrieval (IR) approaches. Latent Semantic In ..."
Abstract

Cited by 112 (8 self)
 Add to MetaCart
Kernel methods like Support Vector Machines have successfully been used for text categorization. A standard choice of kernel function has been the inner product between the vectorspace representationoftwo documents, in analogy with classical information retrieval (IR) approaches. Latent Semantic Indexing (LSI) has been successfully used for IR purposes as a technique for capturing semantic relations between terms and inserting them into the similarity measure between two documents. One of its main drawbacks, in IR, is its computational cost. In this paper we describe how the LSI approach can be implementedinakernelde ned feature space. We provide experimental results demonstrating that the approach can significantly improve performance, and that it does not impair it.
A greedy EM algorithm for Gaussian mixture learning
 Neural Processing Letters
, 2000
"... Learning a Gaussian mixture with a local algorithm like EM can be difficult because (i) the true number of mixing components is usually unknown, (ii) there is no generally accepted method for parameter initialization, and (iii) the algorithm can get stuck in one of the many local maxima of the likel ..."
Abstract

Cited by 65 (13 self)
 Add to MetaCart
(Show Context)
Learning a Gaussian mixture with a local algorithm like EM can be difficult because (i) the true number of mixing components is usually unknown, (ii) there is no generally accepted method for parameter initialization, and (iii) the algorithm can get stuck in one of the many local maxima of the likelihood function. In this paper we propose a greedy algorithm for learning a Gaussian mixture which tries to overcome these limitations. In particular, starting with a single component and adding components sequentially until a maximum number $k$, the algorithm is capable of achieving solutions superior to EM with $k$ components in terms of the likelihood of a test set. The algorithm is based on recent theoretical results on incremental mixture density estimation, and uses a combination of global and local search each time a new component is added to the mixture.
Projecting to a slow manifold: Singularly perturbed systems and legacy codes, Part 2 (working title), in preparation
"... Abstract. We consider dynamical systems possessing an attracting, invariant “slow manifold ” that can be parameterized by a few observable variables. We present a procedure that, given a process for integrating the system step by step and a set of values of the observables, finds the values of the r ..."
Abstract

Cited by 58 (19 self)
 Add to MetaCart
(Show Context)
Abstract. We consider dynamical systems possessing an attracting, invariant “slow manifold ” that can be parameterized by a few observable variables. We present a procedure that, given a process for integrating the system step by step and a set of values of the observables, finds the values of the remaining system variables such that the state is close to the slow manifold to some desired accuracy. It should be noted that this is not equivalent to “integrating down to the manifold ” since the latter process may significantly change the values of the observables. We consider problems whose solution has a singular perturbation expansion, although we do not know what it is nor have any way to compute it (because the system is not necessarily expressed in a singular perturbation form). We show in this paper that, under some conditions, computing the values of the remaining variables so that their (m + 1)st time derivatives are zero provides an estimate of the unknown variables that is an mthorder approximation to a point on the slow manifold in a sense to be defined. We then show how this criterion can be applied approximately when the system is defined by a legacy code rather than directly through closed form equations. This procedure can be valuable when one wishes to start a simulation of the detailed model on the slow manifold with particular values of observable variables characterizing the slow manifold.
Greedy Mixture Learning for Multiple Motif Discovery in Biological Sequences
, 2003
"... Motivation: This paper studies the problem of discovering subsequences, known as motifs, that are common to a given collection of related biosequences, by proposing agreedy algorithm for learning a mixture of motifs model through likelihood maximization. The approach adds sequentially a new motif to ..."
Abstract

Cited by 28 (6 self)
 Add to MetaCart
Motivation: This paper studies the problem of discovering subsequences, known as motifs, that are common to a given collection of related biosequences, by proposing agreedy algorithm for learning a mixture of motifs model through likelihood maximization. The approach adds sequentially a new motif to a mixture model by performing a combined scheme of global and local search for appropriately initializing its parameters. In addition, a hierarchical partitioning scheme based on kdtrees is presented for partitioning the input dataset in order to speedup the global searching procedure. The proposed method compares favorably over the wellknown MEME approach and treats successfully several drawbacks of MEME.
Kernel Methods for Deep Learning
"... We introduce a new family of positivedefinite kernel functions that mimic the computation in large, multilayer neural nets. These kernel functions can be used in shallow architectures, such as support vector machines (SVMs), or in deep kernelbased architectures that we call multilayer kernel machi ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
(Show Context)
We introduce a new family of positivedefinite kernel functions that mimic the computation in large, multilayer neural nets. These kernel functions can be used in shallow architectures, such as support vector machines (SVMs), or in deep kernelbased architectures that we call multilayer kernel machines (MKMs). We evaluate SVMs and MKMs with these kernel functions on problems designed to illustrate the advantages of deep architectures. On several problems, we obtain better results than previous, leading benchmarks from both SVMs with Gaussian kernels as well as deep belief nets. 1
Eigenvoice speaker adaptation via composite kernel PCA
 in Advances in Neural Information Processing Systems 16
, 2004
"... Eigenvoice speaker adaptation has been shown effective when only a small amount of adaptation data is available. At the heart of the method is principal component analysis (PCA) employed to find the most important eigenvoices. In this paper, we postulate that nonlinear PCA, in particular kernel PCA, ..."
Abstract

Cited by 13 (7 self)
 Add to MetaCart
(Show Context)
Eigenvoice speaker adaptation has been shown effective when only a small amount of adaptation data is available. At the heart of the method is principal component analysis (PCA) employed to find the most important eigenvoices. In this paper, we postulate that nonlinear PCA, in particular kernel PCA, may be even more effective. One major challenge is on how to map the featurespace eigenvoices back to the observation space so that the state observation likelihood during estimation of eigenvoice weights and subsequent decoding can be computed. Our solution is to compute kernel PCA using composite kernels, and we will call our new method Kernel Eigenvoice. On the TIDIGITS corpus, we found that compared with a speakerindependent model, our kernel eigenvoice adaptation method can reduce the word error rate by 25 % while the conventional eigenvoice approach can only match the performance of the speakerindependent model. 1
Kernel eigenvoice speaker adaptation
 IEEE Transactions on Speech and Audio Processing
, 2005
"... Eigenvoicebased methods have been shown to be effective for fast speaker adaptation when only a small amount of adaptation data, say, less than 10 seconds, is available. At the heart of the method is principal component analysis (PCA) employed to find the most important eigenvoices. In this paper, ..."
Abstract

Cited by 13 (7 self)
 Add to MetaCart
(Show Context)
Eigenvoicebased methods have been shown to be effective for fast speaker adaptation when only a small amount of adaptation data, say, less than 10 seconds, is available. At the heart of the method is principal component analysis (PCA) employed to find the most important eigenvoices. In this paper, we postulate that nonlinear PCA using kernel methods may be even more effective. The eigenvoices thus derived will be called kernel eigenvoices (KEV), and we will call our new adaptation method kernel eigenvoice speaker adaptation. However, unlike the standard eigenvoice (EV) method, an adapted speaker model found by the kernel eigenvoice method resides in the highdimensional kernelinduced feature space, which, in general, cannot be mapped back to an exact preimage in the input speaker supervector space. Consequently, it is not clear how to obtain the constituent Gaussians of the adapted model that are needed for the computation of state observation likelihoods during the estimation of eigenvoice weights and subsequent decoding. Our solution is the use of composite kernels in such a way that state observation likelihoods can be computed using only kernel functions without the need of a speakeradapted model in the input supervector space. In this paper, we investigate two different composite kernels for KEV adaptation: direct sum kernel and tensor product kernel. In an evaluation on
Subspace Information Criterion for NonQuadratic Regularizers  Model Selection for Sparse Regressors
 IEEE Transactions on Neural Networks
, 2002
"... Nonquadratic regularizers, in particular the # 1 norm regularizer can yield sparse solutions that generalize well. In this work we propose the Generalized Subspace Information Criterion (GSIC) that allows to predict the generalization error for this useful family of regularizers. We show that un ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
(Show Context)
Nonquadratic regularizers, in particular the # 1 norm regularizer can yield sparse solutions that generalize well. In this work we propose the Generalized Subspace Information Criterion (GSIC) that allows to predict the generalization error for this useful family of regularizers. We show that under some technical assumptions GSIC is an asymptotically unbiased estimator of the generalization error. GSIC is demonstrated to have a good performance in experiments with the # 1 norm regularizer as we compare with the Network Information Criterion and crossvalidation in relatively large sample cases. However in the small sample case, GSIC tends to fail to capture the optimal model due to its large variance. Therefore, also a biased version of GSIC is introduced, which achieves reliable model selection in the relevant and challenging scenario of high dimensional data and few samples.