Results 1  10
of
45
Graph Kernels
, 2007
"... We present a unified framework to study graph kernels, special cases of which include the random walk (Gärtner et al., 2003; Borgwardt et al., 2005) and marginalized (Kashima et al., 2003, 2004; Mahé et al., 2004) graph kernels. Through reduction to a Sylvester equation we improve the time complexit ..."
Abstract

Cited by 40 (4 self)
 Add to MetaCart
We present a unified framework to study graph kernels, special cases of which include the random walk (Gärtner et al., 2003; Borgwardt et al., 2005) and marginalized (Kashima et al., 2003, 2004; Mahé et al., 2004) graph kernels. Through reduction to a Sylvester equation we improve the time complexity of kernel computation between unlabeled graphs with n vertices from O(n 6) to O(n 3). We find a spectral decomposition approach even more efficient when computing entire kernel matrices. For labeled graphs we develop conjugate gradient and fixedpoint methods that take O(dn 3) time per iteration, where d is the size of the label set. By extending the necessary linear algebra to Reproducing Kernel Hilbert Spaces (RKHS) we obtain the same result for ddimensional edge kernels, and O(n 4) in the infinitedimensional case; on sparse graphs these algorithms only take O(n 2) time per iteration in all cases. Experiments on graphs from bioinformatics and other application domains show that these techniques can speed up computation of the kernel by an order of magnitude or more. We also show that certain rational kernels (Cortes et al., 2002, 2003, 2004) when specialized to graphs reduce to our random walk graph kernel. Finally, we relate our framework to Rconvolution kernels (Haussler, 1999) and provide a kernel that is close to the optimal assignment kernel of Fröhlich et al. (2006) yet provably positive semidefinite.
A tutorial on Bayesian optimization of expensive cost functions, withapplicationtoactiveusermodeling andhierarchical reinforcement learning
, 2009
"... We present a tutorial on Bayesian optimization, a method of finding the maximum of expensive cost functions. Bayesian optimization employs the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function. This permits a utilitybased se ..."
Abstract

Cited by 29 (2 self)
 Add to MetaCart
We present a tutorial on Bayesian optimization, a method of finding the maximum of expensive cost functions. Bayesian optimization employs the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function. This permits a utilitybased selection of the next observation to make on the objective function, which must take into account both exploration (sampling from areas of high uncertainty) and exploitation (sampling areas likely to offer improvement over the current best observation). We also present two detailed extensions of Bayesian optimization, with experiments—active user modelling with preferences, and hierarchical reinforcement learning— and a discussion of the pros and cons of Bayesian optimization based on our experiences. 1
Application of Support Vector Machines for recognition of handwritten Arabic/Persian digits
 PROCEEDING OF THE SECOND CONFERENCE ON MACHINE VISION AND IMAGE PROCESSING & APPLICATIONS (MVIP
, 2003
"... A new method for recognition of isolated handwritten Arabic/Persian digits is presented. This method is based on Support Vector Machines (SVMs), and a new approach of feature extraction. Each digit is considered from four different views, and from each view 16 features are extracted and combined to ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
A new method for recognition of isolated handwritten Arabic/Persian digits is presented. This method is based on Support Vector Machines (SVMs), and a new approach of feature extraction. Each digit is considered from four different views, and from each view 16 features are extracted and combined to obtain 64 features. Using these features, multiple SVM classifiers are trained to separate different classes of digits. CENPARMI Indian (Arabic/Persian) handwritten digit database is used for training and testing of SVM classifiers. Based on this database, differences between Arabic and Persian digits in digit recognition are shown. This database provides 7390 samples for training and 3035 samples for testing from the real life samples. Experiments show that the proposed features can provide a very good recognition result using Support Vector Machines at a recognition rate 94.14%, compared with 91.25 % obtained by MLP neural network classifier using the same features and test set.
Support Vector Learning for Fuzzy RuleBased Classification Systems
, 2003
"... To design a fuzzy rulebased classi cation system (fuzzy classi er) with good generalization abilityina high dimensional feature space has been an active research topic for a long time. As a powerful machine learning approach for pattern recognition problems, support vector machine (SVM) is known to ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
To design a fuzzy rulebased classi cation system (fuzzy classi er) with good generalization abilityina high dimensional feature space has been an active research topic for a long time. As a powerful machine learning approach for pattern recognition problems, support vector machine (SVM) is known to have good generalization ability. More importantly, an SVM can work very well on a high (or even in nite) dimensional feature space. This paper investigates the connection between fuzzy classi ers and kernel machines, establishes a link between fuzzy rules and kernels, and proposes a learning algorithm for fuzzy classi ers. We rst show that a fuzzy classi er implicitly de nes a translation invariant kernel under the assumption that all membership functions associated with the same input variable are generated from location transformation of a reference function. Fuzzy inference on the IFpart of a fuzzy rule can be viewed as evaluating the kernel function. The kernel function is then proven to be a Mercer kernel if the reference functions meet certain spectral requirement. The corresponding fuzzy classi er is named positive de  nite fuzzy classi er (PDFC). A PDFC can be built from the given training samples based on a support vector learning approach with the IFpart fuzzy rules given by the support vectors. Since the learning process minimizes an upper bound on the expected risk (expected prediction error) instead of the empirical risk (training error), the resulting PDFC usually has good generalization. Moreover, because of the sparsity properties of the SVMs, the number of fuzzy rules is irrelevant to the dimension of input space. In this sense, we avoid the "curse of dimensionality." Finally, PDFCs with dierent reference functions are constructed using the su...
KernelBased Positioning in Wireless Local Area Networks
, 2007
"... The recent proliferation of LocationBased Services (LBSs) has necessitated the development of effective indoor positioning solutions. In such a context, Wireless Local Area Network (WLAN) positioning is a particularly viable solution in terms of hardware and installation costs due to the ubiquity ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
The recent proliferation of LocationBased Services (LBSs) has necessitated the development of effective indoor positioning solutions. In such a context, Wireless Local Area Network (WLAN) positioning is a particularly viable solution in terms of hardware and installation costs due to the ubiquity of WLAN infrastructures. This paper examines three aspects of the problem of indoor WLAN positioning using received signal strength (RSS). First, we show that, due to the variability of RSS features over space, a spatially localized positioning method leads to improved positioning results. Second, we explore the problem of access point (AP) selection for positioning and demonstrate the need for further research in this area. Third, we present a kernelized distance calculation algorithm for comparing RSS observations to RSS training records. Experimental results indicate that the proposed system leads to a 17 percent (0.56 m) improvement over the widely used Knearest neighbor and histogrambased methods.
Spikernels: Embedding spiking neurons in inner product spaces
 Advances in Neural Information Processing Systems 15
, 2003
"... Innerproduct operators, often referred to as kernels in statistical learning, define a mapping from some input space into a feature space. The focus of this paper is the construction of biologicallymotivated kernels for cortical activities. The kernels we derive, termed Spikernels, map spike count ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
Innerproduct operators, often referred to as kernels in statistical learning, define a mapping from some input space into a feature space. The focus of this paper is the construction of biologicallymotivated kernels for cortical activities. The kernels we derive, termed Spikernels, map spike count sequences into an abstract vector space in which we can perform various prediction tasks. We discuss in detail the derivation of Spikernels and describe an efficient algorithm for computing their value on any two sequences of neural population spike counts. We demonstrate the merits of our modeling approach using the Spikernel and various standard kernels for the task of predicting hand movement velocities from cortical recordings. In all of our experiments all the kernels we tested outperform the standard scalar product used in regression with the Spikernel consistently achieving the best performance. 1
Kernel pca for similarity invariant shape recognition
 In the Journal of Neurocomputing
, 2006
"... We present in this paper a novel approach for shape description based on kernel principal component analysis (KPCA). The strength of this method resides in the similarity (rotation, translation and particularly scale) invariance of KPCA when using a family of triangular conditionally positive defini ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
We present in this paper a novel approach for shape description based on kernel principal component analysis (KPCA). The strength of this method resides in the similarity (rotation, translation and particularly scale) invariance of KPCA when using a family of triangular conditionally positive definite kernels. Beside this invariance, the method provides an effective way to capture nonlinearities in shape geometry. A given twodimensional curve is described using the eigenvalues of the underlying manifold modeled in a highdimensional Hilbert space. Using Fourier analysis, we will show that this eigenvalue description captures low to high variations of the shape frequencies. Experiments conducted on standard databases including the SQUID, the Swedish and the Smithsonian leaf databases, show that the method is effective in capturing invariance and generalizes well for shape matching and retrieval. Key words:
Eigenvoice speaker adaptation via composite kernel PCA
 in Advances in Neural Information Processing Systems 16
, 2004
"... Eigenvoice speaker adaptation has been shown to be effective when only a small amount of adaptation data is available. At the heart of the method is principal component analysis (PCA) employed to find the most important eigenvoices. In this paper, we postulate that nonlinear PCA, in particular kerne ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
Eigenvoice speaker adaptation has been shown to be effective when only a small amount of adaptation data is available. At the heart of the method is principal component analysis (PCA) employed to find the most important eigenvoices. In this paper, we postulate that nonlinear PCA, in particular kernel PCA, may be even more effective. One major challenge is to map the featurespace eigenvoices back to the observation space so that the state observation likelihoods can be computed during the estimation of eigenvoice weights and subsequent decoding. Our solution is to compute kernel PCA using composite kernels, and we will call our new method kernel eigenvoice speaker adaptation. On the TIDIGITS corpus, we found that compared with a speakerindependent model, our kernel eigenvoice adaptation method can reduce the word error rate by 28–33% while the standard eigenvoice approach can only match the performance of the speakerindependent model. 1
Invariant kernel functions for pattern analysis and machine learning
 Machine Learning
, 2007
"... In many learning problems prior knowledge about pattern variations can be formalized and beneficially incorporated into the analysis system. The corresponding notion of invariance is commonly used in conceptionally different ways. We propose a more distinguishing treatment in particular in the activ ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
In many learning problems prior knowledge about pattern variations can be formalized and beneficially incorporated into the analysis system. The corresponding notion of invariance is commonly used in conceptionally different ways. We propose a more distinguishing treatment in particular in the active field of kernel methods for machine learning and pattern analysis. Additionally, the fundamental relation of invariant kernels and traditional invariant pattern analysis by means of invariant representations will be clarified. After addressing these conceptional questions, we focus on practical aspects and present two generic approaches for constructing invariant kernels. The first approach is based on a technique called invariant integration. The second approach builds on invariant distances. In principle, our approaches support general transformations in particular covering discrete and nongroup or even an infinite number of patterntransformations. Additionally, both enable a smooth interpolation between invariant and noninvariant pattern analysis, i.e. they are a covering general framework. The wide applicability and various possible benefits of invariant kernels are demonstrated in different kernel methods.
Outlier Detection with the Kernelized Spatial Depth Function
, 2008
"... Statistical depth functions provide from the “deepest ” point a “centeroutward ordering” of multidimensional data. In this sense, depth functions can measure the “extremeness” or “outlyingness” of a data point with respect to a given data set. Hence they can detect outliers – observations that appe ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Statistical depth functions provide from the “deepest ” point a “centeroutward ordering” of multidimensional data. In this sense, depth functions can measure the “extremeness” or “outlyingness” of a data point with respect to a given data set. Hence they can detect outliers – observations that appear extreme relative to the rest of the observations. Of the various statistical depths, the spatial depth is especially appealing because of its computational efficiency and mathematical tractability. In this article, we propose a novel statistical depth, the kernelized spatial depth (KSD), which generalizes the spatial depth via positive definite kernels. By choosing a proper kernel, the KSD can capture the local structure of a data set while the spatial depth fails. We demonstrate this by the halfmoon data and the ringshaped data. Based on the KSD, we propose a novel outlier detection algorithm, by which an observation with a depth value less than a threshold is declared as an outlier. The proposed algorithm is simple in structure: the threshold is the only one parameter for a given kernel. It applies to a oneclass learning setting, in which “normal ” observations are given as the training data, as well as to a missing label scenario where the training set consists of a mixture of normal observations and outliers with unknown labels. We give upper bounds on the false alarm probability of a depthbased detector. These upper bounds can be used to determine the threshold. We perform extensive experiments on synthetic data and data sets from real applications. The proposed outlier detector is compared with existing methods. The KSD outlier detector demonstrates competitive performance.