Results 1  10
of
29
Nonlinear component analysis as a kernel eigenvalue problem

, 1996
"... We describe a new method for performing a nonlinear form of Principal Component Analysis. By the use of integral operator kernel functions, we can efficiently compute principal components in highdimensional feature spaces, related to input space by some nonlinear map; for instance the space of all ..."
Abstract

Cited by 1044 (72 self)
 Add to MetaCart
We describe a new method for performing a nonlinear form of Principal Component Analysis. By the use of integral operator kernel functions, we can efficiently compute principal components in highdimensional feature spaces, related to input space by some nonlinear map; for instance the space of all possible 5pixel products in 16x16 images. We give the derivation of the method, along with a discussion of other techniques which can be made nonlinear with the kernel approach; and present first experimental results on nonlinear feature extraction for pattern recognition.
A tutorial on support vector regression
, 2004
"... In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing ..."
Abstract

Cited by 470 (2 self)
 Add to MetaCart
In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.
Generalization Performance of Regularization Networks and Support . . .
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 2001
"... We derive new bounds for the generalization error of kernel machines, such as support vector machines and related regularization networks by obtaining new bounds on their covering numbers. The proofs make use of a viewpoint that is apparently novel in the field of statistical learning theory. The hy ..."
Abstract

Cited by 73 (20 self)
 Add to MetaCart
We derive new bounds for the generalization error of kernel machines, such as support vector machines and related regularization networks by obtaining new bounds on their covering numbers. The proofs make use of a viewpoint that is apparently novel in the field of statistical learning theory. The hypothesis class is described in terms of a linear operator mapping from a possibly infinitedimensional unit ball in feature space into a finitedimensional space. The covering numbers of the class are then determined via the entropy numbers of the operator. These numbers, which characterize the degree of compactness of the operator, can be bounded in terms of the eigenvalues of an integral operator induced by the kernel function used by the machine. As a consequence, we are able to theoretically explain the effect of the choice of kernel function on the generalization performance of support vector machines.
Large scale online learning
 Advances in Neural Information Processing Systems 16
, 2004
"... We consider situations where training data is abundant and computing resources are comparatively scarce. We argue that suitably designed online learning algorithms asymptotically outperform any batch learning algorithm. Both theoretical and experimental evidences are presented. 1 ..."
Abstract

Cited by 46 (6 self)
 Add to MetaCart
We consider situations where training data is abundant and computing resources are comparatively scarce. We argue that suitably designed online learning algorithms asymptotically outperform any batch learning algorithm. Both theoretical and experimental evidences are presented. 1
SV Estimation of a Distribution's Support
, 1999
"... Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified 0 < 1. We propose an algorithm which approach ..."
Abstract

Cited by 28 (2 self)
 Add to MetaCart
Suppose you are given some dataset drawn from an underlying probability distribution P and you want to estimate a subset S of input space such that the probability that a test point drawn from P lies outside of S is bounded by some a priori specified 0 < 1. We propose an algorithm which approaches this problem by trying to estimate a function f which is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The algorithm is a natural extension of the support vector algorithm to the case of unlabelled data.
Sample Complexity for Learning Recurrent Perceptron Mappings
 IEEE Trans. Inform. Theory
, 1996
"... Recurrent perceptron classifiers generalize the classical perceptron model. They take into account those correlations and dependences among input coordinates which arise from linear digital filtering. This paper provides tight bounds on sample complexity associated to the fitting of such models to e ..."
Abstract

Cited by 23 (10 self)
 Add to MetaCart
Recurrent perceptron classifiers generalize the classical perceptron model. They take into account those correlations and dependences among input coordinates which arise from linear digital filtering. This paper provides tight bounds on sample complexity associated to the fitting of such models to experimental data. Keywords: perceptrons, recurrent models, neural networks, learning, VapnikChervonenkis dimension 1 Introduction One of the most popular approaches to binary pattern classification, underlying many statistical techniques, is based on perceptrons or linear discriminants ; see for instance the classical reference [9]. In this context, one is interested in classifying kdimensional input patterns v = (v 1 ; : : : ; v k ) into two disjoint classes A + and A \Gamma . A perceptron P which classifies vectors into A + and A \Gamma is characterized by a vector (of "weights") ~c 2 R k , and operates as follows. One forms the inner product ~c:v = c 1 v 1 + : : : c k v k . I...
N.: Recursive Aggregation of Estimators by Mirror Descent Algorithm with averaging. Problems of Information Transmission
"... We consider a recursive algorithm to construct an aggregated estimator from a finite number of base decision rules in the classification problem. The estimator approximately minimizes a convex risk functional under the ℓ 1constraint. It is defined by a stochastic version of the mirror descent algor ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
We consider a recursive algorithm to construct an aggregated estimator from a finite number of base decision rules in the classification problem. The estimator approximately minimizes a convex risk functional under the ℓ 1constraint. It is defined by a stochastic version of the mirror descent algorithm (i.e., of the method which performs gradient descent in the dual space) with an additional averaging. The main result of the paper is an upper bound for the expected accuracy 1 of the proposed estimator. This bound is of the order √ (log M)/t with an explicit and small constant factor, where M is the dimension of the problem and t stands for the sample size. Similar bound is proved for a more general setting that covers, in particular, the regression model with squared loss. 1
Website fingerprinting in onion routing based anonymization networks
 in Proceedings of the 18th ACM Computer and Communications Security (ACM CCS) Workshop on Privacy in the Electronic Society (WPES 2011
, 2011
"... Lowlatency anonymization networks such as Tor and JAP claim to hide the recipient and the content of communications from a local observer, i.e., an entity that can eavesdrop the traffic between the user and the first anonymization node. Especially users in totalitarian regimes strongly depend on su ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
Lowlatency anonymization networks such as Tor and JAP claim to hide the recipient and the content of communications from a local observer, i.e., an entity that can eavesdrop the traffic between the user and the first anonymization node. Especially users in totalitarian regimes strongly depend on such networks to freely communicate. For these people, anonymity is particularly important and an analysis of the anonymization methods against various attacks is necessary to ensure adequate protection. In this paper we show that anonymity in Tor and JAP is not as strong as expected so far and cannot resist website fingerprinting attacks under certain circumstances. We first define features for website fingerprinting solely based on volume, time, and direction of the traffic. As a result, the subsequent classification becomes much easier. We apply support vector machines with the introduced features. We are able to improve recognition results of existing works on a given stateoftheart dataset in Tor from 3 % to 55 % and in JAP from 20 % to 80%. The datasets assume a closedworld with 775 websites only. In a next step, we transfer our findings to a more complex and realistic openworld scenario, i.e., recognition of several websites in a set of thousands of random unknown websites. To the best of our knowledge, this work is the first successful attack in the openworld scenario. We achieve a surprisingly high true positive rate of up to 73 % for a false positive rate of 0.05%. Finally, we show preliminary results of a proofofconcept implementation that applies camouflage as a countermeasure to hamper the fingerprinting attack. For JAP, the detection rate decreases from 80 % to 4 % and for Tor it drops from 55 % to about 3%.