Results 1  10
of
30
Parallelized stochastic gradient descent
 Advances in Neural Information Processing Systems 23
, 2010
"... Abstract With the increase in available data parallel machine learning has become an increasingly pressing problem. In this paper we present the first parallel stochastic gradient descent algorithm including a detailed analysis and experimental evidence. Unlike prior work on parallel optimization a ..."
Abstract

Cited by 97 (4 self)
 Add to MetaCart
(Show Context)
Abstract With the increase in available data parallel machine learning has become an increasingly pressing problem. In this paper we present the first parallel stochastic gradient descent algorithm including a detailed analysis and experimental evidence. Unlike prior work on parallel optimization algorithms
Fast and Robust Earth Mover’s Distances
"... We present a new algorithm for a robust family of Earth Mover’s Distances EMDs with thresholded ground distances. The algorithm transforms the flownetwork of the EMD so that the number of edges is reduced by an order of magnitude. As a result, we compute the EMD by an order of magnitude faster tha ..."
Abstract

Cited by 90 (6 self)
 Add to MetaCart
(Show Context)
We present a new algorithm for a robust family of Earth Mover’s Distances EMDs with thresholded ground distances. The algorithm transforms the flownetwork of the EMD so that the number of edges is reduced by an order of magnitude. As a result, we compute the EMD by an order of magnitude faster than the original algorithm, which makes it possible to compute the EMD on large histograms and databases. In addition, we show that EMDs with thresholded ground distances have many desirable properties. First, they correspond to the way humans perceive distances. Second, they are robust to outlier noise and quantization effects. Third, they are metrics. Finally, experimental results on image retrieval show that thresholding the ground distance of the EMD improves both accuracy and speed. 1.
Measure based regularization
 Advances in Neural Information Processing Systems 16
, 2004
"... We address in this paper the question of how the knowledge of the marginal distribution P (x) can be incorporated in a learning algorithm. We suggest three theoretical methods for taking into account this distribution for regularization and provide links to existing graphbased semisupervised learn ..."
Abstract

Cited by 40 (6 self)
 Add to MetaCart
(Show Context)
We address in this paper the question of how the knowledge of the marginal distribution P (x) can be incorporated in a learning algorithm. We suggest three theoretical methods for taking into account this distribution for regularization and provide links to existing graphbased semisupervised learning algorithms. We also propose practical implementations. 1
LinearTime Computation of Similarity Measures for Sequential Data
, 2008
"... Efficient and expressive comparison of sequences is an essential procedure for learning with sequential data. In this article we propose a generic framework for computation of similarity measures for sequences, covering various kernel, distance and nonmetric similarity functions. The basis for comp ..."
Abstract

Cited by 38 (24 self)
 Add to MetaCart
Efficient and expressive comparison of sequences is an essential procedure for learning with sequential data. In this article we propose a generic framework for computation of similarity measures for sequences, covering various kernel, distance and nonmetric similarity functions. The basis for comparison is embedding of sequences using a formal language, such as a set of natural words, kgrams or all contiguous subsequences. As realizations of the framework we provide lineartime algorithms of different complexity and capabilities using sorted arrays, tries and suffix trees as underlying data structures. Experiments on data sets from bioinformatics, text processing and computer security illustrate the efficiency of the proposed algorithms—enabling peak performances of up to 10^6 pairwise comparisons per second. The utility of distances and nonmetric similarity measures for sequences as alternatives to string kernels is demonstrated in applications of text categorization, network intrusion detection and transcription site recognition in DNA.
Kernel discriminant analysis for positive definite and indefinite kernels
, 2008
"... Abstract—Kernel methods are a class of well established and successful algorithms for pattern analysis due to their mathematical elegance and good performance. Numerous nonlinear extensions of pattern recognition techniques have been proposed so far based on the socalled kernel trick. The objective ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Kernel methods are a class of well established and successful algorithms for pattern analysis due to their mathematical elegance and good performance. Numerous nonlinear extensions of pattern recognition techniques have been proposed so far based on the socalled kernel trick. The objective of this paper is twofold. First, we derive an additional kernel tool that is still missing, namely kernel quadratic discriminant (KQD). We discuss different formulations of KQD based on the regularized kernel Mahalanobis distance in both complete and classrelated subspaces. Second, we propose suitable extensions of kernel linear and quadratic discriminants to indefinite kernels. We provide classifiers that are applicable to kernels defined by any symmetric similarity measure. This is important in practice because problemsuited proximity measures often violate the requirement of positive definiteness. As in the traditional case, KQD can be advantageous for data with unequal class spreads in the kernelinduced spaces, which cannot be well separated by a linear discriminant. We illustrate this on artificial and real data for both positive definite and indefinite kernels. Index Terms—Machine learning, pattern recognition, kernel methods, indefinite kernels, discriminant analysis. Ç 1
Efficient classification for metric data
 In COLT
, 2010
"... Recent advances in largemargin classification of data residing in general metric spaces (rather than Hilbert spaces) enable classification under various natural metrics, such as edit and earthmover distance. The general framework developed for this purpose by von Luxburg and Bousquet [JMLR, 2004] l ..."
Abstract

Cited by 16 (11 self)
 Add to MetaCart
(Show Context)
Recent advances in largemargin classification of data residing in general metric spaces (rather than Hilbert spaces) enable classification under various natural metrics, such as edit and earthmover distance. The general framework developed for this purpose by von Luxburg and Bousquet [JMLR, 2004] left open the question of computational efficiency and providing direct bounds on classification error. We design a new algorithm for classification in general metric spaces, whose runtime and accuracy depend on the doubling dimension of the data points. It thus achieves superior classification performance in many common scenarios. The algorithmic core of our approach is an approximate (rather than exact) solution to the classical problems of Lipschitz extension and of Nearest Neighbor Search. The algorithm’s generalization performance is established via the fatshattering dimension of Lipschitz classifiers. 1
Beyond traditional kernels: Classification in two dissimilaritybased representation spaces
 IEEE Trans. Syst., Man Cybern., Part C: Appl. Rev
, 2008
"... Abstract—Proximity captures the degree of similarity between examples and is thereby fundamental in learning. Learning from pairwise proximity data usually relies on either kernel methods for specifically designed kernels or the nearest neighbor (NN) rule. Kernel methods are powerful, but often cann ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Proximity captures the degree of similarity between examples and is thereby fundamental in learning. Learning from pairwise proximity data usually relies on either kernel methods for specifically designed kernels or the nearest neighbor (NN) rule. Kernel methods are powerful, but often cannot handle arbitrary proximities without necessary corrections. The NN rule can work well in such cases, but suffers from local decisions. The aim of this paper is to provide an indispensable explanation and insights about two simple yet powerful alternatives when neither conventional kernel methods nor the NN rule can perform best. These strategies use two proximitybased representation spaces (RSs) in which accurate classifiers are trained on all training objects and demand comparisons to a small set of prototypes. They can handle all meaningful dissimilarity measures, including nonEuclidean and nonmetric ones. Practical examples illustrate that these RSs can be highly advantageous in supervised learning. Simple classifiers built there tend to outperform the NN rule. Moreover, computational complexity may be controlled. Consequently, these approaches offer an appealing alternative to learn from proximity data for which kernel methods cannot directly be applied, are too costly or impractical, while the NN rule leads to noisy results. Index Terms—Classifier design and evaluation, indefinite kernels, similarity measures, statistical learning. I.
Structure Spaces
, 2007
"... Finite structures such as point patterns, strings, trees, and graphs occur as ”natural” representations of structured data in different application areas of machine learning. We develop the theory of structure spaces and derive geometrical and analytical concepts such as the angle between structures ..."
Abstract

Cited by 10 (7 self)
 Add to MetaCart
Finite structures such as point patterns, strings, trees, and graphs occur as ”natural” representations of structured data in different application areas of machine learning. We develop the theory of structure spaces and derive geometrical and analytical concepts such as the angle between structures and the derivative of functions on structures. In particular, we show that the gradient of a differentiable structural function is a welldefined structure pointing in the direction of steepest ascent. Exploiting the properties of structure spaces, it will turn out that a number of problems in structural pattern recognition such as central clustering or learning in structured output spaces can be formulated as optimization problems with cost functions that are locally Lipschitz. Hence, methods from nonsmooth analysis are applicable to optimize those cost functions.
Efficient regression in metric spaces via approximate Lipschitz extension
, 2011
"... Abstract. We present a framework for performing efficient regression in general metric spaces. Roughly speaking, our regressor predicts the value at a new point by computing a Lipschitz extension — the smoothest function consistent with the observed data — while performing an optimized structural ri ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
Abstract. We present a framework for performing efficient regression in general metric spaces. Roughly speaking, our regressor predicts the value at a new point by computing a Lipschitz extension — the smoothest function consistent with the observed data — while performing an optimized structural risk minimization to avoid overfitting. The offline (learning) and online (inference) stages can be solved by convex programming, but this naive approach has runtime complexity O(n 3), which is prohibitive for large datasets. We design instead an algorithm that is fast when the doubling dimension, which measures the “intrinsic ” dimensionality of the metric space, is low. We make dual use of the doubling dimension: first, on the statistical front, to bound fatshattering dimension of the class of Lipschitz functions (and obtain risk bounds); and second, on the computational front, to quickly compute a hypothesis function and a prediction based on Lipschitz extension. Our resulting regressor is both asymptotically strongly consistent and comes with finitesample risk bounds, while making minimal structural and noise assumptions.