Results 1  10
of
10
Accelerated training of conditional random fields with stochastic gradient methods
 In ICML
, 2006
"... We apply Stochastic MetaDescent (SMD), a stochastic gradient optimization method with gain vector adaptation, to the training of Conditional Random Fields (CRFs). On several large data sets, the resulting optimizer converges to the same quality of solution over an order of magnitude faster than lim ..."
Abstract

Cited by 95 (4 self)
 Add to MetaCart
We apply Stochastic MetaDescent (SMD), a stochastic gradient optimization method with gain vector adaptation, to the training of Conditional Random Fields (CRFs). On several large data sets, the resulting optimizer converges to the same quality of solution over an order of magnitude faster than limitedmemory BFGS, the leading method reported to date. We report results for both exact and inexact inference techniques. 1.
Fast Iterative Kernel Principal Component Analysis
"... We develop gain adaptation methods that improve convergence of the kernel Hebbian algorithm (KHA) for iterative kernel PCA (Kim et al., 2005). KHA has a scalar gain parameter which is either held constant or decreased according to a predetermined annealing schedule, leading to slow convergence. We a ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
We develop gain adaptation methods that improve convergence of the kernel Hebbian algorithm (KHA) for iterative kernel PCA (Kim et al., 2005). KHA has a scalar gain parameter which is either held constant or decreased according to a predetermined annealing schedule, leading to slow convergence. We accelerate it by incorporating the reciprocal of the current estimated eigenvalues as part of a gain vector. An additional normalization term then allows us to eliminate a tuning parameter in the annealing schedule. Finally we derive and apply stochastic metadescent (SMD) gain vector adaptation (Schraudolph, 1999, 2002) in reproducing kernel Hilbert space to further speed up convergence. Experimental results on kernel PCA and spectral clustering of USPS digits, motion capture and image denoising, and image superresolution tasks confirm that our methods converge substantially faster than conventional KHA. To demonstrate scalability, we perform kernel PCA on the entire MNIST data set.
Preferential Text Classification: Learning Algorithms and Evaluation Measures
"... Abstract. In many applicative contexts in which textual documents are labelled with thematic categories, a distinction is made between the primary categories of a document, which represent the topics that are central to it, and and its secondary categories, which represent topics that the document o ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract. In many applicative contexts in which textual documents are labelled with thematic categories, a distinction is made between the primary categories of a document, which represent the topics that are central to it, and and its secondary categories, which represent topics that the document only touches upon. We contend that this distinction, so far neglected in text categorization research, is important and deserves to be explicitly tackled. The contribution of this paper is threefold. First, we propose an evaluation measure for this preferential text categorization task, whereby different kinds of misclassifications involving either primary or secondary categories have a different impact on effectiveness. Second, we establish several baseline results for this task on a wellknown benchmark for patent classification in which the distinction between primary and secondary categories is present; these results are obtained by reformulating the preferential text categorization task in terms of well established classification problems, such as single and/or multilabel multiclass classification; stateoftheart learning technology such as SVMs and kernelbased methods are used. Third, we improve on these results by using a recently proposed class of algorithms explicitly devised for learning from training data expressed in
Fast iterative kernel PCA
 Advances in Neural Information Processing Systems
, 2007
"... We introduce two methods to improve convergence of the Kernel Hebbian Algorithm (KHA) for iterative kernel PCA. KHA has a scalar gain parameter which is either held constant or decreased as 1/t, leading to slow convergence. Our KHA/et algorithm accelerates KHA by incorporating the reciprocal of the ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We introduce two methods to improve convergence of the Kernel Hebbian Algorithm (KHA) for iterative kernel PCA. KHA has a scalar gain parameter which is either held constant or decreased as 1/t, leading to slow convergence. Our KHA/et algorithm accelerates KHA by incorporating the reciprocal of the current estimated eigenvalues as a gain vector. We then derive and apply Stochastic MetaDescent (SMD) to KHA/et; this further speeds convergence by performing gain adaptation in RKHS. Experimental results for kernel PCA and spectral clustering of USPS digits as well as motion capture and image denoising problems confirm that our methods converge substantially faster than conventional KHA. 1
Brisk Kernel ICA
"... Recent approaches to independent component analysis have used kernel independence measures to obtain very good performance in ICA, particularly in areas where classical methods experience difficulty (for instance, sources with nearzero kurtosis). In this chapter, we compare two efficient extension ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Recent approaches to independent component analysis have used kernel independence measures to obtain very good performance in ICA, particularly in areas where classical methods experience difficulty (for instance, sources with nearzero kurtosis). In this chapter, we compare two efficient extensions of these methods for largescale problems: random subsampling of entries in the Gram matrices used in defining the independence measures, and incomplete Cholesky decomposition of these matrices. We derive closedform, efficiently computable approximations for the gradients of these measures, and compare their performance on ICA using both artificial and music data. We show that kernel ICA can scale up to larger problems than yet attempted, and that incomplete Cholesky decomposition performs better than random sampling.
Image Denoising with Kernels Based on Natural Image Relations
"... A successful class of image denoising methods is based on Bayesian approaches working in wavelet representations. The performance of these methods improves when relations among the local frequency coefficients are explicitly included. However, in these techniques, analytical estimates can be obtaine ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
A successful class of image denoising methods is based on Bayesian approaches working in wavelet representations. The performance of these methods improves when relations among the local frequency coefficients are explicitly included. However, in these techniques, analytical estimates can be obtained only for particular combinations of analytical models of signal and noise, thus precluding its straightforward extension to deal with other arbitrary noise sources. In this paper, we propose an alternative nonexplicit way to take into account the relations among natural image wavelet coefficients for denoising: we use support vector regression (SVR) in the wavelet domain to enforce these relations in the estimated signal. Since relations among the coefficients are specific to the signal, the regularization property of SVR is exploited to remove the noise, which does not share this feature. The specific signal relations are encoded in an anisotropic kernel obtained from mutual information measures computed on a representative image database. In the proposed scheme, training considers minimizing the KullbackLeibler divergence (KLD) between the estimated and actual probability functions (or histograms) of signal and noise in order to enforce similarity up to the higher (computationally estimable) order. Due to its nonparametric
Preferential text classification: learning algorithms and evaluation measures
"... In many applicative contexts in which textual documents are labelled with thematic categories, a distinction is made between the primary categories of a document, which represent the topics that are central to it, and its secondary categories, which represent topics that the document only touches up ..."
Abstract
 Add to MetaCart
In many applicative contexts in which textual documents are labelled with thematic categories, a distinction is made between the primary categories of a document, which represent the topics that are central to it, and its secondary categories, which represent topics that the document only touches upon. We contend that this distinction, so far neglected in text categorization research, is important and deserves to be explicitly tackled. The contribution of this paper is threefold. First, we propose an evaluation measure for this preferential text categorization task, whereby different kinds of misclassifications involving either primary or secondary categories have a different impact on effectiveness. Second, we establish several baseline results for this task on a wellknown benchmark for patent classification in which the distinction between primary and secondary categories is present; these results are obtained by reformulating the preferential text categorization task in terms of well established classification problems, such as single and/or multilabel multiclass classification; stateoftheart learning technology such as SVMs and kernelbased methods are used. Third, we improve on these results by using a recently proposed class of algorithms explicitly devised for learning from training data expressed in preferential
Submitted 9/08; Published Image Denoising with Kernels Based on Natural Image Relations
"... A successful class of image denoising methods is based on Bayesian approaches working in wavelet representations. The performance of these methods improves when relations among the local frequency coefficients are explicitly included. However, in these techniques, analytical estimates can be obtaine ..."
Abstract
 Add to MetaCart
A successful class of image denoising methods is based on Bayesian approaches working in wavelet representations. The performance of these methods improves when relations among the local frequency coefficients are explicitly included. However, in these techniques, analytical estimates can be obtained only for particular combinations of analytical models of signal and noise, thus precluding its straightforward extension to deal with other arbitrary noise sources. In this paper, we propose an alternative nonexplicit way to take into account the relations among natural image wavelet coefficients for denoising: we use support vector regression (SVR) in the wavelet domain to enforce these relations in the estimated signal. Since relations among the coefficients are specific to the signal, the regularization property of SVR is exploited to remove the noise, which does not share this feature. The specific signal relations are encoded in an anisotropic kernel obtained from mutual information measures computed on a representative image database. In the proposed scheme, training considers minimizing the KullbackLeibler divergence (KLD) between the estimated and actual probability functions (or histograms) of signal and noise in order to enforce similarity up to the higher (computationally estimable) order. Due to its nonparametric
Sequence labeling · Stochastic gradient descent
"... Periodic stepsize adaptation in secondorder gradient ..."