Results 1  10
of
95
A Hilbert space embedding for distributions
 In Algorithmic Learning Theory: 18th International Conference
, 2007
"... Abstract. We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in twosample tests, which are used for ..."
Abstract

Cited by 113 (44 self)
 Add to MetaCart
(Show Context)
Abstract. We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel Hilbert space. Applications of this technique can be found in twosample tests, which are used for determining whether two sets of observations arise from the same distribution, covariate shift correction, local learning, measures of independence, and density estimation. Kernel methods are widely used in supervised learning [1, 2, 3, 4], however they are much less established in the areas of testing, estimation, and analysis of probability distributions, where information theoretic approaches [5, 6] have long been dominant. Recent examples include [7] in the context of construction of graphical models, [8] in the context of feature extraction, and [9] in the context of independent component analysis. These methods have by and large a common issue: to compute quantities such as the mutual information, entropy, or KullbackLeibler divergence, we require sophisticated space partitioning and/or
WeisfeilerLehman Graph Kernels
, 2011
"... In this article, we propose a family of efficient kernels for large graphs with discrete node labels. Key to our method is a rapid feature extraction scheme based on the WeisfeilerLehman test of isomorphism on graphs. It maps the original graph to a sequence of graphs, whose node attributes capture ..."
Abstract

Cited by 36 (4 self)
 Add to MetaCart
In this article, we propose a family of efficient kernels for large graphs with discrete node labels. Key to our method is a rapid feature extraction scheme based on the WeisfeilerLehman test of isomorphism on graphs. It maps the original graph to a sequence of graphs, whose node attributes capture topological and label information. A family of kernels can be defined based on this WeisfeilerLehman sequence of graphs, including a highly efficient kernel comparing subtreelike patterns. Its runtime scales only linearly in the number of edges of the graphs and the length of the WeisfeilerLehman graph sequence. In our experimental evaluation, our kernels outperform stateoftheart graph kernels on several graph classification benchmark data sets in terms of accuracy and runtime. Our kernels open the door to largescale applications of graph kernels in various disciplines such as computational biology and social network analysis.
Estimating labels from label proportions
 Proceedings of the 25th Annual International Conference on Machine Learning
, 2008
"... Consider the following problem: given sets of unlabeled observations, each set with known label proportions, predict the labels of another set of observations, also with known label proportions. This problem appears in areas like ecommerce, spam filtering and improper content detection. We present ..."
Abstract

Cited by 34 (2 self)
 Add to MetaCart
Consider the following problem: given sets of unlabeled observations, each set with known label proportions, predict the labels of another set of observations, also with known label proportions. This problem appears in areas like ecommerce, spam filtering and improper content detection. We present consistent estimators which can reconstruct the correct labels with high probability in a uniform convergence sense. Experiments show that our method works well in practice. 1
A Tour of Modern Image Filtering  New insights and methods, both practical and theoretical
 IEEE SIGNAL PROCESSING MAGAZINE [106]
, 2013
"... Recent developments in computational imaging and restoration have heralded the arrival and convergence of several powerful methods for adaptive processing of multidimensional data. Examples include moving least square (from graphics), the bilateral filter (BF) and anisotropic diffusion (from compute ..."
Abstract

Cited by 27 (2 self)
 Add to MetaCart
Recent developments in computational imaging and restoration have heralded the arrival and convergence of several powerful methods for adaptive processing of multidimensional data. Examples include moving least square (from graphics), the bilateral filter (BF) and anisotropic diffusion (from computer vision), boosting, kernel, and spectral methods (from machine learning), nonlocal means (NLM) and its variants (from signal processing), Bregman iterations (from applied math), kernel regression, and iterative scaling (from statistics). While these approaches found their inspirations in diverse fields of nascence, they are deeply connected. Digital Object Identifier 10.1109/MSP.2011.2179329 Date of publication: 5 December 2012 In this article, I present a practical and accessible framework to understand some of the basic underpinnings of these methods, with the intention of leading the reader to a broad understanding of how they interrelate. I also illustrate connections between these techniques and more classical (empirical) Bayesian approaches. The proposed framework is used to arrive at new insights and methods, both practical and theoretical. In particular, several novel optimality properties of algorithms in wide use such as blockmatching and threedimensional (3D) filtering (BM3D), and methods for their iterative improvement (or nonexistence thereof) are discussed. A general approach is laid out to enable the performance analysis and subsequent improvement of many existing filtering algorithms. While much of the material discussed is applicable to the wider class of linear degradation models beyond noise (e.g., blur,) to keep matters focused, we consider the problem of denoising here.
C.: Latent Structured Models for Human Pose Estimation. ICCV
, 2011
"... We present an approach for automatic 3D human pose reconstruction from monocular images, based on a discriminative formulation with latent segmentation inputs. We advance the field of structured prediction and human pose reconstruction on several fronts. First, by working with a pool of figuregroun ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
(Show Context)
We present an approach for automatic 3D human pose reconstruction from monocular images, based on a discriminative formulation with latent segmentation inputs. We advance the field of structured prediction and human pose reconstruction on several fronts. First, by working with a pool of figureground segment hypotheses, the prediction problem is formulated in terms of combined learning and inference over segment hypotheses and 3D human articular configurations. Besides constructing tractable formulations for the combined segment selection and pose estimation problem, we propose new augmented kernels that can better encode complex dependencies between output variables. Furthermore, we provide primal linear reformulations based on Fourier kernel approximations, in order to scaleup the nonlinear latent structured prediction methodology. The proposed models are shown to be competitive in the HumanEva benchmark and are also illustrated in a clip collected from a Hollywood movie, where the model can infer human poses from monocular images captured in complex environments. 1.
Predicting Structured Objects with Support Vector Machines
, 2009
"... Machine Learning today offers a broad repertoire of methods for classification and regression. But what if we need to predict complex objects like trees, orderings, or alignments? Such problems arise naturally in natural language processing, search engines, and bioinformatics. The following explores ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
(Show Context)
Machine Learning today offers a broad repertoire of methods for classification and regression. But what if we need to predict complex objects like trees, orderings, or alignments? Such problems arise naturally in natural language processing, search engines, and bioinformatics. The following explores a generalization of Support Vector Machines (SVMs) for such complex prediction problems.
A Tour of Modern Image Filtering
, 2011
"... Recent developments in computational imaging and restoration have heralded the arrival and convergence of several powerful methods for adaptive processing of multidimensional data. Examples include ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
Recent developments in computational imaging and restoration have heralded the arrival and convergence of several powerful methods for adaptive processing of multidimensional data. Examples include
Use of kernel deep convex networks and endtoend learning for spoken language understanding
 IEEE SLT
, 2012
"... We present our recent and ongoing work on applying deep learning techniques to spoken language understanding (SLU) problems. The previously developed deep convex network (DCN) is extended to its kernel version (KDCN) where the number of hidden units in each DCN layer approaches infinity using the k ..."
Abstract

Cited by 16 (9 self)
 Add to MetaCart
We present our recent and ongoing work on applying deep learning techniques to spoken language understanding (SLU) problems. The previously developed deep convex network (DCN) is extended to its kernel version (KDCN) where the number of hidden units in each DCN layer approaches infinity using the kernel trick. We report experimental results demonstrating dramatic error reduction achieved by the KDCN over both the Boostingbased baseline and the DCN on a domain classification task of SLU, especially when a highly correlated set of features extracted from search query click logs are used. Not only can DCN and KDCN be used as a domain or intent classifier for SLU, they can also be used as local, discriminative feature extractors for the slot filling task of SLU. The interface of KDCN to slot filling systems via the softmax function is presented. Finally, we outline an endtoend learning strategy for training the softmax parameters (and potentially all DCN and KDCN parameters) where the learning objective can take any performance measure (e.g. the Fmeasure) for the full SLU system. Index Terms — kernel learning, deep learning, spoken language understanding, domain detection, slot filling 1.
E.: How Effective is Tabu Search to Configure Support Vector Regression for Effort Estimation
 In PROMISE Procs, ACM
, 2010
"... ABSTRACT Background. Recent studies have shown that Support Vector Regression (SVR) has an interesting potential in the field of effort estimation. However applying SVR requires to carefully set some parameters that heavily affect the prediction accuracy. No general guidelines are available to sele ..."
Abstract

Cited by 10 (6 self)
 Add to MetaCart
(Show Context)
ABSTRACT Background. Recent studies have shown that Support Vector Regression (SVR) has an interesting potential in the field of effort estimation. However applying SVR requires to carefully set some parameters that heavily affect the prediction accuracy. No general guidelines are available to select these parameters, whose choice also depends on the characteristics of the data set used. This motivates the work described in this paper. Aims. We have investigated the use of an optimization technique in combination with SVR to select a suitable subset of parameters to be used for effort estimation. This technique is named Tabu Search (TS), which is a metaheuristic approach used to address several optimization problems. Method. We employed SVR with linear and RBF kernels, and used variables' preprocessing strategies (i.e., logarithmic). As for the data set, we employed the Tukutuku crosscompany database, which is widely adopted in Web effort estimation studies, and performed a holdout validation using two different splits of the data set. As benchmark, results are compared to those obtained with Manual StepWise Regression, CaseBased Reasoning, and Bayesian Networks. Results. Our results show that TS provides a good choice of parameters, so that the combination of TS and SVR outperforms any other technique applied on this data set. Conclusions. The use of the metaheuristic Tabu Search allowed us to obtain (I) an automatic choice of the parameters required to run SVR, and (II) a significant improvement on prediction accuracy for SVR. While we are not guaranteed that this is the global optimum, the results we are presenting are the best performance ever obtained on the problem at the hand, up to now. Of course, the experimental results here presented should be assessed on further data. However, they are surely interesting enough to suggest the use of SVR among the techniques that are suitable for effort estimation, especially when using a crosscompany database.
A Family of Simple NonParametric Kernel Learning Algorithms
"... Previous studies of NonParametric Kernel Learning (NPKL) usually formulate the learning task as a SemiDefinite Programming (SDP) problem that is often solved by some general purpose SDP solvers. However, for N data examples, the time complexity of NPKL using a standard interiorpoint SDP solver cou ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
Previous studies of NonParametric Kernel Learning (NPKL) usually formulate the learning task as a SemiDefinite Programming (SDP) problem that is often solved by some general purpose SDP solvers. However, for N data examples, the time complexity of NPKL using a standard interiorpoint SDP solver could be as high as O(N 6.5), which prohibits NPKL methods applicable to real applications, even for data sets of moderate size. In this paper, we present a family of efficient NPKL algorithms, termed “SimpleNPKL”, which can learn nonparametric kernels from a large set of pairwise constraints efficiently. In particular, we propose two efficient SimpleNPKL algorithms. One is SimpleNPKL algorithm with linear loss, which enjoys a closedform solution that can be efficiently computed by the Lanczos sparse eigen decomposition technique. Another one is SimpleNPKL algorithm with other loss functions (including square hinge loss, hinge loss, square loss) that can be reformulated as a saddlepoint optimization problem, which can be further resolved by a fast iterative algorithm. In contrast to the previous NPKL approaches, our empirical results show that the proposed new technique, maintaining the same accuracy, is significantly more efficient and scalable. Finally, we also demonstrate that the proposed new technique is also applicable to speed up many kernel learning tasks, including colored maximum variance unfolding, minimum volume embedding, and structure preserving embedding.