Results 1  10
of
63
KernelBased Learning of Hierarchical Multilabel Classification Models
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We present a kernelbased algorithm for hierarchical text classification where the documents are allowed to belong to more than one category at a time. The classification model is a variant of the Maximum Margin Markov Network framework, where the classification hierarchy is represented as a Mark ..."
Abstract

Cited by 53 (6 self)
 Add to MetaCart
We present a kernelbased algorithm for hierarchical text classification where the documents are allowed to belong to more than one category at a time. The classification model is a variant of the Maximum Margin Markov Network framework, where the classification hierarchy is represented as a Markov tree equipped with an exponential family defined on the edges. We present an efficient optimization algorithm based on incremental conditional gradient ascent in singleexample subspaces spanned by the marginal dual variables. The optimization is facilitated with a dynamic programming based algorithm that computes best update directions in the feasible set. Experiments show
MultiLabel Prediction via Compressed Sensing
, 902
"... We consider multilabel prediction problems with large output spaces under the assumption of output sparsity – that the target vectors have small support. We develop a general theory for a variant of the popular ECOC (error correcting output code) scheme, based on ideas from compressed sensing for e ..."
Abstract

Cited by 43 (2 self)
 Add to MetaCart
We consider multilabel prediction problems with large output spaces under the assumption of output sparsity – that the target vectors have small support. We develop a general theory for a variant of the popular ECOC (error correcting output code) scheme, based on ideas from compressed sensing for exploiting this sparsity. The method can be regarded as a simple reduction from multilabel regression problems to binary regression problems. It is shown that the number of subproblems need only be logarithmic in the total number of label values, making this approach radically more efficient than others. We also state and prove performance guarantees for this method, and test it empirically. 1.
Mining multilabel data
 In Data Mining and Knowledge Discovery Handbook
, 2010
"... A large body of research in supervised learning deals with the analysis of singlelabel data, where training examples are associated with a single label λ from a set of disjoint labels L. However, training examples in several application domains are often associated with a set of labels Y ⊆ L. Such d ..."
Abstract

Cited by 41 (4 self)
 Add to MetaCart
A large body of research in supervised learning deals with the analysis of singlelabel data, where training examples are associated with a single label λ from a set of disjoint labels L. However, training examples in several application domains are often associated with a set of labels Y ⊆ L. Such data are called multilabel.
L.: Hierarchical classification: combining bayes with svm
 In: ICML ’06: Proceedings of the 23rd international conference on Machine learning
, 2006
"... We study hierarchical classification in the general case when an instance could belong to more than one class node in the underlying taxonomy. Experiments done in previous work showed that a simple hierarchy of Support Vectors Machines (SVM) with a topdown evaluation scheme has a surprisingly good ..."
Abstract

Cited by 30 (2 self)
 Add to MetaCart
We study hierarchical classification in the general case when an instance could belong to more than one class node in the underlying taxonomy. Experiments done in previous work showed that a simple hierarchy of Support Vectors Machines (SVM) with a topdown evaluation scheme has a surprisingly good performance on this kind of task. In this paper, we introduce a refined evaluation scheme which turns the hierarchical SVM classifier into an approximator of the Bayes optimal classifier with respect to a simple stochastic model for the labels. Experiments on synthetic datasets, generated according to this stochastic model, show that our refined algorithm outperforms the simple hierarchical SVM. On realworld data, however, the advantage brought by our approach is a bit less clear. We conjecture this is due to a higher noise rate for the training labels in the low levels of the taxonomy. 1.
Learning Hierarchical MultiCategory Text Classification Models
 22nd International Conference on Machine Learning (ICML'2005
, 2005
"... jstatecs.soton.ac.uk ..."
Refined Experts Improving Classification in Large Taxonomies ABSTRACT
"... While largescale taxonomies – especially for web pages – have been in existence for some time, approaches to automatically classify documents into these taxonomies have met with limited success compared to the more general progress made in text classification. We argue that this stems from three ca ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
While largescale taxonomies – especially for web pages – have been in existence for some time, approaches to automatically classify documents into these taxonomies have met with limited success compared to the more general progress made in text classification. We argue that this stems from three causes: increasing sparsity of training data at deeper nodes in the taxonomy, error propagation where a mistake made high in the hierarchy cannot be recovered, and increasingly complex decision surfaces in higher nodes in the hierarchy. While prior research has focused on the first problem, we introduce methods that target the latter two problems – first by biasing the training distribution to reduce error propagation and second by propagating up “firstguess ” expert information in a bottomup manner before making a refined top down choice. Finally, we present an empirical study demonstrating that the suggested changes lead to 1030 % improvements in F1 scores versus an accepted competitive baseline, hierarchical SVMs.
Large Scale MaxMargin MultiLabel Classification with Priors
"... We propose a maxmargin formulation for the multilabel classification problem where the goal is to tag a data point with a set of prespecified labels. Given a set of L labels, a data point can be tagged with any of the 2 L possible subsets. The main challenge therefore lies in optimising over this ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
We propose a maxmargin formulation for the multilabel classification problem where the goal is to tag a data point with a set of prespecified labels. Given a set of L labels, a data point can be tagged with any of the 2 L possible subsets. The main challenge therefore lies in optimising over this exponentially large label space subject to label correlations. Existing solutions take either of two approaches. The first assumes, a priori, that there are no label correlations and independently trains a classifier for each label (as is done in the 1vsAll heuristic). This reduces the problem complexity from exponential to linear and such methods can scale to large problems. The second approach explicitly models correlations by pairwise label interactions. However, the complexity remains exponential unless one assumes that label correlations are sparse. Furthermore, the learnt correlations reflect the training set biases. We take a middle approach that assumes labels are correlated but does not incorporate pairwise label terms in the prediction function. We show that the complexity can still be reduced from exponential to linear while modelling dense pairwise label correlations. By incorporating correlation priors we can overcome training set biases and improve prediction accuracy. We provide a principled interpretation of the 1vsAll method and show
2007, ‘Exploiting known taxonomies in learning overlapping concepts
 In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI’07
"... Many realworld classification problems involve large numbers of overlapping categories that are arranged in a hierarchy or taxonomy. We propose to incorporate prior knowledge on category taxonomy directly into the learning architecture. We present two concrete multilabel classification methods, a ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
Many realworld classification problems involve large numbers of overlapping categories that are arranged in a hierarchy or taxonomy. We propose to incorporate prior knowledge on category taxonomy directly into the learning architecture. We present two concrete multilabel classification methods, a generalized version of Perceptron and a hierarchical multilabel SVM learning. Our method works with arbitrary, not necessarily singly connected taxonomies, and can be applied more generally in settings where categories are characterized by attributes and relations that are not necessarily induced by a taxonomy. Experimental results on WIPOalpha collection show that our hierarchical methods bring significant performance improvement. 1
Robust Bounds for Classification via Selective Sampling
"... We introduce a new algorithm for binary classification in the selective sampling protocol. Our algorithm uses Regularized Least Squares (RLS) as base classifier, and for this reason it can be efficiently run in any RKHS. Unlike previous marginbased semisupervised algorithms, our sampling condition ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
We introduce a new algorithm for binary classification in the selective sampling protocol. Our algorithm uses Regularized Least Squares (RLS) as base classifier, and for this reason it can be efficiently run in any RKHS. Unlike previous marginbased semisupervised algorithms, our sampling condition hinges on a simultaneous upper bound on bias and variance of the RLS estimate under a simple linear label noise model. This fact allows us to prove performance bounds that hold for an arbitrary sequence of instances. In particular, we show that our sampling strategy approximates the margin of the Bayes optimal classifier to any desired accuracy ε by asking Õ ( d/ε2) queries (in the RKHS case d is replaced by a suitable spectral quantity). While these are the standard rates in the fully supervised i.i.d. case, the best previously known result in our harder setting was Õ ( d3 /ε4). Preliminary experiments show that some of our algorithms also exhibit a good practical performance. 1.