Results 1  10
of
78
MultiLabel Prediction via Compressed Sensing
, 902
"... We consider multilabel prediction problems with large output spaces under the assumption of output sparsity – that the target vectors have small support. We develop a general theory for a variant of the popular ECOC (error correcting output code) scheme, based on ideas from compressed sensing for e ..."
Abstract

Cited by 100 (3 self)
 Add to MetaCart
(Show Context)
We consider multilabel prediction problems with large output spaces under the assumption of output sparsity – that the target vectors have small support. We develop a general theory for a variant of the popular ECOC (error correcting output code) scheme, based on ideas from compressed sensing for exploiting this sparsity. The method can be regarded as a simple reduction from multilabel regression problems to binary regression problems. It is shown that the number of subproblems need only be logarithmic in the total number of label values, making this approach radically more efficient than others. We also state and prove performance guarantees for this method, and test it empirically. 1.
Mining multilabel data
 In Data Mining and Knowledge Discovery Handbook
, 2010
"... A large body of research in supervised learning deals with the analysis of singlelabel data, where training examples are associated with a single label λ from a set of disjoint labels L. However, training examples in several application domains are often associated with a set of labels Y ⊆ L. Such d ..."
Abstract

Cited by 92 (9 self)
 Add to MetaCart
(Show Context)
A large body of research in supervised learning deals with the analysis of singlelabel data, where training examples are associated with a single label λ from a set of disjoint labels L. However, training examples in several application domains are often associated with a set of labels Y ⊆ L. Such data are called multilabel.
Multilabel classification via calibrated label ranking
 MACH LEARN
, 2008
"... Label ranking studies the problem of learning a mapping from instances to rankings over a predefined set of labels. Hitherto existing approaches to label ranking implicitly operate on an underlying (utility) scale which is not calibrated in the sense that it lacks a natural zero point. We propose a ..."
Abstract

Cited by 69 (10 self)
 Add to MetaCart
Label ranking studies the problem of learning a mapping from instances to rankings over a predefined set of labels. Hitherto existing approaches to label ranking implicitly operate on an underlying (utility) scale which is not calibrated in the sense that it lacks a natural zero point. We propose a suitable extension of label ranking that incorporates the calibrated scenario and substantially extends the expressive power of these approaches. In particular, our extension suggests a conceptually novel technique for extending the common learning by pairwise comparison approach to the multilabel scenario, a setting previously not being amenable to the pairwise decomposition technique. The key idea of the approach is to introduce an artificial calibration label that, in each example, separates the relevant from the irrelevant labels. We show that this technique can be viewed as a combination of pairwise preference learning and the conventional relevance classification technique, where a separate classifier is trained to predict whether a label is relevant or not. Empirical results in the area of text categorization, image classification and gene analysis underscore the merits of the calibrated model in comparison to stateoftheart multilabel learning methods.
Decision trees for hierarchical multilabel classification
 Machine Learning
, 2008
"... Abstract. Hierarchical multilabel classification (HMC) is a variant of classification where instances may belong to multiple classes at the same time and these classes are organized in a hierarchy. This article presents several approaches to the induction of decision trees for HMC, as well as an em ..."
Abstract

Cited by 59 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Hierarchical multilabel classification (HMC) is a variant of classification where instances may belong to multiple classes at the same time and these classes are organized in a hierarchy. This article presents several approaches to the induction of decision trees for HMC, as well as an empirical study of their use in functional genomics. We compare learning a single HMC tree (which makes predictions for all classes together) to two approaches that learn a set of regular classification trees (one for each class). The first approach defines an independent singlelabel classification task for each class (SC). Obviously, the hierarchy introduces dependencies between the classes. While they are ignored by the first approach, they are exploited by the second approach, named hierarchical singlelabel classification (HSC). Depending on the application at hand, the hierarchy of classes can be such that each class has at most one parent (tree structure) or such that classes may have multiple parents (DAG structure). The latter case has not been considered before and we show how the HMC and HSC approaches can be modified to support this setting. We compare the three approaches on 24 yeast data sets using as classification schemes MIPS’s FunCat (tree structure) and the Gene Ontology (DAG structure). We show that HMC trees outperform HSC and SC trees along three dimensions: predictive accuracy, model size, and induction time. We conclude that HMC trees should definitely be considered in HMC tasks where interpretable models are desired. 1
Stochastic blockcoordinate frankwolfe optimization for structural svms. arXiv preprint:1207.4747
, 2012
"... We propose a randomized blockcoordinate variant of the classic FrankWolfe algorithm for convex optimization with blockseparable constraints. Despite its lower iteration cost, we show that it achieves a similar convergence rate in duality gap as the full FrankWolfe algorithm. We also show that, w ..."
Abstract

Cited by 58 (6 self)
 Add to MetaCart
We propose a randomized blockcoordinate variant of the classic FrankWolfe algorithm for convex optimization with blockseparable constraints. Despite its lower iteration cost, we show that it achieves a similar convergence rate in duality gap as the full FrankWolfe algorithm. We also show that, when applied to the dual structural support vector machine (SVM) objective, this yields an online algorithm that has the same low iteration complexity as primal stochastic subgradient methods. However, unlike stochastic subgradient methods, the blockcoordinate FrankWolfe algorithm allows us to compute the optimal stepsize and yields a computable duality gap guarantee. Our experiments indicate that this simple algorithm outperforms competing structural SVM solvers. 1.
Large Scale MaxMargin MultiLabel Classification with Priors
"... We propose a maxmargin formulation for the multilabel classification problem where the goal is to tag a data point with a set of prespecified labels. Given a set of L labels, a data point can be tagged with any of the 2 L possible subsets. The main challenge therefore lies in optimising over this ..."
Abstract

Cited by 41 (2 self)
 Add to MetaCart
We propose a maxmargin formulation for the multilabel classification problem where the goal is to tag a data point with a set of prespecified labels. Given a set of L labels, a data point can be tagged with any of the 2 L possible subsets. The main challenge therefore lies in optimising over this exponentially large label space subject to label correlations. Existing solutions take either of two approaches. The first assumes, a priori, that there are no label correlations and independently trains a classifier for each label (as is done in the 1vsAll heuristic). This reduces the problem complexity from exponential to linear and such methods can scale to large problems. The second approach explicitly models correlations by pairwise label interactions. However, the complexity remains exponential unless one assumes that label correlations are sparse. Furthermore, the learnt correlations reflect the training set biases. We take a middle approach that assumes labels are correlated but does not incorporate pairwise label terms in the prediction function. We show that the complexity can still be reduced from exponential to linear while modelling dense pairwise label correlations. By incorporating correlation priors we can overcome training set biases and improve prediction accuracy. We provide a principled interpretation of the 1vsAll method and show
Large Scale MultiLabel Classification via MetaLabeler
 WWW 2009 MADRID! TRACK: DATA MINING / SESSION: LEARNING
, 2009
"... The explosion of online content has made the management of such content nontrivial. Webrelated tasks such as web page categorization, news filtering, query categorization, tag recommendation, etc. often involve the construction of multilabel categorization systems on a large scale. Existing multil ..."
Abstract

Cited by 35 (7 self)
 Add to MetaCart
The explosion of online content has made the management of such content nontrivial. Webrelated tasks such as web page categorization, news filtering, query categorization, tag recommendation, etc. often involve the construction of multilabel categorization systems on a large scale. Existing multilabel classification methods either do not scale or have unsatisfactory performance. In this work, we propose MetaLabeler to automatically determine the relevant set of labels for each instance without intensive human involvement or expensive crossvalidation. Extensive experiments conducted on benchmark data show that the MetaLabeler tends to outperform existing methods. Moreover, MetaLabeler scales to millions of multilabeled instances and can be deployed easily. This enables us to apply the MetaLabeler to a large scale query categorization problem in Yahoo!, yielding a significant improvement in performance.
The Interplay of Optimization and Machine Learning Research
 Journal of Machine Learning Research
, 2006
"... The fields of machine learning and mathematical programming are increasingly intertwined. Optimization problems lie at the heart of most machine learning approaches. The Special Topic on Machine Learning and Large Scale Optimization examines this interplay. Machine learning researchers have embra ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
(Show Context)
The fields of machine learning and mathematical programming are increasingly intertwined. Optimization problems lie at the heart of most machine learning approaches. The Special Topic on Machine Learning and Large Scale Optimization examines this interplay. Machine learning researchers have embraced the advances in mathematical programming allowing new types of models to be pursued. The special topic includes models using quadratic, linear, secondorder cone, semidefinite, and semiinfinite programs. We observe that the qualities of good optimization algorithms from the machine learning and optimization perspectives can be quite different. Mathematical programming puts a premium on accuracy, speed, and robustness. Since generalization is the bottom line in machine learning and training is normally done offline, accuracy and small speed improvements are of little concern in machine learning. Machine learning prefers simpler algorithms that work in reasonable computational time for specific classes of problems. Reducing machine learning problems to wellexplored mathematical programming classes with robust general purpose optimization codes allows machine learning researchers to rapidly develop new techniques.
Multilabel classification on tree and DAGstructured hierarchies
 In ICML
, 2011
"... Many realworld applications involve multilabel classification, in which the labels are organized in the form of a tree or directed acyclic graph (DAG). However, current research efforts typically ignore the label dependencies or can only exploit the dependencies in treestructured hierarchies. In t ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
Many realworld applications involve multilabel classification, in which the labels are organized in the form of a tree or directed acyclic graph (DAG). However, current research efforts typically ignore the label dependencies or can only exploit the dependencies in treestructured hierarchies. In this paper, we present a novel hierarchical multilabel classification algorithm which can be used on both tree and DAGstructured hierarchies. The key idea is to formulate the search for the optimal consistent multilabel as the finding of the best subgraph in a tree/DAG. Using a simple greedy strategy, the proposed algorithm is computationally efficient, easy to implement, does not suffer from the problem of insufficient/skewed training data in classifier training, and can be readily used on large hierarchies. Theoretical results guarantee the optimality of the obtained solution. Experiments are performed on a large number of functional genomics data sets. The proposed method consistently outperforms the stateoftheart method on both tree and DAGstructured hierarchies. 1.
MultiLabel Learning with Millions of Labels: Recommending Advertiser Bid Phrases for Web Pages
"... Recommending phrases from web pages for advertisers to bid on against search engine queries is an important research problem with direct commercial impact. Most approaches have found it infeasible to determine the relevance of all possible queries to a given ad landing page and have focussed on maki ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
Recommending phrases from web pages for advertisers to bid on against search engine queries is an important research problem with direct commercial impact. Most approaches have found it infeasible to determine the relevance of all possible queries to a given ad landing page and have focussed on making recommendations from a small set of phrases extracted (and expanded) from the page using NLP and ranking based techniques. In this paper, we eschew this paradigm, and demonstrate that it is possible to efficiently predicttherelevantsubsetofqueriesfromalargesetofmonetizable ones by posing the problem as a multilabel learning task with each query being represented by a separate label. We develop Multilabel Random Forests to tackle problems with millions of labels. Our proposed classifier has prediction costs that are logarithmic in the number of labels and can make predictions in a few milliseconds using 10 Gb of RAM. We demonstrate that it is possible to generate training data for our classifier automatically from click logs withoutanyhumanannotationorintervention. Wetrainour classifier on tens of millions of labels, features and training points in less than two days on a thousand node cluster. We develop a sparse semisupervised multilabel learning formulation to deal with training set biases and noisy labels harvested automatically from the click logs. This formulation is used to infer a belief in the state of each label for each training ad and the random forest classifier is extended to train on these beliefs rather than the given labels. Experiments reveal significant gains over ranking and NLP based techniques on a large test set of 5 million ads using multiple metrics.