Results 1 
8 of
8
Multilabel dimensionality reduction via dependence maximization
 In Proceedings of AAAI Conference on Artificial Intelligence(AAAI
, 2008
"... Multilabel learning deals with data associated with multiple labels simultaneously. Like other machine learning and data mining tasks, multilabel learning also suffers from the curse of dimensionality. Although dimensionality reduction has been studied for many years, multilabel dimensionality r ..."
Abstract

Cited by 35 (6 self)
 Add to MetaCart
(Show Context)
Multilabel learning deals with data associated with multiple labels simultaneously. Like other machine learning and data mining tasks, multilabel learning also suffers from the curse of dimensionality. Although dimensionality reduction has been studied for many years, multilabel dimensionality reduction remains almost untouched. In this paper, we propose a multilabel dimensionality reduction method, MDDM, which attempts to project the original data into a lowerdimensional feature space maximizing the dependence between the original feature description and the associated class labels. Based on the HilbertSchmidt Independence Criterion, we derive a closedform solution which enables the dimensionality reduction process to be efficient. Experiments validate the performance of MDDM.
1 Canonical Correlation Analysis for MultiLabel Classification: A Least Squares Formulation, Extensions and Analysis
"... Abstract—Canonical Correlation Analysis (CCA) is a wellknown technique for finding the correlations between two sets of multidimensional variables. It projects both sets of variables onto a lowerdimensional space in which they are maximally correlated. CCA is commonly applied for supervised dimens ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
Abstract—Canonical Correlation Analysis (CCA) is a wellknown technique for finding the correlations between two sets of multidimensional variables. It projects both sets of variables onto a lowerdimensional space in which they are maximally correlated. CCA is commonly applied for supervised dimensionality reduction in which the two sets of variables are derived from the data and the class labels, respectively. It is wellknown that CCA can be formulated as a least squares problem in the binaryclass case. However, the extension to the more general setting remains unclear. In this paper, we show that under a mild condition which tends to hold for highdimensional data, CCA in the multilabel case can be formulated as a least squares problem. Based on this equivalence relationship, efficient algorithms for solving least squares problems can be applied to scale CCA to very large data sets. In addition, we propose several CCA extensions including the sparse CCA formulation based on the 1norm regularization. We further extend the least squares formulation to partial least squares. In addition, we show that the CCA projection for one set of variables is independent of the regularization on the other set of multidimensional variables, providing new insights on the effect of regularization on CCA. We have conducted experiments using benchmark data sets. Experiments on multilabel data sets confirm the established equivalence relationships. Results also demonstrate the effectiveness and efficiency of the proposed CCA extensions. Index Terms—Canonical correlation analysis, least squares, multilabel learning, partial least squares, regularization 1
A Least Squares Formulation for Canonical Correlation Analysis
"... Canonical Correlation Analysis (CCA) is a wellknown technique for finding the correlations between two sets of multidimensional variables. It projects both sets of variables into a lowerdimensional space in which they are maximally correlated. CCA is commonly applied for supervised dimensionality ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
(Show Context)
Canonical Correlation Analysis (CCA) is a wellknown technique for finding the correlations between two sets of multidimensional variables. It projects both sets of variables into a lowerdimensional space in which they are maximally correlated. CCA is commonly applied for supervised dimensionality reduction, in which one of the multidimensional variables is derived from the class label. It has been shown that CCA can be formulated as a least squares problem in the binaryclass case. However, their relationship in the more general setting remains unclear. In this paper, we show that, under a mild condition which tends to hold for highdimensional data, CCA in multilabel classifications can be formulated as a least squares problem. Based on this equivalence relationship, we propose several CCA extensions including sparse CCA using 1norm regularization. Experiments on multilabel data sets confirm the established equivalence relationship. Results also demonstrate the effectiveness of the proposed CCA extensions. 1.
Featureaware label space dimension reduction for multilabel classification problem
, 2012
"... Label space dimension reduction (LSDR) is an efficient and effective paradigm for multilabel classification with many classes. Existing approaches to LSDR, such as compressive sensing and principal label space transformation, exploit only the label part of the dataset, but not the feature part. In ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
(Show Context)
Label space dimension reduction (LSDR) is an efficient and effective paradigm for multilabel classification with many classes. Existing approaches to LSDR, such as compressive sensing and principal label space transformation, exploit only the label part of the dataset, but not the feature part. In this paper, we propose a novel approach to LSDR that considers both the label and the feature parts. The approach, called conditional principal label space transformation, is based on minimizing an upper bound of the popular Hamming loss. The minimization step of the approach can be carried out efficiently by a simple use of singular value decomposition. In addition, the approach can be extended to a kernelized version that allows the use of sophisticated feature combinations to assist LSDR. The experimental results verify that the proposed approach is more effective than existing ones to LSDR across many realworld datasets. 1
On label dependence in multilabel classification
 In Workshop Proceedings of Learning from MultiLabel Data, The 27th International Conference on Machine Learning
, 2010
"... The aim of this paper is to elaborate on the important issue of label dependence in multilabel classification (MLC). Looking at the problem from a statistical perspective, we claim that two different types of label dependence should be distinguished, namely conditional and unconditional. We formall ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
The aim of this paper is to elaborate on the important issue of label dependence in multilabel classification (MLC). Looking at the problem from a statistical perspective, we claim that two different types of label dependence should be distinguished, namely conditional and unconditional. We formally explain the differences and connections between both types of dependence and illustrate them by means of simple examples. Moreover, we given an overview of stateoftheart algorithms for MLC and categorize them according to the type of label dependence they seek to capture. 1.
Unified Solution to Nonnegative Data Factorization Problems
"... In this paper, we restudy the nonconvex data factorization problems (regularized or not, unsupervised or supervised), where the optimization is confined in the nonnegative orthant, and provide a unified convergency provable solution based on multiplicative nonnegative update rules. This solution is ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
In this paper, we restudy the nonconvex data factorization problems (regularized or not, unsupervised or supervised), where the optimization is confined in the nonnegative orthant, and provide a unified convergency provable solution based on multiplicative nonnegative update rules. This solution is general for optimization problems with blockwisely quadratic objective functions, and thus direct update rules can be derived by skipping over the tedious specific procedure deduction process and algorithmic convergence proof. By taking this unified solution as a general template, we i) reexplain several existing nonnegative data factorization algorithms, ii) develop a variant of nonnegative matrix factorization formulation for handling outofsample data, and iii) propose a new nonnegative data factorization algorithm, called Correlated CoDecomposition (CCD), to simultaneously factorize two feature spaces by exploring the intercorrelated information. Experiments on both face recognition and multilabel image annotation tasks demonstrate the wide applicability of the unified solution as well as the effectiveness of two proposed new algorithms. 1
PROXIMITYBASED GRAPH EMBEDDINGS FOR MULTILABEL CLASSIFICATION
"... Abstract: In many real applications of text mining, information retrieval and natural language processing, largescale features are frequently used, which often make the employed machine learning algorithms intractable, leading to the wellknown problem “curse of dimensionality”. Aiming at not only ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract: In many real applications of text mining, information retrieval and natural language processing, largescale features are frequently used, which often make the employed machine learning algorithms intractable, leading to the wellknown problem “curse of dimensionality”. Aiming at not only removing the redundant information from the original features but also improving their discriminating ability, we present a novel approach on supervised generation of lowdimensional, proximitybased, graph embeddings to facilitate multilabel classification. The optimal embeddings are computed from a supervised adjacency graph, called multilabel graph, which simultaneously preserves proximity structures between samples constructed based on feature and multilabel class information. We propose different ways to obtain this multilabel graph, by either working in a binary label space or a projected real label space. To reduce the training cost in the dimensionality reduction procedure caused by largescale features, a smaller set of relation features between each sample and a set of representative prototypes are employed. The effectiveness of our proposed method is demonstrated with two document collections for text categorization based on the “bag of words ” model. 1