Results 1  10
of
84
An introduction to variable and feature selection
 Journal of Machine Learning Research
, 2003
"... Variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available. ..."
Abstract

Cited by 804 (15 self)
 Add to MetaCart
(Show Context)
Variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available.
A Feature Selection Newton Method for Support Vector Machine Classification
 Computational Optimization and Applications
, 2002
"... A fast Newton method, that suppresses input space features, is proposed for a linear programming formulation of support vector machine classifiers. The proposed standalone method can handle classification problems in very high dimensional spaces, such as 28,032 dimensions, and generates a classifie ..."
Abstract

Cited by 55 (4 self)
 Add to MetaCart
(Show Context)
A fast Newton method, that suppresses input space features, is proposed for a linear programming formulation of support vector machine classifiers. The proposed standalone method can handle classification problems in very high dimensional spaces, such as 28,032 dimensions, and generates a classifier that depends on very few input features, such as 7 out of the original 28,032. The method can also handle problems with a large number of data points and requires no specialized linear programming packages but merely a linear equation solver. For nonlinear kernel classifiers, the method utilizes a minimal number of kernel functions in the classifier that it gener ates.
A statistical approach to material classification using image patch examplars
 IEEE Trans. Pattern Anal. Mach. Intell. 2009
"... ..."
(Show Context)
Multipleinstance learning for music information retrieval
 In ISMIR
, 2008
"... Multipleinstance learning algorithms train classifiers from lightly supervised data, i.e. labeled collections of items, rather than labeled items. We compare the multipleinstance learners miSVM and MILES on the task of classifying 10second song clips. These classifiers are trained on tags at the ..."
Abstract

Cited by 36 (6 self)
 Add to MetaCart
(Show Context)
Multipleinstance learning algorithms train classifiers from lightly supervised data, i.e. labeled collections of items, rather than labeled items. We compare the multipleinstance learners miSVM and MILES on the task of classifying 10second song clips. These classifiers are trained on tags at the track, album, and artist levels, or granularities, that have been derived from tags at the clip granularity, allowing us to test the effectiveness of the learners at recovering the clip labeling in the training set and predicting the clip labeling for a heldout test set. We find that miSVM is better than a control at the recovery task on training clips, with an average classification accuracy as high as 87 % over 43 tags; on test clips, it is comparable to the control with an average classification accuracy of up to 68%. MILES performed adequately on the recovery task, but poorly on the test clips. 1
Gene Selection Using Support Vector Machines With Nonconvex Penalty
 Bioinformatics
, 2006
"... Motivation: With the development of DNA microarray technology, scientists can now measure the expression levels of thousands of genes simultaneously in one single experiment. One current difficulty in interpreting microarray data comes from their innate nature of “high dimensional low sample size.” ..."
Abstract

Cited by 30 (2 self)
 Add to MetaCart
(Show Context)
Motivation: With the development of DNA microarray technology, scientists can now measure the expression levels of thousands of genes simultaneously in one single experiment. One current difficulty in interpreting microarray data comes from their innate nature of “high dimensional low sample size.” Therefore, robust and accurate gene selection methods are required to identify differentially expressed group of genes across different samples, e.g., between cancerous and normal cells. Successful gene selection will help to classify different cancer types, lead to a better understanding of genetic signatures in cancers, and improve treatment strategies. Although gene selection and cancer classification are two closely related problems, most existing approaches handle them separately by selecting genes prior to classification. We provide
A sparse support vector machine approach to regionbased image categorization
 In CVPR ’05
, 2005
"... Automatic image categorization using lowlevel features is a challenging research topic in computer vision. In this paper, we formulate the image categorization problem as a multipleinstance learning (MIL) problem by viewing an image as a bag of instances, each corresponding to a region obtained fr ..."
Abstract

Cited by 27 (0 self)
 Add to MetaCart
(Show Context)
Automatic image categorization using lowlevel features is a challenging research topic in computer vision. In this paper, we formulate the image categorization problem as a multipleinstance learning (MIL) problem by viewing an image as a bag of instances, each corresponding to a region obtained from image segmentation. We propose a new solution to the resulting MIL problem. Unlike many existing MIL approaches that rely on the diverse density framework, our approach performs an effective feature mapping through a chosen metric distance function. Thus the MIL problem becomes solvable by a regular classification algorithm. Sparse SVM is adopted to dramatically reduce the regions that are needed to classify images. The selected regions by a sparse SVM approximate to the target concepts in the traditional diverse density framework. The proposed approach is a lot more efficient in computation and less sensitive to the class label uncertainty. Experimental results are included to demonstrate the effectiveness and robustness of the proposed method. 1.
Learning autostructured regressor from uncertain nonnegative labels
 IEEE International Conference on Computer Vision
, 2007
"... In this paper, we take the human age and pose estimation problems as examples to study automatic designing regressor from training samples with uncertain nonnegative labels. First, the nonnegative label is predicted as the square norm of a matrix, which is bilinearly transformed from the nonlinear ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
(Show Context)
In this paper, we take the human age and pose estimation problems as examples to study automatic designing regressor from training samples with uncertain nonnegative labels. First, the nonnegative label is predicted as the square norm of a matrix, which is bilinearly transformed from the nonlinear mappings of the candidate kernels. Two transformation matrices are then learned for deriving such a matrix by solving a semidefinite programming (SDP) problem, in which the uncertain label of each sample is expressed as two inequality constraints. The objective function of SDP controls the ranks of these two matrices, and consequently automatically determines the structure of the regressor. The whole framework for automatic designing regressor from samples with uncertain nonnegative labels has the following characteristics: 1) SDP formulation makes full use of the uncertain labels, instead of using conventional fixed labels; 2) regression with matrix norm naturally guarantees the nonnegativity of the labels, and greater prediction capability is achieved by integrating the squares of the matrix elements, which act as weak regressors; and 3) the regressor structure is automatically determined by the pursuit of simplicity, which potentially promotes the algorithmic generalization capability. Extensive experiments on two human age databases, FGNET and Yamaha, as well as the Pointing’04 pose database, demonstrate encouraging estimation accuracy improvements over conventional regression algorithms. 1.
Direct convex relaxations of sparse svm
 in ICML ’07: Proceedings of the 24th international conference on Machine learning
"... Although support vector machines (SVMs) for binary classification give rise to a decision rule that only relies on a subset of the training data points (support vectors), it will in general be based on all available features in the input space. We propose two direct, novel convex relaxations of a no ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
(Show Context)
Although support vector machines (SVMs) for binary classification give rise to a decision rule that only relies on a subset of the training data points (support vectors), it will in general be based on all available features in the input space. We propose two direct, novel convex relaxations of a nonconvex sparse SVM formulation that explicitly constrains the cardinality of the vector of feature weights. One relaxation results in a quadraticallyconstrained quadratic program (QCQP), while the second is based on a semidefinite programming (SDP) relaxation. The QCQP formulation can be interpreted as applying an adaptive softthreshold on the SVM hyperplane, while the SDP formulation learns a weighted innerproduct (i.e. a kernel) that results in a sparse hyperplane. Experimental results show an increase in sparsity while conserving the generalization performance compared to a standard as well as a linear programming SVM. 1.
Ultrahigh dimensional feature selection: beyond the linear model
, 2009
"... Variable selection in highdimensional space characterizes many contemporary problems in scientific discovery and decision making. Many frequentlyused techniques are based on independence screening; examples include correlation ranking (Fan and Lv, 2008) or feature selection using a twosample tte ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
Variable selection in highdimensional space characterizes many contemporary problems in scientific discovery and decision making. Many frequentlyused techniques are based on independence screening; examples include correlation ranking (Fan and Lv, 2008) or feature selection using a twosample ttest in highdimensional classification (Tibshirani et al., 2003). Within the context of the linear model, Fan and Lv (2008) showed that this simple correlation ranking possesses a sure independence screening property under certain conditions and that its revision, called iteratively sure independent screening (ISIS), is needed when the features are marginally unrelated but jointly related to the response variable. In this paper, we extend ISIS, without explicit definition of residuals, to a general pseudolikelihood framework, which includes generalized linear models as a special case. Even in the leastsquares setting, the new method improves ISIS by allowing feature deletion in the iterative process. Our technique allows us to select important features in highdimensional classification where the popularly used twosample tmethod fails. A new technique is introduced to reduce the false selection rate in the feature screening stage. Several simulated and two real data examples are presented to illustrate the methodology.
Unsupervised feature selection for multicluster data
 KDD
, 2010
"... In many data analysis tasks, one is often confronted with very high dimensional data. Feature selection techniques are designed to find the relevant feature subset of the original features which can facilitate clustering, classification and retrieval. In this paper, we consider the feature selection ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
(Show Context)
In many data analysis tasks, one is often confronted with very high dimensional data. Feature selection techniques are designed to find the relevant feature subset of the original features which can facilitate clustering, classification and retrieval. In this paper, we consider the feature selection problem in unsupervised learning scenario, which is particularly difficult due to the absence of class labels that would guide the search for relevant information. The feature selection problem is essentially a combinatorial optimization problem which is computationally expensive. Traditional unsupervised feature selection methods address this issue by selecting the top ranked features based on certain scores computed independently for each feature. These approaches neglect the possible correlation between different features and thus can not produce an optimal feature subset. Inspired from the recent developments on manifold learning and L1regularized models for subset selection, we propose in this paperanewapproach,calledMultiCluster Feature Selection (MCFS), for unsupervised feature selection. Specifically, we select those features such that the multicluster structure of the data can be best preserved. The corresponding optimization problem can be efficiently solved since it only involves a sparse eigenproblem and a L1regularized least squares problem. Extensive experimental results over various reallife data sets have demonstrated the superiority of the proposed algorithm.