Results 1  10
of
11
Improving “bagofkeypoints” image categorisation
, 2005
"... In this paper we propose two distinct enhancements to the basic “bagofkeypoints ” image categorisation scheme proposed in [4]. In this approach images are represented as a variable sized set of local image features (keypoints). Thus, we require machine learning tools which can operate on sets of v ..."
Abstract

Cited by 35 (0 self)
 Add to MetaCart
(Show Context)
In this paper we propose two distinct enhancements to the basic “bagofkeypoints ” image categorisation scheme proposed in [4]. In this approach images are represented as a variable sized set of local image features (keypoints). Thus, we require machine learning tools which can operate on sets of vectors. In [4] this is achieved by representing the set as a histogram over bins found by kmeans. We show how this approach can be improved and generalised using Gaussian Mixture Models (GMMs). Alternatively, the set of keypoints can be represented directly as a probability density function, over which a kernel can be defined. This approach is shown to give state of the art categorisation performance.
OnLine Entropy Manipulation: Stochastic Information Gradient
 IEEE Signal Processing Letters
, 2003
"... Abstract—Entropy has found significant applications in numerous signal processing problems including independent components analysis and blind deconvolution. In general, entropy estimators require ( 2) operations, being the number of samples. For practical online entropy manipulation, it is desirabl ..."
Abstract

Cited by 14 (8 self)
 Add to MetaCart
(Show Context)
Abstract—Entropy has found significant applications in numerous signal processing problems including independent components analysis and blind deconvolution. In general, entropy estimators require ( 2) operations, being the number of samples. For practical online entropy manipulation, it is desirable to determine a stochastic gradient for entropy, which has ( ) complexity. In this letter, we propose a stochastic Shannon’s entropy estimator. We determine the corresponding stochastic gradient and investigate its performance. The proposed stochastic gradient for Shannon’s entropy can be used in online adaptation problems where the optimization of an entropybased cost function is necessary. Index Terms—Shannon’s entropy, stochastic gradient for entropy. I.
SVM Decision Boundary Based Discriminative Subspace Induction
, 2002
"... Dimensionality reduction is widely acceptes as an analysis and modeling tool to deal with highdimensional spaces, although researches from different disciplines have different interpretations of what properties should be preserved in the reduction process. We study the problem of linear dimension r ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
Dimensionality reduction is widely acceptes as an analysis and modeling tool to deal with highdimensional spaces, although researches from different disciplines have different interpretations of what properties should be preserved in the reduction process. We study the problem of linear dimension reduction for classification, with a focus on sufficient dimension reduction, i.e., inducing subspaces without loss of discriminative information. Decision boundary analysis (DBA), originally proposed by Lee & Landgrebe (1993), can directly find the smallest subspace with such property. However, existing DBA implementations are computationally expensive and sensitive to sample size. In this paper, we first formulate the problem of sufficient dimension reduction for classification in parallel terms as for regression. Disclosures of these connections lead to several meaningful observations. Then we present a novel space reduction algorithm that combines SVM and DBA, thus inheriting several appealing properties from kernel machines such as good generalization, weak assumption, and efficient computation. In addition, the proposed method provides a natural way to reduce the complexity, and even improve the accuracy, of SVM itself. We demonstrate its superiority by comparative experiments on one simulated and four realworld benchmark datasets.
On the use of independent tasks for face recognition
 In Proc. IEEE Conference on Pattern Recognition and Computer Vision (CVPR), Anchorage (AK
, 2008
"... We present a method for learning discriminative linear feature extraction using independent tasks. More concretely, given a target classification task, we consider a complementary classification task that is independent of the target one. For example, in face classification field, subject recognitio ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
We present a method for learning discriminative linear feature extraction using independent tasks. More concretely, given a target classification task, we consider a complementary classification task that is independent of the target one. For example, in face classification field, subject recognition can be a target task while facial expression classification can be a complementary task. Then, we use labels of the complementary task in order to obtain a more robust feature extraction, being the new feature space less sensitive to the complementary classification. To learn the proposed feature extraction we use the mutual information measure between the projected data and both labels from the target and the complementary tasks. In our experiments, this framework has been applied to a face recognition problem, in order to inhibit this classification task from environmental artifacts, and to mitigate the effects of the small sample size problem. Our classification experiments show an improved feature extraction process using the proposed method. 1.
Ensemble Selection for Evolutionary Learning using Information Theory and Price’s Theorem
"... This paper presents an information theoretic perspective on design and analysis of evolutionary algorithms. Indicators of solution quality are developed and applied not only to individuals but also to ensembles, thereby ensuring information diversity. Price’s Theorem is extended to show how joint in ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
This paper presents an information theoretic perspective on design and analysis of evolutionary algorithms. Indicators of solution quality are developed and applied not only to individuals but also to ensembles, thereby ensuring information diversity. Price’s Theorem is extended to show how joint indicators can drive reproductive sampling rate of potential parental pairings. Heritability of mutual information is identified as a key issue. Categories and Subject Descriptors
FOR NEURAL PREDICTIVE CODING
"... In this paper, we introduce a model for Discrimant Feature Extraction (DFE): the Neural Predictive Coding (NPC). It is an extension of the Linear Predictive Coding (LPC). The Modelisation Error Ratio (MER), a discriminant criterion adapted for predictive models, is introduced. We propose a theoretic ..."
Abstract
 Add to MetaCart
(Show Context)
In this paper, we introduce a model for Discrimant Feature Extraction (DFE): the Neural Predictive Coding (NPC). It is an extension of the Linear Predictive Coding (LPC). The Modelisation Error Ratio (MER), a discriminant criterion adapted for predictive models, is introduced. We propose a theoretical validation of the discriminant properties of the MER. The experimental validation consists on phoneme recognition task. The phonemes are extracted from the DarpaTimit speech database. The performances are compared with traditional methods: LPC, MFCC, PLP. 1.
Journal of Machine Learning Research 3 (2003) 14151438 Submitted 5/02; Published 3/03 Feature Extraction by NonParametric Mutual Information
 Journal of Machine Learning Research
, 2003
"... We present a method for learning discriminative feature transforms using as criterion the mutual information between class labels and transformed features. Instead of a commonly used mutual information measure based on KullbackLeibler divergence, we use a quadratic divergence measure, which allo ..."
Abstract
 Add to MetaCart
We present a method for learning discriminative feature transforms using as criterion the mutual information between class labels and transformed features. Instead of a commonly used mutual information measure based on KullbackLeibler divergence, we use a quadratic divergence measure, which allows us to make an efficient nonparametric implementation and requires no prior assumptions about class densities. In addition to linear transforms, we also discuss nonlinear transforms that are implemented as radial basis function networks. Extensions to reduce the computational complexity are also presented, and a comparison to greedy feature selection is made.
Sequential Feature Extraction Using InformationTheoretic Learning
"... Abstract A classification system typically includes both a feature extractor and a classifier. The two components can be trained either sequentially or simultaneously. The former option has an implementation advantage since the extractor is trained independently of the classifier, but it is hinder ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract A classification system typically includes both a feature extractor and a classifier. The two components can be trained either sequentially or simultaneously. The former option has an implementation advantage since the extractor is trained independently of the classifier, but it is hindered by the suboptimality of feature selection. Simultaneous training has the advantage of minimizing classification error, but it has implementation difficulties. Certain criteria, such as Minimum Classification Error, are better suited for simultaneous training, while other criteria, such as Mutual Information, are amenable for training the extractor either sequentially or simultaneously. Herein, an informationtheoretic criterion is introduced and is evaluated for sequential training, in order to ascertain its ability to find relevant features for classification. The proposed method uses nonparametric estimation of Renyi’s entropy to train the extractor by maximizing an approximation of the mutual information between the class labels and the output of the extractor. The proposed method is compared against seven other feature reduction methods and, when combined with a simple classifier, against the Support Vector Machine and Optimal Hyperplane. Interestingly, the evaluations show that the proposed method, when used in a sequential manner, performs at least as well as the best simultaneous feature reduction methods. Index Terms Feature extraction, Information theory, Classification, Nonparametric statistics. 1
unknown title
"... Using visualization, variable selection and feature extraction to learn from industrial data ..."
Abstract
 Add to MetaCart
Using visualization, variable selection and feature extraction to learn from industrial data
FEATURE
"... Abstract—A classification system typically consists of both a feature extractor (preprocessor) and a classifier. These two components can be trained either independently or simultaneously. The former option has an implementation advantage since the extractor need only be trained once for use with an ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—A classification system typically consists of both a feature extractor (preprocessor) and a classifier. These two components can be trained either independently or simultaneously. The former option has an implementation advantage since the extractor need only be trained once for use with any classifier, whereas the latter has an advantage since it can be used to minimize classification error directly. Certain criteria, such as Minimum Classification Error, are better suited for simultaneous training, whereas other criteria, such as Mutual Information, are amenable for training the feature extractor either independently or simultaneously. Herein, an informationtheoretic criterion is introduced and is evaluated for training the extractor independently of the classifier. The proposed method uses nonparametric estimation of Renyi’s entropy to train the extractor by maximizing an approximation of the mutual information between the class labels and the output of the feature extractor. The evaluations show that the proposed method, even though it uses independent training, performs at least as well as three feature extraction methods that train the extractor and classifier simultaneously. Index Terms—Feature extraction, information theory, classification, nonparametric statistics. Ç