Results 1  10
of
37
Clustering Based on Conditional Distributions in an Auxiliary Space
 Neural Computation
, 2001
"... We study the problem of learning groups or categories that are local ..."
Abstract

Cited by 79 (22 self)
 Add to MetaCart
We study the problem of learning groups or categories that are local
Bankruptcy Analysis with SelfOrganizing Maps in Learning Metrics
 IEEE Transactions on Neural Networks
, 2001
"... We introduce a method for deriving a metric, locally based on the Fisher information matrix, into the data space. A SelfOrganizing Map is computed in the new metric to explore financial statements of enterprises. The metric measures local distances in terms of changes in the distribution of an auxi ..."
Abstract

Cited by 48 (19 self)
 Add to MetaCart
We introduce a method for deriving a metric, locally based on the Fisher information matrix, into the data space. A SelfOrganizing Map is computed in the new metric to explore financial statements of enterprises. The metric measures local distances in terms of changes in the distribution of an auxiliary random variable that reflects what is important in the data. In this paper the variable indicates bankruptcy within the next few years. The conditional density of the auxiliary variable is first estimated, and the change in the estimate resulting from local displacements in the primary data space is measured using the Fisher information matrix. When a SelfOrganizing Map is computed in the new metric it still visualizes the data space in a topologypreserving fashion, but represents the (local) directions in which the probability of bankruptcy changes the most.
Feature Selection by Maximum Marginal Diversity: Optimality and Implications for Visual Recognition
 In submitted
, 2002
"... We address the question of feature selection in the context of visual recognition. It is shown that, besides efficient from a computational standpoint, the infomax principle is nearly optimal in the minimum Bayes error sense. The concept of marginal diversity is introduced, leading to a generic prin ..."
Abstract

Cited by 27 (7 self)
 Add to MetaCart
We address the question of feature selection in the context of visual recognition. It is shown that, besides efficient from a computational standpoint, the infomax principle is nearly optimal in the minimum Bayes error sense. The concept of marginal diversity is introduced, leading to a generic principle for feature selection (the principle of maximum marginal diversity) of extreme computational simplicity. The relationships between infomax and the maximization of marginal diversity are identified, uncovering the existence of a family of classification procedures for which near optimal (in the Bayes error sense) feature selection does not require combinatorial search. Examination of this family in light of recent studies on the statistics of natural images suggests that visual recognition problems are a subset of it. 1
Generalized information potential criterion for adaptive system training
 IEEE Trans. Neural Networks
, 2002
"... Abstract—We have recently proposed the quadratic Renyi’s error entropy as an alternative cost function for supervised adaptive system training. An entropy criterion instructs the minimization of the average information content of the error signal rather than merely trying to minimize its energy. In ..."
Abstract

Cited by 25 (13 self)
 Add to MetaCart
Abstract—We have recently proposed the quadratic Renyi’s error entropy as an alternative cost function for supervised adaptive system training. An entropy criterion instructs the minimization of the average information content of the error signal rather than merely trying to minimize its energy. In this paper, we propose a generalization of the error entropy criterion that enables the use of any order of Renyi’s entropy and any suitable kernel function in density estimation. It is shown that the proposed entropy estimator preserves the global minimum of actual entropy. The equivalence between global optimization by convolution smoothing and the convolution by the kernel in Parzen windowing is also discussed. Simulation results are presented for timeseries prediction and classification where experimental demonstration of all the theoretical concepts is presented. Index Terms—Minimum error entropy, Parzen windowing, Renyi’s entropy, supervised training.
Linear discriminant analysis in document classification
 In IEEE ICDM Workshop on Text Mining
, 2001
"... Document representation using the bagofwords approach may require bringing the dimensionality of the representation down in order to be able to make effective use of various statistical classification methods. Latent Semantic Indexing (LSI) is one such method that is based on eigendecomposition of ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
Document representation using the bagofwords approach may require bringing the dimensionality of the representation down in order to be able to make effective use of various statistical classification methods. Latent Semantic Indexing (LSI) is one such method that is based on eigendecomposition of the covariance of the documentterm matrix. Another often used approach is to select a small number of most important features out of the whole set according to some relevant criterion. This paper points out that LSI ignores discrimination while concentrating on representation. Furthermore, selection methods fail to produce a feature set that jointly optimizes class discrimination. As a remedy, we suggest supervised linear discriminative transforms, and report good classification results applying these to the Reuters21578 database. 1
A Theory for Learning Based on Rigid Bodies Dynamics
, 2002
"... A new learning theory derived from the study of the dynamics of an abstract system of masses, moving in a multidimensional space under an external force field, is presented. The set of equations describing system's dynamics may be directly interpreted as a learning algorithm for neural layers. Relev ..."
Abstract

Cited by 12 (11 self)
 Add to MetaCart
A new learning theory derived from the study of the dynamics of an abstract system of masses, moving in a multidimensional space under an external force field, is presented. The set of equations describing system's dynamics may be directly interpreted as a learning algorithm for neural layers. Relevant properties of the proposed learning theory are discussed within the paper, along with results of computer simulations performed in order to assess its effectiveness in applied fields.
Discriminative components of data
 IEEE Transactions on Neural Networks
, 2005
"... for publication. This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of Helsinki University’s products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish thi ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
for publication. This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of Helsinki University’s products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to pubspermissions@ieee.org. By choosing to view this document, you agree to all provisions of the copyright laws protecting it. Thank you.
Informative Discriminant Analysis
 In: Proceedings of the Twentieth International Conference on Machine Learning (ICML2003). AAAI Press, Menlo Park, CA
, 2003
"... We introduce a probabilistic model that generalizes classical linear discriminant analysis and gives an interpretation for the components as informative or relevant components of data. The components maximize the predictability of class distribution which is asymptotically equivalent to (i) ma ..."
Abstract

Cited by 11 (7 self)
 Add to MetaCart
We introduce a probabilistic model that generalizes classical linear discriminant analysis and gives an interpretation for the components as informative or relevant components of data. The components maximize the predictability of class distribution which is asymptotically equivalent to (i) maximizing mutual information with the classes, and (ii) nding principal components in the socalled learning or Fisher metrics. The Fisher metric measures only distances that are relevant to the classes, that is, distances that cause changes in the class distribution. The components have applications in data exploration, visualization, and dimensionality reduction.
Learning Discriminative Feature Transforms to Low Dimensions in Low Dimensions
 In Advances in neural information processing systems 14
, 2001
"... The marriage of Renyi entropy with Parzen density estimation has been shown to be a viable tool in learning discriminative feature transforms. ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
The marriage of Renyi entropy with Parzen density estimation has been shown to be a viable tool in learning discriminative feature transforms.
On Feature Extraction By Mutual Information Maximization
 In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing
, 2002
"... In order to learn discriminative feature transforms, we discuss mutual information between class labels and transformed features as a criterion. Instead of Shannon's definition we use measures based on Renyi entropy, which lends itself into an efficient implementation and an interpretation of "infor ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
In order to learn discriminative feature transforms, we discuss mutual information between class labels and transformed features as a criterion. Instead of Shannon's definition we use measures based on Renyi entropy, which lends itself into an efficient implementation and an interpretation of "information potentials" and "information forces" induced by samples of data. This paper presents two routes towards practical usability of the method, especially aimed to large databases: The first is an online stochastic gradient algorithm, and the second is based on approximating class densities in the output space by Gaussian mixture models.