Results 1 -
7 of
7
Unifying Divergence Minimization and Statistical Inference via Convex Duality
- Proc. of Conf. on Learning Theory (COLT
, 2006
"... Abstract. In this paper we unify divergence minimization and statistical inference by means of convex duality. In the process of doing so, we prove that the dual of approximate maximum entropy estimation is maximum a posteriori estimation. Moreover, our treatment leads to stability and convergence b ..."
Abstract
-
Cited by 26 (9 self)
- Add to MetaCart
Abstract. In this paper we unify divergence minimization and statistical inference by means of convex duality. In the process of doing so, we prove that the dual of approximate maximum entropy estimation is maximum a posteriori estimation. Moreover, our treatment leads to stability and convergence bounds for many statistical learning problems. Finally, we show how an algorithm by Zhang can be used to solve this class of optimization problems efficiently. 1
Similarity-based Classification: Concepts and Algorithms
, 2008
"... This report reviews and extends the field of similarity-based classification, presenting new analyses, algorithms, data sets, and the most comprehensive set of experimental results to date. Specifically, the generalizability of using similarities as features is analyzed, design goals and methods for ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
This report reviews and extends the field of similarity-based classification, presenting new analyses, algorithms, data sets, and the most comprehensive set of experimental results to date. Specifically, the generalizability of using similarities as features is analyzed, design goals and methods for weighting nearest-neighbors for similarity-based learning are proposed, and different methods for consistently converting similarities into kernels are compared. Experiments on eight real data sets compare eight approaches and their variants to similarity-based learning. 1
On causally asymmetric versions of Occam’s Razor and their relation to thermodynamics
, 2007
"... and their relation to thermodynamics ..."
Local similarity discriminant analysis
- Proc. Intl. Conf. on Machine Learning
, 2007
"... We propose a local, generative model for similarity-based classification. The method is applicable to the case that only pairwise similarities between samples are available. The classifier models the local class-conditional distribution using a maximum entropy estimate and empirical moment constrain ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We propose a local, generative model for similarity-based classification. The method is applicable to the case that only pairwise similarities between samples are available. The classifier models the local class-conditional distribution using a maximum entropy estimate and empirical moment constraints. The resulting exponential class conditionaldistributions are combined with class prior probabilities and misclassification costs to form the local similarity discriminant analysis (local SDA) classifier. We compare the performance of local SDA to a non-local version, to the local nearest centroid classifier, the nearest centroid classifier, k-NN, and to the recently-developed potential support vector machine (PSVM). Results show that local SDA is competitive with k-NN and the computationally-demanding PSVM while offering the advantages of a generative classifier. 1. Similarity-based Classification Similarity-based learning methods make inferences based only on pairwise similarities or dissimilarities between a test sample and training samples and between pairs of training samples [Bicego et al., 2006,Pekalska et al., 2001, Jacobs et al., 2000, Hochreiter & Obermayer, 2006]. The term similarity-based learning is used whether the pairwise relationship is a similarity or a dissimilarity. The similarity/dissimilarity function is not constrained to satisfy the properties of a metric. Similarity-based learning can be applied when the test and training samples are not described as points
Conditioning by rare sources
- Acta Univ. Belii, Math
, 2005
"... To George Judge, on the occasion of his eightieth birthday. Abstract. In this paper we study the exponential decay of posterior probability of a set of sources and conditioning by rare sources for both uniform and general prior distributions of sources. The decay rate is determined by L-divergence a ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
To George Judge, on the occasion of his eightieth birthday. Abstract. In this paper we study the exponential decay of posterior probability of a set of sources and conditioning by rare sources for both uniform and general prior distributions of sources. The decay rate is determined by L-divergence and rare sources from a convex, closed set asymptotically conditionally concentrate on an L-projection. L-projection on a linear family of sources belongs to Λ-family of distributions. The results parallel those of Large Deviations for Empirical Measures (Sanov’s Theorem and Conditional Limit Theorem). 1.
Semi-Supervised Learning via Generalized Maximum Entropy
"... Various supervised inference methods can be analyzed as convex duals of the generalized maximum entropy (MaxEnt) framework. Generalized MaxEnt aims to find a distribution that maximizes an entropy function while respecting prior information represented as potential functions in miscellaneous forms o ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Various supervised inference methods can be analyzed as convex duals of the generalized maximum entropy (MaxEnt) framework. Generalized MaxEnt aims to find a distribution that maximizes an entropy function while respecting prior information represented as potential functions in miscellaneous forms of constraints and/or penalties. We extend this framework to semi-supervised learning by incorporating unlabeled data via modifications to these potential functions reflecting structural assumptions on the data geometry. The proposed approach leads to a family of discriminative semi-supervised algorithms, that are convex, scalable, inherently multi-class, easy to implement, and that can be kernelized naturally. Experimental evaluation of special cases shows the competitiveness of our methodology. 1

