Results 1  10
of
13
Similaritybased Classification: Concepts and Algorithms
, 2008
"... This report reviews and extends the field of similaritybased classification, presenting new analyses, algorithms, data sets, and the most comprehensive set of experimental results to date. Specifically, the generalizability of using similarities as features is analyzed, design goals and methods for ..."
Abstract

Cited by 55 (3 self)
 Add to MetaCart
(Show Context)
This report reviews and extends the field of similaritybased classification, presenting new analyses, algorithms, data sets, and the most comprehensive set of experimental results to date. Specifically, the generalizability of using similarities as features is analyzed, design goals and methods for weighting nearestneighbors for similaritybased learning are proposed, and different methods for consistently converting similarities into kernels are compared. Experiments on eight real data sets compare eight approaches and their variants to similaritybased learning. 1
Unifying Divergence Minimization and Statistical Inference via Convex Duality
 Proc. of Conf. on Learning Theory (COLT
, 2006
"... Abstract. In this paper we unify divergence minimization and statistical inference by means of convex duality. In the process of doing so, we prove that the dual of approximate maximum entropy estimation is maximum a posteriori estimation. Moreover, our treatment leads to stability and convergence b ..."
Abstract

Cited by 50 (10 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper we unify divergence minimization and statistical inference by means of convex duality. In the process of doing so, we prove that the dual of approximate maximum entropy estimation is maximum a posteriori estimation. Moreover, our treatment leads to stability and convergence bounds for many statistical learning problems. Finally, we show how an algorithm by Zhang can be used to solve this class of optimization problems efficiently. 1
On causally asymmetric versions of Occam’s Razor and their relation to thermodynamics
, 2007
"... and their relation to thermodynamics ..."
(Show Context)
SemiSupervised Learning via Generalized Maximum Entropy
"... Various supervised inference methods can be analyzed as convex duals of the generalized maximum entropy (MaxEnt) framework. Generalized MaxEnt aims to find a distribution that maximizes an entropy function while respecting prior information represented as potential functions in miscellaneous forms o ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Various supervised inference methods can be analyzed as convex duals of the generalized maximum entropy (MaxEnt) framework. Generalized MaxEnt aims to find a distribution that maximizes an entropy function while respecting prior information represented as potential functions in miscellaneous forms of constraints and/or penalties. We extend this framework to semisupervised learning by incorporating unlabeled data via modifications to these potential functions reflecting structural assumptions on the data geometry. The proposed approach leads to a family of discriminative semisupervised algorithms, that are convex, scalable, inherently multiclass, easy to implement, and that can be kernelized naturally. Experimental evaluation of special cases shows the competitiveness of our methodology. 1
Local similarity discriminant analysis
 Proc. Intl. Conf. on Machine Learning
, 2007
"... We propose a local, generative model for similaritybased classification. The method is applicable to the case that only pairwise similarities between samples are available. The classifier models the local classconditional distribution using a maximum entropy estimate and empirical moment constrain ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
We propose a local, generative model for similaritybased classification. The method is applicable to the case that only pairwise similarities between samples are available. The classifier models the local classconditional distribution using a maximum entropy estimate and empirical moment constraints. The resulting exponential class conditionaldistributions are combined with class prior probabilities and misclassification costs to form the local similarity discriminant analysis (local SDA) classifier. We compare the performance of local SDA to a nonlocal version, to the local nearest centroid classifier, the nearest centroid classifier, kNN, and to the recentlydeveloped potential support vector machine (PSVM). Results show that local SDA is competitive with kNN and the computationallydemanding PSVM while offering the advantages of a generative classifier. 1. Similaritybased Classification Similaritybased learning methods make inferences based only on pairwise similarities or dissimilarities between a test sample and training samples and between pairs of training samples [Bicego et al., 2006,Pekalska et al., 2001, Jacobs et al., 2000, Hochreiter & Obermayer, 2006]. The term similaritybased learning is used whether the pairwise relationship is a similarity or a dissimilarity. The similarity/dissimilarity function is not constrained to satisfy the properties of a metric. Similaritybased learning can be applied when the test and training samples are not described as points
Weighted Nearest Neighbor Classifiers and Firstorder Error
, 2009
"... Weighted nearestneighbor classification is analyzed in terms of squared error of class probability estimates. Two classes of algorithms for calculating weights are studied with respect to their ability to minimize the firstorder term of the squared error: local linear regression and a new class te ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Weighted nearestneighbor classification is analyzed in terms of squared error of class probability estimates. Two classes of algorithms for calculating weights are studied with respect to their ability to minimize the firstorder term of the squared error: local linear regression and a new class termed regularized linear interpolation. A number of variants of each class are considered or proposed, and compared analytically and by simulations and experiments on benchmark datasets. The experiments establish that weighting methods which aim to minimize firstorder error can perform significantly better than standard kNN, particularly in highdimensions. Regularization functions, the fitted surfaces, crossvalidated neighborhood size, and the effect of highdimensionality are also analyzed. 1
BEAMFORMING ALTERNATIVES FOR MULTICHANNEL TRANSIENT ACOUSTIC EVENT CLASSIFICATION
"... Signals acquired through a microphone array are typically beamformed to combine channels and improve the signaltonoise ratio (SNR). However, it has been previously shown that alternative methods for handling multichannel systems can outperform beamforming for speech recognition applications. In th ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Signals acquired through a microphone array are typically beamformed to combine channels and improve the signaltonoise ratio (SNR). However, it has been previously shown that alternative methods for handling multichannel systems can outperform beamforming for speech recognition applications. In this paper, we implemented a comprehensive set of classification tests using multiple classifiers and feature extraction techniques to determine whether the alternative methods generalize beyond speech recognition applications. We show that applying the alternative methods (in a slightly simpler form) outperform beamforming when used for classifying a database of transient acoustic projectile weapon signals. Furthermore, an additional technique is introduced which outperforms both beamforming and previously proposed alternatives in certain classification scenarios. For the majority of classification tests, the improvements seen through the use of these alternative methods are statistically significant. Index Terms — Pattern classification, feature extraction, cepstral analysis, acoustic beam steering, transient propagation. 1.
Conditioning by rare sources
 Acta Univ. Belii, Math
, 2005
"... To George Judge, on the occasion of his eightieth birthday. Abstract. In this paper we study the exponential decay of posterior probability of a set of sources and conditioning by rare sources for both uniform and general prior distributions of sources. The decay rate is determined by Ldivergence a ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
To George Judge, on the occasion of his eightieth birthday. Abstract. In this paper we study the exponential decay of posterior probability of a set of sources and conditioning by rare sources for both uniform and general prior distributions of sources. The decay rate is determined by Ldivergence and rare sources from a convex, closed set asymptotically conditionally concentrate on an Lprojection. Lprojection on a linear family of sources belongs to Λfamily of distributions. The results parallel those of Large Deviations for Empirical Measures (Sanov’s Theorem and Conditional Limit Theorem). 1.
Noname manuscript No. (will be inserted by the editor)
"... Eliciting vague but proper maximum entropy priors ..."