Results 1  10
of
65
Sparse multinomial logistic regression: fast algorithms and generalization bounds
 IEEE Trans. on Pattern Analysis and Machine Intelligence
"... Abstract—Recently developed methods for learning sparse classifiers are among the stateoftheart in supervised learning. These methods learn classifiers that incorporate weighted sums of basis functions with sparsitypromoting priors encouraging the weight estimates to be either significantly larg ..."
Abstract

Cited by 113 (1 self)
 Add to MetaCart
Abstract—Recently developed methods for learning sparse classifiers are among the stateoftheart in supervised learning. These methods learn classifiers that incorporate weighted sums of basis functions with sparsitypromoting priors encouraging the weight estimates to be either significantly large or exactly zero. From a learningtheoretic perspective, these methods control the capacity of the learned classifier by minimizing the number of basis functions used, resulting in better generalization. This paper presents three contributions related to learning sparse classifiers. First, we introduce a true multiclass formulation based on multinomial logistic regression. Second, by combining a bound optimization approach with a componentwise update procedure, we derive fast exact algorithms for learning sparse multiclass classifiers that scale favorably in both the number of training samples and the feature dimensionality, making them applicable even to large data sets in highdimensional feature spaces. To the best of our knowledge, these are the first algorithms to perform exact multinomial logistic regression with a sparsitypromoting prior. Third, we show how nontrivial generalization bounds can be derived for our classifier in the binary case. Experimental results on standard benchmark data sets attest to the accuracy, sparsity, and efficiency of the proposed methods.
Importance Weighted Active Learning
"... We present a practical and statistically consistent scheme for actively learning binary classifiers under general loss functions. Our algorithm uses importance weighting to correct sampling bias, and by controlling the variance, we are able to give rigorous label complexity bounds for the learning p ..."
Abstract

Cited by 53 (6 self)
 Add to MetaCart
We present a practical and statistically consistent scheme for actively learning binary classifiers under general loss functions. Our algorithm uses importance weighting to correct sampling bias, and by controlling the variance, we are able to give rigorous label complexity bounds for the learning process. 1.
Machine learning classifiers and fmri: A tutorial overview
 NeuroImage
, 2009
"... Interpreting brain image experiments requires analysis of complex, multivariate data. In recent years, one analysis approach that has grown in popularity is the use of machine learning algorithms to train classifiers to decode stimuli, mental states, behaviors and other variables of interest from fM ..."
Abstract

Cited by 41 (3 self)
 Add to MetaCart
Interpreting brain image experiments requires analysis of complex, multivariate data. In recent years, one analysis approach that has grown in popularity is the use of machine learning algorithms to train classifiers to decode stimuli, mental states, behaviors and other variables of interest from fMRI data and thereby show the data contain enough information about them. In this tutorial overview we review some of the key choices faced in using this approach as well as how to derive statistically significant results, illustrating each point from a case study. Furthermore, we show how, in addition to answering the question of ‘is there information about a variable of interest ’ (pattern discrimination), classifiers can be used to tackle other classes of question, namely ‘where is the information ’ (pattern localization) and ‘how is that information encoded ’ (pattern characterization). 1
PACBayesian Learning of Linear Classifiers
"... We present a general PACBayes theorem from which all known PACBayes risk bounds are obtained as particular cases. We also propose different learning algorithms for finding linear classifiers that minimize these bounds. These learning algorithms are generally competitive with both AdaBoost and the ..."
Abstract

Cited by 29 (6 self)
 Add to MetaCart
We present a general PACBayes theorem from which all known PACBayes risk bounds are obtained as particular cases. We also propose different learning algorithms for finding linear classifiers that minimize these bounds. These learning algorithms are generally competitive with both AdaBoost and the SVM. 1. Intoduction For the classification problem, we are given a training set of examples—each generated according to the same (but unknown) distribution D, and the goal is to find a classifier that minimizes the true risk (i.e., the generalization error or the expected loss). Since the true risk is defined only with respect to the unknown distribution D, we are automatically confronted with the problem of specifying exactly what we should optimize on the training data to find a classifier having the smallest possible true risk. Many different specifications (of what should be optimized on the training data) have been provided by using different inductive principles but the final guarantee on the true risk, however, always comes with a socalled risk bound that holds uniformly over a set of classifiers. Hence, the formal justification of a learning strategy has always come a posteriori via a risk bound. Since a risk bound can be computed from what a classifier achieves on the training data, it automatically suggests the following optimization problem for learning algorithms: given a risk (upper) bound, find a classifier that minimizes it. Despite the enormous impact they had on our understanding of learning, the VC bounds are generally very loose. These bounds are characterized by the fact that
Learning minimum volume sets
 J. Machine Learning Res
, 2006
"... Given a probability measure P and a reference measure µ, one is often interested in the minimum µmeasure set with Pmeasure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P, and are useful for detecting anomalies and constructing confidence region ..."
Abstract

Cited by 27 (9 self)
 Add to MetaCart
Given a probability measure P and a reference measure µ, one is often interested in the minimum µmeasure set with Pmeasure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P, and are useful for detecting anomalies and constructing confidence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P. Other than these samples, no other information is available regarding P, but the reference measure µ is assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classification. As in classification, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain finite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency and an oracle inequality. Estimators based on histograms and dyadic partitions illustrate the proposed rules. 1
On Bayesian bounds
 In Proceedings of the 23rd International Conference on Machine Learning
, 2006
"... We show that several important Bayesian bounds studied in machine learning, both in the batch as well as the online setting, arise by an application of a simple compression lemma. In particular, we derive (i) PACBayesian bounds in the batch setting, (ii) Bayesian logloss bounds and (iii) Bayesian ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
We show that several important Bayesian bounds studied in machine learning, both in the batch as well as the online setting, arise by an application of a simple compression lemma. In particular, we derive (i) PACBayesian bounds in the batch setting, (ii) Bayesian logloss bounds and (iii) Bayesian boundedloss bounds in the online setting using the compression lemma. Although every setting has different semantics for prior, posterior and loss, we show that the core bound argument is the same. The paper simplifies our understanding of several important and apparently disparate results, as well as brings to light a powerful tool for developing similar arguments for other methods. 1.
An Empirical Evaluation of Thompson Sampling
"... Thompson sampling is one of oldest heuristic to address the exploration / exploitation tradeoff, but it is surprisingly unpopular in the literature. We present here some empirical results using Thompson sampling on simulated and real data, and show that it is highly competitive. And since this heur ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
Thompson sampling is one of oldest heuristic to address the exploration / exploitation tradeoff, but it is surprisingly unpopular in the literature. We present here some empirical results using Thompson sampling on simulated and real data, and show that it is highly competitive. And since this heuristic is very easy to implement, we argue that it should be part of the standard baselines to compare against. 1
Tighter PACBayes Bounds
, 2006
"... This paper proposes a PACBayes bound to measure the performance of Support Vector Machine (SVM) classifiers. The bound is based on learning a prior over the distribution of classifiers with a part of the training samples. Experimental work shows that this bound is tighter than the original PACBaye ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
This paper proposes a PACBayes bound to measure the performance of Support Vector Machine (SVM) classifiers. The bound is based on learning a prior over the distribution of classifiers with a part of the training samples. Experimental work shows that this bound is tighter than the original PACBayes, resulting in an enhancement of the predictive capabilities of the PACBayes bound. In addition, it is shown that the use of this bound as a means to estimate the hyperparameters of the classifier compares favourably with cross validation in terms of accuracy of the model, while saving a lot of computational burden.
Performance Prediction for Exponential Language Models
"... We investigate the task of performance prediction for language models belonging to the exponential family. First, we attempt to empirically discover a formula for predicting test set crossentropy for ngram language models. We build models over varying domains, data set sizes, and ngram orders, an ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
We investigate the task of performance prediction for language models belonging to the exponential family. First, we attempt to empirically discover a formula for predicting test set crossentropy for ngram language models. We build models over varying domains, data set sizes, and ngram orders, and perform linear regression to see whether we can model test set performance as a simple function of training set performance and various model statistics. Remarkably, we find a simple relationship that predicts test set performance with a correlation of 0.9997. We analyze why this relationship holds and show that it holds for other exponential language models as well, including classbased models and minimum discrimination information models. Finally, we discuss how this relationship can be applied to improve language model performance. 1