Results 1  10
of
37
Maximum Entropy Discrimination
, 1999
"... We present a general framework for discriminative estimation based on the maximum entropy principle and its extensions. All calculations involve distributions over structures and/or parameters rather than specific settings and reduce to relative entropy projections. This holds even when the data is ..."
Abstract

Cited by 122 (20 self)
 Add to MetaCart
We present a general framework for discriminative estimation based on the maximum entropy principle and its extensions. All calculations involve distributions over structures and/or parameters rather than specific settings and reduce to relative entropy projections. This holds even when the data is not separable within the chosen parametric class, in the context of anomaly detection rather than classification, or when the labels in the training set are uncertain or incomplete. Support vector machines are naturally subsumed under this class and we provide several extensions. We are also able to estimate exactly and efficiently discriminative distributions over tree structures of classconditional models within this framework. Preliminary experimental results are indicative of the potential in these techniques.
Gaussian Processes for Classification: Mean Field Algorithms
 Neural Computation
, 1999
"... We derive a mean field algorithm for binary classification with Gaussian processes which is based on the TAP approach originally proposed in Statistical Physics of disordered systems. The theory also yields an approximate leaveoneout estimator for the generalization error which is computed wit ..."
Abstract

Cited by 72 (13 self)
 Add to MetaCart
We derive a mean field algorithm for binary classification with Gaussian processes which is based on the TAP approach originally proposed in Statistical Physics of disordered systems. The theory also yields an approximate leaveoneout estimator for the generalization error which is computed with no extra computational cost. We show that from the TAP approach, it is possible to derive both a simpler `naive' mean field theory and support vector machines (SVM) as limiting cases. For both mean field algorithms and support vectors machines, simulation results for three small benchmark data sets are presented. They show 1. that one may get state of the art performance by using the leaveoneout estimator for model selection and 2. the builtin leaveoneout estimators are extremely precise when compared to the exact leaveoneout estimate. The latter result is a taken as a strong support for the internal consistency of the mean field approach. 1 1
Gaussian Processes  A Replacement for Supervised Neural Networks?
"... These lecture notes are based on the work of Neal (1996), Williams and ..."
Abstract

Cited by 51 (0 self)
 Add to MetaCart
These lecture notes are based on the work of Neal (1996), Williams and
Gaussian Processes for Bayesian Classification via Hybrid Monte Carlo
 Advances in Neural Information Processing Systems 9
, 1997
"... The full Bayesian method for applying neural networks to a prediction problem is to set up the prior/hyperprior structure for the net and then perform the necessary integrals. However, these integrals are not tractable analytically, and Markov Chain Monte Carlo (MCMC) methods are slow, especially if ..."
Abstract

Cited by 33 (4 self)
 Add to MetaCart
The full Bayesian method for applying neural networks to a prediction problem is to set up the prior/hyperprior structure for the net and then perform the necessary integrals. However, these integrals are not tractable analytically, and Markov Chain Monte Carlo (MCMC) methods are slow, especially if the parameter space is highdimensional. Using Gaussian processes we can approximate the weight space integral analytically, so that only a small number of hyperparameters need be integrated over by MCMC methods. We have applied this idea to classification problems, obtaining excellent results on the realworld problems investigated so far. 1 INTRODUCTION To make predictions based on a set of training data, fundamentally we need to combine our prior beliefs about possible predictive functions with the data at hand. In the Bayesian approach to neural networks a prior on the weights in the net induces a prior distribution over functions. This leads naturally to the idea of specifying our bel...
Variational Gaussian Process Classifiers
 IEEE Transactions on Neural Networks
, 1997
"... Gaussian processes are a promising nonlinear interpolation tool (Williams 1995; Williams and Rasmussen 1996), but it is not straightforward to solve classification problems with them. In this paper the variational methods of Jaakkola and Jordan (1996) are applied to Gaussian processes to produce an ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
Gaussian processes are a promising nonlinear interpolation tool (Williams 1995; Williams and Rasmussen 1996), but it is not straightforward to solve classification problems with them. In this paper the variational methods of Jaakkola and Jordan (1996) are applied to Gaussian processes to produce an efficient Bayesian binary classifier. 1 Introduction Assume that we have some data D which consists of inputs fx n g N n=1 in some space, real or discrete, and corresponding targets t n which are binary categorical variables. We shall model this data using a Bayesian conditional classifier which predicts t conditional on x. We assume the existence of a function a(x) which models the `logit' log P (t=1jx) P (t=0jx) as a function of x. Thus P (t = 1jx; a(x)) = 1 1 + exp(\Gammaa(x)) (1) To complete the model we place a prior distribution over the unknown function a(x). There are two approaches to this. In the standard parametric approach, a(x) is a parameterized function a(x; w) where the...
The Evolutionary PreProcessor: Automatic Feature Extraction for Supervised Classification using Genetic Programming
 In Proc. 2nd International Conference on Genetic Programming (GP97
, 1997
"... The extraction of features for classification is often performed heuristically, despite the effect this step has on the performance of the classifier. The Evolutionary PreProcessor is presented, an automatic nonparametric method for the extraction of nonlinear features. Using genetic programming, ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
The extraction of features for classification is often performed heuristically, despite the effect this step has on the performance of the classifier. The Evolutionary PreProcessor is presented, an automatic nonparametric method for the extraction of nonlinear features. Using genetic programming, the Evolutionary PreProcessor evolves networks of different nonlinear functions which preprocess the data to improve the discriminatory performance of a classifier. In experiments performed on 9 realworld data sets, the Evolutionary PreProcessor was able to preprocess the data to reduce the test set misclassification rate. The dimensionality of the data was decreased and those measurements not required for classification were excised. The Evolutionary PreProcessor behaved intelligently by deciding whether to perform feature extraction or feature selection. 1 Introduction A common step in Pattern Classification is the extraction of features from the original data, motivated by the red...
Statistical Ideas for Selecting Network Architectures
 Invited Presentation, Neural Information Processing Systems 8
, 1995
"... Choosing the architecture of a neural network is one of the most important problems in making neural networks practically useful, but accounts of applications usually sweep these details under the carpet. How many hidden units are needed? Should weight decay be used, and if so how much? What type of ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
Choosing the architecture of a neural network is one of the most important problems in making neural networks practically useful, but accounts of applications usually sweep these details under the carpet. How many hidden units are needed? Should weight decay be used, and if so how much? What type of output units should be chosen? And so on. We address these issues within the framework of statistical theory for model choice, which provides a number of workable approximate answers. This paper is principally concerned with architecture selection issues for feedforward neural networks (also known as multilayer perceptrons). Many of the same issues arise in selecting radial basis function networks, recurrent networks and more widely. These problems occur in a much wider context within statistics, and applied statisticians have been selecting and combining models for decades. Two recent discussions are [4, 5]. References [3, 20, 21, 22] discuss neural networks from a statistical perspecti...
Bias and Variance of Validation Methods for Function Approximation Neural Networks Under Conditions of Sparse Data
 IEEE Transactions on Systems, Man, and Cybernetics, Part C
, 1998
"... Neural networks must be constructed and validated with strong empirical dependence, which is difficult under conditions of sparse data. This paper examines the most common methods of neural network validation along with several general validation methods from the statistical resampling literature ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
Neural networks must be constructed and validated with strong empirical dependence, which is difficult under conditions of sparse data. This paper examines the most common methods of neural network validation along with several general validation methods from the statistical resampling literature as applied to function approximation networks with small sample sizes. It is shown that an increase in computation, necessary for the statistical resampling methods, produces networks that perform better than those constructed in the traditional manner. The statistical resampling methods also result in lower variance of validation, however some of the methods are biased in estimating network error. 1. INTRODUCTION To be beneficial, system models must be validated to assure the users that the model emulates the actual system in the desired manner. This is especially true of empirical models, such as neural network and statistical models, which rely primarily on observed data rather th...
JETNET 3.0  A Versatile Artificial Neural Network Package
, 1993
"... this paper quantities written in sansserif denote matrices and quantities written in boldface denote vectors ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
this paper quantities written in sansserif denote matrices and quantities written in boldface denote vectors
A Neural Network Classifier Based on DempsterShafer Theory
, 2000
"... A new adaptive pattern classifier based on the DempsterShafer theory of evidence is presented. This method uses reference patterns as items of evidence regarding the class membership of each input pattern under consideration. This evidence is represented by basic belief assignments (BBA's) an ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
A new adaptive pattern classifier based on the DempsterShafer theory of evidence is presented. This method uses reference patterns as items of evidence regarding the class membership of each input pattern under consideration. This evidence is represented by basic belief assignments (BBA's) and pooled using the Dempster's rule of combination. This procedure can be implemented in a multilayer neural network with specific architecture consisting of one input layer, two hidden layers and one output layer. The weight vector, the receptive field and the class membership of each prototype are determined by minimizing the mean squared differences between the classifier outputs and target values. After training, the classifier computes for each input vector a BBA that provides a description of the uncertainty pertaining to the class of the current pattern, given the available evidence. This information may be used to implement various decision rules allowing for ambiguous pattern rejection an...