Results 1  10
of
20
Discriminative fields for modeling spatial dependencies in natural images
 In NIPS
, 2003
"... In this paper we present Discriminative Random Fields (DRF), a discriminative framework for the classification of natural image regions by incorporating neighborhood spatial dependencies in the labels as well as the observed data. The proposed model exploits local discriminative models and allows to ..."
Abstract

Cited by 110 (3 self)
 Add to MetaCart
(Show Context)
In this paper we present Discriminative Random Fields (DRF), a discriminative framework for the classification of natural image regions by incorporating neighborhood spatial dependencies in the labels as well as the observed data. The proposed model exploits local discriminative models and allows to relax the assumption of conditional independence of the observed data given the labels, commonly used in the Markov Random Field (MRF) framework. The parameters of the DRF model are learned using penalized maximum pseudolikelihood method. Furthermore, the form of the DRF model allows the MAP inference for binary classification problems using the graph mincut algorithms. The performance of the model was verified on the synthetic as well as the realworld images. The DRF model outperforms the MRF model in the experiments. 1
Adaptive Sparseness for Supervised Learning
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2003
"... The goal of supervised learning is to infer a functional mapping based on a set of training examples. To achieve good generalization, it is necessary to control the "complexity" of the learned function. In Bayesian approaches, this is done by adopting a prior for the parameters of the fu ..."
Abstract

Cited by 88 (4 self)
 Add to MetaCart
(Show Context)
The goal of supervised learning is to infer a functional mapping based on a set of training examples. To achieve good generalization, it is necessary to control the "complexity" of the learned function. In Bayesian approaches, this is done by adopting a prior for the parameters of the function being learned. We propose a Bayesian approach to supervised learning, which leads to sparse solutions; that is, in which irrelevant parameters are automatically set exactly to zero. Other ways to obtain sparse classifiers (such as Laplacian priors, support vector machines) involve (hyper)parameters which control the degree of sparseness of the resulting classifiers; these parameters have to be somehow adjusted/estimated from the training data. In contrast, our approach does not involve any (hyper)parameters to be adjusted or estimated. This is achieved by a hierarchicalBayes interpretation of the Laplacian prior, which is then modified by the adoption of a Jeffreys' noninformative hyperprior. Implementation is carried out by an expectationmaximization (EM) algorithm. Experiments with several benchmark data sets show that the proposed approach yields stateoftheart performance. In particular, our method outperforms SVMs and performs competitively with the best alternative techniques, although it involves no tuning or adjustment of sparsenesscontrolling hyperparameters.
Comparison of Approximate Methods for Handling Hyperparameters
 NEURAL COMPUTATION
"... I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants and noise levels. In the 'evidence framework' the model parameters are integrated over, and the resu ..."
Abstract

Cited by 73 (1 self)
 Add to MetaCart
I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants and noise levels. In the 'evidence framework' the model parameters are integrated over, and the resulting evidence is maximized over the hyperparameters. The optimized
Bayesian Neural Networks and Density Networks
 Nuclear Instruments and Methods in Physics Research, A
, 1994
"... This paper reviews the Bayesian approach to learning in neural networks, then introduces a new adaptive model, the density network. This is a neural network for which target outputs are provided, but the inputs are unspecied. When a probability distribution is placed on the unknown inputs, a latent ..."
Abstract

Cited by 43 (8 self)
 Add to MetaCart
(Show Context)
This paper reviews the Bayesian approach to learning in neural networks, then introduces a new adaptive model, the density network. This is a neural network for which target outputs are provided, but the inputs are unspecied. When a probability distribution is placed on the unknown inputs, a latent variable model is dened that is capable of discovering the underlying dimensionality of a data set. A Bayesian learning algorithm for these networks is derived and demonstrated. 1 Introduction to the Bayesian view of learning A binary classier is a parameterized mapping from an input x to an output y 2 [0; 1]); when its parameters w are specied, the classier states the probability that an input x belongs to class t = 1, rather than the alternative t = 0. Consider a binary classier which models the probability as a sigmoid function of x: P (t = 1jx; w;H) = y(x; w;H) = 1 1 + e wx (1) This form of model is known to statisticians as a linear logistic model, and in the neural networks ...
Adaptive Sparseness Using Jeffreys Prior
, 2001
"... In this paper we introduce a new sparseness inducing prior which does not involve any (hyper) parameters that need to be adjusted or estimated. Although other applications are possible, we focus here on supervised learning problems: regression and classification. Experiments with several publicly av ..."
Abstract

Cited by 41 (2 self)
 Add to MetaCart
(Show Context)
In this paper we introduce a new sparseness inducing prior which does not involve any (hyper) parameters that need to be adjusted or estimated. Although other applications are possible, we focus here on supervised learning problems: regression and classification. Experiments with several publicly available benchmark data sets show that the proposed approach yields stateoftheart performance. In particular, our method outperforms support vector machines and performs competitively with the best alternative techniques, both in terms of error rates and sparseness, although it involves no tuning or adjusting of sparsenesscontrolling hyperparameters.
Bayesian learning of sparse classifiers
 in IEEE Computer Society Conference on Computer Vision and Pattern Recognition  CVPR’2001, (Hawaii
, 2001
"... Bayesian approaches to supervised learning use priors on the classifier parameters. However, few priors aim at achieving “sparse ” classifiers, where irrelevant/redundant parameters are automatically set to zero. Two wellknown ways of obtaining sparse classifiers are: use a zeromean Laplacian prio ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
(Show Context)
Bayesian approaches to supervised learning use priors on the classifier parameters. However, few priors aim at achieving “sparse ” classifiers, where irrelevant/redundant parameters are automatically set to zero. Two wellknown ways of obtaining sparse classifiers are: use a zeromean Laplacian prior on the parameters, and the “support vector machine ” (SVM). Whether one uses a Laplacian prior or an SVM, one still needs to specify/estimate the parameters that control the degree of sparseness of the resulting classifiers. We propose a Bayesian approach to learning sparse classifiers which does not involve any parameters controlling the degree of sparseness. This is achieved by a hierarchicalBayes interpretation of the Laplacian prior, followed by the adoption of a Jeffreys ’ noninformative hyperprior. Implementation is carried out by an EM algorithm. Experimental evaluation of the proposed method shows that it performs competitively with (often better than) the best classification techniques available.
Evaluation of 3D Human Motion Tracking with a Coordinated Mixture of Factor Analyzers
 Proc. EHuM workshop, NIPS
, 2006
"... ..."
Bayesian Neural Networks for Classification: How Useful is the Evidence Framework?
, 1998
"... This paper presents an empirical assessment of the Bayesian evidence framework for neural networks using four synthetic and four realworld classification problems. We focus on three issues; model selection, automatic relevance determination (ARD) and the use of committees. Model selection using the ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
This paper presents an empirical assessment of the Bayesian evidence framework for neural networks using four synthetic and four realworld classification problems. We focus on three issues; model selection, automatic relevance determination (ARD) and the use of committees. Model selection using the evidence criterion is only tenable if the number of training examples exceeds the number of network weights by a factor of five or ten. With this number of available examples, however, crossvalidation is a viable alternative. The ARD feature selection scheme is only useful in networks with many hidden units and for data sets containing many irrelevant variables. ARD is also useful as a hard feature selection method. Results on applying the evidence framework to the realworld data sets showed that committees of Bayesian networks achieved classification accuracies similar to the best alternative methods. Importantly, this was achievable with a minimum of human intervention. 1 Introduction ...
Hyperparameters: optimize, or integrate out?
 IN MAXIMUM ENTROPY AND BAYESIAN METHODS, SANTA BARBARA
, 1996
"... I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants. In the `evidence framework' the model parameters are integrated over, and the resulting evidence is maximi ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants. In the `evidence framework' the model parameters are integrated over, and the resulting evidence is maximized over the hyperparameters. The optimized hyperparameters are used to define a Gaussian approximation to the posterior distribution. In the alternative `MAP' method, the true posterior probability is found by integrating over the hyperparameters. The true posterior is then maximized over the model parameters, and a Gaussian approximation is made. The similarities of the two approaches, and their relative merits, are discussed, and comparisons are made with the ideal hierarchical Bayesian solution. In moderately illposed problems, integration over hyperparameters yields a probability distribution with a skew peak which causes significant biases to arise in the MAP method. In contrast, the evidence framework is shown to introduce negligible predictive error, under straightforward conditions. General lessons are drawn concerning the distinctive properties of inference in many dimensions.
BAYESIAN MODELS AND MACHINE LEARNING WITH GENE EXPRESSION ANALYSIS APPLICATIONS
, 2005
"... The present thesis is divided into two major parts. The first part focuses on developing modelbased estimates for gene expression indices in the Bayesian framework. In the application of oligonucleotide expression array technology, reliable estimation of expression indices is critical for “highlev ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
The present thesis is divided into two major parts. The first part focuses on developing modelbased estimates for gene expression indices in the Bayesian framework. In the application of oligonucleotide expression array technology, reliable estimation of expression indices is critical for “highlevel analysis ” such as classification, clustering and regulatory network exploration. A statistical model (Li and Wong, 2001a) has been proposed to develop modelbased estimates for gene expression indices and outlier detection. Chapter 1 illustrates an extension of the model in the Bayesian framework. Proper constraints on model parameters, heavytail distributions for noise, and mixture priors are introduced with the help of Gibbs sampling. Our model is applied to both artificial probe data and real microarray probe data, with a demonstration that it is more robust and reliable than the original model. The second part of the thesis concerns a novel Bayesian models for the problem of nonlinear regression for prediction. Recently, kernel methods have been introduced