• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Comparison of approximate methods for handling hyperparameters (1999)

by D J C MacKay
Venue:Neural Computation
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 34
Next 10 →

A Bayesian Framework for the Analysis of Microarray Expression Data: Regularized t-Test and Statistical Inferences of Gene Changes

by Pierre Baldi, Anthony D. Long - Bioinformatics , 2001
"... Motivation: DNA microarrays are now capable of providing genome-wide patterns of gene expression across many different conditions. The first level of analysis of these patterns requires determining whether observed differences in expression are significant or not. Current methods are unsatisfactory ..."
Abstract - Cited by 178 (0 self) - Add to MetaCart
Motivation: DNA microarrays are now capable of providing genome-wide patterns of gene expression across many different conditions. The first level of analysis of these patterns requires determining whether observed differences in expression are significant or not. Current methods are unsatisfactory due to the lack of a systematic framework that can accommodate noise, variability, and low replication often typical of microarray data. Results: We develop a Bayesian probabilistic framework for microarray data analysis. At the simplest level, we model log-expression values by independent normal distributions, parameterized by corresponding means and variances with hierarchical prior distributions. We derive point estimates for both parameters and hyperparameters, and regularized expressions for the variance of each gene by combining the empirical variance with a local background variance associated with neighboring genes. An additional hyperparameter, inversely related to the number of empirical observations, determines the strength of the background variance. Simulations show that these point estimates, combined with a t-test, provide a systematic inference approach that compares favorably with simple t-test or fold methods, and partly compensate for the lack of replication. Availability: The approach is implemented in a software called Cyber-T accessible through a Web interface at www.genomics.uci.edu/software.html. The code is available as Open Source and is written in the freely available statistical language R. and Department of Biological Chemistry, College of Medicine, University of California, Irvine. To whom all correspondence should be addressed. Contact: pfbaldi@ics.uci.edu, tdlong@uci.edu. 1

Ensemble learning for independent component analysis

by James W. Miskin - in Advances in Independent Component Analysis , 2000
"... i Abstract This thesis is concerned with the problem of Blind Source Separation. Specifically we considerthe Independent Component Analysis (ICA) model in which a set of observations are modelled by xt = Ast: (1) where A is an unknown mixing matrix and st is a vector of hidden source components atti ..."
Abstract - Cited by 42 (2 self) - Add to MetaCart
i Abstract This thesis is concerned with the problem of Blind Source Separation. Specifically we considerthe Independent Component Analysis (ICA) model in which a set of observations are modelled by xt = Ast: (1) where A is an unknown mixing matrix and st is a vector of hidden source components attime t. The ICA problem is to find the sources given only a set of observations. In chapter 1, the blind source separation problem is introduced. In chapter 2 the methodof Ensemble Learning is explained. Chapter 3 applies Ensemble Learning to the ICA model and chapter 4 assesses the use of Ensemble Learning for model selection.Chapters 5-7 apply the Ensemble Learning ICA algorithm to data sets from physics (a medical imaging data set consisting of images of a tooth), biology (data sets from cDNAmicro-arrays) and astrophysics (Planck image separation and galaxy spectra separation).

Classification and Regression using Mixtures of Experts

by Steven Richard Waterhouse , 1997
"... ..."
Abstract - Cited by 27 (0 self) - Add to MetaCart
Abstract not found

Assessing approximate inference for binary Gaussian process classification

by Malte Kuss, Carl Edward Rasmussen, Ralf Herbrich - Journal of Machine Learning Research , 2005
"... Gaussian process priors can be used to define flexible, probabilistic classification models. Unfortunately exact Bayesian inference is analytically intractable and various approximation techniques have been proposed. In this work we review and compare Laplace’s method and Expectation Propagation for ..."
Abstract - Cited by 26 (2 self) - Add to MetaCart
Gaussian process priors can be used to define flexible, probabilistic classification models. Unfortunately exact Bayesian inference is analytically intractable and various approximation techniques have been proposed. In this work we review and compare Laplace’s method and Expectation Propagation for approximate Bayesian inference in the binary Gaussian process classification model. We present a comprehensive comparison of the approximations, their predictive performance and marginal likelihood estimates to results obtained by MCMC sampling. We explain theoretically and corroborate empirically the advantages of Expectation Propagation compared to Laplace’s method. Keywords: Gaussian process priors, probabilistic classification, Laplace’s approximation, expectation propagation, marginal likelihood, evidence, MCMC

A new view of automatic relevance determination

by David Wipf, Srikantan Nagarajan - In NIPS 20 , 2008
"... Automatic relevance determination (ARD) and the closely-related sparse Bayesian learning (SBL) framework are effective tools for pruning large numbers of irrelevant features leading to a sparse explanatory subset. However, popular update rules used for ARD are either difficult to extend to more gene ..."
Abstract - Cited by 20 (2 self) - Add to MetaCart
Automatic relevance determination (ARD) and the closely-related sparse Bayesian learning (SBL) framework are effective tools for pruning large numbers of irrelevant features leading to a sparse explanatory subset. However, popular update rules used for ARD are either difficult to extend to more general problems of interest or are characterized by non-ideal convergence properties. Moreover, it remains unclear exactly how ARD relates to more traditional MAP estimation-based methods for learning sparse representations (e.g., the Lasso). This paper furnishes an alternative means of expressing the ARD cost function using auxiliary functions that naturally addresses both of these issues. First, the proposed reformulation of ARD can naturally be optimized by solving a series of re-weighted ℓ1 problems. The result is an efficient, extensible algorithm that can be implemented using standard convex programming toolboxes and is guaranteed to converge to a local minimum (or saddle point). Secondly, the analysis reveals that ARD is exactly equivalent to performing standard MAP estimation in weight space using a particular feature- and noise-dependent, non-factorial weight prior. We then demonstrate that this implicit prior maintains several desirable advantages over conventional priors with respect to feature selection. Overall these results suggest alternative cost functions and update procedures for selecting features and promoting sparse solutions in a variety of general situations. In particular, the methodology readily extends to handle problems such as non-negative sparse coding and covariance component estimation. 1

Recent Advances in Radial Basis Function Networks

by Mark J. L. Orr - Technical Report www.ed.ac.uk/ ~ mjo/papers/recad.ps, Institute for Adaptive and Neural Computation , 1999
"... In 1996 an Introduction to Radial Basis Function Networks was published on the web 2 along with a package of Matlab functions 3 . The emphasis was on the linear character of RBF networks and two techniques borrowed from statistics: forward selection and ridge regression. This document 4 is ..."
Abstract - Cited by 12 (1 self) - Add to MetaCart
In 1996 an Introduction to Radial Basis Function Networks was published on the web 2 along with a package of Matlab functions 3 . The emphasis was on the linear character of RBF networks and two techniques borrowed from statistics: forward selection and ridge regression. This document 4 is an update on developments between 1996 and 1999 and is associated with a second version of the Matlab package 5 . Improvements have been made to the forward selection and ridge regression methods and a new method, which is a cross between regression trees and RBF networks, has been developed. 1 mjo@anc.ed.ac.uk 2 www.anc.ed.ac.uk/mjo/papers/intro.ps 3 www.anc.ed.ac.uk/mjo/software/rbf.zip 4 www.anc.ed.ac.uk/mjo/papers/recad.ps 5 www.anc.ed.ac.uk/mjo/software/rbf2.zip 2 CONTENTS Contents 1 Introduction 3 1.1 MacKay's Hermite Polynomial . . . . . . . . . . . . . . . . . . . . . 3 1.2 Friedman's Simulated Circuit . . . . . . . . . . . . . . . . . . . . . . 4 2 Maximum Margina...

Variational EM algorithms for non-Gaussian latent variable models

by J. A. Palmer, D. P. Wipf, K. Kreutz-delgado, B. D. Rao - Advances in Neural Information Processing Systems 18 , 2006
"... We consider criteria for variational representations of non-Gaussian latent variables, and derive variational EM algorithms in general form. We establish a general equivalence among convex bounding methods, evidence based methods, and ensemble learning/Variational Bayes methods, which has previously ..."
Abstract - Cited by 12 (5 self) - Add to MetaCart
We consider criteria for variational representations of non-Gaussian latent variables, and derive variational EM algorithms in general form. We establish a general equivalence among convex bounding methods, evidence based methods, and ensemble learning/Variational Bayes methods, which has previously been demonstrated only for particular cases. 1

Bayesian framework for least squares support vector machine classifiers, Gaussian processes and kernel fisher discriminant analysis

by Tony Van Gestel, Johan A. K. Suykens, Gert Lanckriet, Annemie Lambrechts, Bart De Moor, Joos Vandewalle - NEURAL COMPUTATION , 2002
"... The Bayesian evidence framework has been successfully applied to the design of multilayer perceptrons (MLPs) in the work of MacKay. Nevertheless,the training of MLPs suffers from drawbacks like the non-convex optimization problem and the choice of the number of hidden units. In Support Vector Machin ..."
Abstract - Cited by 12 (4 self) - Add to MetaCart
The Bayesian evidence framework has been successfully applied to the design of multilayer perceptrons (MLPs) in the work of MacKay. Nevertheless,the training of MLPs suffers from drawbacks like the non-convex optimization problem and the choice of the number of hidden units. In Support Vector Machines (SVMs) for classification,as introduced by Vapnik,a nonlinear decision boundary is obtained by mapping the input vector first in a nonlinear way to a high dimensional kernel-induced feature space in which a linear large margin classifier is constructed. Practical expressions are formulated in the dual space in terms of the related kernel function and the solution follows from a (convex) quadratic programming (QP) problem. In Least Squares SVMs (LS-SVMs), the SVM problem formulation is modified by introducing a least squares cost function and equality instead of inequality constraints and the solution follows from a linear system in the dual space. Implicitly,the least squares formulation corresponds to a regression formulation and is also related to kernel

The Bayesian Backfitting Relevance Vector Machine

by Aaron D'Souza , Sethu Vijayakumar, Stefan Schaal - IN PROCEEDINGS OF THE 21ST INTERNATIONAL CONFERENCE ON MACHINE LEARNING , 2004
"... Traditional non-parametric statistical learning techniques are often computationally attractive, but lack the same generalization and model selection abilities as state-of-the-art Bayesian algorithms which, however, are usually computationally prohibitive. This paper makes several important co ..."
Abstract - Cited by 11 (6 self) - Add to MetaCart
Traditional non-parametric statistical learning techniques are often computationally attractive, but lack the same generalization and model selection abilities as state-of-the-art Bayesian algorithms which, however, are usually computationally prohibitive. This paper makes several important contributions that allow Bayesian learning to scale to more complex, real-world learning scenarios. Firstly, we show that backfitting --- a traditional non-parametric, yet highly e#cient regression tool --- can be derived in a novel formulation within an expectation maximization (EM) framework and thus can finally be given a probabilistic interpretation. Secondly, we show that the general framework of sparse Bayesian learning and in particular the relevance vector machine (RVM), can be derived as a highly e#cient algorithm using a Bayesian version of backfitting at its core. As we demonstrate on several regression and classification benchmarks, Bayesian backfitting o#ers a compelling alternative to current regression methods, especially when the size and dimensionality of the data challenge computational resources.

BM 3 E: Discriminative Density Propagation for Visual Tracking

by Cristian Sminchisescu, Cristian Sminchisescu, Atul Kanaujia, Dimitris N. Metaxas - In IEEE Transactions on Pattern Analysis and Machine Intelligence , 2007
"... We introduce BM 3 E, a Conditional Bayesian Mixture of Experts Markov Model, for consistent proba-bilistic estimates in discriminative visual tracking. The model applies to problems of temporal and uncertain inference and represents the unexplored bottom-up counterpart of pervasive generative models ..."
Abstract - Cited by 7 (1 self) - Add to MetaCart
We introduce BM 3 E, a Conditional Bayesian Mixture of Experts Markov Model, for consistent proba-bilistic estimates in discriminative visual tracking. The model applies to problems of temporal and uncertain inference and represents the unexplored bottom-up counterpart of pervasive generative models estimated with Kalman filtering or particle filtering. Instead of inverting a non-linear generative observation model at run-time, we learn to cooperatively predict complex state distributions directly from descriptors that encode image observations – typically bag-of-feature global image histograms or descriptors computed over regular spatial grids. These are integrated in a conditional graphical model in order to enforce temporal smoothness constraints and allow a principled management of uncertainty. The algorithms combine sparsity, mixture modeling, and non-linear dimensionality reduction for efficient computation in high-dimensional continuous state spaces. The combined system automatically self-initializes and recovers from failure. The research has three contributions: (1) We establish the density propagation rules for discriminative inference in continu-ous, temporal chain models; (2) We propose flexible supervised and unsupervised algorithms for learning feedforward, multivalued contextual mappings (multimodal state distributions) based on compact, condi-tional Bayesian mixture of experts models; (3) We validate the framework empirically for the reconstruction of 3d human motion in monocular video sequences. Our tests on both real and motion capture-based se-quences show significant performance gains with respect to competing nearest-neighbor, regression, and structured prediction methods.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University