Results 11 - 20
of
45
Hierarchical Bayesian-Kalman Models For Regularisation And ARD In Sequential Learning
- Department of Engineering, Cambridge University
, 1998
"... In this paper, we show that a hierarchical Bayesian modelling approach to sequential learning leads to many interesting attributes such as regularisation and automatic relevance determination. We identify three inference levels within this hierarchy, namely model selection, parameter estimation and ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
In this paper, we show that a hierarchical Bayesian modelling approach to sequential learning leads to many interesting attributes such as regularisation and automatic relevance determination. We identify three inference levels within this hierarchy, namely model selection, parameter estimation and noise estimation. In environments where data arrives sequentially, techniques such as cross-validation to achieve regularisation or model selection are not possible. The Bayesian approach, with extended Kalman filtering at the parameter estimation level, allows for regularisation within a minimum variance framework. A multi-layer perceptron is used to generate the extended Kalman filter nonlinear measurements mapping. We describe several algorithms at the noise estimation level, which allow us to implement adaptive regularisation and automatic relevance determination of model inputs and basis functions. An important contribution of this paper is to show the theoretical links between adaptive...
Connection Pruning with Static and Adaptive Pruning Schedules
- Neurocomputing
, 1996
"... Neural network pruning methods on the level of individual network parameters (e.g. connection weights) can improve generalization, as is shown in this empirical study. However, an open problem in the pruning methods known today (e.g. OBD, OBS, autoprune, epsiprune) is the selection of the number of ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Neural network pruning methods on the level of individual network parameters (e.g. connection weights) can improve generalization, as is shown in this empirical study. However, an open problem in the pruning methods known today (e.g. OBD, OBS, autoprune, epsiprune) is the selection of the number of parameters to be removed in each pruning step (pruning strength). This work presents a pruning method lprune that automatically adapts the pruning strength to the evolution of weights and loss of generalization during training. The method requires no algorithm parameter adjustment by the user. Results of statistical significance tests comparing autoprune, lprune, and static networks with early stopping are given, based on extensive experimentation with 14 different problems. The results indicate that training with pruning is often significantly better and rarely significantly worse than training with early stopping without pruning. Furthermore, lprune is often superior to autoprune (which is...
Bayesian Ying Yang system, best harmony learning, and Gaussian manifold based family
- Computational Intelligence: Research Frontiers, WCCI2008 Plenary/Invited Lectures. Lecture Notes in Computer Science
"... five action circling ..."
Sequence features of DNA binding sites reveal structural class of associated transcription factor
- Bioinformatics
, 2006
"... Motivation: A key goal in molecular biology is to understand the mechanisms by which a cell regulates the transcription of its genes. One important aspect of this transcriptional regulation is the binding of transcription factors (TFs) to their specific cis-regulatory counterparts on the DNA. TFs re ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Motivation: A key goal in molecular biology is to understand the mechanisms by which a cell regulates the transcription of its genes. One important aspect of this transcriptional regulation is the binding of transcription factors (TFs) to their specific cis-regulatory counterparts on the DNA. TFs recognize and bind their DNA counterparts according to the structure of their DNA-binding domains (e.g. zinc finger, leucine zipper, homeodomain). The structure of thesedomains can be used as a basis for grouping TFs into classes. Although the structure of DNAbinding domains varies widely across TFs generally, the TFs within a particular class bind to DNA in a similar fashion, suggesting the existence of class-specific features in the DNA sequences bound by each class of TFs. Results: In this paper, we apply a sparse Bayesian learning algorithm to identify a small set of class-specific features in the DNA sequences bound by different classes of TFs; the algorithm simultaneously learns a true multi-class classifier that uses these features to predict the DNA-binding domain of the TF that recognizes a particular set of DNA sequences. We train our algorithm on the six largest classes in TRANSFAC, comprising a total of 587 TFs. We learn a six-class classifier for this training set that achieves 87 % leave-one-out crossvalidation accuracy. We also identify features within cis-regulatory sequences that are highly specific to each class of TF, which has significant implications for how TF binding sites should be modeled for the purpose of motif discovery.
On The Use Of A Pruning Prior For Neural Networks
- In Neural Networks for Signal Processing VI -- Proceedings of the 1996 IEEE Workshop, number VI in NNSP
, 1996
"... . We adress the problem of using a regularization prior that prunes unnecessary weights in a neural network architecture. This prior provides a convenient alternative to traditional weight-decay. Two examples are studied to support this method and illustrate its use. First we use the sunspots benchm ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
. We adress the problem of using a regularization prior that prunes unnecessary weights in a neural network architecture. This prior provides a convenient alternative to traditional weight-decay. Two examples are studied to support this method and illustrate its use. First we use the sunspots benchmark problem as an example of time series processing. Then we adress the problem of system identification on a small artificial system. OVERVIEW It is well known that the use of a regularization term during optimization improves the general accuracy of the model obtained. In the case of neural networks, regularization is most often used through the addition of a weight-decay term to the cost function in order to improve the generalization abilities of the solution [5]. Other methods for improving these abilities include pruning, along the lines of OBD [6]. These techniques have been applied to a wide variety of problems, including time series and system identification. In this paper, we ana...
Regularization with a Pruning Prior
, 1997
"... We investigate the use of a regularization prior and its pruning properties. We illustrate the behavior of this prior by conducting analyses both using a Bayesian framework and with the generalization method, on a simple toy problem. Results are thoroughly compared with those obtained with a traditi ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
We investigate the use of a regularization prior and its pruning properties. We illustrate the behavior of this prior by conducting analyses both using a Bayesian framework and with the generalization method, on a simple toy problem. Results are thoroughly compared with those obtained with a traditional weight decay. Keywords: Regularization, Generalization method, Bayesian learning, Evidence framework, Laplace prior, Comparison with weight decay. List of symbols w parameter. w parameter estimate. ~ w teacher parameter. ¸ regularization parameter. S(w) training error. C(w) regularized training error. E(w) generalization error. D data set. N number of examples = card(D). y i individual example, i 2 f1 : : : Ng. ¯ y empirical mean of the examples. oe 2 variance of the examples. oe 2 y empirical variance of the examples = P N i=1 (y i \Gamma ¯ y) 2 =N . ff hyper-parameter. wML maximum likelihood estimator. wMP most probable estimator. wG best generalization estimat...
Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation
"... Accurate modeling of geographic distributions of species is crucial to various applications in ecology and conservation. The best performing techniques often require some parameter tuning, which may be prohibitively time-consuming to do separately for each species, or unreliable for small or biased ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Accurate modeling of geographic distributions of species is crucial to various applications in ecology and conservation. The best performing techniques often require some parameter tuning, which may be prohibitively time-consuming to do separately for each species, or unreliable for small or biased datasets. Additionally, even with the abundance of good quality data, users interested in the application of species models need not have the statistical knowledge required for detailed tuning. In such cases, it is desirable to use ‘‘default settings’’, tuned and validated on diverse datasets. Maxent is a recently introduced modeling technique, achieving high predictive accuracy and enjoying several additional attractive properties. The performance of Maxent is influenced by a moderate number of parameters. The first contribution of this paper is the empirical tuning of these parameters. Since many datasets lack information about species absence, we present a tuning method that uses presence-only data. We evaluate our method on independently collected high-quality presenceabsence data. In addition to tuning, we introduce several concepts that improve the predictive accuracy and running time of Maxent. We introduce ‘‘hinge features’ ’ that model more complex relationships in the training data; we describe a new logistic output format that gives an estimate of probability of presence; finally we explore ‘‘background sampling’’ strategies that cope with sample selection bias and decrease model-building time. Our evaluation, based on a diverse dataset of 226 species from 6 regions, shows: 1) default settings tuned on presence-only data achieve performance which is almost as good as if they had been tuned on the evaluation data itself; 2) hinge features substantially improve model
Performance Prediction for Exponential Language Models
"... We investigate the task of performance prediction for language models belonging to the exponential family. First, we attempt to empirically discover a formula for predicting test set cross-entropy for n-gram language models. We build models over varying domains, data set sizes, and n-gram orders, an ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
We investigate the task of performance prediction for language models belonging to the exponential family. First, we attempt to empirically discover a formula for predicting test set cross-entropy for n-gram language models. We build models over varying domains, data set sizes, and n-gram orders, and perform linear regression to see whether we can model test set performance as a simple function of training set performance and various model statistics. Remarkably, we find a simple relationship that predicts test set performance with a correlation of 0.9997. We analyze why this relationship holds and show that it holds for other exponential language models as well, including class-based models and minimum discrimination information models. Finally, we discuss how this relationship can be applied to improve language model performance. 1
The Subspace Information Criterion for Infinite Dimensional Hypothesis Spaces
- Journal of Machine Learning Research
, 2002
"... A central problem in learning is selection of an appropriate model. This is typically done by estimating the unknown generalization errors of a set of models to be selected from and then choosing the model with minimal generalization error estimate. In this article, we discuss the problem of mode ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
A central problem in learning is selection of an appropriate model. This is typically done by estimating the unknown generalization errors of a set of models to be selected from and then choosing the model with minimal generalization error estimate. In this article, we discuss the problem of model selection and generalization error estimation in the context of kernel regression models, e.g., kernel ridge regression, kernel subset regression or Gaussian process regression.
Efficient Covariance Matrix Methods for Bayesian Gaussian Processes and Hopfield Neural Networks
, 1999
"... Covariance matrices are important in many areas of neural modelling. In Hopfield networks they are used to form the weight matrix which controls the autoassociative properties of the network. In Gaussian processes, which have been shown to be the infinite neuron limit of many regularised feedforward ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Covariance matrices are important in many areas of neural modelling. In Hopfield networks they are used to form the weight matrix which controls the autoassociative properties of the network. In Gaussian processes, which have been shown to be the infinite neuron limit of many regularised feedforward neural networks, covariance matrices control the form of Bayesian prior distribution over function space. This thesis examines interesting modifications to the standard covariance matrix methods to increase functionality or efficiency of these neural techniques. Firstly the problem of adapting Gaussian process priors to perform regression on switching regimes is tackled. This involves the use of block covariance matrices and Gibbs sampling methods. Then the use of Toeplitz methods is proposed for Gaussian process regression where sampling positions can be chosen. A comparison is made between Hopfield weight matrices, and sample covariances. This allows work on sample covariances to be used ...

