Results 1 -
8 of
8
Issues in Bayesian Analysis of Neural Network Models
, 1998
"... This paper discusses these issues exploring the potentiality of Bayesian ideas in the analysis of NN models. Buntine and Weigend (1991) and MacKay (1992) have provided frameworks for their Bayesian analysis based on Gaussian approximations and Neal (1993) has applied hybrid Monte Carlo ideas. Ripley ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
This paper discusses these issues exploring the potentiality of Bayesian ideas in the analysis of NN models. Buntine and Weigend (1991) and MacKay (1992) have provided frameworks for their Bayesian analysis based on Gaussian approximations and Neal (1993) has applied hybrid Monte Carlo ideas. Ripley (1993) and Cheng and Titterington (1994) have dwelt on the power of these ideas, specially as far as interpretation and architecture selection are concerned. See MacKay (1995) for a recent review. From a statistical modeling point of view NN's are a special instance of mixture models. Many issues about posterior multimodality and computational strategies in NN modeling are of relevance in the wider class of mixture models. Related recent references in the Bayesian literature on mixture models include Diebolt and Robert (1994), Escobar and West (1994), Robert and Mengersen (1995), Roeder and Wasserman (1995), West (1994), West and Cao (1993), West, Muller and Escobar (1994), and West and Turner (1994). We concentrate on approximation problems, though many of our suggestions can be translated to other areas. For those problems, NN's are viewed as highly nonlinear (semiparametric) approximators, where parameters are typically estimated by least squares. Applications of interest for practicioners include nonlinear regression, stochastic optimisation and regression metamodels for simulation output. The main issue we address here is how to undertake a Bayesian analysis of a NN model, and the uses of it we may make. Our contributions include: an evaluation of computational approaches to Bayesian analysis of NN models, including a novel Markov chain Monte Carlo scheme; a suggestion of a scheme for handling a variable architecture model and a scheme for combining NN models with more ...
Bayesian neural networks for internet traffic classification
- IEEE Transaction on Neural Networks
, 2007
"... Abstract—Internet traffic identification is an important tool for network management. It allows operators to better predict future traffic matrices and demands, security personnel to detect anomalous behavior, and researchers to develop more realistic traffic models. We present here a traffic classi ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
Abstract—Internet traffic identification is an important tool for network management. It allows operators to better predict future traffic matrices and demands, security personnel to detect anomalous behavior, and researchers to develop more realistic traffic models. We present here a traffic classifier that can achieve a high accuracy across a range of application types without any source or destination host-address or port information. We use supervised machine learning based on a Bayesian trained neural network. Though our technique uses training data with categories derived from packet content, training and testing were done using features derived from packet streams consisting of one or more packet headers. By providing classification without access to the contents of packets, our technique offers wider application than methods that require full packet/payloads for classification. This is a powerful advantage, using samples of classified traffic to permit the categorization of traffic based only upon commonly available information. Index Terms—Internet traffic, network operations, neural network applications, pattern recognition, traffic identification.
Improving the Determination of the Hyperparameters in Bayesian Learning
- In Proccedings of the ACNN '98
, 1998
"... Bayesian learning provides a theoretical way to prevent neural networks from overfitting. It is possible to determine the weight decay parameter during the training process without using a validation set. This is done by maximizing the evidence p(Djff; fi) of the hyperparameters ff and fi. In this ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Bayesian learning provides a theoretical way to prevent neural networks from overfitting. It is possible to determine the weight decay parameter during the training process without using a validation set. This is done by maximizing the evidence p(Djff; fi) of the hyperparameters ff and fi. In this papers two new methods are described that improve the determination of the hyperparameters. The first method defines an iteration process in order to get the optimal value of ff. We proof that this iteration process always converge to the optimal solution. The second one takes into account the fact, that ff and fi are so-called scale parameters and therefore have a natural a priori probability that differs significantly from the a priori probability that is used in general. The new methods are applied to a very noisy data set, namely the prediction of the foreign exchange rate of the US Dollar against the German Mark and demonstrate a substantial improvement with respect to the generalizati...
Gated Experts for Classification of Financial Time Series
, 1997
"... this paper are neural networks whose forecasts are combined by another neural network, a gate. For regression problems such an architecture was shown to partly remedy the two main problems in forecasting real world time series: nonstationarity and overfitting. The goal of this paper is to compare th ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
this paper are neural networks whose forecasts are combined by another neural network, a gate. For regression problems such an architecture was shown to partly remedy the two main problems in forecasting real world time series: nonstationarity and overfitting. The goal of this paper is to compare the forecasting ability of gated experts (GE) with a that of a single neural network expert on a time series classification task, which corresponds to decisions of taking a long position in a stock, a short position, or doing nothing. A new error function and a weight update rule were derived for this problem. The architecture was tested on the actual stock market data, and the errors on both training and testing data were smaller than errors for the best expert. This suggests that the performance of any single stock market forecasting system can be improved by making several copies of it and training them under the GE framework. In addition, an algorithm is presented for the GE architecture that makes it possible for the model to modify the data to fit the model better. Such a modification is done only if the decrease in the model cost associated with the output error is less than the increase in the input cost associated with moving the data away from its initial values. This idea corresponds to a bi-directional search for the true model, which was shown in AI to cut in half the exponent in the search time in comparison to the standard unidirectional search used by most connectionist architectures. The implementation of this algorithm was show to further decrease overfitting on the testing data.
Stock Market Pattern Recognition with Neural Networks
, 1997
"... this paper we understand a real world structure or process which is characterized by a set of structural and behavioral patterns. These patterns can be viewed as reflecting the "style" of the object. The objects are assumed to have a relatively high level of stationarity and the patterns characteriz ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
this paper we understand a real world structure or process which is characterized by a set of structural and behavioral patterns. These patterns can be viewed as reflecting the "style" of the object. The objects are assumed to have a relatively high level of stationarity and the patterns characterizing an object are assumed to be probabilistically dependent on each other. For stock market modeling, these objects do not have to represent physical entities such as company's assets or other objects used in fundamental analysis. The objects can be structures that were created as a result of complex interactions of physical entities. The subject of behavioral finance deals with a class of objects such as fads and fashions present in the market. The stock market is extremely sensitive to its environment, and many objects related to the stock market contribute their patterns to the stock price. The goal is to extract patterns related to each object and build a model of the object from these patterns. For the purpose of risk management, patterns not related to any object will be considered nonstationary and will thus be classified as noise. A similar idea was proposed in Weigend, Zimmermann and Neuneier (1996), who describe an architecture in which the data is accepted for analysis only if it confirms the model. In the AI term, their algorithm implements a bi-directional search, which was proven to give better results than a one-sided search. The objects in the stock market contribute patterns to the stock price at different time scales. This idea is gaining a wide recognition which is reflected in the growing number of research in multi-resolution analysis. See Bjorn and Weigend (1996) for discussion. For example, investors and traders operate at different time horizons, and t...
Temperature Wind
"... ) model (Lewis and Stevens, 1991; Lewis et al., 1994). The modelling is done by letting the predictor variables for the øth value in the time series fy ø g be given by y ø \Gamma1 (= x ø;1 ); y ø \Gamma2 (= x ø;2 ); : : : ; y ø \Gammap (= x ø;p ). Note that if we combined these predictors to form a ..."
Abstract
- Add to MetaCart
) model (Lewis and Stevens, 1991; Lewis et al., 1994). The modelling is done by letting the predictor variables for the øth value in the time series fy ø g be given by y ø \Gamma1 (= x ø;1 ); y ø \Gamma2 (= x ø;2 ); : : : ; y ø \Gammap (= x ø;p ). Note that if we combined these predictors to form a linear additive function we would just be modelling the time series as a usual AR(p) process. However, the ASTAR method involves modelling these lagged predictors variables using a MARS model. Thus the predictor 5.6. MODELLING TIME SERIES USING BAYESIAN MARS 127 variables can have both threshold terms, because of the form of the truncated linear spline basis functions, and interactions
Posterior Simulation for Feed Forward Neural Network Models
"... research. However, it leads to difficult computational problems, stemming from nonnormality and multimodality of posterior distributions, which hinder the use of methods like Laplace integration, Gaussian quadrature and Monte Carlo importance sampling. Multimodality issues have predated discussions ..."
Abstract
- Add to MetaCart
research. However, it leads to difficult computational problems, stemming from nonnormality and multimodality of posterior distributions, which hinder the use of methods like Laplace integration, Gaussian quadrature and Monte Carlo importance sampling. Multimodality issues have predated discussions in neural network research, see e.g. Ripley (1993), and are relevant as well for mixture models, see West, Muller and Escobar (1994) and Crawford (1994), of which FFNN's are a special case. There are three main reasons for multimodality of posterior models in FFNN's. The first one is symmetries due to relabeling; we mitigate this problem introducing appropriate inequality constraints among parameters. The second, and most worrisome, is the inclusion of several copies of the same term, in our case, terms with the same fl vector. Node duplication may be actually viewed as a manifestation of model mixing. The third one is inherent
Extended Kalman Filter Based Pruning Algorithms And Several Aspects Of Neural Network Learning
, 1998
"... In recent years, more and more researchers have been aware of the effectiveness of using the extended Kalman filter (EKF) in neural network learning since some information such as the Kalman gain and error covariance matrix can be obtained during the progress of training. It would be interesting to ..."
Abstract
- Add to MetaCart
In recent years, more and more researchers have been aware of the effectiveness of using the extended Kalman filter (EKF) in neural network learning since some information such as the Kalman gain and error covariance matrix can be obtained during the progress of training. It would be interesting to inquire if there is any possibility of using an EKF method together with pruning in order to speed up the learning process, as well as to determine the size of a trained network. In this dissertation, certain extended Kalman filter based pruning algorithms for feedforward neural network (FNN) and recurrent neural network (RNN) are proposed and several aspects of neural network learning are presented. For FNN, a weight importance measure linking up prediction error sensitivity and the by-products obtained from EKF training is derived. Comparison results demonstrate that the proposed measure can better approximate the prediction error sensitivity than using the forgetting recursive least squa...

