Results 1 - 10
of
14
Efficient Agnostic Learning of Neural Networks with Bounded Fan-in
, 1996
"... We show that the class of two layer neural networks with bounded fan-in is efficiently learnable in a realistic extension to the Probably Approximately Correct (PAC) learning model. In this model, a joint probability distribution is assumed to exist on the observations and the learner is required to ..."
Abstract
-
Cited by 57 (18 self)
- Add to MetaCart
We show that the class of two layer neural networks with bounded fan-in is efficiently learnable in a realistic extension to the Probably Approximately Correct (PAC) learning model. In this model, a joint probability distribution is assumed to exist on the observations and the learner is required to approximate the neural network which minimizes the expected quadratic error. As special cases, the model allows learning real-valued functions with bounded noise, learning probabilistic concepts and learning the best approximation to a target function that cannot be well approximated by the neural network. The networks we consider have real-valued inputs and outputs, an unlimited number of threshold hidden units with bounded fan-in, and a bound on the sum of the absolute values of the output weights. The number of computation This work was supported by the Australian Research Council and the Australian Telecommunications and Electronics Research Board. The material in this paper was pres...
Noisy Time Series Prediction using a Recurrent Neural Network and Grammatical Inference
- Machine Learning
, 2001
"... Financial forecasting is an example of a signal processing problem which is challenging due to small sample sizes, high noise, non-stationarity, and non-linearity. Neural networks have been very successful in a number of signal processing applications. We discuss fundamental limitations and inherent ..."
Abstract
-
Cited by 40 (0 self)
- Add to MetaCart
Financial forecasting is an example of a signal processing problem which is challenging due to small sample sizes, high noise, non-stationarity, and non-linearity. Neural networks have been very successful in a number of signal processing applications. We discuss fundamental limitations and inherent difficulties when using neural networks for the processing of high noise, small sample size signals. We introduce a new intelligent signal processing method which addresses the difficulties. The method proposed uses conversion into a symbolic representation with a selforganizing map, and grammatical inference with recurrent neural networks. We apply the method to the prediction of daily foreign exchange rates, addressing difficulties with non-stationarity, overfitting, and unequal a priori class probabilities, and we find significant predictability in comprehensive experiments covering 5 different foreign exchange rates. The method correctly predicts the direction of change for th...
Memory-Universal Prediction of Stationary Random Processes
- IEEE Trans. Inform. Theory
, 1998
"... We consider the problem of one-step-ahead prediction of a real-valued, stationary, strongly mixing random process fX i g i=01 . The best mean-square predictor of X0 is its conditional mean given the entire infinite past fX i g i=01 . Given a sequence of observations X1 X2 111 XN, we propose estimato ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
We consider the problem of one-step-ahead prediction of a real-valued, stationary, strongly mixing random process fX i g i=01 . The best mean-square predictor of X0 is its conditional mean given the entire infinite past fX i g i=01 . Given a sequence of observations X1 X2 111 XN, we propose estimators for the conditional mean based on sequences of parametric models of increasing memory and of increasing dimension, for example, neural networks and Legendre polynomials. The proposed estimators select both the model memory and the model dimension, in a data-driven fashion, by minimizing certain complexity regularized least squares criteria. When the underlying predictor function has a finite memory, we establish that the proposed estimators are memory-universal: the proposed estimators, which do not know the true memory, deliver the same statistical performance (rates of integrated mean-squared error) as that delivered by estimators that know the true memory. Furthermore, when the underlying predictor function does not have a finite memory, we establish that the estimator based on Legendre polynomials is consistent.
Hardness Results for Neural Network Approximation Problems
, 1999
"... Introduction Previous negative results for learning two-layer neural network classifiers show that it is difficult to find a network that correctly classifies all examples in a training set. However, for learning to a particular accuracy it is only necessary to approximately solve this problem, tha ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
Introduction Previous negative results for learning two-layer neural network classifiers show that it is difficult to find a network that correctly classifies all examples in a training set. However, for learning to a particular accuracy it is only necessary to approximately solve this problem, that is, to find a network that correctly classifies most examples in a training set. In this paper, we show that this approximation problem is hard for several neural network classes. The hardness of PAC style learning is a very natural question that has been addressed from a variety of viewpoints. The strongest non-learnability conclusions are those stating that no matter what type of algorithm a learner may use, as long as its computational resources are limited, it would not be able to predict a previously unseen label (with probability significantly better than that of a random guess). Such results have been derived by noticing that, in some precise sense, learning
Minimum Complexity Regression Estimation with Weakly Dependent Observations
- IEEE Trans. Inform. Theory
, 1996
"... Parameter Spaces and Abstract Complexities For each integer rt _> 1, let % denote a model dimension, for example, see (2), and let S, denote a compact subset of ]R The set S, will serve as a collection of parameters associated with the model dimension %, for example, see (5). For every v S,, let f( ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Parameter Spaces and Abstract Complexities For each integer rt _> 1, let % denote a model dimension, for example, see (2), and let S, denote a compact subset of ]R The set S, will serve as a collection of parameters associated with the model dimension %, for example, see (5). For every v S,, let f(,, v) denote a real-valued function on Bx parameterized by (n, v), for example, see (3). The following condition is required to invoke the exponential inequalities in Theorems 4.2 and 4.3.
Presenting and Analyzing the Results of AI Experiments: Data Averaging and Data Snooping
- In Proceedings of the Fourteenth National Conference on Artificial Intelligence, AAAI-97. Menlo Park
, 1997
"... Experimental results reported in the machine learning AI literature can be misleading. This paper investigates the common processes of data averaging (reporting results in terms of the mean and standard deviation of the results from multiple trials) and data snooping in the context of neural network ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Experimental results reported in the machine learning AI literature can be misleading. This paper investigates the common processes of data averaging (reporting results in terms of the mean and standard deviation of the results from multiple trials) and data snooping in the context of neural networks, one of the most popular AI machine learning models. Both of these processes can result in misleading results and inaccurate conclusions. We demonstrate how easily this can happen and propose techniques for avoiding these very important problems. For data averaging, common presentation assumes that the distribution of individual results is Gaussian. However, we investigate the distribution for common problems and find that it often does not approximate the Gaussian distribution, may not be symmetric, and may be multimodal. We show that assuming Gaussian distributions can significantly affect the interpretation of results, especially those of comparison studies. For a controlled task, we fi...
Noisy time series prediction using symbolic representation and recurrent neural network grammatical inference
, 1996
"... Financial forecasting is an example of a signal processing problem which is challenging due to small sample sizes, high noise, non-stationarity, and non-linearity. Neural networks have been very successful in a number of signal processing applications. We discuss fundamental limitations and inherent ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Financial forecasting is an example of a signal processing problem which is challenging due to small sample sizes, high noise, non-stationarity, and non-linearity. Neural networks have been very successful in a number of signal processing applications. We discuss fundamental limitations and inherent difficulties when using neural networks for the processing of high noise, small sample size signals. We introduce a new intelligent signal processing method which addresses the difficulties. The method uses conversion into a symbolic representation with a self-organizing map, and grammatical inference with recurrent neural networks. We apply the method to the prediction of daily foreign exchange rates, addressing difficulties with non-stationarity, overfitting, and unequal a priori class probabilities, and we find significant predictability in comprehensive experiments covering 5 different foreign exchange rates. The method correctly predicts the direction of change for the next day with an error rate of 47.1%. The error rate reduces to around 40% when rejecting examples where the system has low confidence in its prediction. The symbolic representation aids the extraction of symbolic knowledge from the recurrent neural networks in the form of deterministic finite state automata. These automata explain the operation of the system and are often relatively simple. Rules related to well known behavior such as trend following and mean reversal are extracted.
On the Consistency of Boosting Algorithms
, 2001
"... Boosting algorithms have been shown to perform well on many realworld problems, although they sometimes tend to overfit in noisy situations. While excellent finite sample bounds are known, it has not been clear whether boosting is statistically consistent, implying asymptotic convergence to the opti ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Boosting algorithms have been shown to perform well on many realworld problems, although they sometimes tend to overfit in noisy situations. While excellent finite sample bounds are known, it has not been clear whether boosting is statistically consistent, implying asymptotic convergence to the optimal classification rule. Recent work has provided su#cient conditions for the consistency of boosting for one-dimensional problems. In this work we provide su#cient conditions for the consistency of boosting in the multi-variate case. These conditions require non-trivial geometric concepts, which play no role in the one-dimensional setting. An interesting connection to the recently introduced notion of kernel alignment is pointed out. 1
Agnostic Learning and Single Hidden Layer Neural Networks
, 1996
"... This thesis is concerned with some theoretical aspects of supervised learning of real-valued functions. We study a formal model of learning called agnostic learning. The agnostic learning model assumes a joint probability distribution on the observations (inputs and outputs) and requires the learnin ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This thesis is concerned with some theoretical aspects of supervised learning of real-valued functions. We study a formal model of learning called agnostic learning. The agnostic learning model assumes a joint probability distribution on the observations (inputs and outputs) and requires the learning algorithm to produce an hypothesis with performance close to that of the best function within a specified class of functions. It is a very general model of learning which includes function learning, learning with additive noise and learning the best approximation in a class of functions as special cases. Within the agnostic learning model, we concentrate on learning functions which can be well approximated by single hidden layer neural networks. Artificial neural networks are often used as black box models for modelling phenomena for which very little prior knowledge is available. Agnostic learning is a natural model for such learning problems. The class of single hidden layer neural netwo...
On the Distribution of Performance from Multiple Neural-Network Trials
, 1997
"... The performance of neural-network simulations is often reported in terms of the mean and standard deviation of a number of simulations performed with different starting conditions. However, in many cases, the distribution of the individual results does not approximate a Gaussian distribution, may no ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The performance of neural-network simulations is often reported in terms of the mean and standard deviation of a number of simulations performed with different starting conditions. However, in many cases, the distribution of the individual results does not approximate a Gaussian distribution, may not be symmetric, and may be multimodal. We present the distribution of results for practical problems and show that assuming Gaussian distributions can significantly affect the interpretation of results, especially those of comparison studies. For a controlled task which we consider, we find that the distribution of performance is skewed toward better performance for smoother target functions and skewed toward worse performance for more complex target functions. We propose new guidelines for reporting performance which provide more information about the actual distribution.

