Results 1  10
of
25
Efficient Progressive Sampling
, 1999
"... Having access to massiveamounts of data does not necessarily imply that induction algorithms must use them all. Samples often provide the same accuracy with far less computational cost. However, the correct sample size is rarely obvious. We analyze methods for progressive samplingstarting with ..."
Abstract

Cited by 91 (9 self)
 Add to MetaCart
Having access to massiveamounts of data does not necessarily imply that induction algorithms must use them all. Samples often provide the same accuracy with far less computational cost. However, the correct sample size is rarely obvious. We analyze methods for progressive samplingstarting with small samples and progressively increasing them as long as model accuracy improves. We show that a simple, geometric sampling schedule is efficient in an asymptotic sense. We then explore the notion of optimal efficiency: what is the absolute best sampling schedule? We describe the issues involved in instantiating an "optimally efficient" progressive sampler. Finally,we provide empirical results comparing a variety of progressive sampling methods. We conclude that progressive sampling often is preferable to analyzing all data instances.
The Perceptron algorithm vs. Winnow: linear vs. logarithmic mistake bounds when few input variables are relevant
"... This paper addresses the familiar problem of predicting with a linear classifier . The ..."
Abstract

Cited by 55 (8 self)
 Add to MetaCart
This paper addresses the familiar problem of predicting with a linear classifier . The
Rigorous learning curve bounds from statistical mechanics
 Machine Learning
, 1994
"... Abstract In this paper we introduce and investigate a mathematically rigorous theory of learning curves that is based on ideas from statistical mechanics. The advantage of our theory over the wellestablished VapnikChervonenkis theory is that our bounds can be considerably tighter in many cases, an ..."
Abstract

Cited by 53 (9 self)
 Add to MetaCart
Abstract In this paper we introduce and investigate a mathematically rigorous theory of learning curves that is based on ideas from statistical mechanics. The advantage of our theory over the wellestablished VapnikChervonenkis theory is that our bounds can be considerably tighter in many cases, and are also more reflective of the true behavior (functional form) of learning curves. This behavior can often exhibit dramatic properties such as phase transitions, as well as power law asymptotics not explained by the VC theory. The disadvantages of our theory are that its application requires knowledge of the input distribution, and it is limited so far to finite cardinality function classes. We illustrate our results with many concrete examples of learning curve bounds derived from our theory. 1 Introduction According to the VapnikChervonenkis (VC) theory of learning curves [27, 26], minimizing empirical error within a function class F on a random sample of m examples leads to generalization error bounded by ~O(d=m) (in the case that the target function is contained in F) or ~O(pd=m) plus the optimal generalization error achievable within F (in the general case). 1 These bounds are universal: they hold for any class of hypothesis functions F, for any input distribution, and for any target function. The only problemspecific quantity remaining in these bounds is the VC dimension d, a measure of the complexity of the function class F. It has been shown that these bounds are essentially the best distributionindependent bounds possible, in the sense that for any function class, there exists an input distribution for which matching lower bounds on the generalization error can be given [5, 7, 22].
OnLine Learning Processes in Artificial Neural Networks
, 1993
"... We study online learning processes in artificial neural networks from a general point of view. Online learning means that a learning step takes place at each presentation of a randomly drawn training pattern. It can be viewed as a stochastic process governed by a continuoustime master equation. O ..."
Abstract

Cited by 31 (4 self)
 Add to MetaCart
We study online learning processes in artificial neural networks from a general point of view. Online learning means that a learning step takes place at each presentation of a randomly drawn training pattern. It can be viewed as a stochastic process governed by a continuoustime master equation. Online learning is necessary if not all training patterns are available all the time. This occurs in many applications when the training patterns are drawn from a timedependent environmental distribution. Studying learning in a changing environment, we encounter a conflict between the adaptability and the confidence of the network's representation. Minimization of a criterion incorporating both effects yields an algorithm for online adaptation of the learning parameter. The inherent noise of online learning makes it possible to escape from undesired local minima of the error potential on which the learning rule performs (stochastic) gradient descent. We try to quantify these often made cl...
Kernel Methods: A Survey of Current Techniques
 Neurocomputing
, 2000
"... : Kernel Methods have become an increasingly popular tool for machine learning tasks involving classification, regression or novelty detection. They exhibit good generalisation performance on many reallife datasets and the approach is properly motivated theoretically. There are relatively few free ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
: Kernel Methods have become an increasingly popular tool for machine learning tasks involving classification, regression or novelty detection. They exhibit good generalisation performance on many reallife datasets and the approach is properly motivated theoretically. There are relatively few free parameters to adjust and the architecture of the learning machine does not need to be found by experimentation. In this tutorial we survey this subject with a principal focus on the most wellknown models based on kernel substitution, namely, Support Vector Machines. 1 Introduction. Support Vector Machines (SVMs) have been successfully applied to a number of applications ranging from particle identification, face identification and text categorisation to engine knock detection, bioinformatics and database marketing [9]. The approach is systematic and properly motivated by statistical learning theory [42]. Training involves optimisation of a convex cost function: there are no false local mi...
Using Finite Automata to Mine Execution Data for Intrusion Detection: a Preliminary Report
 In Recent Advances in Intrusion Detection (RAID
, 2000
"... The use of program execution traces to detect intrusions has proven to be a successful strategy. Existing systems that employ this approach are anomaly detectors, meaning that they model a program's normal behavior and signal deviations from that behavior. Unfortunately, many programbased exploits ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
The use of program execution traces to detect intrusions has proven to be a successful strategy. Existing systems that employ this approach are anomaly detectors, meaning that they model a program's normal behavior and signal deviations from that behavior. Unfortunately, many programbased exploits of NT systems use specialized malicious executables. Anomaly detection systems cannot deal with such programs because there is no standard of \normalcy" that they deviate from.
Prior Information and Generalized Questions
, 1996
"... In learning problems available information is usually divided into two categories: examples of function values (or training data) and prior information (e.g. a smoothness constraint). ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
In learning problems available information is usually divided into two categories: examples of function values (or training data) and prior information (e.g. a smoothness constraint).
AverageCase Learning Curves for Radial Basis Function Networks
 Neural Computation
, 1995
"... The application of statistical physics to the study of the learning curves of feedforward connectionist networks has, to date, been concerned mostly with networks that do not include hidden layers. Recent work has extended the theory to networks such as committee machines and parity machines; howeve ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
The application of statistical physics to the study of the learning curves of feedforward connectionist networks has, to date, been concerned mostly with networks that do not include hidden layers. Recent work has extended the theory to networks such as committee machines and parity machines; however these are not networks that are often used in practice and an important direction for current and future research is the extension of the theory to practical connectionist networks. In this paper we investigate the learning curves of a class of networks that has been widely, and successfully applied to practical problems: the Gaussian radial basis function networks (RBFNs). We address the problem of learning linear and nonlinear, realizable and unrealizable, target rules from noisefree training examples using a stochastic training algorithm. Expressions for the generalization error, defined as the expected error for a network with a given set of parameters, are derived for general Gaussia...
The Use of Being Stubborn and Introspective
, 1994
"... We give a simple example of a learning process in an artificial neural network: a perceptron learning by noisy training examples, i.e. a "student" learning from an unreliable "teacher." A "gullible novice," who takes for granted everything the teacher says, does in the end far worse than this teache ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We give a simple example of a learning process in an artificial neural network: a perceptron learning by noisy training examples, i.e. a "student" learning from an unreliable "teacher." A "gullible novice," who takes for granted everything the teacher says, does in the end far worse than this teacher. Being more "stubborn," i.e. by neglecting some of the teacher's answers, particularly the ones that the student is confident about, the student can indeed outperform the teacher. We present an algorithm for online tuning of the "acceptance threshold," the fraction of the teacher's advice that the student takes into account. With this algorithm the student will always become wiser than the unreliable teacher, independent of the fraction of his errors. This is an "introspective" algorithm, in the sense that the student's own learning behavior is used to update the acceptance threshold. Introspection makes the algorithm applicable in environments that are both noisy and nonstationary. By fo...
Asymptotic Properties of the Fisher Kernel
 Neural Computation
, 2003
"... This paper analyses the Fisher kernel (FK) from a statistical point of view. The FK is a particularly interesting method for constructing a model of the posterior probability that makes intelligent use of unlabeled data, i.e. of the underlying data density. It is important to analyse and ultimate ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
This paper analyses the Fisher kernel (FK) from a statistical point of view. The FK is a particularly interesting method for constructing a model of the posterior probability that makes intelligent use of unlabeled data, i.e. of the underlying data density. It is important to analyse and ultimately understand the statistical properties of the FK. To this end, we first establish su#cient conditions that the constructed posterior model is realizable, i.e. that it contains the true distribution.