Results 1  10
of
35
Efficient Progressive Sampling
, 1999
"... Having access to massiveamounts of data does not necessarily imply that induction algorithms must use them all. Samples often provide the same accuracy with far less computational cost. However, the correct sample size is rarely obvious. We analyze methods for progressive samplingstarting with ..."
Abstract

Cited by 96 (9 self)
 Add to MetaCart
(Show Context)
Having access to massiveamounts of data does not necessarily imply that induction algorithms must use them all. Samples often provide the same accuracy with far less computational cost. However, the correct sample size is rarely obvious. We analyze methods for progressive samplingstarting with small samples and progressively increasing them as long as model accuracy improves. We show that a simple, geometric sampling schedule is efficient in an asymptotic sense. We then explore the notion of optimal efficiency: what is the absolute best sampling schedule? We describe the issues involved in instantiating an "optimally efficient" progressive sampler. Finally,we provide empirical results comparing a variety of progressive sampling methods. We conclude that progressive sampling often is preferable to analyzing all data instances.
The Perceptron algorithm vs. Winnow: linear vs. logarithmic mistake bounds when few input variables are relevant
"... This paper addresses the familiar problem of predicting with a linear classifier . The ..."
Abstract

Cited by 57 (8 self)
 Add to MetaCart
This paper addresses the familiar problem of predicting with a linear classifier . The
Rigorous learning curve bounds from statistical mechanics
 Machine Learning
, 1994
"... Abstract In this paper we introduce and investigate a mathematically rigorous theory of learning curves that is based on ideas from statistical mechanics. The advantage of our theory over the wellestablished VapnikChervonenkis theory is that our bounds can be considerably tighter in many cases, an ..."
Abstract

Cited by 53 (9 self)
 Add to MetaCart
(Show Context)
Abstract In this paper we introduce and investigate a mathematically rigorous theory of learning curves that is based on ideas from statistical mechanics. The advantage of our theory over the wellestablished VapnikChervonenkis theory is that our bounds can be considerably tighter in many cases, and are also more reflective of the true behavior (functional form) of learning curves. This behavior can often exhibit dramatic properties such as phase transitions, as well as power law asymptotics not explained by the VC theory. The disadvantages of our theory are that its application requires knowledge of the input distribution, and it is limited so far to finite cardinality function classes. We illustrate our results with many concrete examples of learning curve bounds derived from our theory. 1 Introduction According to the VapnikChervonenkis (VC) theory of learning curves [27, 26], minimizing empirical error within a function class F on a random sample of m examples leads to generalization error bounded by ~O(d=m) (in the case that the target function is contained in F) or ~O(pd=m) plus the optimal generalization error achievable within F (in the general case). 1 These bounds are universal: they hold for any class of hypothesis functions F, for any input distribution, and for any target function. The only problemspecific quantity remaining in these bounds is the VC dimension d, a measure of the complexity of the function class F. It has been shown that these bounds are essentially the best distributionindependent bounds possible, in the sense that for any function class, there exists an input distribution for which matching lower bounds on the generalization error can be given [5, 7, 22].
OnLine Learning Processes in Artificial Neural Networks
, 1993
"... We study online learning processes in artificial neural networks from a general point of view. Online learning means that a learning step takes place at each presentation of a randomly drawn training pattern. It can be viewed as a stochastic process governed by a continuoustime master equation. O ..."
Abstract

Cited by 34 (4 self)
 Add to MetaCart
We study online learning processes in artificial neural networks from a general point of view. Online learning means that a learning step takes place at each presentation of a randomly drawn training pattern. It can be viewed as a stochastic process governed by a continuoustime master equation. Online learning is necessary if not all training patterns are available all the time. This occurs in many applications when the training patterns are drawn from a timedependent environmental distribution. Studying learning in a changing environment, we encounter a conflict between the adaptability and the confidence of the network's representation. Minimization of a criterion incorporating both effects yields an algorithm for online adaptation of the learning parameter. The inherent noise of online learning makes it possible to escape from undesired local minima of the error potential on which the learning rule performs (stochastic) gradient descent. We try to quantify these often made cl...
Kernel Methods: A Survey of Current Techniques
 Neurocomputing
, 2000
"... : Kernel Methods have become an increasingly popular tool for machine learning tasks involving classification, regression or novelty detection. They exhibit good generalisation performance on many reallife datasets and the approach is properly motivated theoretically. There are relatively few free ..."
Abstract

Cited by 34 (1 self)
 Add to MetaCart
(Show Context)
: Kernel Methods have become an increasingly popular tool for machine learning tasks involving classification, regression or novelty detection. They exhibit good generalisation performance on many reallife datasets and the approach is properly motivated theoretically. There are relatively few free parameters to adjust and the architecture of the learning machine does not need to be found by experimentation. In this tutorial we survey this subject with a principal focus on the most wellknown models based on kernel substitution, namely, Support Vector Machines. 1 Introduction. Support Vector Machines (SVMs) have been successfully applied to a number of applications ranging from particle identification, face identification and text categorisation to engine knock detection, bioinformatics and database marketing [9]. The approach is systematic and properly motivated by statistical learning theory [42]. Training involves optimisation of a convex cost function: there are no false local mi...
Using Finite Automata to Mine Execution Data for Intrusion Detection: a Preliminary Report
 In Recent Advances in Intrusion Detection (RAID
, 2000
"... The use of program execution traces to detect intrusions has proven to be a successful strategy. Existing systems that employ this approach are anomaly detectors, meaning that they model a program's normal behavior and signal deviations from that behavior. Unfortunately, many programbased expl ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
(Show Context)
The use of program execution traces to detect intrusions has proven to be a successful strategy. Existing systems that employ this approach are anomaly detectors, meaning that they model a program's normal behavior and signal deviations from that behavior. Unfortunately, many programbased exploits of NT systems use specialized malicious executables. Anomaly detection systems cannot deal with such programs because there is no standard of \normalcy" that they deviate from.
Kernel Methods for Deep Learning
"... We introduce a new family of positivedefinite kernel functions that mimic the computation in large, multilayer neural nets. These kernel functions can be used in shallow architectures, such as support vector machines (SVMs), or in deep kernelbased architectures that we call multilayer kernel machi ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
We introduce a new family of positivedefinite kernel functions that mimic the computation in large, multilayer neural nets. These kernel functions can be used in shallow architectures, such as support vector machines (SVMs), or in deep kernelbased architectures that we call multilayer kernel machines (MKMs). We evaluate SVMs and MKMs with these kernel functions on problems designed to illustrate the advantages of deep architectures. On several problems, we obtain better results than previous, leading benchmarks from both SVMs with Gaussian kernels as well as deep belief nets. 1
Prior Information and Generalized Questions
, 1996
"... In learning problems available information is usually divided into two categories: examples of function values (or training data) and prior information (e.g. a smoothness constraint). ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
In learning problems available information is usually divided into two categories: examples of function values (or training data) and prior information (e.g. a smoothness constraint).
AverageCase Learning Curves for Radial Basis Function Networks
 Neural Computation
, 1995
"... The application of statistical physics to the study of the learning curves of feedforward connectionist networks has, to date, been concerned mostly with networks that do not include hidden layers. Recent work has extended the theory to networks such as committee machines and parity machines; howeve ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
The application of statistical physics to the study of the learning curves of feedforward connectionist networks has, to date, been concerned mostly with networks that do not include hidden layers. Recent work has extended the theory to networks such as committee machines and parity machines; however these are not networks that are often used in practice and an important direction for current and future research is the extension of the theory to practical connectionist networks. In this paper we investigate the learning curves of a class of networks that has been widely, and successfully applied to practical problems: the Gaussian radial basis function networks (RBFNs). We address the problem of learning linear and nonlinear, realizable and unrealizable, target rules from noisefree training examples using a stochastic training algorithm. Expressions for the generalization error, defined as the expected error for a network with a given set of parameters, are derived for general Gaussia...
Large margin classification in infinite neural networks
"... We introduce a new family of positivedefinite kernels for large margin classification in support vector machines (SVMs). These kernels mimic the computation in large neural networks with one layer of hidden units. We also show how to derive new kernels, by recursive composition, that may be viewed ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
We introduce a new family of positivedefinite kernels for large margin classification in support vector machines (SVMs). These kernels mimic the computation in large neural networks with one layer of hidden units. We also show how to derive new kernels, by recursive composition, that may be viewed as mapping their inputs through a series of nonlinear feature spaces. These recursively derived kernels mimic the computation in deep networks with multiple hidden layers. We evaluate SVMs with these kernels on problems designed to illustrate the advantages of deep architectures. Comparing to previous benchmarks, we find that on some problems, these SVMs yield stateoftheart results, beating not only other SVMs, but also deep belief nets.