Results 1 - 10
of
19
Learning With Many Irrelevant Features
- In Proceedings of the Ninth National Conference on Artificial Intelligence
, 1991
"... In many domains, an appropriate inductive bias is the MIN-FEATURES bias, which prefers consistent hypotheses definable over as few features as possible. This paper defines and studies this bias. First, it is shown that any learning algorithm implementing the MIN-FEATURES bias requires \Theta( 1 ff ..."
Abstract
-
Cited by 187 (3 self)
- Add to MetaCart
In many domains, an appropriate inductive bias is the MIN-FEATURES bias, which prefers consistent hypotheses definable over as few features as possible. This paper defines and studies this bias. First, it is shown that any learning algorithm implementing the MIN-FEATURES bias requires \Theta( 1 ffl ln 1 ffi + 1 ffl [2 p + p ln n]) training examples to guarantee PAC-learning a concept having p relevant features out of n available features. This bound is only logarithmic in the number of irrelevant features. The paper also presents a quasi-polynomial time algorithm, FOCUS, which implements MIN-FEATURES. Experimental studies are presented that compare FOCUS to the ID3 and FRINGE algorithms. These experiments show that--- contrary to expectations---these algorithms do not implement good approximations of MIN-FEATURES. The coverage, sample complexity, and generalization performance of FOCUS is substantially better than either ID3 or FRINGE on learning problems where the MIN-FEATURE...
Reduction Techniques for Instance-Based Learning Algorithms
- Machine Learning
, 2000
"... . Instance-based learning algorithms are often faced with the problem of deciding which instances to store for use during generalization. Storing too many instances can result in large memory requirements and slow execution speed, and can cause an oversensitivity to noise. This paper has two main p ..."
Abstract
-
Cited by 93 (2 self)
- Add to MetaCart
. Instance-based learning algorithms are often faced with the problem of deciding which instances to store for use during generalization. Storing too many instances can result in large memory requirements and slow execution speed, and can cause an oversensitivity to noise. This paper has two main purposes. First, it provides a survey of existing algorithms used to reduce storage requirements in instance-based learning algorithms and other exemplar-based algorithms. Second, it proposes six additional reduction algorithms called DROP1--DROP5 and DEL (three of which were first described in Wilson & Martinez, 1997c, as RT1--RT3) that can be used to remove instances from the concept description. These algorithms and 10 algorithms from the survey are compared on 31 classification tasks. Of those algorithms that provide substantial storage reduction, the DROP algorithms have the highest average generalization accuracy in these experiments, especially in the presence of uniform class noise. ...
The Inferential Theory Of Learning: Developing Foundations for . . .
, 1993
"... Thedevelopmentofmultistrategylearningsystemsrequiresaclearunderstandingoftherolesandthe applicabilityconditionsofdifferentlearningstrategies.Tothisend,thischapterintroducesthe InferentialTheoryofLearning thatprovidesaconceptualframeworkforexplaininglogicalcapabilities oflearningstrategies,i.e.,thei ..."
Abstract
-
Cited by 61 (15 self)
- Add to MetaCart
Thedevelopmentofmultistrategylearningsystemsrequiresaclearunderstandingoftherolesandthe applicabilityconditionsofdifferentlearningstrategies.Tothisend,thischapterintroducesthe InferentialTheoryofLearning thatprovidesaconceptualframeworkforexplaininglogicalcapabilities oflearningstrategies,i.e.,their competence.Viewinglearningasaprocessofmodifyingthelearner's knowledgebyexploringthelearner'sexperience,thetheorypostulatesthatanysuchprocesscanbe describedasasearchina knowledgespace, which involvesthelearner'sexperience,piorknowledgeand the learninggoal .Thesearchoperatorsareinstantiationsof knowledgetransmutations, whichare genericpatternsofknowledgechange.Transmutationsmayemployanybasictypeofinference --- deduction,inductionoranalogy.Severalfundamentalknowledg etransmutationsaredescribedinanovel andgeneralway,suchasgeneralization,abstraction,explanationandsimilization,andtheircounterparts, specialization,concretion,predictionanddissimilization,respectively.Generalizationenlargesthe referenceset ofadescription(thesetofentitiesthatarebeingdescribed).Abstractionreducesthe amountofthedetailaboutthereferenceset.Explanationgeneratespremisesthatexplain(orimply)the givenpropertiesofthereferenceset.Similization transfersknowledgefromonereferencesettoasimilar referenceset.Usingconceptsofthetheory,a multistrategytask -adaptivelearning(MTL)methodology isoutlined,andillustratedbyanexample.MTLdynamicallyadaptsstrategiestothe learningtask , definedbytheinputinformation,learner'sbackgroundknowledge,andthelearninggoal. Thegoalof MTLresearchisto synergisticallyintegrateawiderangeofinferentiallearningstrategies,suchas empiricalgeneralization,constructiveinduction, deductivegeneralization,explanation,prediction, abstraction,andsimilization. Keywords: learningtheory,inferencetheory,multi...
Discovering Neural Nets With Low Kolmogorov Complexity And High Generalization Capability
- Neural Networks
, 1997
"... Many neural net learning algorithms aim at finding "simple" nets to explain training data. The expectation is: the "simpler" the networks, the better the generalization on test data (! Occam's razor). Previous implementations, however, use measures for "simplicity" that lack the power, universali ..."
Abstract
-
Cited by 41 (23 self)
- Add to MetaCart
Many neural net learning algorithms aim at finding "simple" nets to explain training data. The expectation is: the "simpler" the networks, the better the generalization on test data (! Occam's razor). Previous implementations, however, use measures for "simplicity" that lack the power, universality and elegance of those based on Kolmogorov complexity and Solomonoff's algorithmic probability. Likewise, most previous approaches (especially those of the "Bayesian" kind) suffer from the problem of choosing appropriate priors. This paper addresses both issues. It first reviews some basic concepts of algorithmic complexity theory relevant to machine learning, and how the Solomonoff-Levin distribution (or universal prior) deals with the prior problem. The universal prior leads to a probabilistic method for finding "algorithmically simple" problem solutions with high generalization capability. The method is based on Levin complexity (a time-bounded generalization of Kolmogorov comple...
For Every Generalization Action, Is There Really An Equal And Opposite Reaction? Analysis of the Conservation Law for Generalization Performance
- Proceedings of the Twelfth International Conference on Machine Learning
, 1995
"... The "Conservation Law for Generalization Performance" [Schaffer, 1994] states that for any learning algorithm and bias, "generalization is a zero-sum enterprise." In this paper we study the law and show that while the law is true, the manner in which the Conservation Law adds up generalization ..."
Abstract
-
Cited by 38 (0 self)
- Add to MetaCart
The "Conservation Law for Generalization Performance" [Schaffer, 1994] states that for any learning algorithm and bias, "generalization is a zero-sum enterprise." In this paper we study the law and show that while the law is true, the manner in which the Conservation Law adds up generalization performance over all target concepts, without regard to the probability with which each concept occurs, is relevant only in a uniformly random universe. We then introduce a more meaningful measure of generalization, expected generalization performance. Unlike the Conservation Law's measure of generalization perfor- mance (which is, in essence, defined to be zero), expected generalization performance is conserved only when certain symmetric properties hold in our universe. There is no reason to believe, a priori, that such symmetries exist; learning algorithms may well ex- hibit non-zero (expected) generalization per- forlllance.
The Use of Explicit Goals for Knowledge to Guide Inference and Learning
- APPLIED INTELLIGENCE
, 1992
"... Combinatorial explosion of inferences has always been a central problem in artificial intelligence. Although the inferences that can be drawn from a reasoner's knowledge and from available inputs is very large (potentially infinite), the inferential resources available to any reasoning system are ..."
Abstract
-
Cited by 36 (21 self)
- Add to MetaCart
Combinatorial explosion of inferences has always been a central problem in artificial intelligence. Although the inferences that can be drawn from a reasoner's knowledge and from available inputs is very large (potentially infinite), the inferential resources available to any reasoning system are limited. With limited inferential capacity and very many potential inferences, reasoners must somehow control the process of inference. Not all inferences are equally useful to a given reasoning system. Any reasoning system that has goals (or any form of a utility function) and acts based on its beliefs indirectly assigns utility to its beliefs. Given limits on the process of inference, and variation in the utility of inferences, it is clear that a reasoner ought to draw the inferences that will be most valuable to it. This paper presents an approach to this problem that makes the utility of a (potential) belief an explicit part of the inference process. The method is to generate exp...
A Comparison of ID3 and Backpropagation for English Text-to-Speech Mapping
, 1995
"... The performance of the error backpropagation (BP) and ID3 learning algorithms was compared on the task of mapping English text to phonemes and stresses. Under the distributed output code developed by Sejnowski and Rosenberg, it is shown that BP consistently outperforms ID3 on this task by several pe ..."
Abstract
-
Cited by 36 (6 self)
- Add to MetaCart
The performance of the error backpropagation (BP) and ID3 learning algorithms was compared on the task of mapping English text to phonemes and stresses. Under the distributed output code developed by Sejnowski and Rosenberg, it is shown that BP consistently outperforms ID3 on this task by several percentage points. Three hypotheses explaining this difference were explored: (a) ID3 is overfitting the training data, (b) BP is able to share hidden units across several output units and hence can learn the output units better, and (c) BP captures statistical information that ID3 does not. We conclude that only hypothesis (c) is correct. By augmenting ID3 with a simple statistical learning procedure, the performance of BP can be approached but not matched. More complex statistical procedures can improve the performance of both BP and ID3 substantially. A study of the residual errors suggests that there is still substantial room for improvement in learning methods for text-to-speech mapping.
On Learning How to Learn Learning Strategies
, 1995
"... This paper introduces the "incremental self-improvement paradigm". Unlike previous methods, incremental self-improvement encourages a reinforcement learning system to improve the way it learns, and to improve the way it improves the way it learns ..., without significant theoretical limitations --- ..."
Abstract
-
Cited by 34 (14 self)
- Add to MetaCart
This paper introduces the "incremental self-improvement paradigm". Unlike previous methods, incremental self-improvement encourages a reinforcement learning system to improve the way it learns, and to improve the way it improves the way it learns ..., without significant theoretical limitations --- the system is able to "shift its inductive bias" in a universal way. Its major features are: (1) There is no explicit difference between "learning", "meta-learning", and other kinds of information processing. Using a Turing machine equivalent programming language, the system itself occasionally executes self-delimiting, initially highly random "self-modification programs" which modify the context-dependent probabilities of future action sequences (including future self-modification programs). (2) The system keeps only those probability modifications computed by "useful" selfmodification programs: those which bring about more payoff (reward, reinforcement) per time than all previous self-modi...
A General Method for Incremental Self-Improvement and Multi-Agent Learning in Unrestricted Environments
- Evolutionary Computation: Theory and Applications. Scientific Publ. Co., Singapore. In
, 1996
"... I describe a novel paradigm for reinforcement learning (RL) with limited computational resources in realistic, non-resettable environments. The learner's policy is an arbitrary modifiable algorithm mapping environmental inputs and internal states to outputs and new internal states. Like in the re ..."
Abstract
-
Cited by 21 (8 self)
- Add to MetaCart
I describe a novel paradigm for reinforcement learning (RL) with limited computational resources in realistic, non-resettable environments. The learner's policy is an arbitrary modifiable algorithm mapping environmental inputs and internal states to outputs and new internal states. Like in the real world, any event in system life and any learning process computing policy modifications may affect future performance and preconditions of future learning processes. Unlike with most previous RL approaches, the expected reward for a certain behavior may change during successive "trials". At a given time in system life, there is only one single training example to evaluate the current long-term usefulness of any given previous policy modification, namely the average reinforcement per time since that modification occurred. At certain times in system life called checkpoints, such singular observations are used by a stack-based backtracking method which invalidates certain previous po...
Reduction Techniques for Exemplar-Based Learning Algorithms
- MACHINE LEARNING
, 2000
"... Exemplar-based learning algorithms are often faced with the problem of deciding which instances or other exemplars to store for use during generalization. Storing too many exemplars can result in large memory requirements and slow execution speed, and can cause an oversensitivity to noise. This pap ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
Exemplar-based learning algorithms are often faced with the problem of deciding which instances or other exemplars to store for use during generalization. Storing too many exemplars can result in large memory requirements and slow execution speed, and can cause an oversensitivity to noise. This paper has two main purposes. First, it provides a survey of existing algorithms used to reduce the number of exemplars retained in exemplar-based learning models. Second, it proposes six new reduction algorithms called DROP1-5 and DEL that can be used to prune instances from the concept description. These algorithms and 10 algorithms from the survey are compared on 31 datasets. Of those algorithms that provide substantial storage reduction, the DROP algorithms have the highest generalization accuracy in these experiments, especially in the presence of noise.

