Results 1  10
of
12
The Power of Vacillation in Language Learning
, 1992
"... Some extensions are considered of Gold's influential model of language learning by machine from positive data. Studied are criteria of successful learning featuring convergence in the limit to vacillation between several alternative correct grammars. The main theorem of this paper is that there ..."
Abstract

Cited by 46 (13 self)
 Add to MetaCart
Some extensions are considered of Gold's influential model of language learning by machine from positive data. Studied are criteria of successful learning featuring convergence in the limit to vacillation between several alternative correct grammars. The main theorem of this paper is that there are classes of languages that can be learned if convergence in the limit to up to (n+1) exactly correct grammars is allowed but which cannot be learned if convergence in the limit is to no more than n grammars, where the no more than n grammars can each make finitely many mistakes. This contrasts sharply with results of Barzdin and Podnieks and, later, Case and Smith, for learnability from both positive and negative data. A subset principle from a 1980 paper of Angluin is extended to the vacillatory and other criteria of this paper. This principle, provides a necessary condition for circumventing overgeneralization in learning from positive data. It is applied to prove another theorem to the eff...
Infinitary Self Reference in Learning Theory
, 1994
"... Kleene's Second Recursion Theorem provides a means for transforming any program p into a program e(p) which first creates a quiescent self copy and then runs p on that self copy together with any externally given input. e(p), in effect, has complete (low level) self knowledge, and p represents ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
Kleene's Second Recursion Theorem provides a means for transforming any program p into a program e(p) which first creates a quiescent self copy and then runs p on that self copy together with any externally given input. e(p), in effect, has complete (low level) self knowledge, and p represents how e(p) uses its self knowledge (and its knowledge of the external world). Infinite regress is not required since e(p) creates its self copy outside of itself. One mechanism to achieve this creation is a self replication trick isomorphic to that employed by singlecelled organisms. Another is for e(p) to look in a mirror to see which program it is. In 1974 the author published an infinitary generalization of Kleene's theorem which he called the Operator Recursion Theorem. It provides a means for obtaining an (algorithmically) growing collection of programs which, in effect, share a common (also growing) mirror from which they can obtain complete low level models of themselves and the other prog...
Vacillatory and BC Learning on Noisy Data
, 2007
"... The present work employs a model of noise introduced earlier by the third author. In this model noisy data nonetheless uniquely determines the true data: correct information occurs infinitely often while incorrect information occurs only finitely often. The present paper considers the effects of thi ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
The present work employs a model of noise introduced earlier by the third author. In this model noisy data nonetheless uniquely determines the true data: correct information occurs infinitely often while incorrect information occurs only finitely often. The present paper considers the effects of this form of noise on vacillatory and behaviorally correct learning of grammars — both from positive data alone and from informant (positive and negative data). For learning from informant, the noise, in effect, destroys negative data. Various noisydata hierarchies are exhibited, which, in some cases, are known to collapse when there is no noise. Noisy behaviorally correct learning is shown to obey a very strong “subset principle”. It is shown, in many cases, how much power is needed to overcome the effects of noise. For example, the best we can do to simulate, in the presence of noise, the noisefree, no mind change cases takes infinitely many mind changes. One technical result is proved by a priority argument.
Ushaped learning may be necessary
"... Ushaped learning behaviour in cognitive development involves learning, unlearning and relearning. It occurs, for example, in learning irregular verbs. The prior cognitive science literature is occupied with how humans do it, for example, general rules versus tables of exceptions. This paper is mos ..."
Abstract

Cited by 7 (6 self)
 Add to MetaCart
Ushaped learning behaviour in cognitive development involves learning, unlearning and relearning. It occurs, for example, in learning irregular verbs. The prior cognitive science literature is occupied with how humans do it, for example, general rules versus tables of exceptions. This paper is mostly concerned with whether Ushaped learning behaviour may be necessary in the abstract mathematical setting of inductive inference, that is, in the computational learning theory following the framework of Gold. All notions considered are learning from text, that is, from positive data. Previous work showed that Ushaped learning behaviour is necessary for behaviourally correct learning but not for syntactically convergent, learning in the limit ( = explanatory learning). The present paper establishes the necessity for the whole hierarchy of classes of vacillatory learning where a behaviourally correct learner has to satisfy the additional constraint that it vacillates in the limit between at most k grammars, where k ≥ 1. Non Ushaped vacillatory learning is shown to be restrictive: Every non Ushaped vacillatorily learnable class is already learnable in the limit. Furthermore, if vacillatory learning with the parameter k = 2
Vacillatory learning of nearly minimal size grammars
 Journal of Computer and System Sciences
, 1994
"... In Gold’s influential language learning paradigm a learning machine converges in the limit to one correct grammar. In an attempt to generalize Gold’s paradigm, Case considered the question whether people might converge to vacillating between up to (some integer) n> 1 distinct, but equivalent, cor ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
(Show Context)
In Gold’s influential language learning paradigm a learning machine converges in the limit to one correct grammar. In an attempt to generalize Gold’s paradigm, Case considered the question whether people might converge to vacillating between up to (some integer) n> 1 distinct, but equivalent, correct grammars. He showed that larger classes of languages can be algorithmically learned (in the limit) by converging to up to n + 1 rather than up to n correct grammars. He also argued that, for “small ” n> 1, it is plausible that people might sometimes converge to vacillating between up to n grammars. The insistence on small n was motivated by the consideration that, for “large ” n, at least one of n grammars would be too large to fit in peoples ’ heads. Of course, even for Gold’s n = 1 case, the single grammar converged to in the limit may be infeasibly large. An interesting complexity restriction to make, then, on the final grammar(s) converged to in the limit is that they all have small size. In this paper we study some of the tradeoffs in learning power involved in making a welldefined version of this restriction. We show and exploit as a tool the desirable property that the learning power under our
Strong separation of learning classes
 Journal of Experimental and Theoretical Artificial Intelligence
, 1992
"... Suppose LC1 and LC2 are two machine learning classes each based on a criterion of success. Suppose, for every machine which learns a class of functions according to the LC1 criterion of success, there is a machine which learns this class according to the LC2 criterion. In the case where the converse ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Suppose LC1 and LC2 are two machine learning classes each based on a criterion of success. Suppose, for every machine which learns a class of functions according to the LC1 criterion of success, there is a machine which learns this class according to the LC2 criterion. In the case where the converse does not hold LC1 is said to be separated from LC2. It is shown that for many such separated learning classes from the literature a much stronger separation holds: (∀C ∈ LC1)(∃C ′ ∈ (LC2 −LC1))[C ′ ⊃ C]. It is also shown that there is a pair of separated learning classes from the literature for which the stronger separation just above does not hold. A philosophical heuristic toward the design of artificially intelligent learning programs is presented with each strong separation result. 1
Counting Extensional Differences in BCLearning
 PROCEEDINGS OF THE 5TH INTERNATIONAL COLLOQUIUM ON GRAMMATICAL INFERENCE (ICGI 2000), SPRINGER LECTURE NOTES IN A. I. 1891
, 2000
"... Let BC be the model of behaviourally correct function learning as introduced by Barzdins [4] and Case and Smith [8]. We introduce a mind change hierarchy for BC, counting the number of extensional differences in the hypotheses of a learner. We compare the resulting models BCn to models from the ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Let BC be the model of behaviourally correct function learning as introduced by Barzdins [4] and Case and Smith [8]. We introduce a mind change hierarchy for BC, counting the number of extensional differences in the hypotheses of a learner. We compare the resulting models BCn to models from the literature and discuss confidence, team learning, and finitely defective hypotheses. Among other things, we prove that there is a tradeoff between the number of semantic mind changes and the number of anomalies in the hypotheses. We also discuss consequences for language learning. In particular we show that, in contrast to the case of function learning, the family of classes that are confidently BClearnable from text is not closed under finite unions. Keywords. Models of grammar induction, inductive inference, behaviourally correct learning.
Counting Extensional Differences in BCLearning
"... Let BC be the model of behaviourally correct function learning as introduced by Barzdins [4] and Case and Smith [8]. We introduce a mind change hierarchy for BC, counting the number of extensional differences in the hypotheses of a learner. We compare the resulting models BC n to models from the lit ..."
Abstract
 Add to MetaCart
Let BC be the model of behaviourally correct function learning as introduced by Barzdins [4] and Case and Smith [8]. We introduce a mind change hierarchy for BC, counting the number of extensional differences in the hypotheses of a learner. We compare the resulting models BC n to models from the literature and discuss confidence, team learning, and finitely defective hypotheses. Among other things, we prove that there is a tradeoff between the number of semantic mind changes and the number of anomalies in the hypotheses. We also discuss consequences for language learning. In particular we show that, in contrast to the case of function learning, the family of classes that are confidently BClearnable from text is not closed under finite unions.
Counting Extensional Differences in BCLearning \Lambda
"... University of Heidelberg Sebastiaan A. Terwijn x Vrije Universiteit Amsterdam ..."
Abstract
 Add to MetaCart
(Show Context)
University of Heidelberg Sebastiaan A. Terwijn x Vrije Universiteit Amsterdam
Anomalous Learning Helps Succinctness 1
"... It is shown that allowing a bounded number of anomalies (mistakes) in the final programs learned by an algorithmic procedure can considerably “succinctify ” those final programs. Naturally, only those contexts are investigated in which the presence of anomalies is not actually required for successfu ..."
Abstract
 Add to MetaCart
(Show Context)
It is shown that allowing a bounded number of anomalies (mistakes) in the final programs learned by an algorithmic procedure can considerably “succinctify ” those final programs. Naturally, only those contexts are investigated in which the presence of anomalies is not actually required for successful inference (learning). The contexts considered are certain infinite subclasses of the class of characteristic functions of finite sets. For each finite set D, these subclasses have a finite set containing D. This latter prevents the anomalies from wiping out all the information in the sets featured in these subclasses and shows the context to be fairly robust. Some of the results in the present paper are shown to be provably more constructive than others. The results of this paper can also be interpreted as facts about succinctness of coding finite sets, which facts have interesting consequences for learnability of decision procedures for finite sets. 1