Results 1  10
of
15
The Power of Vacillation in Language Learning
, 1992
"... Some extensions are considered of Gold's influential model of language learning by machine from positive data. Studied are criteria of successful learning featuring convergence in the limit to vacillation between several alternative correct grammars. The main theorem of this paper is that there ..."
Abstract

Cited by 46 (13 self)
 Add to MetaCart
(Show Context)
Some extensions are considered of Gold's influential model of language learning by machine from positive data. Studied are criteria of successful learning featuring convergence in the limit to vacillation between several alternative correct grammars. The main theorem of this paper is that there are classes of languages that can be learned if convergence in the limit to up to (n+1) exactly correct grammars is allowed but which cannot be learned if convergence in the limit is to no more than n grammars, where the no more than n grammars can each make finitely many mistakes. This contrasts sharply with results of Barzdin and Podnieks and, later, Case and Smith, for learnability from both positive and negative data. A subset principle from a 1980 paper of Angluin is extended to the vacillatory and other criteria of this paper. This principle, provides a necessary condition for circumventing overgeneralization in learning from positive data. It is applied to prove another theorem to the eff...
Ordinal Mind Change Complexity of Language Identification
"... The approach of ordinal mind change complexity, introduced by Freivalds and Smith, uses (notations for) constructive ordinals to bound the number of mind changes made by a learning machine. This approach provides a measure of the extent to which a learning machine has to keep revising its estimate o ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
The approach of ordinal mind change complexity, introduced by Freivalds and Smith, uses (notations for) constructive ordinals to bound the number of mind changes made by a learning machine. This approach provides a measure of the extent to which a learning machine has to keep revising its estimate of the number of mind changes it will make before converging to a correct hypothesis for languages in the class being learned. Recently, this notion, which also yields a measure for the difficulty of learning a class of languages, has been used to analyze the learnability of rich concept classes. The present paper further investigates the utility of ordinal mind change complexity. It is shown that for identification from both positive and negative data and n ≥ 1, the ordinal mind change complexity of the class of languages formed by unions of up to n + 1 pattern languages is only ω ×O notn(n) (where notn(n) is a notation for n, ω is a notation for the least limit ordinal and ×O represents ordinal multiplication). This result nicely extends an observation of Lange and Zeugmann
Complexity issues for vacillatory function identification
 Information and Computation
, 1995
"... It was previously shown by Barzdin and Podnieks that one does not increase the power of learning programs for functions by allowing learning algorithms to converge to a finite set of correct programs instead of requiring them to converge to a single correct program. In this paper we define some new, ..."
Abstract

Cited by 12 (10 self)
 Add to MetaCart
It was previously shown by Barzdin and Podnieks that one does not increase the power of learning programs for functions by allowing learning algorithms to converge to a finite set of correct programs instead of requiring them to converge to a single correct program. In this paper we define some new, subtle, but natural concepts of mind change complexity for function learning and show that, if one bounds this complexity for learning algorithms, then, by contrast with Barzdin and Podnieks result, there are interesting and sometimes complicated tradeoffs between these complexity bounds, bounds on the number of final correct programs, and learning power. CR Classification Number: I.2.6 (Learning – Induction). 1
On Aggregating Teams of Learning Machines
 Theoretical Computer Science A
, 1994
"... The present paper studies the problem of when a team of learning machines can be aggregated into a single learning machine without any loss in learning power. The main results concern aggregation ratios for vacillatory identification of languages from texts. For a positiveinteger n,amachine is said ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
(Show Context)
The present paper studies the problem of when a team of learning machines can be aggregated into a single learning machine without any loss in learning power. The main results concern aggregation ratios for vacillatory identification of languages from texts. For a positiveinteger n,amachine is said to TxtFex n identify a language L just in case the machine converges to up to n grammars for L on any text for L.For such identification criteria, the aggregation ratio is derived for the n = 2 case. It is shown that the collection of languages that can be TxtFex 2 identified by teams with success ratio greater than 5=6 are the same as those collections of languages that can be TxtFex 2  identified by a single machine. It is also established that 5=6 is indeed the cutoff point by showing that there are collections of languages that can be TxtFex 2 identified bya team employing 6 machines, at least 5 of which are required to be successful, but cannot be TxtFex 2 identified byany single machine. Additionally, aggregation ratios are also derived for finite identification of languages from positive data and for numerous criteria involving language learning from both positive and negative data.
Training Sequences
"... this paper initiates a study in which it is demonstrated that certain concepts (represented by functions) can be learned, but only in the event that certain relevant subconcepts (also represented by functions) have been previously learned. In other words, the Soar project presents empirical evidence ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
this paper initiates a study in which it is demonstrated that certain concepts (represented by functions) can be learned, but only in the event that certain relevant subconcepts (also represented by functions) have been previously learned. In other words, the Soar project presents empirical evidence that learning how to learn is viable for computers and this paper proves that doing so is the only way possible for computers to make certain inferences.
Synthesizing noisetolerant language learners
 Theoretical Computer Science A
, 1997
"... An index for an r.e. class of languages (by definition) generates a sequence of grammars defining the class. An index for an indexed family of languages (by definition) generates a sequence of decision procedures defining the family. F. Stephan’s model of noisy data is employed, in which, roughly, c ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
An index for an r.e. class of languages (by definition) generates a sequence of grammars defining the class. An index for an indexed family of languages (by definition) generates a sequence of decision procedures defining the family. F. Stephan’s model of noisy data is employed, in which, roughly, correct data crops up infinitely often, and incorrect data only finitely often. Studied, then, is the synthesis from indices for r.e. classes and for indexed families of languages of various kinds of noisetolerant languagelearners for the corresponding classes or families indexed. Many positive results, as well as some negative results, are presented regarding the existence of such synthesizers. The proofs of most of the positive results yield, as pleasant corollaries, strict subsetprinciple or telltale style characterizations for the noisetolerant learnability of the corresponding classes or families indexed. 1
Vacillatory and BC Learning on Noisy Data
, 2007
"... The present work employs a model of noise introduced earlier by the third author. In this model noisy data nonetheless uniquely determines the true data: correct information occurs infinitely often while incorrect information occurs only finitely often. The present paper considers the effects of thi ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
The present work employs a model of noise introduced earlier by the third author. In this model noisy data nonetheless uniquely determines the true data: correct information occurs infinitely often while incorrect information occurs only finitely often. The present paper considers the effects of this form of noise on vacillatory and behaviorally correct learning of grammars — both from positive data alone and from informant (positive and negative data). For learning from informant, the noise, in effect, destroys negative data. Various noisydata hierarchies are exhibited, which, in some cases, are known to collapse when there is no noise. Noisy behaviorally correct learning is shown to obey a very strong “subset principle”. It is shown, in many cases, how much power is needed to overcome the effects of noise. For example, the best we can do to simulate, in the presence of noise, the noisefree, no mind change cases takes infinitely many mind changes. One technical result is proved by a priority argument.
Synthesizing Learners Tolerating Computable Noisy Data
 In Proc. 9th International Workshop on Algorithmic Learning Theory, Lecture
, 1998
"... An index for an r.e. class of languages (by definition) generates a sequence of grammars defining the class. An index for an indexed family of languages (by definition) generates a sequence of decision procedures defining the family. F. Stephan's model of noisy data is employed, in which, rough ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
An index for an r.e. class of languages (by definition) generates a sequence of grammars defining the class. An index for an indexed family of languages (by definition) generates a sequence of decision procedures defining the family. F. Stephan's model of noisy data is employed, in which, roughly, correct data crops up infinitely often, and incorrect data only finitely often. In a completely computable universe, all data sequences, even noisy ones, are computable. New to the present paper is the restriction that noisy data sequences be, nonetheless, computable! Studied, then, is the synthesis from indices for r.e. classes and for indexed families of languages of various kinds of noisetolerant languagelearners for the corresponding classes or families indexed, where the noisy input data sequences are restricted to being computable. Many positive results, as well as some negative results, are presented regarding the existence of such synthesizers. The main positive result is surpris...
Vacillatory learning of nearly minimal size grammars
 Journal of Computer and System Sciences
, 1994
"... In Gold’s influential language learning paradigm a learning machine converges in the limit to one correct grammar. In an attempt to generalize Gold’s paradigm, Case considered the question whether people might converge to vacillating between up to (some integer) n> 1 distinct, but equivalent, cor ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
(Show Context)
In Gold’s influential language learning paradigm a learning machine converges in the limit to one correct grammar. In an attempt to generalize Gold’s paradigm, Case considered the question whether people might converge to vacillating between up to (some integer) n> 1 distinct, but equivalent, correct grammars. He showed that larger classes of languages can be algorithmically learned (in the limit) by converging to up to n + 1 rather than up to n correct grammars. He also argued that, for “small ” n> 1, it is plausible that people might sometimes converge to vacillating between up to n grammars. The insistence on small n was motivated by the consideration that, for “large ” n, at least one of n grammars would be too large to fit in peoples ’ heads. Of course, even for Gold’s n = 1 case, the single grammar converged to in the limit may be infeasibly large. An interesting complexity restriction to make, then, on the final grammar(s) converged to in the limit is that they all have small size. In this paper we study some of the tradeoffs in learning power involved in making a welldefined version of this restriction. We show and exploit as a tool the desirable property that the learning power under our