## The Power of Vacillation in Language Learning (1992)

### Cached

### Download Links

- [ftp.fwi.uva.nl]
- [epubs.siam.org:80]
- [www.cis.udel.edu]
- DBLP

### Other Repositories/Bibliography

Citations: | 44 - 11 self |

### BibTeX

@MISC{Case92thepower,

author = {John Case},

title = {The Power of Vacillation in Language Learning},

year = {1992}

}

### Years of Citing Articles

### OpenURL

### Abstract

Some extensions are considered of Gold's influential model of language learning by machine from positive data. Studied are criteria of successful learning featuring convergence in the limit to vacillation between several alternative correct grammars. The main theorem of this paper is that there are classes of languages that can be learned if convergence in the limit to up to (n+1) exactly correct grammars is allowed but which cannot be learned if convergence in the limit is to no more than n grammars, where the no more than n grammars can each make finitely many mistakes. This contrasts sharply with results of Barzdin and Podnieks and, later, Case and Smith, for learnability from both positive and negative data. A subset principle from a 1980 paper of Angluin is extended to the vacillatory and other criteria of this paper. This principle, provides a necessary condition for circumventing overgeneralization in learning from positive data. It is applied to prove another theorem to the eff...

### Citations

4100 |
Introduction to Automata Theory, Languages and Computation
- Hopcroft, Ullman
- 1979
(Show Context)
Citation Context ...orresponding conjectures of M converge, in the limit, to some fixed final conjecture, and that final conjecture is correct. 2 Gold showed that no M so identifies the entire class of regular languages =-=[48]-=-, but some M does identify the class of finite languages. Angluin [1, 2] presents other classes L natural from the perspective of formal language theory such that some M identifies each language in L.... |

941 |
Language identification in the limit
- Gold
- 1967
(Show Context)
Citation Context ...learning theory, inductive reference, language learning, recursion theory, topology AMS subject classifications. 68Q, 68T05, 68S05, 03D, 92J40, 54E51, D0944 PII. S0097539793249694 1. Introduction. In =-=[46]-=- Gold introduced his seminal model of language learning: Imagine, as pictured in (1.1) just below, a machine M being fed data about membership in a (formal) language L and, as a result, outputting ove... |

544 |
Set Theory
- Jech
- 1978
(Show Context)
Citation Context ...ell known [82] that equivalent grammars (e.g., p t , p t+1 , p t+2 , . . . as above) can be so di#erent from one another that in some cases it is not possible to prove in Zermelo-- Frankel set theory =-=[49]-=- that they are equivalent. This suggests that a suitably clever M might be able to TxtFex n+1 -identify a larger class of languages than any machine, however clever, could TxtFex n -identify. Unfortun... |

524 |
The Emperor New Mind
- Penrose
- 1989
(Show Context)
Citation Context ...fication. We originally suggested in [16] on the basis of our main corollary (Corollary 3.7 to Theorem 3.3) that Gold's model be extended to embrace the success criteria TxtFex a b 29 Sources such as =-=[73, 74]-=-, sadly, seem to have overlooked the important result in [32] that the expected input/output behavior of a Turing machine with random oracle subject to a computable probability distribution is computa... |

500 |
Recursively enumerable sets and degrees
- Soare
- 1978
(Show Context)
Citation Context ...n are modifications and/or simplifications of the proof of Theorem 5.5. 19 In fact, more importantly, since this set is # 0 2 -complete [82], it is not even algorithmically recognizable by a limiting =-=[88]-=- or mind-changing procedure (but its complement is). 20 One can think of them as global and the set-driven notions as local. 21 In the present paper we do not consider the restriction to so-called pru... |

426 | Simulating physics with computers - Feynman - 1982 |

367 | Shadows of the Mind
- Penrose
- 1994
(Show Context)
Citation Context ...fication. We originally suggested in [16] on the basis of our main corollary (Corollary 3.7 to Theorem 3.3) that Gold's model be extended to embrace the success criteria TxtFex a b 29 Sources such as =-=[73, 74]-=-, sadly, seem to have overlooked the important result in [32] that the expected input/output behavior of a Turing machine with random oracle subject to a computable probability distribution is computa... |

337 |
Introduction to Mathematical Logic, D
- Mendelson
- 1979
(Show Context)
Citation Context ... hand, we prove it by self/other reference, and it is common to regard self-referential examples as unnatural. For example, Godel proved his famous incompleteness theorem by a self-reference argument =-=[45, 62]-=-, and his self-referential sentence providing an unprovable truth of, for example, First Order Peano Arithmetic (FOPA) is not natural---no number or combinatorial theorist would care whether it was tr... |

304 |
Inductive inference of formal languages from positive data
- Angluin
- 1980
(Show Context)
Citation Context ...nal conjecture, and that final conjecture is correct. 2 Gold showed that no M so identifies the entire class of regular languages [48], but some M does identify the class of finite languages. Angluin =-=[1, 2]-=- presents other classes L natural from the perspective of formal language theory such that some M identifies each language in L. Many cognitive scientists seek to model all of cognition by computer pr... |

267 |
Toward a mathematical theory of inductive inference
- Blum, Blum
- 1975
(Show Context)
Citation Context ...s to learn larger classes of languages than any machine required to succeed on arbitrary texts, but Corollary 3.1 below implies that this is not the case (generalizing the b = 1 case essentially from =-=[96, 3]-=-). We say, then, that the restriction to recursive texts is circumvented. Angluin, in her seminal paper [1], presents a severe constraint on TxtFex 1 - identification of classes of languages: the subs... |

224 |
Systems that learn, An introduction to learning theory
- Osherson, Stob, et al.
- 1986
(Show Context)
Citation Context ...y machine has been very influential in contemporary theories of natural language and in mathematical work explicitly motivated by its possible connection to human language learning (see, for example, =-=[76, 93, 94, 66, 68, 8, 44, 15, 69, 70, 38, 39, 53, 5]-=-). In the present paper we consider some new criteria of success extending Gold's basic model above. Suppose that we fix an integer n > 0. Consider the following criterion of success (again based on (... |

218 |
Computational complexity of probabilistic Turing machines
- Gill
- 1977
(Show Context)
Citation Context ...15, 29]. In a discrete, random universe with only computable probability distributions for its behavior (e.g., a discrete, quantum mechanical universe), the expected behavior will still be computable =-=[32, 42, 43]-=-. 29 In such a universe any beings (e.g., humans) who have cognition, including language learning and scientific induction, will be subject to the constraint that at least their expected behavior will... |

210 |
Computation and Cognition: Toward a Foundation for Cognitive Science
- Pylyshyn
- 1984
(Show Context)
Citation Context ...nts other classes L natural from the perspective of formal language theory such that some M identifies each language in L. Many cognitive scientists seek to model all of cognition by computer program =-=[77, 50]-=-, and Gold's model of language learning from text (positive information) by machine has been very influential in contemporary theories of natural language and in mathematical work explicitly motivated... |

197 |
Cellular Automata Machines
- Toffoli, Margolus
- 1987
(Show Context)
Citation Context ...e models of learning on computable data sequences. The whole universe or humanly significant portions of it may be computable and/or discrete. Such possibilities are taken seriously---for example, in =-=[99, 92, 91, 34, 17, 15, 29]-=-. In a discrete, random universe with only computable probability distributions for its behavior (e.g., a discrete, quantum mechanical universe), the expected behavior will still be computable [32, 42... |

193 |
Formal principles of language acquisition
- Wexler, Culicover
- 1980
(Show Context)
Citation Context ...y machine has been very influential in contemporary theories of natural language and in mathematical work explicitly motivated by its possible connection to human language learning (see, for example, =-=[76, 93, 94, 66, 68, 8, 44, 15, 69, 70, 38, 39, 53, 5]-=-). In the present paper we consider some new criteria of success extending Gold's basic model above. Suppose that we fix an integer n > 0. Consider the following criterion of success (again based on (... |

175 |
The Acquisition of Syntactic Knowledge
- Berwick
- 1985
(Show Context)
Citation Context ...y machine has been very influential in contemporary theories of natural language and in mathematical work explicitly motivated by its possible connection to human language learning (see, for example, =-=[76, 93, 94, 66, 68, 8, 44, 15, 69, 70, 38, 39, 53, 5]-=-). In the present paper we consider some new criteria of success extending Gold's basic model above. Suppose that we fix an integer n > 0. Consider the following criterion of success (again based on (... |

171 | Inference of reversible languages
- Angluin
- 1982
(Show Context)
Citation Context ...ontains a finite variant of each r.e. language. Wiehagen's class is obviously quite hefty. Angluin presents examples natural from the perspective of formal language theory that also are in TxtFex 0 1 =-=[1, 2]-=-. All these classes in TxtFex 0 1 (of course) satisfy the subset principle (of Theorem 4.4), and, in particular, they are not closed under finite sublanguages as is the class of regular languages. Sup... |

163 |
Comparison of identification criteria for machine inductive inference, Theoretical Computer Science 25
- Case, Smith
- 1983
(Show Context)
Citation Context ... for languages where a single final grammar is allowed to have a bounded number of mistakes (anomalies). The mistakes are about which objects are (and which are not) in the corresponding language. In =-=[30, 31, 15]-=- there are discussion, motivation, and interpretation of results about inferring anomalous programs for functions. The results in [30, 31, 15] and in this paper show that allowing anomalies increases ... |

149 | Naive Set Theory, D - Halmos - 1960 |

103 |
An introduction to the general theory of algorithms
- Machtey, Young
- 1978
(Show Context)
Citation Context .... 2. Preliminaries. We now proceed more formally. N denotes the set of natural numbers, { 0, 1, 2, . . . }. # denotes a fixed acceptable programming system for the partial computable functions: N # N =-=[81, 65, 79, 80, 83]-=-. # p denotes the partial computable function computed by the program (with code number) p in the #-system. 3 Thanks to the device of Godel or code numbering [82] we can treat languages over any finit... |

92 |
Machine inductive inference and language identification
- Case, Lynes
- 1982
(Show Context)
Citation Context ...less, as seems unlikely, human storage mechanisms admit infinite regress. Osherson and Weinstein [71] introduced the case where the number of final grammars is finite but unbounded, and independently =-=[28, 71]-=- (see also [72]) introduced the case where the number of final grammars is infinite (TxtBc-identification). We briefly introduced the case, discussed above, of up to n final grammars in [15]. The proo... |

89 |
Derivational complexity and the order of acquisition in child speech
- Brown, Hanlon
- 1970
(Show Context)
Citation Context ...hild language learning, the case of data presentation by arbitrary text, where M receives all and only positive information about L. Justification for this point of view can be found, for example, in =-=[9, 13]-=-, where it is noted from field work that children don't need corrections to learn language. Regarding successful language learning, referring to (1.1) above: for Gold, machinesM identifies language L ... |

89 |
Formal models of language learning
- Pinker
- 1979
(Show Context)
Citation Context |

87 |
Theory of recursive functions and eective computability, McGraw-Hill
- Rogers
- 1967
(Show Context)
Citation Context ...ns at most n distinct grammars, and 2. each of the grammars p t , p t+1 , p t+2 , . . . is correct. Of course, Gold's identification criterion above is just TxtFex 1 -identification. It is well known =-=[82]-=- that equivalent grammars (e.g., p t , p t+1 , p t+2 , . . . as above) can be so di#erent from one another that in some cases it is not possible to prove in Zermelo-- Frankel set theory [49] that they... |

85 |
The Computer and the Mind: An Introduction to Cognitive Science
- Johnson-Laird
- 1988
(Show Context)
Citation Context ...nts other classes L natural from the perspective of formal language theory such that some M identifies each language in L. Many cognitive scientists seek to model all of cognition by computer program =-=[77, 50]-=-, and Gold's model of language learning from text (positive information) by machine has been very influential in contemporary theories of natural language and in mathematical work explicitly motivated... |

82 |
Initial knowledge: six suggestions
- Spelke
- 1994
(Show Context)
Citation Context ...neral purpose learning is not possible, and 2. alleged human general purpose learning is an illusion brought about by our having innate information stored for a large and varied collection of domains =-=[41, 89]. In -=-the practical context of robot planning, McDermott [60] says, "Learning makes the most sense when it is thought of as filling in the details in an algorithm that is 1966 JOHN CASE already nearly ... |

78 |
A mathematical incompleteness in Peano arithmetic
- Paris, Harrington
- 1977
(Show Context)
Citation Context ...truth may be broad enough to admit natural examples. Empirical: Although Godel proved his famous first incompleteness theorem by a self-reference argument, many years afterwards, Paris and Harrington =-=[75]-=- and later Friedman [84, 85] found quite natural examples of combinatorial truths of first order arithmetic not provable in FOPA. 11 In fairness, regarding the above informal thesis, we note, for exam... |

76 |
A machine independent theory of the complexity of recursive functions
- Blum
(Show Context)
Citation Context ...uite natural examples of combinatorial truths of first order arithmetic not provable in FOPA. 11 In fairness, regarding the above informal thesis, we note, for example, that the Blum speed-up theorem =-=[10]-=- was originally proved by a self-reference argument, 12 but natural witnesses to even exponential speed-up have not (yet) been found. However, even the self-reference proofs of this result are fairly ... |

72 |
Theory of formal systems
- Smullyan
- 1961
(Show Context)
Citation Context ... the self-referential character of L n+1 , we employ the (n + 1)-ary recursion theorem, a folk theorem generalizing the Kleene recursion theorem [82, p. 214] and the Smullyan double recursion theorem =-=[87]-=-; it is also a consequence of our operator recursion theorem [14], an infinitary analogue of the finitary recursion theorems. Intuitively, the (n + 1)-ary recursion theorem provides a means for transf... |

59 |
Prudence and other conditions on formal language learning
- Fulk
- 1990
(Show Context)
Citation Context ...its complement is). 20 One can think of them as global and the set-driven notions as local. 21 In the present paper we do not consider the restriction to so-called prudence [70], a primary concern of =-=[39]-=-. Prudent learning functions are those which never conjecture a grammar p without being able to learn Wp . On that subject the interested reader may also wish to consult [51, 55]. THE POWER OF VACILLA... |

53 | Robot planning
- McDermott
- 1991
(Show Context)
Citation Context ...eral purpose learning is an illusion brought about by our having innate information stored for a large and varied collection of domains [41, 89]. In the practical context of robot planning, McDermott =-=[60] says, &qu-=-ot;Learning makes the most sense when it is thought of as filling in the details in an algorithm that is 1966 JOHN CASE already nearly right." In the context of function learning, [27] provides s... |

51 |
Periodicity in generations of automata
- Case
- 1974
(Show Context)
Citation Context ...ary recursion theorem, a folk theorem generalizing the Kleene recursion theorem [82, p. 214] and the Smullyan double recursion theorem [87]; it is also a consequence of our operator recursion theorem =-=[14]-=-, an infinitary analogue of the finitary recursion theorems. Intuitively, the (n + 1)-ary recursion theorem provides a means for transforming any sequence of n+ 1 programs p 0 , . . . , p n into a cor... |

50 |
Gòˆdel numberings of partial recursive functions
- Rogers
- 1958
(Show Context)
Citation Context .... 2. Preliminaries. We now proceed more formally. N denotes the set of natural numbers, { 0, 1, 2, . . . }. # denotes a fixed acceptable programming system for the partial computable functions: N # N =-=[81, 65, 79, 80, 83]-=-. # p denotes the partial computable function computed by the program (with code number) p in the #-system. 3 Thanks to the device of Godel or code numbering [82] we can treat languages over any finit... |

46 | Computability by probabilistic machines - Leeuw, Moore, et al. - 1956 |

42 |
Lessons from animal learning for the study of cognitive development
- Gallistel, Brown, et al.
- 1991
(Show Context)
Citation Context ...neral purpose learning is not possible, and 2. alleged human general purpose learning is an illusion brought about by our having innate information stored for a large and varied collection of domains =-=[41, 89]. In -=-the practical context of robot planning, McDermott [60] says, "Learning makes the most sense when it is thought of as filling in the details in an algorithm that is 1966 JOHN CASE already nearly ... |

38 |
Nonprovability of certain combinatorial properties of finite trees
- Simpson
- 1985
(Show Context)
Citation Context ...h to admit natural examples. Empirical: Although Godel proved his famous first incompleteness theorem by a self-reference argument, many years afterwards, Paris and Harrington [75] and later Friedman =-=[84, 85]-=- found quite natural examples of combinatorial truths of first order arithmetic not provable in FOPA. 11 In fairness, regarding the above informal thesis, we note, for example, that the Blum speed-up ... |

35 |
Criteria of language learning
- Osherson, Weinstein
- 1982
(Show Context)
Citation Context ...stake and one final grammar which cannot be learned with no mistakes and up to two final grammars. Corollary 3.8 (Case and Lynes [28]). TxtFex 0 1 # TxtFex 1 1 #s# TxtFex # 1 . Osherson and Weinstein =-=[71]-=- independently showed the case of TxtFex 0 1 # TxtFex # 1 from the previous corollary. Corollary 3.9 (Osherson and Weinstein [71]). TxtFex 0 # # TxtFex # # . Next we spell out the connections between ... |

32 |
On the complexity of inductive inference
- Daley, Smith
- 1986
(Show Context)
Citation Context ...ication, the restricted variant of TxtFex a b -identification which requires that final programs/grammars be nearly minimal size. For language learning, bounding complexity of learning machines as in =-=[33]-=- or [24] remains to be explored. Translating relative solvability results into relative feasibility results, as in [97], would be very interesting to pursue in the context of the present paper. In sec... |

31 |
A note on formal learning theory
- OSHERSON, WEINSTEIN
- 1982
(Show Context)
Citation Context ...kely, human storage mechanisms admit infinite regress. Osherson and Weinstein [71] introduced the case where the number of final grammars is finite but unbounded, and independently [28, 71] (see also =-=[72]-=-) introduced the case where the number of final grammars is infinite (TxtBc-identification). We briefly introduced the case, discussed above, of up to n final grammars in [15]. The proof of Theorem 3.... |

30 |
The power of vacillation
- Case
- 1988
(Show Context)
Citation Context ...3 makes use of Theorem 5.6. We believe it is not possible to replace weak b-ary order independence with bary order independence in Theorems 5.5 and 5.6, contrary to our slightly overzealous claims in =-=[16]-=-. However, we have the following result (Theorem 5.7) with Fulk (who is not responsible for the possibly incorrect claims in [16]). Theorem 5.7 implies that learning power (with respect to TxtFex a b ... |

24 |
On the size of machines
- Blum
- 1967
(Show Context)
Citation Context ...e. Mathematically, TxtMfex a b -identification is well behaved, e.g., it turns out not to depend on the choice of acceptable system; it also does not depend on the choice of Blum program size measure =-=[11]-=- (by his recursive-relatedness result in [11]). The lack of dependence on the choice of acceptable system is in contrast with the variant of TxtMfex 0 1 -identification in which we require h to be the... |

24 | On two types of models of the internalization of grammars - Braine - 1971 |

23 |
A study of inductive inference machines
- Fulk
- 1985
(Show Context)
Citation Context |

23 |
Robust separations in inductive inference
- Fulk
- 1990
(Show Context)
Citation Context ...especially hard to find. For some theoretical work instigated by Barzdin and dealing, in part, with eliminating dependence on self-referential examples, see Fulk's work on robust function learning in =-=[40]-=-. Corollary 3.4. (#a)[TxtFex a 1 # TxtFex a 2 #s# TxtFex a # ]. Corollary 3.5 (Osherson and Weinstein [71]). TxtFex 0 1 # TxtFex 0 # . We announced in [15] that we could prove TxtFex 0 1 # TxtFex 0 2 ... |

21 |
Learning theory and natural language
- Osherson, Stob, et al.
- 1984
(Show Context)
Citation Context |

21 |
A Recursive Introduction to the Theory of Computation
- Smith
- 1994
(Show Context)
Citation Context ...rollary 3.4 is very sensitive to minor perturbations. We should men11 See [78] for an example from complexity theory. 12 See also Young's version in [98] and our operator recursion theorem variant in =-=[86]-=-. 13 Just output every nth grammar. 14 For computable functions f , one can think of input x as coding a scientific experiment and the output f(x) as coding the corresponding experimental result. In t... |

20 |
Tradeoffs in inductive inference of nearly minimal sized programs
- Chen
- 1982
(Show Context)
Citation Context ...ve text [82]. 7 See further discussion in section 3 below. THE POWER OF VACILLATION IN LANGUAGE LEARNING 1947 size programs began with [35] in the context of learning programs for functions (see also =-=[52, 19, 20, 37]-=-). Definition 2.10. TxtMfex a b = {L | (#F)[F TxtMfex a b -identifies L]}. Similarly, we may define RecTxtMfex a b -identification and RecTxtMfex a b as TxtMfex a b -identification and TxtMfex a b , r... |

20 | Characterizations of monotonic and dual monotonic language learning
- Zeugmann, Lange, et al.
- 1995
(Show Context)
Citation Context ...ly decidable classes of recursive languages in TxtFex 0 1 . Reference [6] also provides a related characterization of the uniformly decidable classes of recursive languages in TxtFex # 1 . References =-=[63, 57]-=- contain characterizations of uniformly decidable classes of recursive languages in important special cases of TxtFex 0 1 , and [26] contains characterizations of language learning with noisy texts. 1... |

20 |
A Connotational Theory of Program Structure
- Royer
(Show Context)
Citation Context ...y order independent, and 2. (# r.e. L)[F RecTxtFex a b -identifies L # F # TxtFex a b -identifies L]. Proof. Suppose that pad is a one-to-one computable function such that (#n, p) [W pad(p,n) = W p ] =-=[65, 83]-=-. Intuitively, pad(p, 0), pad(p, 1), pad(p, 2), . . . are just padded variants of program p which have the same recognizing behavior as p but which di#er from one another syntactically. Suppose F and ... |

19 |
Parameters, Binding Theory, and learnability
- Manzini, Wexler
- 1987
(Show Context)
Citation Context ...alization in learning languages from positive data. See [54, 95] for discussion regarding the possible connection between this subset principle and a more traditionally linguistically oriented one in =-=[64]-=-. We let 2# def = #. Theorem 4.4. Suppose I # { Fex a b , Bc a } and F TxtI-identifies L. Then (#D finite # L)(#L # # L | D # L # # L # #= 2a L)[F does not TxtI-identify L # ]. (4.3) It would be inter... |