Results 1  10
of
13
The Power of Vacillation in Language Learning
, 1992
"... Some extensions are considered of Gold's influential model of language learning by machine from positive data. Studied are criteria of successful learning featuring convergence in the limit to vacillation between several alternative correct grammars. The main theorem of this paper is that there are ..."
Abstract

Cited by 44 (11 self)
 Add to MetaCart
Some extensions are considered of Gold's influential model of language learning by machine from positive data. Studied are criteria of successful learning featuring convergence in the limit to vacillation between several alternative correct grammars. The main theorem of this paper is that there are classes of languages that can be learned if convergence in the limit to up to (n+1) exactly correct grammars is allowed but which cannot be learned if convergence in the limit is to no more than n grammars, where the no more than n grammars can each make finitely many mistakes. This contrasts sharply with results of Barzdin and Podnieks and, later, Case and Smith, for learnability from both positive and negative data. A subset principle from a 1980 paper of Angluin is extended to the vacillatory and other criteria of this paper. This principle, provides a necessary condition for circumventing overgeneralization in learning from positive data. It is applied to prove another theorem to the eff...
Computational Limits on Team Identification of Languages
, 1993
"... A team of learning machines is essentially a multiset of learning machines. ..."
Abstract

Cited by 17 (7 self)
 Add to MetaCart
A team of learning machines is essentially a multiset of learning machines.
Synthesizing Enumeration Techniques For Language Learning
 In Proceedings of the Ninth Annual Conference on Computational Learning Theory
, 1996
"... this paper we assume, without loss of generality, that for all oe ` ø , [M(oe) 6=?] ) [M(ø) 6=?]. ..."
Abstract

Cited by 16 (7 self)
 Add to MetaCart
this paper we assume, without loss of generality, that for all oe ` ø , [M(oe) 6=?] ) [M(ø) 6=?].
The synthesis of language learners
 Information and Computation
, 1999
"... An index for an r.e. class of languages (by definition) is a procedure which generates a sequence of grammars defining the class. An index for an indexed family of languages (by definition) is a procedure which generates a sequence of decision procedures defining the family. Studied is the metaprobl ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
An index for an r.e. class of languages (by definition) is a procedure which generates a sequence of grammars defining the class. An index for an indexed family of languages (by definition) is a procedure which generates a sequence of decision procedures defining the family. Studied is the metaproblem of synthesizing from indices for r.e. classes and for indexed families of languages various kinds of languagelearners for the corresponding classes or families indexed. Many positive results, as well as some negative results, are presented regarding the existence of such synthesizers. The negative results essentially provide lower bounds for the positive results. The proofs of some of the positive results yield, as pleasant corollaries, subsetprinciple or telltale style characterizations for the learnability of the corresponding classes or families indexed. For example, the indexed families of recursive languages that can be behaviorally correctly identified from positive data are surprisingly characterized by Angluin’s (1980b) Condition 2 (the subset principle for circumventing overgeneralization). 1
On learning limiting programs
 International Journal of Foundations of Computer Science
, 1992
"... Machine learning of limit programs (i.e., programs allowed finitely many mind changes about their legitimate outputs) for computable functions is studied. Learning of iterated limit programs is also studied. To partially motivate these studies, it is shown that, in some cases, interesting global pr ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
Machine learning of limit programs (i.e., programs allowed finitely many mind changes about their legitimate outputs) for computable functions is studied. Learning of iterated limit programs is also studied. To partially motivate these studies, it is shown that, in some cases, interesting global properties of computable functions can be proved from suitable (n + 1)iterated limit programs for them which can not be proved from any niterated limit programs for them. It is shown that learning power is increased when (n + 1)iterated limit programs rather than niterated limit programs are to be learned. Many tradeoff results are obtained regarding learning power, number (possibly zero) of limits taken, program size constraints and information, and number of errors tolerated in final programs learned.
Learning in the presence of inaccurate information
 in "Proceedings of the 2nd Annual ACM Conference on Computational Learning Theory
, 1989
"... The present paper considers the effects of introducing inaccuracies in a learner’s environment in Gold’s learning model of identification in the limit. Three kinds of inaccuracies are considered: presence of spurious data is modeled as learning from a noisy environment, missing data is modeled as ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
The present paper considers the effects of introducing inaccuracies in a learner’s environment in Gold’s learning model of identification in the limit. Three kinds of inaccuracies are considered: presence of spurious data is modeled as learning from a noisy environment, missing data is modeled as learning from incomplete environment, and the presence of a mixture of both spurious and missing data is modeled as learning from imperfect environment. Two learning domains are considered, namely, identification of programs from graphs of computable functions and identification of grammars from positive data about recursively enumerable languages. Many hierarchies and tradeoffs resulting from the interplay between the number of errors allowed in the final hypotheses, the number of inaccuracies in the data, the types of inaccuracies, and the type of success criteria are derived. An interesting result is that in the context of function learning, incomplete data is strictly worse for learning than noisy data. 1
Learning from Multiple Sources of Inaccurate Data
 in "Proceedings of the International Workshop on Analogical and Inductive Inference in Dagstuhl
, 1992
"... Abstract. Most theoretical models of inductive inference make the idealized assumption that the data available to a learner is from a single and accurate source. The subject of inaccuracies in data emanating from a single source has been addressed by several authors. The present paper argues in favo ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Abstract. Most theoretical models of inductive inference make the idealized assumption that the data available to a learner is from a single and accurate source. The subject of inaccuracies in data emanating from a single source has been addressed by several authors. The present paper argues in favor of a more realistic learning model in which data emanates from multiple sources, some or all of which may be inaccurate. Three kinds of inaccuracies are considered: spurious data (modeled as noisy texts), missing data (modeled as incomplete texts), and a mixture of spurious and missing data (modeled as imperfect texts). Motivated by the above argument, the present paper introduces and theoretically analyzes a number of inference criteria in which a learning machine is fed data from multiple sources, some of which may be infected with inaccuracies. The learning situation modeled is the identification in the limit of programs from graphs of computable functions. The main parameters of the investigation are: kind of inaccuracy, total number of data sources, number of faulty data sources which produce data within an acceptable bound, and the bound on the number of errors allowed in the final hypothesis learned by the machine. Sufficient conditions are determined under which, for the same kind of inaccuracy, for the same
Vacillatory learning of nearly minimal size grammars
 Journal of Computer and System Sciences
, 1994
"... In Gold’s influential language learning paradigm a learning machine converges in the limit to one correct grammar. In an attempt to generalize Gold’s paradigm, Case considered the question whether people might converge to vacillating between up to (some integer) n> 1 distinct, but equivalent, correc ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
In Gold’s influential language learning paradigm a learning machine converges in the limit to one correct grammar. In an attempt to generalize Gold’s paradigm, Case considered the question whether people might converge to vacillating between up to (some integer) n> 1 distinct, but equivalent, correct grammars. He showed that larger classes of languages can be algorithmically learned (in the limit) by converging to up to n + 1 rather than up to n correct grammars. He also argued that, for “small ” n> 1, it is plausible that people might sometimes converge to vacillating between up to n grammars. The insistence on small n was motivated by the consideration that, for “large ” n, at least one of n grammars would be too large to fit in peoples ’ heads. Of course, even for Gold’s n = 1 case, the single grammar converged to in the limit may be infeasibly large. An interesting complexity restriction to make, then, on the final grammar(s) converged to in the limit is that they all have small size. In this paper we study some of the tradeoffs in learning power involved in making a welldefined version of this restriction. We show and exploit as a tool the desirable property that the learning power under our
Strong separation of learning classes
 Journal of Experimental and Theoretical Artificial Intelligence
, 1992
"... Suppose LC1 and LC2 are two machine learning classes each based on a criterion of success. Suppose, for every machine which learns a class of functions according to the LC1 criterion of success, there is a machine which learns this class according to the LC2 criterion. In the case where the converse ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Suppose LC1 and LC2 are two machine learning classes each based on a criterion of success. Suppose, for every machine which learns a class of functions according to the LC1 criterion of success, there is a machine which learns this class according to the LC2 criterion. In the case where the converse does not hold LC1 is said to be separated from LC2. It is shown that for many such separated learning classes from the literature a much stronger separation holds: (∀C ∈ LC1)(∃C ′ ∈ (LC2 −LC1))[C ′ ⊃ C]. It is also shown that there is a pair of separated learning classes from the literature for which the stronger separation just above does not hold. A philosophical heuristic toward the design of artificially intelligent learning programs is presented with each strong separation result. 1
Parsimony Hierarchies for Inductive Inference
 Journal of Symbolic Logic
"... Freivalds defined an acceptable programming system independent criterion for learning programs for functions in which the final programs were required to be both correct and "nearly" minimal size, i.e, within a computable function of being purely minimal size. Kinber showed that this parsimony requi ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Freivalds defined an acceptable programming system independent criterion for learning programs for functions in which the final programs were required to be both correct and "nearly" minimal size, i.e, within a computable function of being purely minimal size. Kinber showed that this parsimony requirement on final programs limits learning power. However, in scientific inference, parsimony is considered highly desirable. A limcomputable function is (by definition) one calculable by a total procedure allowed to change its mind finitely many times about its output. Investigated is the possibility of assuaging somewhat the limitation on learning power resulting from requiring parsimonious final programs by use of criteria which require the final, correct programs to be "notsonearly" minimal size, e.g., to be within a limcomputable function of actual minimal size. It is shown that some parsimony in the final program is thereby retained, yet learning power strictly increases. Considered, then, are limcomputable functions as above but for which notations for constructive ordinals are used to bound the number of mind changes allowed regarding the output. This is a variant of an idea introduced by Freivalds and Smith. For this ordinal notation complexity bounded version of limcomputability, the power of the resultant learning criteria form finely graded, infinitely ramifying, infinite hierarchies intermediate between the computable and the limcomputable cases. Some of these hierarchies, for the natural notations determining them, are shown to be optimally tight.