Results 1  10
of
11
The Power of Vacillation in Language Learning
, 1992
"... Some extensions are considered of Gold's influential model of language learning by machine from positive data. Studied are criteria of successful learning featuring convergence in the limit to vacillation between several alternative correct grammars. The main theorem of this paper is that there are ..."
Abstract

Cited by 44 (11 self)
 Add to MetaCart
Some extensions are considered of Gold's influential model of language learning by machine from positive data. Studied are criteria of successful learning featuring convergence in the limit to vacillation between several alternative correct grammars. The main theorem of this paper is that there are classes of languages that can be learned if convergence in the limit to up to (n+1) exactly correct grammars is allowed but which cannot be learned if convergence in the limit is to no more than n grammars, where the no more than n grammars can each make finitely many mistakes. This contrasts sharply with results of Barzdin and Podnieks and, later, Case and Smith, for learnability from both positive and negative data. A subset principle from a 1980 paper of Angluin is extended to the vacillatory and other criteria of this paper. This principle, provides a necessary condition for circumventing overgeneralization in learning from positive data. It is applied to prove another theorem to the eff...
On the Intrinsic Complexity of Learning
 Information and Computation
, 1995
"... A new view of learning is presented. The basis of this view is a natural notion of reduction. We prove completeness and relative difficulty results. An infinite hierarchy of intrinsically more and more difficult to learn concepts is presented. Our results indicate that the complexity notion capt ..."
Abstract

Cited by 25 (6 self)
 Add to MetaCart
A new view of learning is presented. The basis of this view is a natural notion of reduction. We prove completeness and relative difficulty results. An infinite hierarchy of intrinsically more and more difficult to learn concepts is presented. Our results indicate that the complexity notion captured by our new notion of reduction differs dramatically from the traditional studies of the complexity of the algorithms performing learning tasks. 2 1 Introduction Traditional studies of inductive inference have focused on illuminating various strata of learnability based on varying the definition of learnability. The research following the Valiant's PAC model [Val84] and Angluin's teacher/learner model [Ang88] paid very careful attention to calculating the complexity of the learning algorithm. We present a new view of learning, based on the notion of reduction, that captures a different perspective on learning complexity than all prior studies. Based on our prelimanary reports, Jain...
Infinitary Self Reference in Learning Theory
, 1994
"... Kleene's Second Recursion Theorem provides a means for transforming any program p into a program e(p) which first creates a quiescent self copy and then runs p on that self copy together with any externally given input. e(p), in effect, has complete (low level) self knowledge, and p represents how ..."
Abstract

Cited by 18 (6 self)
 Add to MetaCart
Kleene's Second Recursion Theorem provides a means for transforming any program p into a program e(p) which first creates a quiescent self copy and then runs p on that self copy together with any externally given input. e(p), in effect, has complete (low level) self knowledge, and p represents how e(p) uses its self knowledge (and its knowledge of the external world). Infinite regress is not required since e(p) creates its self copy outside of itself. One mechanism to achieve this creation is a self replication trick isomorphic to that employed by singlecelled organisms. Another is for e(p) to look in a mirror to see which program it is. In 1974 the author published an infinitary generalization of Kleene's theorem which he called the Operator Recursion Theorem. It provides a means for obtaining an (algorithmically) growing collection of programs which, in effect, share a common (also growing) mirror from which they can obtain complete low level models of themselves and the other prog...
Complexity issues for vacillatory function identification
 Information and Computation
, 1995
"... It was previously shown by Barzdin and Podnieks that one does not increase the power of learning programs for functions by allowing learning algorithms to converge to a finite set of correct programs instead of requiring them to converge to a single correct program. In this paper we define some new, ..."
Abstract

Cited by 12 (9 self)
 Add to MetaCart
It was previously shown by Barzdin and Podnieks that one does not increase the power of learning programs for functions by allowing learning algorithms to converge to a finite set of correct programs instead of requiring them to converge to a single correct program. In this paper we define some new, subtle, but natural concepts of mind change complexity for function learning and show that, if one bounds this complexity for learning algorithms, then, by contrast with Barzdin and Podnieks result, there are interesting and sometimes complicated tradeoffs between these complexity bounds, bounds on the number of final correct programs, and learning power. CR Classification Number: I.2.6 (Learning – Induction). 1
On learning limiting programs
 International Journal of Foundations of Computer Science
, 1992
"... Machine learning of limit programs (i.e., programs allowed finitely many mind changes about their legitimate outputs) for computable functions is studied. Learning of iterated limit programs is also studied. To partially motivate these studies, it is shown that, in some cases, interesting global pr ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
Machine learning of limit programs (i.e., programs allowed finitely many mind changes about their legitimate outputs) for computable functions is studied. Learning of iterated limit programs is also studied. To partially motivate these studies, it is shown that, in some cases, interesting global properties of computable functions can be proved from suitable (n + 1)iterated limit programs for them which can not be proved from any niterated limit programs for them. It is shown that learning power is increased when (n + 1)iterated limit programs rather than niterated limit programs are to be learned. Many tradeoff results are obtained regarding learning power, number (possibly zero) of limits taken, program size constraints and information, and number of errors tolerated in final programs learned.
Vacillatory learning of nearly minimal size grammars
 Journal of Computer and System Sciences
, 1994
"... In Gold’s influential language learning paradigm a learning machine converges in the limit to one correct grammar. In an attempt to generalize Gold’s paradigm, Case considered the question whether people might converge to vacillating between up to (some integer) n> 1 distinct, but equivalent, correc ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
In Gold’s influential language learning paradigm a learning machine converges in the limit to one correct grammar. In an attempt to generalize Gold’s paradigm, Case considered the question whether people might converge to vacillating between up to (some integer) n> 1 distinct, but equivalent, correct grammars. He showed that larger classes of languages can be algorithmically learned (in the limit) by converging to up to n + 1 rather than up to n correct grammars. He also argued that, for “small ” n> 1, it is plausible that people might sometimes converge to vacillating between up to n grammars. The insistence on small n was motivated by the consideration that, for “large ” n, at least one of n grammars would be too large to fit in peoples ’ heads. Of course, even for Gold’s n = 1 case, the single grammar converged to in the limit may be infeasibly large. An interesting complexity restriction to make, then, on the final grammar(s) converged to in the limit is that they all have small size. In this paper we study some of the tradeoffs in learning power involved in making a welldefined version of this restriction. We show and exploit as a tool the desirable property that the learning power under our
Machine induction without revolutionary changes in hypothesis size
 Information and Computation
, 1996
"... This paper provides a beginning study of the effects on inductive inference of paradigm shifts whose absence is approximately modeled by various formal approaches to forbidding large changes in the size of programs conjectured. One approach, called severely parsimonious, requires all the programs co ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
This paper provides a beginning study of the effects on inductive inference of paradigm shifts whose absence is approximately modeled by various formal approaches to forbidding large changes in the size of programs conjectured. One approach, called severely parsimonious, requires all the programs conjectured on the way to success to be nearly (i.e., within a recursive function of) minimal size. It is shown that this very conservative constraint allows learning infinite classes of functions, but not infinite r.e. classes of functions. Another approach, called nonrevolutionary, requires all conjectures to be nearly the same size as one another. This quite conservative constraint is, nonetheless, shown to permit learning some infinite r.e. classes of functions. Allowing up to one extra bounded size mind change towards a final program learned certainly doesn’t appear revolutionary. However, somewhat surprisingly for scientific (inductive) inference, it is shown that there are classes learnable with the nonrevolutionary constraint (respectively, with severe parsimony), up to (i + 1) mind changes, and no anomalies, which classes cannot be learned with no size constraint, an unbounded, finite number of anomalies in the final program, but with no more than i mind changes. Hence, in some cases, the possibility of one extra mind change is considerably more liberating than removal of very conservative size shift constraints. The proofs of these results are also combinatorially interesting. 1
On a question about learning nearly minimal programs
 Information Processing Letters
, 1995
"... Identification of programs for computable functions from their graphs by algorithmic devices is a well studied problem in learning theory. Freivalds and Chen consider identification of ‘minimal’ and ‘nearly minimal ’ programs for functions from their graphs. The present paper solves the following qu ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Identification of programs for computable functions from their graphs by algorithmic devices is a well studied problem in learning theory. Freivalds and Chen consider identification of ‘minimal’ and ‘nearly minimal ’ programs for functions from their graphs. The present paper solves the following question left open by Chen: Is it the case that for any collection of computable functions, C, such that some machine can finitely learn a nearly minimal (n + 1)error program for every function in C, there exists another machine that can learn in the limit an nerror program (which need not be nearly minimal) for every function in C? We answer this question negatively.
Parsimony Hierarchies for Inductive Inference
 Journal of Symbolic Logic
"... Freivalds defined an acceptable programming system independent criterion for learning programs for functions in which the final programs were required to be both correct and "nearly" minimal size, i.e, within a computable function of being purely minimal size. Kinber showed that this parsimony requi ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Freivalds defined an acceptable programming system independent criterion for learning programs for functions in which the final programs were required to be both correct and "nearly" minimal size, i.e, within a computable function of being purely minimal size. Kinber showed that this parsimony requirement on final programs limits learning power. However, in scientific inference, parsimony is considered highly desirable. A limcomputable function is (by definition) one calculable by a total procedure allowed to change its mind finitely many times about its output. Investigated is the possibility of assuaging somewhat the limitation on learning power resulting from requiring parsimonious final programs by use of criteria which require the final, correct programs to be "notsonearly" minimal size, e.g., to be within a limcomputable function of actual minimal size. It is shown that some parsimony in the final program is thereby retained, yet learning power strictly increases. Considered, then, are limcomputable functions as above but for which notations for constructive ordinals are used to bound the number of mind changes allowed regarding the output. This is a variant of an idea introduced by Freivalds and Smith. For this ordinal notation complexity bounded version of limcomputability, the power of the resultant learning criteria form finely graded, infinitely ramifying, infinite hierarchies intermediate between the computable and the limcomputable cases. Some of these hierarchies, for the natural notations determining them, are shown to be optimally tight.
Characterizing language identification by standardizing operations
 Journal of Computer and System Sciences
, 1994
"... Notions from formal language learning theory are characterized in terms of standardizing operations on classes of recursively enumerable languages. Algorithmic identification in the limit of grammars from text presentation of recursively enumerable languages is a central paradigm of language learnin ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Notions from formal language learning theory are characterized in terms of standardizing operations on classes of recursively enumerable languages. Algorithmic identification in the limit of grammars from text presentation of recursively enumerable languages is a central paradigm of language learning. A mapping, F, from the set of all grammars into the set of all grammars is a standardizing operation on a class of recursively enumerable languages L just in case F maps any grammar for any language L ∈ L to a canonical grammar for L. Investigating connections between these two notions is the subject of this paper. 1 1