## On the Impact of Forgetting on Learning Machines (1993)

Venue: | Journal of the ACM |

Citations: | 10 - 3 self |

### BibTeX

@INPROCEEDINGS{Freivalds93onthe,

author = {Rusins Freivalds and Efim Kinber and Carl H. Smith},

title = {On the Impact of Forgetting on Learning Machines},

booktitle = {Journal of the ACM},

year = {1993},

pages = {165--174},

publisher = {ACM Press}

}

### OpenURL

### Abstract

this paper contributes toward the goal of understanding how a computer can be programmed to learn by isolating features of incremental learning algorithms that theoretically enhance their learning potential. In particular, we examine the effects of imposing a limit on the amount of information that learning algorithm can hold in its memory as it attempts to This work was facilitated by an international agreement under NSF Grant 9119540.

### Citations

2045 |
An introduction to probability theory and its applications, volume I
- Feller
- 1968
(Show Context)
Citation Context ...man . Recall that oe i denotes the length of the the i th block of 1's in the string representation of f . By the above discussion, if 1 X n=1 1 2 oe n (3) diverges, then by the Borel--Cantelli lemma =-=[17]-=-, state q 1 , hence state q 3 will be entered just at the end of reading an entire block infinitely often with probability 1, and finitely often with probability 0. On the other hand, if (3) converges... |

1695 | A Theory of the Learnable
- Valiant
- 1984
(Show Context)
Citation Context ...dy of work addressing the inference of minimal size programs. See [19] for a survey. There have been a few results concerning space limited learning in the PAC (probably apporoximately correct) model =-=[57]-=-. Haussler [29] shows how to PAC learn strictly ordered decision trees using space linear in the size of the smallest possible decision tree. Boucheron and Sallantin [10] show that some classes of boo... |

924 | The Logic of Scientific Discovery - Popper - 1959 |

890 | Language identification in the limit - Gold - 1967 |

876 | The Architecture of Cognition - Anderson - 1983 |

665 | The strength of weak learnability
- Schapire
- 1990
(Show Context)
Citation Context ...riables) space. PAC learning while remembering only a fixed number of examples, each of a bounded size is considered in [3, 18, 32]. The most general investigation on this line was the observation in =-=[51]-=- that the boosting algorithm can be made reasonably space efficient as well. Sample complexity gives only a very crude accounting of space utilization. Learning procedures may want to remember other i... |

473 | Recursively Enumerable Sets and Degrees - Soare - 1987 |

306 | Inductive inference: theory and methods - Angluin, Smith - 1983 |

276 | Elements of the Theory of Computation - Lewis, Papadimitriou - 1981 |

259 | Toward a mathematical theory of inductive inference - Blum, Blum - 1975 |

168 |
Formal Principles of Language Acquisition
- Wexler
- 1980
(Show Context)
Citation Context ... Consequently, items to be remembered that are not reinforced by subsequent inputs may be eliminated from some level of the memory before they become permanently fixed in memory. Wexler and Culicover =-=[59]-=- formalized many notions of language learning, including one where a device (essentially an inductive inference machine) was to learn having access to the most recently received data and the machines'... |

163 | Comparison of identification criteria for machine inductive inference, Theoretical Computer Science - Case, Smith - 1983 |

158 | On the computational power of neural nets
- Siegelmann, Sontag
- 1995
(Show Context)
Citation Context ...ndependent simulations have verified that occasional "unlearning" aids in learning [33]. In a similar vein, neural networks with a limitation on the type of the weight in each node were cons=-=idered in [54]-=-. The types considered are integer, rational and real. Each successive type can, potentially, place higher demands on memory utilization within each node. Each type also expands the inherent capabilit... |

120 | The Magic Number Seven Plus or Minus Two - Miller - 1956 |

117 |
On formal properties of simple phrase structure grammars
- Bar-Hillel, Perles, et al.
- 1964
(Show Context)
Citation Context ...o store the constant 0. Clearly, U 1 2 EX : c. X Proposition 2 U 0 62 EX : c. Proof: Suppose by way of contradiction that M is an IIM such that U 0 2 EX : c(M ). A pumping lemma type argument is used =-=[7]-=-, see [37]. The string representation of functions is used in this proof. Since M has a constant, finite amount of memory, there are strings oe andssuch that 1. oe andscontain only 0's and 1's, and 2.... |

114 | Systems that Learn - Osherson, Stob, et al. - 1986 |

101 |
An Introduction to the General Theory of Algorithms
- Machtey, Young
- 1978
(Show Context)
Citation Context ...al numbers (N) will serve as names for programs. The function computed by program i will be denoted by ' i . It is assumed that ' 0 , ' 1 , \Delta \Delta \Delta forms an acceptable programming system =-=[39, 49]. The quan-=-tifier 1 8 is read "for all but finitely many." Sometimes, it will be convenient to represent a function by a sequence of values from its range. Such a representation is called a string repr... |

83 |
On notation for ordinal numbers
- Kleene
- 1938
(Show Context)
Citation Context ... memory configuration after reading ff 0 as it does after reading ff 1 . Since M 's memory is assumed to be of constant size, two such strings will exist. By two applications of the recursion theorem =-=[35]-=-, there are programs e 0 and e 1 such that ' e 0 = ff 0 (01 e 0 ) 1 and ' e 1 = ff 1 (01 e 1 ) 1 . Clearly, both ' e 0 and ' e 1 are in U 0 . M will exhibit the same limiting behavior on both function... |

69 | A preliminary analysis of the SOAR architecture as a basis for general intelligence - Rosenbloom, Laird, et al. - 1991 |

69 |
Theory of Formal Systems
- SMULLYAN
- 1962
(Show Context)
Citation Context ...after seeing the segments 1 a and 1 b . There are two cases to consider in order to verify the claim. Case 1, a = 2 2\Deltam . Then a is a perfect square and b is not. By the mutual recursion theorem =-=[55]-=- there are programs e 1 and e 2 such that ' e 1 = 1 a e 1 e 2 0 1 and ' e 2 = 1 b e 1 e 2 0 1 . Notice that both ' e 1 and ' e 2 are in U SQ . Since M 's long term memory is assumed to be unable to di... |

61 |
Theory of Recursive Functions and Effective Computability
- Jr
- 1967
(Show Context)
Citation Context ...al numbers (N) will serve as names for programs. The function computed by program i will be denoted by ' i . It is assumed that ' 0 , ' 1 , \Delta \Delta \Delta forms an acceptable programming system =-=[39, 49]. The quan-=-tifier 1 8 is read "for all but finitely many." Sometimes, it will be convenient to represent a function by a sequence of values from its range. Such a representation is called a string repr... |

49 |
Periodicity in generations of automata
- Case
- 1974
(Show Context)
Citation Context ...ffe 1 where ff is some arbitrary initial segment and ' e = f . Let oe 0 , oe 1 , \Delta \Delta \Delta be an effective enumeration of all the finite initial segments. By the operator recursion theorem =-=[12]-=- there is a recursively enumerable sequence of recursive functions, indexed by i, such that ' h(i) = oe i h(i) 1 . Clearly, f' h(i) ji 2 Ng ` U is dense. Furthermore, it can be learned by an IIM that ... |

45 | Artificial intelligence - Shapiro - 1992 |

41 |
Limeserkennung Rekursiver Funktionen Durch Spezielle Stratgien Elektronische Informationsverarbeitung und Kybernetik
- Wiehagen
- 1976
(Show Context)
Citation Context ...d data item. This generalization was shown not to increase the potential of such mechanisms to learn languages. A study of learning functions with a limited amount of memory was initiated by Wiehagen =-=[60]-=-. He defined iterative strategies that, like the Wexler and Culicover model, had access to only the next data item and the current hypothesis. Also defined were feedback strategies that were allowed t... |

40 |
M.K.: Learning nested differences of intersection-closed concept classes
- Helmbold, Sloan, et al.
- 1990
(Show Context)
Citation Context ...ions can be learned time efficiently using only logarithmic (in the number of variables) space. PAC learning while remembering only a fixed number of examples, each of a bounded size is considered in =-=[3, 18, 32]-=-. The most general investigation on this line was the observation in [51] that the boosting algorithm can be made reasonably space efficient as well. Sample complexity gives only a very crude accounti... |

40 |
Probability and plurality for aggregations of learning machines
- Pitt, Smith
- 1988
(Show Context)
Citation Context ...least b bits. Hence, M 0 requires at least linear long term memory. X 6 Probabilistic Limited Memory Machines Probabilistic inductive inference machines were introduced in [45] and studied further in =-=[46]-=-. A probabilistic inductive inference machine is an IIM that makes use of a fair coin in its deliberations. We say that f 2 EX(M)hpi if M learns f with probability p, 0sps1. The collection EXhpi is de... |

38 |
Unlearning has a stabilizing effect in collective memories
- Hopfield, Feinstein, et al.
- 1983
(Show Context)
Citation Context ...ns of rapid eye movement (REM) sleep is to discard some memories to keep from overloading our neural networks [16]. Independent simulations have verified that occasional "unlearning" aids in=-= learning [33]-=-. In a similar vein, neural networks with a limitation on the type of the weight in each node were considered in [54]. The types considered are integer, rational and real. Each successive type can, po... |

36 | The function of dream sleep - Crick, Mitchison - 1983 |

34 |
On the role of procrastination in machine learning
- Freivalds, Smith
- 1993
(Show Context)
Citation Context ...ative learning power of EX : c type inference, we will employ the set U 0 of functions of finite support and the set U 1 of self describing functions. These sets were introduced in [8, 9] and used in =-=[15, 25]-=- to separate various classes of learnable sets of functions. Let U 0 = ff jf is recursive and 1 8x(f(x) = 0))g and U 1 = ff jf is recursive and ' f(0) = fg. Proposition 1 U 1 2 EX : c. 8 Proof: The II... |

33 |
Probabilistic Inductive Inference
- Pitt
- 1985
(Show Context)
Citation Context ...h(n) . This will require at least b bits. Hence, M 0 requires at least linear long term memory. X 6 Probabilistic Limited Memory Machines Probabilistic inductive inference machines were introduced in =-=[45]-=- and studied further in [46]. A probabilistic inductive inference machine is an IIM that makes use of a fair coin in its deliberations. We say that f 2 EX(M)hpi if M learns f with probability p, 0sps1... |

32 | A Survey of Inductive Inference - Angluin, Smith - 1983 |

26 |
Two theorems on the limiting synthesis of functions
- Barzdins
- 1974
(Show Context)
Citation Context ...ugh idea of the relative learning power of EX : c type inference, we will employ the set U 0 of functions of finite support and the set U 1 of self describing functions. These sets were introduced in =-=[8, 9]-=- and used in [15, 25] to separate various classes of learnable sets of functions. Let U 0 = ff jf is recursive and 1 8x(f(x) = 0))g and U 1 = ff jf is recursive and ' f(0) = fg. Proposition 1 U 1 2 EX... |

22 |
On two types of models of the internalization of grammars
- Braine
- 1971
(Show Context)
Citation Context ...expands the inherent capabilities of the neural networks using that type of node weights. Linguists interested in how children learn language have hypothesized many mechanisms for remembering. Braine =-=[11]-=- suggested that human memory is organized as a cascading sequence of memories. The idea is that items to be remembered are initially entered in the first level of the memory and then later moved to su... |

18 |
Space-bounded learning and the Vapnik-Chervonenkis dimension
- Floyd
- 1989
(Show Context)
Citation Context ...ions can be learned time efficiently using only logarithmic (in the number of variables) space. PAC learning while remembering only a fixed number of examples, each of a bounded size is considered in =-=[3, 18, 32]-=-. The most general investigation on this line was the observation in [51] that the boosting algorithm can be made reasonably space efficient as well. Sample complexity gives only a very crude accounti... |

18 |
Elementary Theory of Numbers
- Griffin
- 1954
(Show Context)
Citation Context ...note the least common multiple of f1; \Delta \Delta \Delta ; ng. We would like to define b nsa n + LCM(n). To do so, we must develop an upper bound for LCM(n). From elementary number theory (e.g. see =-=[28]-=-) the number of primes less than or equal to n is O(n=ln n). The largest possible factor of any prime in LCM(n) is n. Consequently, an upper bound for LCM(n) is O(n O( n log n ) ): Thus, we can choose... |

15 | Introduction to Neural and Cognitive Modeling. Lawrence Earlbaum Associates - Levine - 1991 |

12 |
Space efficient learning algorithms
- Haussler
- 1988
(Show Context)
Citation Context ...essing the inference of minimal size programs. See [19] for a survey. There have been a few results concerning space limited learning in the PAC (probably apporoximately correct) model [57]. Haussler =-=[29]-=- shows how to PAC learn strictly ordered decision trees using space linear in the size of the smallest possible decision tree. Boucheron and Sallantin [10] show that some classes of boolean functions ... |

10 |
Combining postulates of naturalness in inductive inference. Elektronische Informationsverarbeitung und Kybernetik
- Jantke, Beick
- 1981
(Show Context)
Citation Context ... The strict inclusion of the class of sets learnable by iteratively working strategies in the similar class for all strategies (EX) was shown to hold when anomalies are allowed [43]. Jantke and Beick =-=[34]-=- considered order independent iterative strategies and showed that Wiehagen's results hold with order restrictions removed. The conclusion reached in the above mentioned work on learning functions was... |

7 | The competitive chunking theory: Models of perception, learning, and memory. Unpublished doctoral dissertation - Servan-Schreiber - 1991 |

7 |
Convergence to nearly minimal size grammars by vacillating learning machines
- Case, Jain, et al.
- 1989
(Show Context)
Citation Context ... For example, based on the observation that people do not have have enough memory to learn an arbitrarily large grammar for a natural language, a study of learning minimal size grammars was initiated =-=[13]-=-. There has been a large body of work addressing the inference of minimal size programs. See [19] for a survey. There have been a few results concerning space limited learning in the PAC (probably app... |

7 | A theory for memory-based learning
- Lin, Vitter, et al.
- 1993
(Show Context)
Citation Context ...he sample complexity metric neglects to count some of the long term storage employed by learning algorithms. Lin and Vitter consider memory requirements for learning sufficiently smooth distributions =-=[38]-=-. Since they assume that the inputs are in some readable form, the issue of how much space it takes to store a number never arises. 6 We now describe the model investigated in this paper. To insure an... |

4 |
Some remarks about space-complexity of learning, and circuit complexity of recognizing
- Boucheron, Sallantin
- 1988
(Show Context)
Citation Context ...apporoximately correct) model [57]. Haussler [29] shows how to PAC learn strictly ordered decision trees using space linear in the size of the smallest possible decision tree. Boucheron and Sallantin =-=[10]-=- show that some classes of boolean functions can be learned time efficiently using only logarithmic (in the number of variables) space. PAC learning while remembering only a fixed number of examples, ... |

4 |
Refinements of inductive inference by Popperian machines
- Case, Ngo-Manguelle
- 1979
(Show Context)
Citation Context ... : c ` EX. The inclusion is proper by Proposition 2 and the fact that U 0 2 EX [9]. X Another type of inference that may be relevant to neural networks is the class PEX defined in [15] and studied in =-=[14]-=-. A set of functions U is in PEX just in case there is an IIM that outputs only programs for total recursive functions and U ` EX(M ). The collection of sets PEX is defined analogously. In this case t... |

3 |
Inductive inference of minimal size programs
- Freivalds
- 1990
(Show Context)
Citation Context ...itrarily large grammar for a natural language, a study of learning minimal size grammars was initiated [13]. There has been a large body of work addressing the inference of minimal size programs. See =-=[19]-=- for a survey. There have been a few results concerning space limited learning in the PAC (probably apporoximately correct) model [57]. Haussler [29] shows how to PAC learn strictly ordered decision t... |

3 |
Probabilistic versus Deterministic Memory Limited Learning. Algorithmic Learning for Knowledge-Based Systems
- Freivalds, Kinber, et al.
- 1995
(Show Context)
Citation Context ...ryland College Park, MD 20912 USA August 1, 1997 This work was facilitated by an international agreement under NSF Grant 9119540. Results collected in this paper were presented at various conferences =-=[20,21,22,23,24]-=-. y Supported by the Latvian Council of Science, grants No. 90.619 and 93.599. z Supported in part by NSF Grants 9020079 and 9301339. 2 Abstract People tend not to have perfect memories when it comes ... |

2 |
Why sometimes probabilistic algorithms can be more effective
- Ablaev, Freivalds
- 1986
(Show Context)
Citation Context ...g. the y i 's are points immediately following a point that immediately follows a block of 0's or 1's. Theorem 11 U 2 EX : ch1i. Proof: The proof proceeds by constructing two probabilistic !-automata =-=[1, 56]-=-. These !-automata will process the string of values representing the range of functions from U . Consequently, they will only have to recognize symbols as being either 0 or 1 or other. The state tran... |

1 | Memory limited inductive inference machines - Freivalds, Smith |

1 | Trial and error: a new approach to space-bounded learning
- Ameur, Fischer, et al.
- 1993
(Show Context)
Citation Context ...ions can be learned time efficiently using only logarithmic (in the number of variables) space. PAC learning while remembering only a fixed number of examples, each of a bounded size is considered in =-=[3, 18, 32]-=-. The most general investigation on this line was the observation in [51] that the boosting algorithm can be made reasonably space efficient as well. Sample complexity gives only a very crude accounti... |

1 |
Learning with a limited memory
- Freivalds, Kinber, et al.
- 1993
(Show Context)
Citation Context ...ryland College Park, MD 20912 USA August 1, 1997 This work was facilitated by an international agreement under NSF Grant 9119540. Results collected in this paper were presented at various conferences =-=[20,21,22,23,24]-=-. y Supported by the Latvian Council of Science, grants No. 90.619 and 93.599. z Supported in part by NSF Grants 9020079 and 9301339. 2 Abstract People tend not to have perfect memories when it comes ... |

1 |
Quantifying the amount of relevant information
- Freivalds, Kinber, et al.
- 1994
(Show Context)
Citation Context ...ryland College Park, MD 20912 USA August 1, 1997 This work was facilitated by an international agreement under NSF Grant 9119540. Results collected in this paper were presented at various conferences =-=[20,21,22,23,24]-=-. y Supported by the Latvian Council of Science, grants No. 90.619 and 93.599. z Supported in part by NSF Grants 9020079 and 9301339. 2 Abstract People tend not to have perfect memories when it comes ... |