## On the Impact of Forgetting on Learning Machines (1993)

Venue: | Journal of the ACM |

Citations: | 10 - 3 self |

### BibTeX

@INPROCEEDINGS{Freivalds93onthe,

author = {Rusins Freivalds and Efim Kinber and Carl H. Smith},

title = {On the Impact of Forgetting on Learning Machines},

booktitle = {Journal of the ACM},

year = {1993},

pages = {165--174},

publisher = {ACM Press}

}

### OpenURL

### Abstract

this paper contributes toward the goal of understanding how a computer can be programmed to learn by isolating features of incremental learning algorithms that theoretically enhance their learning potential. In particular, we examine the effects of imposing a limit on the amount of information that learning algorithm can hold in its memory as it attempts to This work was facilitated by an international agreement under NSF Grant 9119540.

### Citations

2316 |
An Introduction to Probability Theory and its Applications, volume II
- Feller
- 1971
(Show Context)
Citation Context ...man . Recall that oe i denotes the length of the the i th block of 1's in the string representation of f . By the above discussion, if 1 X n=1 1 2 oe n (3) diverges, then by the Borel--Cantelli lemma =-=[17]-=-, state q 1 , hence state q 3 will be entered just at the end of reading an entire block infinitely often with probability 1, and finitely often with probability 0. On the other hand, if (3) converges... |

1227 | The logic of scientific discovery - Popper - 1934 |

1113 | The Architecture of Cognition - Anderson - 1983 |

953 | Language identification in the limit - Gold - 1967 |

701 | The strength of weak learnability
- Schapire
- 1990
(Show Context)
Citation Context ...riables) space. PAC learning while remembering only a fixed number of examples, each of a bounded size is considered in [3, 18, 32]. The most general investigation on this line was the observation in =-=[51]-=- that the boosting algorithm can be made reasonably space efficient as well. Sample complexity gives only a very crude accounting of space utilization. Learning procedures may want to remember other i... |

505 | Recursively Enumerable Sets and Degrees - Soare - 1987 |

316 | Inductive inference: theory and methods - Angluin, Smith - 1983 |

289 | Elements of the theory of computation - Lewis, Papadimitriou - 1981 |

269 | Toward a mathematical theory of inductive inference - Blum, Blum - 1975 |

165 | Comparison of identification criteria for machine inductive inference - Case, Smith - 1983 |

161 | On the computational power of neural nets
- Siegelmann, Sontag
- 1995
(Show Context)
Citation Context ...ndependent simulations have verified that occasional "unlearning" aids in learning [33]. In a similar vein, neural networks with a limitation on the type of the weight in each node were cons=-=idered in [54]-=-. The types considered are integer, rational and real. Each successive type can, potentially, place higher demands on memory utilization within each node. Each type also expands the inherent capabilit... |

141 | The magical number seven, plus or minus two - Miller - 1956 |

126 | Systems that learn - Osherson, Stob, et al. - 1986 |

123 |
On Formal Properties of Simple Phrase Structure Grammars. Hebrew University Students
- Bar-Hillel, Perles, et al.
- 1964
(Show Context)
Citation Context ...o store the constant 0. Clearly, U 1 2 EX : c. X Proposition 2 U 0 62 EX : c. Proof: Suppose by way of contradiction that M is an IIM such that U 0 2 EX : c(M ). A pumping lemma type argument is used =-=[7]-=-, see [37]. The string representation of functions is used in this proof. Since M has a constant, finite amount of memory, there are strings oe andssuch that 1. oe andscontain only 0's and 1's, and 2.... |

104 |
An Introduction to the General Theory of Algorithms
- Machtey, Young
- 1978
(Show Context)
Citation Context ...al numbers (N) will serve as names for programs. The function computed by program i will be denoted by ' i . It is assumed that ' 0 , ' 1 , \Delta \Delta \Delta forms an acceptable programming system =-=[39, 49]. The quan-=-tifier 1 8 is read "for all but finitely many." Sometimes, it will be convenient to represent a function by a sequence of values from its range. Such a representation is called a string repr... |

94 |
On the Notation of Ordinal Numbers
- Kleene
- 1938
(Show Context)
Citation Context ... memory configuration after reading ff 0 as it does after reading ff 1 . Since M 's memory is assumed to be of constant size, two such strings will exist. By two applications of the recursion theorem =-=[35]-=-, there are programs e 0 and e 1 such that ' e 0 = ff 0 (01 e 0 ) 1 and ' e 1 = ff 1 (01 e 1 ) 1 . Clearly, both ' e 0 and ' e 1 are in U 0 . M will exhibit the same limiting behavior on both function... |

75 | A preliminary analysis of the Soar architecture as a basis for general intelligence - Rosenbloom, Laird, et al. - 1991 |

75 |
Theory of formal systems
- Smullyan
- 1961
(Show Context)
Citation Context ...after seeing the segments 1 a and 1 b . There are two cases to consider in order to verify the claim. Case 1, a = 2 2\Deltam . Then a is a perfect square and b is not. By the mutual recursion theorem =-=[55]-=- there are programs e 1 and e 2 such that ' e 1 = 1 a e 1 e 2 0 1 and ' e 2 = 1 b e 1 e 2 0 1 . Notice that both ' e 1 and ' e 2 are in U SQ . Since M 's long term memory is assumed to be unable to di... |

63 |
Theory of recursive functions and effective computability, McGraw-Hill
- Jr
- 1967
(Show Context)
Citation Context ...al numbers (N) will serve as names for programs. The function computed by program i will be denoted by ' i . It is assumed that ' 0 , ' 1 , \Delta \Delta \Delta forms an acceptable programming system =-=[39, 49]. The quan-=-tifier 1 8 is read "for all but finitely many." Sometimes, it will be convenient to represent a function by a sequence of values from its range. Such a representation is called a string repr... |

52 |
Periodicity in generations of automata
- Case
- 1974
(Show Context)
Citation Context ...ffe 1 where ff is some arbitrary initial segment and ' e = f . Let oe 0 , oe 1 , \Delta \Delta \Delta be an effective enumeration of all the finite initial segments. By the operator recursion theorem =-=[12]-=- there is a recursively enumerable sequence of recursive functions, indexed by i, such that ' h(i) = oe i h(i) 1 . Clearly, f' h(i) ji 2 Ng ` U is dense. Furthermore, it can be learned by an IIM that ... |

51 | Artificial Intelligence - Shapiro - 1992 |

42 | The function of dream sleep - Crick, Mitchison - 1983 |

40 |
M.: Learning nested differences of intersection-closed concept classes
- Helmbold, Sloan, et al.
- 1990
(Show Context)
Citation Context ...ions can be learned time efficiently using only logarithmic (in the number of variables) space. PAC learning while remembering only a fixed number of examples, each of a bounded size is considered in =-=[3, 18, 32]-=-. The most general investigation on this line was the observation in [51] that the boosting algorithm can be made reasonably space efficient as well. Sample complexity gives only a very crude accounti... |

40 |
Unlearning' has a stabilizing effect in collective memories, Nature
- Hopfield, Feinstein, et al.
- 1983
(Show Context)
Citation Context ...ns of rapid eye movement (REM) sleep is to discard some memories to keep from overloading our neural networks [16]. Independent simulations have verified that occasional "unlearning" aids in=-= learning [33]-=-. In a similar vein, neural networks with a limitation on the type of the weight in each node were considered in [54]. The types considered are integer, rational and real. Each successive type can, po... |

40 |
Probability and plurality for aggregations of learning machines
- Pitt, Smith
- 1988
(Show Context)
Citation Context ...least b bits. Hence, M 0 requires at least linear long term memory. X 6 Probabilistic Limited Memory Machines Probabilistic inductive inference machines were introduced in [45] and studied further in =-=[46]-=-. A probabilistic inductive inference machine is an IIM that makes use of a fair coin in its deliberations. We say that f 2 EX(M)hpi if M learns f with probability p, 0sps1. The collection EXhpi is de... |

34 |
On the role of procrastination in machine learning
- Freivalds, Smith
- 1993
(Show Context)
Citation Context ...ative learning power of EX : c type inference, we will employ the set U 0 of functions of finite support and the set U 1 of self describing functions. These sets were introduced in [8, 9] and used in =-=[15, 25]-=- to separate various classes of learnable sets of functions. Let U 0 = ff jf is recursive and 1 8x(f(x) = 0))g and U 1 = ff jf is recursive and ' f(0) = fg. Proposition 1 U 1 2 EX : c. 8 Proof: The II... |

34 |
Probabilistic inductive inference
- Pitt
- 1989
(Show Context)
Citation Context ...h(n) . This will require at least b bits. Hence, M 0 requires at least linear long term memory. X 6 Probabilistic Limited Memory Machines Probabilistic inductive inference machines were introduced in =-=[45]-=- and studied further in [46]. A probabilistic inductive inference machine is an IIM that makes use of a fair coin in its deliberations. We say that f 2 EX(M)hpi if M learns f with probability p, 0sps1... |

33 | Formal inductive inference - Angluin, Smith |

28 |
Two theorems on the limiting synthesis of functions. Theory of Algorithms and Programs
- Bārzdiņš
- 1974
(Show Context)
Citation Context ...ugh idea of the relative learning power of EX : c type inference, we will employ the set U 0 of functions of finite support and the set U 1 of self describing functions. These sets were introduced in =-=[8, 9]-=- and used in [15, 25] to separate various classes of learnable sets of functions. Let U 0 = ff jf is recursive and 1 8x(f(x) = 0))g and U 1 = ff jf is recursive and ' f(0) = fg. Proposition 1 U 1 2 EX... |

24 |
On two types of models of the internalization of grammars
- Braine
- 1971
(Show Context)
Citation Context ...expands the inherent capabilities of the neural networks using that type of node weights. Linguists interested in how children learn language have hypothesized many mechanisms for remembering. Braine =-=[11]-=- suggested that human memory is organized as a cascading sequence of memories. The idea is that items to be remembered are initially entered in the first level of the memory and then later moved to su... |

19 |
Elementary theory of numbers
- Griffin
- 1954
(Show Context)
Citation Context ...note the least common multiple of f1; \Delta \Delta \Delta ; ng. We would like to define b nsa n + LCM(n). To do so, we must develop an upper bound for LCM(n). From elementary number theory (e.g. see =-=[28]-=-) the number of primes less than or equal to n is O(n=ln n). The largest possible factor of any prime in LCM(n) is n. Consequently, an upper bound for LCM(n) is O(n O( n log n ) ): Thus, we can choose... |

18 |
On Space-bounded Learning and the Vapnik-Chervonenkis Dimension
- Floyd
- 1989
(Show Context)
Citation Context ...ions can be learned time efficiently using only logarithmic (in the number of variables) space. PAC learning while remembering only a fixed number of examples, each of a bounded size is considered in =-=[3, 18, 32]-=-. The most general investigation on this line was the observation in [51] that the boosting algorithm can be made reasonably space efficient as well. Sample complexity gives only a very crude accounti... |

15 | Introduction to Neural and Cognitive Modeling. Lawrence Earlbaum Associates - Levine - 1991 |

12 |
Space efficient learning algorithms
- Haussler
- 1988
(Show Context)
Citation Context ...essing the inference of minimal size programs. See [19] for a survey. There have been a few results concerning space limited learning in the PAC (probably apporoximately correct) model [57]. Haussler =-=[29]-=- shows how to PAC learn strictly ordered decision trees using space linear in the size of the smallest possible decision tree. Boucheron and Sallantin [10] show that some classes of boolean functions ... |

10 |
Combining postulates of naturalness in inductive inference. Elektronische Informationsverarbeitung und Kybernetik
- Jantke, Beick
- 1981
(Show Context)
Citation Context ... The strict inclusion of the class of sets learnable by iteratively working strategies in the similar class for all strategies (EX) was shown to hold when anomalies are allowed [43]. Jantke and Beick =-=[34]-=- considered order independent iterative strategies and showed that Wiehagen's results hold with order restrictions removed. The conclusion reached in the above mentioned work on learning functions was... |

7 | The competitive chunking theory: Models of perception, learning, and memory. Unpublished doctoral dissertation - Servan-Schreiber - 1991 |

7 |
Convergence to nearly minimal size grammars by vacillating learning machines
- Case, Jain, et al.
- 1989
(Show Context)
Citation Context ... For example, based on the observation that people do not have have enough memory to learn an arbitrarily large grammar for a natural language, a study of learning minimal size grammars was initiated =-=[13]-=-. There has been a large body of work addressing the inference of minimal size programs. See [19] for a survey. There have been a few results concerning space limited learning in the PAC (probably app... |

7 | A theory for memory-based learning
- Lin, Vitter, et al.
- 1993
(Show Context)
Citation Context ...he sample complexity metric neglects to count some of the long term storage employed by learning algorithms. Lin and Vitter consider memory requirements for learning sufficiently smooth distributions =-=[38]-=-. Since they assume that the inputs are in some readable form, the issue of how much space it takes to store a number never arises. 6 We now describe the model investigated in this paper. To insure an... |

4 |
Some remarks about space-complexity of learning, and circuit complexity of recognizing
- Boucheron, Sallantin
- 1988
(Show Context)
Citation Context ...apporoximately correct) model [57]. Haussler [29] shows how to PAC learn strictly ordered decision trees using space linear in the size of the smallest possible decision tree. Boucheron and Sallantin =-=[10]-=- show that some classes of boolean functions can be learned time efficiently using only logarithmic (in the number of variables) space. PAC learning while remembering only a fixed number of examples, ... |

4 |
Refinements of inductive inference by Popperian machines
- Case, Ngo-Manguelle
- 1979
(Show Context)
Citation Context ... : c ` EX. The inclusion is proper by Proposition 2 and the fact that U 0 2 EX [9]. X Another type of inference that may be relevant to neural networks is the class PEX defined in [15] and studied in =-=[14]-=-. A set of functions U is in PEX just in case there is an IIM that outputs only programs for total recursive functions and U ` EX(M ). The collection of sets PEX is defined analogously. In this case t... |

3 |
Inductive inference of minimal size programs
- Freivalds
- 1990
(Show Context)
Citation Context ...itrarily large grammar for a natural language, a study of learning minimal size grammars was initiated [13]. There has been a large body of work addressing the inference of minimal size programs. See =-=[19]-=- for a survey. There have been a few results concerning space limited learning in the PAC (probably apporoximately correct) model [57]. Haussler [29] shows how to PAC learn strictly ordered decision t... |

3 |
Probabilistic versus Deterministic Memory Limited Learning. Algorithmic Learning for Knowledge-Based Systems
- Freivalds, Kinber, et al.
- 1995
(Show Context)
Citation Context ...ryland College Park, MD 20912 USA August 1, 1997 This work was facilitated by an international agreement under NSF Grant 9119540. Results collected in this paper were presented at various conferences =-=[20,21,22,23,24]-=-. y Supported by the Latvian Council of Science, grants No. 90.619 and 93.599. z Supported in part by NSF Grants 9020079 and 9301339. 2 Abstract People tend not to have perfect memories when it comes ... |

1 | Memory limited inductive inference machines - Freivalds, Smith |

1 |
Why sometimes probabilistic algorithms can be more effective
- Ablaev, Freivalds
- 1986
(Show Context)
Citation Context ...g. the y i 's are points immediately following a point that immediately follows a block of 0's or 1's. Theorem 11 U 2 EX : ch1i. Proof: The proof proceeds by constructing two probabilistic !-automata =-=[1, 56]-=-. These !-automata will process the string of values representing the range of functions from U . Consequently, they will only have to recognize symbols as being either 0 or 1 or other. The state tran... |

1 | Trial and error: a new approach to space-bounded learning
- Ameur, Fischer, et al.
- 1993
(Show Context)
Citation Context ...ions can be learned time efficiently using only logarithmic (in the number of variables) space. PAC learning while remembering only a fixed number of examples, each of a bounded size is considered in =-=[3, 18, 32]-=-. The most general investigation on this line was the observation in [51] that the boosting algorithm can be made reasonably space efficient as well. Sample complexity gives only a very crude accounti... |

1 |
Learning with a limited memory
- Freivalds, Kinber, et al.
- 1993
(Show Context)
Citation Context ...ryland College Park, MD 20912 USA August 1, 1997 This work was facilitated by an international agreement under NSF Grant 9119540. Results collected in this paper were presented at various conferences =-=[20,21,22,23,24]-=-. y Supported by the Latvian Council of Science, grants No. 90.619 and 93.599. z Supported in part by NSF Grants 9020079 and 9301339. 2 Abstract People tend not to have perfect memories when it comes ... |

1 |
Quantifying the amount of relevant information
- Freivalds, Kinber, et al.
- 1994
(Show Context)
Citation Context ...ryland College Park, MD 20912 USA August 1, 1997 This work was facilitated by an international agreement under NSF Grant 9119540. Results collected in this paper were presented at various conferences =-=[20,21,22,23,24]-=-. y Supported by the Latvian Council of Science, grants No. 90.619 and 93.599. z Supported in part by NSF Grants 9020079 and 9301339. 2 Abstract People tend not to have perfect memories when it comes ... |

1 | Learning nested concept classes with limited storage
- Heath, Kasif, et al.
- 1991
(Show Context)
Citation Context ...ed work on learning functions was that restricting the data available to the inference machine also reduces its learning potential. A different approach to memory limited learning was investigated in =-=[31]-=-. The issue addressed in their work is to calculate how many passes through the data are needed in order to learn. In our model, the decision to retain data must be made when the data is first encount... |

1 |
Inductive inference by iteratively working and consistent strategies with anomalies
- Miyahara
- 1987
(Show Context)
Citation Context ... allowed to remember the n most recent conjectures. As was the case for learning languages, it makes no difference how many previous hypotheses are remembered for learning functions --- one is enough =-=[42]-=-. Furthermore, the hierarchies based on the number of admissible errors in the final answer as discovered in [15], were shown to hold for a variety of types of iterative learning. The strict inclusion... |

1 |
A note on iteratively working strategies in inductive inference
- Miyahara
- 1989
(Show Context)
Citation Context ... of iterative learning. The strict inclusion of the class of sets learnable by iteratively working strategies in the similar class for all strategies (EX) was shown to hold when anomalies are allowed =-=[43]-=-. Jantke and Beick [34] considered order independent iterative strategies and showed that Wiehagen's results hold with order restrictions removed. The conclusion reached in the above mentioned work on... |