## Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm (1988)

### Cached

### Download Links

- [www.cs.utsa.edu]
- [ai.stanford.edu]
- [www.iua.upf.es]
- [www.cse.ucsc.edu]
- [ai.stanford.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | Machine Learning |

Citations: | 672 - 5 self |

### BibTeX

@INPROCEEDINGS{Littlestone88learningquickly,

author = {Nick Littlestone},

title = {Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm},

booktitle = {Machine Learning},

year = {1988},

pages = {285--318}

}

### Years of Citing Articles

### OpenURL

### Abstract

learning Boolean functions, linear-threshold algorithms Abstract. Valiant (1984) and others have studied the problem of learning various classes of Boolean functions from examples. Here we discuss incremental learning of these functions. We consider a setting in which the learner responds to each example according to a current hypothesis. Then the learner updates the hypothesis, if necessary, based on the correct classification of the example. One natural measure of the quality of learning in this setting is the number of mistakes the learner makes. For suitable classes of functions, learning algorithms are available that make a bounded number of mistakes, with the bound independent of the number of examples seen by the learner. We present one such algorithm that learns disjunctive Boolean functions, along with variants for learning other classes of Boolean functions. The basic method can be expressed as a linear-threshold algorithm. A primary advantage of this algorithm is that the number of mistakes grows only logarithmically with the number of irrelevant attributes in the examples. At the same time, the algorithm is computationally efficient in both time and space. 1.

### Citations

3921 |
Pattern Classification and Scene Analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...ber of mistakes grows linearly with the number of irrelevant attributes. This is in keeping with theoretical bounds from the perceptron convergence theorems288 N. LITTLESTONE (Hampson & Volper, 1986; =-=Duda & Hart, 1973-=-; Nilsson, 1965). We know of no evidence that any other standard perceptron algorithm does better. In contrast, we will prove that the number of mistakes that our algorithm makes grows only logarithmi... |

1695 | A Theory of the Learnable
- Valiant
- 1984
(Show Context)
Citation Context ... not knowing which few will prove useful. For another example, consider an environment in which the learners286 N. LITTLESTONE builds new concepts as Boolean functions of old concepts (Banerji, 1985; =-=Valiant, 1984-=-). Here the learner may need to sift through a large library of available concepts to find the suitable ones to use in expressing each new concept. In a special case of this situation, one may design ... |

803 |
Estimation of Dependencies Based on Empirical Data
- Vapnik
- 1979
(Show Context)
Citation Context ...ive a lower bound for opt(C) in terms of the Vapnik-Chervonenkis (Vapnik &; Chervonenkis, 1971) dimension of C, which is a combinatorial parameter that has proven useful in other studies of learning (=-=Vapnik, 1982-=-; Blumer et al., 1987a; Haussler, Littlestone, & Warmuth, 1987).2 To define the Vapnik-Chervonenkis dimension, we use the notion of a shattered set. Definition 3 A set S C X is shattered by a target c... |

647 | Queries and concept learning - Angluin - 1988 |

624 |
Learnability and the vapnik-chervonenkis dimension
- Blumer, Ehrenfeucht, et al.
- 1989
(Show Context)
Citation Context ...und for opt(C) in terms of the Vapnik-Chervonenkis (Vapnik &; Chervonenkis, 1971) dimension of C, which is a combinatorial parameter that has proven useful in other studies of learning (Vapnik, 1982; =-=Blumer et al., 1987-=-a; Haussler, Littlestone, & Warmuth, 1987).2 To define the Vapnik-Chervonenkis dimension, we use the notion of a shattered set. Definition 3 A set S C X is shattered by a target class C if for every U... |

552 | Generalization as Search - Mitchell - 1982 |

471 |
Parallel distributed processing: explorations in the microstructure of cognition
- Rumelhart, McClelland
- 1986
(Show Context)
Citation Context ...g an algorithm for learning k-DNF. Our main result is an algorithm that deals efficiently with large numbers of irrelevant attributes. If desired, it can be implemented within a neural net framework (=-=Rumelhart & McClelland, 1986-=-) as a simple linearthreshold algorithm. The method learns certain classes of functions that can be computed by a one-layer linear-threshold network; these include, among other functions, disjunctions... |

306 | Inductive inference: theory and methods - Angluin, Smith - 1983 |

168 | On the learnability of Boolean formulae
- Kearns, Li, et al.
- 1987
(Show Context)
Citation Context ...es of functions that can be computed by a one-layer linear-threshold network; these include, among other functions, disjunctions, conjunctions, and r-of-k threshold functions (Hampson & Volper, 1986; =-=Kearns, Li, Pitt, & Valiant, 1987-=-a). (The latter functions are true if at least r out of k designated variables are true.) Preprocessing techniques can be used to extend the algorithm to classes of Boolean functions that are not line... |

164 |
Threshold Logic and Its Applications
- MUROGA
- 1971
(Show Context)
Citation Context ...e r-of-k threshold functions are contained in F({0,1}n, £). There exist other classes of linearly-separable Boolean functions for which 1 grows exponentially with n when the instance space is {0,1}" (=-=Muroga, 1971-=-; Hampson &; Volper, 1986). One example of a set of functions with exponentially small 6 consists of as n varies. For such functions, the mistake bound that we will derive grows exponentially with n. ... |

150 | Learning Machines
- Nilsson
- 1965
(Show Context)
Citation Context ...ws linearly with the number of irrelevant attributes. This is in keeping with theoretical bounds from the perceptron convergence theorems288 N. LITTLESTONE (Hampson & Volper, 1986; Duda & Hart, 1973; =-=Nilsson, 1965-=-). We know of no evidence that any other standard perceptron algorithm does better. In contrast, we will prove that the number of mistakes that our algorithm makes grows only logarithmically with the ... |

118 |
Learning disjunction of conjunctions
- Valiant
- 1985
(Show Context)
Citation Context ...ibrary will just be Boolean functions themselves. For example, consider k-DNF, the class of Boolean functions that can be represented in disjunctive normal form with no more than k literals per term (=-=Valiant, 1985-=-). If one has available intermediate concepts that include all conjunctions of no more than k literals, then any k-DNF function can be represented as a simple disjunction of these concepts. We will re... |

52 |
Predicting {0, 1} functions on randomly drawn points
- Haussler, Littlestone, et al.
(Show Context)
Citation Context ...s of the Vapnik-Chervonenkis (Vapnik &; Chervonenkis, 1971) dimension of C, which is a combinatorial parameter that has proven useful in other studies of learning (Vapnik, 1982; Blumer et al., 1987a; =-=Haussler, Littlestone, & Warmuth, 1987-=-).2 To define the Vapnik-Chervonenkis dimension, we use the notion of a shattered set. Definition 3 A set S C X is shattered by a target class C if for every U C S there exists a function / e C such t... |

47 | On the prediction of general recursive functions - Barzdin, Freivald - 1972 |

34 |
Linear function neurons: Structure and training
- Hampson, Volper
- 1986
(Show Context)
Citation Context ...hod learns certain classes of functions that can be computed by a one-layer linear-threshold network; these include, among other functions, disjunctions, conjunctions, and r-of-k threshold functions (=-=Hampson & Volper, 1986-=-; Kearns, Li, Pitt, & Valiant, 1987a). (The latter functions are true if at least r out of k designated variables are true.) Preprocessing techniques can be used to extend the algorithm to classes of ... |

32 |
Recent results on boolean concept learning
- KEARNS, LI, et al.
- 1987
(Show Context)
Citation Context ...es of functions that can be computed by a one-layer linear-threshold network; these include, among other functions, disjunctions, conjunctions, and r-of-k threshold functions (Hampson & Volper, 1986; =-=Kearns, Li, Pitt, & Valiant, 1987-=-a). (The latter functions are true if at least r out of k designated variables are true.) Preprocessing techniques can be used to extend the algorithm to classes of Boolean functions that are not line... |

9 | Quantifying the inductive bias in concept learning - Haussler - 1986 |

2 | Space Efficient Learning Algorithms. Unpublished manuscript - Haussler - 1986 |

1 |
The logic of learning: A basis for pattern recognition and for improvement of performance
- Banerji
- 1985
(Show Context)
Citation Context ... consideration, not knowing which few will prove useful. For another example, consider an environment in which the learners286 N. LITTLESTONE builds new concepts as Boolean functions of old concepts (=-=Banerji, 1985-=-; Valiant, 1984). Here the learner may need to sift through a large library of available concepts to find the suitable ones to use in expressing each new concept. In a special case of this situation, ... |

1 | The programmer's guide to the Connection Machine (Technical Report - Slade - 1987 |