## A neuroidal architecture for cognitive computation (2000)

### Cached

### Download Links

- [www.cs.iastate.edu]
- [ftp.deas.harvard.edu]
- [ftp.das.harvard.edu]
- [l2r.cs.uiuc.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | Journal of the ACM |

Citations: | 35 - 4 self |

### BibTeX

@ARTICLE{Valiant00aneuroidal,

author = {Leslie G. Valiant},

title = {A neuroidal architecture for cognitive computation},

journal = {Journal of the ACM},

year = {2000},

volume = {47},

pages = {642--669}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract. An architecture is described for designing systems that acquire and manipulate large amounts of unsystematized, or so-called commonsense, knowledge. Its aim is to exploit to the full those aspects of computational learning that are known to offer powerful solutions in the acquisition and maintenance of robust knowledge bases. The architecture makes explicit the requirements on the basic computational tasks that are to be performed and is designed to make these computationally tractable even for very large databases. The main claims are that (i) the basic learning and deduction tasks are provably tractable and (ii) tractable learning offers viable approaches to a range of issues that have been previously identified as problematic for artificial intelligence systems that are programmed. Among the issues that learning offers to resolve are robustness to inconsistencies, robustness to incomplete information and resolving among alternatives. Attribute-efficient learning algorithms, which allow learning from few examples in large dimensional systems, are fundamental to the approach. Underpinning the overall architecture is a new principled approach to manipulating relations in learning systems. This approach, of independently quantified arguments, allows propositional learning algorithms to be applied systematically to learning relational concepts in polynomial time and in a modular fashion.

### Citations

7493 | Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference - Pearl - 1988 |

2072 | The magical number seven plus or minus two: Some limits on our capacity for processing information - Miller - 1956 |

1754 | A theory of the learnable
- Valiant
- 1984
(Show Context)
Citation Context ...e semantics we shall describe here, PAC circuit semantics, or PAC semantics for short, is based on the notion of computationally feasible learning of functions that are probably approximately correct =-=[50]. To expla-=-in the contrast in viewpoints consider the situation calculus described in McCarthy and Hayes [30]. There, a situation is "the complete state of the world", and general facts are relations a... |

1616 |
Unified theories of cognition
- Newell
- 1990
(Show Context)
Citation Context ...one gives inductive learning a suitable central role. In this respect the proposed architecture differs from other general approaches to cognitive architectures that have been described, such as [1], =-=[37]-=-, and [38], in which inductive learning plays a much smaller role. We note that our use of PAC semantics suggests a modified Turing test. His basic criterion for whether a machine could think was that... |

1586 | Some philosophical problems from the standpoint of artificial intelligence
- McCarthy, Hayes
- 1969
(Show Context)
Citation Context ...n of computationally feasible learning of functions that are probably approximately correct [50]. To explain the contrast in viewpoints consider the situation calculus described in McCarthy and Hayes =-=[30]. There, a-=- situation is "the complete state of the world", and general facts are relations among situations. Thus P ) Q means that for all situations for which P holds Q holds also. This is an all emb... |

1204 |
Human problem solving
- Newell, Simon
- 1972
(Show Context)
Citation Context ...resentations are insufficient. An area in which complex representations may need to be learned is that of strategies and plans. Here production systems are widely believed to have useful expressivity =-=[38]-=-. Khardon has shown that a rich class of these can be learned using decision list algorithms [13]. The dilemma here is that no attribute-efficient learning algorithm is known for decision lists, unles... |

1117 | The architecture of cognition
- Anderson
- 1983
(Show Context)
Citation Context ...d if one gives inductive learning a suitable central role. In this respect the proposed architecture differs from other general approaches to cognitive architectures that have been described, such as =-=[1]-=-, [37], and [38], in which inductive learning plays a much smaller role. We note that our use of PAC semantics suggests a modified Turing test. His basic criterion for whether a machine could think wa... |

1105 | Principles of Database and Knowledgebase Systems, volume 2 - Ullman - 1989 |

891 |
A framework for representing knowledge
- Minsky
- 1975
(Show Context)
Citation Context ...ful commonsense knowledge often need qualification -- they hold only in certain contexts. Thus a rule Q ) R may hold if context P is true, but not necessarily otherwise. Frames in the sense of Minsky =-=[33]-=- can be viewed as contexts that have a rich set of associated rules. The PAC semantics of such a rule is that on the subdomain in which P holds, Q ) R is the case, at least with high probability. In o... |

865 |
Circumscription—a form of non-monotonic reasoning
- McCarthy
- 1980
(Show Context)
Citation Context ... to take a generic view of how to treat incomplete information -- they need a uniform theory, such as circumscription or the closed world assumption that takes positions on how to resolve the unknown =-=[10, 29, 31]-=-. PAC circuit semantics offers an advantage here -- it resolves the unknown separately in each case by using information learned from past experience of cases where similar features to the case in han... |

830 |
Computing Machinery and Intelligence
- Turing
- 1950
(Show Context)
Citation Context ...test. His basic criterion for whether a machine could think was that the performance of the machine should be indistinguishable from that of a human to an interrogator communicating via a teleprinter =-=[47]-=-. The significance of this informally stated criterion is that it is a purely behavioral one. What PAC semantics offers is a precise way of formulating such behavioral criteria. In particular it insis... |

687 | Learning quickly when irrelevant attributes abound: A new linear threshold algorithm
- Littlestone
- 1988
(Show Context)
Citation Context ...or how the circuits can change in the process of knowledge acquisition. For the gates, linear threshold units in our case, some update rules are given, such as the perceptron algorithm [42] or Winnow =-=[26]-=-, that specify the supervised inductive learning process. In addition some further rules are given to allow the acquisition of programmed knowledge. In particular new output gates may be added and the... |

616 | An Introduction to Computational Learning Theory - Kearns, Vazirani - 1994 |

392 | Learning Decision Lists
- Rivest
- 1987
(Show Context)
Citation Context ...last case we would have n X i=1 z isk where z i is a variable over the reals that is given value 1 if x i = 1, and zero otherwise. A further class that can be so expressed is that of 1-decision lists =-=[41]. These -=-test for a sequence of literals y i 1 ; \Delta \Delta \Delta ; y i m where y i j 2 fx 1 ; \Delta \Delta \Delta ; xn ; �� x 1 ; \Delta \Delta \Delta ; �� xn g: If the literal y i j is true then... |

349 | Programs with Common Sense - McCarthy - 1968 |

323 | How to use expert advice - Cesa-Bianchi, Freund, et al. - 1997 |

315 |
Principles of Neurodynamics
- Rosenblatt
- 1962
(Show Context)
Citation Context ...her specified for how the circuits can change in the process of knowledge acquisition. For the gates, linear threshold units in our case, some update rules are given, such as the perceptron algorithm =-=[42]-=- or Winnow [26], that specify the supervised inductive learning process. In addition some further rules are given to allow the acquisition of programmed knowledge. In particular new output gates may b... |

314 | Quantitative results concerning the utility of explanation-based learning - Minton - 1990 |

301 | Efficient noise-tolerant learning from statistical queries
- Kearns
- 1999
(Show Context)
Citation Context ...classification of the examples and this is random [3]. At least for the more benign models there are some powerful general techniques for making learning algorithms cope with noise in some generality =-=[16]-=-. For the problem of learning linear separators there exist theoretical results that show that there is no fundamental computational impediment to overcoming random classification noise [5, 8]. Curren... |

280 | An analysis of first-order logics of probability - Halpern - 1990 |

238 |
Non-monotonic logic
- McDermott, Doyle
- 1980
(Show Context)
Citation Context ... to take a generic view of how to treat incomplete information -- they need a uniform theory, such as circumscription or the closed world assumption that takes positions on how to resolve the unknown =-=[10, 29, 31]-=-. PAC circuit semantics offers an advantage here -- it resolves the unknown separately in each case by using information learned from past experience of cases where similar features to the case in han... |

230 |
Quantifying Inductive Bias: AI Learning Algorithms and Valiant’s Learning Framework
- Haussler
- 1989
(Show Context)
Citation Context ...l situation from dimensionality arguments, but much more slowly, sometimes logarithmically. The phenomenon of efficient attribute efficient learning in the PAC sense was first pointed out by Haussler =-=[13]-=-. A striking and remarkable embodiment of this idea followed in the form of Littlestone's Winnow algorithm [26] for learning linear threshold functions. The algorithm is similar in form to the classic... |

228 |
Learning from Noisy Examples
- ANGLUIN, LAIRD
- 1988
(Show Context)
Citation Context ...se. In the case of inductive learning the issue of noise has been studied extensively. On the theoretical side a range of noise models have been considered, ranging from a malicious adversarial model =-=[3]-=- to the more benign random classification noise model, where the only noise is in the classification of the examples and this is random [3]. At least for the more benign models there are some powerful... |

196 | A general lower bound on the number of examples needed for learning - Ehrenfeucht, Hanssler, et al. - 1989 |

192 | Computational limitations on learning from examples
- Pitt, Valiant
- 1988
(Show Context)
Citation Context ...hreshold functions makes the original class polynomial time learnable, while restricting the learner to the minimal representation needed for expressing these functions would make the task NP-compete =-=[40]-=-. Our richer representation, therefore, has not only the obvious advantage of being able to express more, but has the additional computational benefits of making Boolean domains potentially easier to ... |

169 | Learning in the presence of malicious errors - KEARNS, LI - 1993 |

169 | Learning to resolve natural language ambiguities: a unified approach - Roth - 1998 |

157 |
CYC: Toward Programs with Common Sense
- Lenat, Guha, et al.
- 1990
(Show Context)
Citation Context ...owledge base starting from a blank slate. Facts and reasoning about the most universal concepts such as time and space are then formalized in the same framework as is more specialized knowledge (e.g. =-=[25]-=-). In the neuroidal framework there is room reserved for treating the universal concepts differently from the others. In particular, the features and inverse features of the image units can be used to... |

130 | A SNoW-based face detector - MH, Roth, et al. - 2000 |

123 |
The Art of Computer Programming, volume 1
- Knuth
- 1968
(Show Context)
Citation Context ... the gate evaluate to one. To do this we shall, for simplicity, impose here the constraint that aggregates of circuit units are acyclic. We can then form a topological sort of their nodes (e.g. Knuth =-=[23]-=- ) and for one such topologically sorted order evaluate each node in succession. The evaluation of each node is for all the\Omega d bindings `, where d ! ff is the arity of the relation R at that node... |

117 |
Learning Disjunctions of Conjunctions
- VALIANT
- 1985
(Show Context)
Citation Context ...ons of all pairs of them. Another choice is to create conjunctions of just those pairs that occur with high correlation. More generally one can generate some set of polynomialy generable combinations =-=[51]-=-. The intention is that large numbers of variables, even if most are irrelevant, will not degrade performance in the presence of attribute efficient learning algorithms. We are suggesting that the way... |

92 | A Winnow-based approach to context-sensitive spelling correction - Golding, Roth - 1999 |

90 |
From on-line to batch learning
- Littlestone
- 1989
(Show Context)
Citation Context ...nction over fx 1 ; \Delta \Delta \Delta ; xn g the number of examples needed for convergence, whether in the PAC or mistake-bounded sense, is upper bounded by ck log 2 n, where c is a small constant, =-=[26, 27]-=-. Thus the sample complexity is linear in k, the number of relevant variables, and logarithmic in the number of irrelevant ones. Littlestone's Theorem 9 [26] adapted to the case when coefficients can ... |

86 |
Circuits of the Mind
- Valiant
- 1994
(Show Context)
Citation Context ...be a candidate for such an architecture. This candidate emerged from a study that attempted to look at the issues of functional adequacy, computational feasibility and biological constraints together =-=[52]. We call -=-the architecturesneuroidal since it respects the most basic constraints imposed by that model of neural computation. One feature of that study was that it was a "whole systems" study. It add... |

78 | Applying winnow to context-sensitive spelling correction
- Golding, Roth
- 1996
(Show Context)
Citation Context ...tem. The effectiveness of Winnow itself has been demonstrated in a variety of experiments. A striking example in the cognitive domain is offered in the work of Golding and Roth on spelling correction =-=[11]-=-. Even in the presence of tens of thousands of variables, Winnow is able to learn accurately from few examples, sometimes fewer than 100. The question arises whether attribute efficient learning is po... |

78 | Learning conjunctive concepts in structural domains - Haussler - 1989 |

75 |
Learning boolean functions in an infinite attribute space
- Blum
- 1992
(Show Context)
Citation Context ...set (x 1 ; \Delta \Delta \Delta ; xn ; 1 \Gamma x 1 ; \Delta \Delta \Delta ; 1 \Gamma xn ) makes at most the following number of mistakes: 16 ffi 2 n ` + ` 5 ffi + 14ln` ffi 2 ' n X i=1 ( i + ��si=-= ): (4)-=- Here ` and ffi are parameters of the algorithm and ffi, which quantifies the margin by which positive and negative examples are separated, is a parameter of the distribution of examples. For a monoto... |

61 | The problem of expensive chunks and its solution by restricting expressiveness
- Tambe, Newell, et al.
- 1990
(Show Context)
Citation Context ...ds to be addressed in a different way here because of the circuit orientation. An overriding concern for us throughout is, of course, that the complexity of manipulating relations be controlled (c.f. =-=[46]-=-). Suppose we have a connection. R 1 (x 1 ; \Delta \Delta \Delta ; xn ) ! R 2 (y 1 ; \Delta \Delta \Delta ; y k ) and an associated connection binding that specifies x 1 = y 2 and x 2 = y 3 . What doe... |

57 | Learning to reason - Khardon, Roth - 1997 |

57 | The perceptron algorithm vs. winnow: Linear vs. logarithmic mistake bounds when few input variables are relevant - Kivinen, Warmuth, et al. - 1997 |

53 | Learning to take actions
- Khardon
- 1999
(Show Context)
Citation Context ...hat is attribute efficient in a weaker sense. Consider a sequential covering algorithm, as in Rivest [41], for learning decision lists, or Khardon's extension to propositional production rule systems =-=[19]. In-=- the simplest case such a covering algorithm works as follows: it looks successively for a literal, say �� x 3 , that is the most predictive single literal, in some sense, of the function f being ... |

47 | Computing machinery and intelligence, Mind 59 - Turing - 1950 |

43 |
Readings in Nonmonotonic Reasoning
- Ginsberg
- 1987
(Show Context)
Citation Context ... to take a generic view of how to treat incomplete information -- they need a uniform theory, such as circumscription or the closed world assumption that takes positions on how to resolve the unknown =-=[10, 29, 31]-=-. PAC circuit semantics offers an advantage here -- it resolves the unknown separately in each case by using information learned from past experience of cases where similar features to the case in han... |

37 | Production system models of learning and development - Klahr, Langley, et al. - 1987 |

34 |
Learning Noisy Perceptrons by a Perceptron in Polynomial Time
- COHEN
- 1997
(Show Context)
Citation Context ...enerality [16]. For the problem of learning linear separators there exist theoretical results that show that there is no fundamental computational impediment to overcoming random classification noise =-=[5, 8]-=-. Currently somewhat complex algorithms are needed to establish this rigorously. In practice, fortunately, natural algorithms such as the perceptron algorithm and Winnow, or the linear discriminant al... |

33 | Learning Linear Threshold Functions in the Presence of Classification Noise - BYLANDER - 1994 |

31 | Inductive Logic Programming: derivations, successes and shortcomings - Muggleton - 1994 |

31 | Robust logics - Valiant - 2000 |

30 | Artificial intelligence, logic, and formalizing common sense - McCarthy - 1989 |

29 | Learning to reason with a restricted view - Khardon, Roth - 1995 |