Operations for Learning with Graphical Models
 Journal of Artificial Intelligence Research
, 1994
This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Wellknown examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models are extended to model data analysis and empirical learning using the notation of plates. Graphical operations for simplifying and manipulating a problem are provided including decomposition, differentiation, and the manipulation of probability models from the exponential family. Two standard algorithm schemas for learning are reviewed in a graphical framework: Gibbs sampling and the expectation maximization algorithm. Using these operations and schemas, some popular algorithms can be synthesized from their graphical specification. This includes versions of linear regression, techniques for feedforward networks, and learning Gaussian and discrete Bayesian networks from data. The paper conclu...
A neuropsychological theory of multiple systems in category learning
 PSYCHOLOGICAL REVIEW
, 1998
A neuropsychological theory is proposed that assumes category learning is a competition between separate verbal and implicit (i.e., procedurallearningbased) categorization systems. The theory assumes that the caudate nucleus is an important component of the implicit system and that the anterior cingulate and prefrontal cortices are critical to the verbal system. In addition to making predictions for normal human adults, the theory makes specific predictions for children, elderly people, and patients suffering from Parkinson's disease, Huntington's disease, major depression, amnesia, or lesions of the prefrontal cortex. Two separate formal descriptions of the theory are also provided. One describes trialbytrial learning, and the other describes global dynamics. The theory is tested on published neuropsychological data and on category learning data with normal adults.
Deictic Codes for the Embodiment of Cognition
 Behavioral and Brain Sciences
, 1995
To describe phenomena that occur at different time scales, computational models of the brain must necessarily incorporate different levels of abstraction. We argue that at time scales of approximately onethird of a second, orienting movements of the body play a crucial role in cognition and form a useful computational level, termed the embodiment level . At this level, the constraints of the body determine the nature of cognitive operations, since the natural sequentiality of body movements can be matched to the natural computational economies of sequential decision systems. The way this is done is through a system of implicit reference termed deictic, whereby pointing movements are used to bind objects in the world to cognitive programs. We show how deictic bindings enable the solution of natural tasks and argue that one of the central features of cognition, working memory, can be related to momentbymoment dispositions of body features such as eye movements and hand movements. Keyw...
Computational Models of Sensorimotor Integration
 SCIENCE
, 1997
The sensorimotor integration system can be viewed as an observer attempting to estimate its own state and the state of the environment by integrating multiple sources of information. We describe a computational framework capturing this notion, and some specific models of integration and adaptation that result from it. Psychophysical results from two sensorimotor systems, subserving the integration and adaptation of visuoauditory maps, and estimation of the state of the hand during arm movements, are presented and analyzed within this framework. These results suggest that: (1) Spatial information from visual and auditory systems is integrated so as to reduce the variance in localization. (2) The effects of a remapping in the relation between visual and auditory space can be predicted from a simple learning rule. (3) The temporal propagation of errors in estimating the hand's state is captured by a linear dynamic observer, providing evidence for the existence of an intern...
Removing the Genetics from the Standard Genetic Algorithm
, 1995
We present an abstraction of the genetic algorithm (GA), termed populationbased incremental learning (PBIL), that explicitly maintains the statistics contained in a GA's population, but which abstracts away the crossover operator and redefines the role of the population. This results in PBIL being simpler, both computationally and theoretically, than the GA. Empirical results reported elsewhere show that PBIL is faster and more effective than the GA on a large set of commonly used benchmark problems. Here we present results on a problem custom designed to benefit both from the GA's crossover operator and from its use of a population. The results show that PBIL performs as well as, or better than, GAs carefully tuned to do well on this problem. This suggests that even on problems custom designed for GAs, much of the power of the GA may derive from the statistics maintained implicitly in its population, and not from the population itself nor from the crossover operator. Removing the Ge...
The Sample Complexity of Pattern Classification With Neural Networks: The Size of the Weights is More Important Than the Size of the Network
, 1997
Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number of training examples should grow at least linearly with the number of adjustable parameters in the network. Results in this paper show that if a large neural network is used for a pattern classification problem and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights. For example, consider a twolayer feedforward network of sigmoid units, in which the sum of the magnitudes of the weights associated with each unit is bounded by A and the input dimension is n. We show that the misclassification probability is no more than a certain error estimate (that is related to squared error on the training set) plus A³ p (log n)=m (ignori...
A Guide to the Literature on Learning Probabilistic Networks From Data
, 1996
This literature review discusses different methods under the general rubric of learning Bayesian networks from data, and includes some overlapping work on more general probabilistic networks. Connections are drawn between the statistical, neural network, and uncertainty communities, and between the different methodological communities, such as Bayesian, description length, and classical statistics. Basic concepts for learning and Bayesian networks are introduced and methods are then reviewed. Methods are discussed for learning parameters of a probabilistic network, for learning the structure, and for learning hidden variables. The presentation avoids formal definitions and theorems, as these are plentiful in the literature, and instead illustrates key concepts with simplified examples. Keywords Bayesian networks, graphical models, hidden variables, learning, learning structure, probabilistic networks, knowledge discovery. I. Introduction Probabilistic networks or probabilistic gra...
Simulated annealing: Practice versus theory
 Mathl. Comput. Modelling
, 1993
this paper "ergodic" is used in a very weak sense, as it is not proposed, theoretically or practically, that all states of the system are actually to be visited
Data Mining using MLC++: A Machine Learning Library in C++
 INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS
, 1997
Data mining algorithmsincluding machine learning, statistical analysis, and pattern recognition techniques can greatly improve our understanding of data warehouses that are now becoming more widespread. In this paper, we focus on classification algorithms and review the need for multiple classification algorithms. We describe a system called MLC++ , which was designed to help choose the appropriate classification algorithm for a given dataset by making it easy to compare the utility of different algorithms on a specific dataset of interest. MLC ++ not only provides a workbench for such comparisons, but also provides a library of C ++ classes to aid in the development of new algorithms, especially hybrid algorithms and multistrategy algorithms. Such algorithms are generally hard to code from scratch. We discuss design issues, interfaces to other programs, and visualization of the resulting classifiers. 1 Introduction Data warehouses containing massive amounts of data have been b...
A Review of Evolutionary Artificial Neural Networks
, 1993
Research on potential interactions between connectionist learning systems, i.e., artificial neural networks (ANNs), and evolutionary search procedures, like genetic algorithms (GAs), has attracted a lot of attention recently. Evolutionary ANNs (EANNs) can be considered as the combination of ANNs and evolutionary search procedures. This paper first distinguishes among three kinds of evolution in EANNs, i.e., the evolution of connection weights, of architectures and of learning rules. Then it reviews each kind of evolution in detail and analyses critical issues related to different evolutions. The review shows that although a lot of work has been done on the evolution of connection weights and of architectures, few attempts have been made to understand the evolution of learning rules. Interactions among different evolutions are seldom mentioned in current research. However, the evolution of learning rules and its interactions with other kinds of evolution play a vital role in EANNs. As t...