Results 11  20
of
75
the Logistic Function? A tutorial discussion on probabilities and neural networks
 Massachusetts Institute of Technology
, 1995
"... This paper presents a tutorial introduction to the logistic function as a statistical object. Beyond the discussion of the whys and wherefores of the logistic function, I also hope to illuminate the general distinction between the \generative/causal/classconditional " and the \discriminative/diagno ..."
Abstract

Cited by 55 (0 self)
 Add to MetaCart
This paper presents a tutorial introduction to the logistic function as a statistical object. Beyond the discussion of the whys and wherefores of the logistic function, I also hope to illuminate the general distinction between the \generative/causal/classconditional " and the \discriminative/diagnostic/ predictive " directions for the modeling of data. Crudely put, the belief network community has tended to focus on the former while the neural network community has tended to focus on the latter (although there are numerous papers in both communities going against their respective grains). It is the author's view that these two directions are two sides of the same coin, a corollary of which is that the two networkbased communities are in closer contact than one might otherwise think. To illustrate some of the issues involved, I discuss the simplest nonlinear neural networka logistic function of a linear combination of the input variables (also known in statistics as a logistic regression). The logistic function has had a lengthy history in classical statistics and in neural networks. In statistics it plays a leading role in the methodology of logistic regression, where it makes an important contribution to the literature on classi cation. The logistic function has also appeared in many guises in neural network research. In early work, in which continuous time formalisms tended to dominate, it was justi ed via its being the solution to a particular di erential equation. In later work, with the emphasis on discrete time, it was generally used more heuristically as one of the many possible smooth, monotonic \squashing " functions that map real values into a bounded interval. More recently, however, with the increasing focus on learning, the probabilistic properties of the logistic function have begunto 1 ω ω
Doing without schema hierarchies: A recurrent connectionist approach to normal and impaired routine sequential action
 Psychological Review
, 2004
"... In everyday tasks, selecting actions in the proper sequence requires a continuously updated representation of temporal context. Many existing models address this problem by positing a hierarchy of processing units, mirroring the roughly hierarchical structure of naturalistic tasks themselves. Such a ..."
Abstract

Cited by 54 (11 self)
 Add to MetaCart
In everyday tasks, selecting actions in the proper sequence requires a continuously updated representation of temporal context. Many existing models address this problem by positing a hierarchy of processing units, mirroring the roughly hierarchical structure of naturalistic tasks themselves. Such an approach has led to a number of difficulties, including a reliance on overly rigid sequencing mechanisms, an inability to account for context sensitivity in behavior, and a failure to address learning. We consider here an alternative framework, according to which the representation of temporal context is facilitated by recurrent connections within a network mapping from environmental inputs to actions. Applying this approach to a specific, and in many ways prototypical, everyday task (coffeemaking), we examine its ability to account for several central characteristics of normal and impaired human performance. The model we consider learns to deal flexibly with a complex set of sequencing constraints, encoding contextual information at multiple timescales within a single, distributed internal representation. Mildly degrading this context representation leads
Individual and Developmental Differences in Semantic Priming: Empirical and Computational Support for a SingleMechanism Account of Lexical Processing
, 2000
"... the properties of distributed network models, and support this account by demonstrating that an implemented simulation closely approximates the empirical findings despite the absence of expectancybased processes and postlexical semantic matching. The results suggest that distributed network mod ..."
Abstract

Cited by 47 (10 self)
 Add to MetaCart
the properties of distributed network models, and support this account by demonstrating that an implemented simulation closely approximates the empirical findings despite the absence of expectancybased processes and postlexical semantic matching. The results suggest that distributed network models can provide a viable singlemechanism account of lexical processing. Introduction It is wellestablished that people are faster and more accurate to read a word (e.g., BUTTER) when it is preceded by a related word (e.g., BREAD) compared with when it is preceded by an unrelated word (e.g., DOCTOR; The research was supported by an NIMH FIRST award (MH55628) to the first author and by NIMH Training Grant 5T32MH19102 and NICHD Grant 80258. The computational simulation was run using customized software written within the Xerion simulator (version 3.1) developed by Drew van Camp, Tony Plate, and Geoff Hinton at the Univers
Neural networks for classification: a survey
 and Cybernetics  Part C: Applications and Reviews
, 2000
"... Abstract—Classification is one of the most active research and application areas of neural networks. The literature is vast and growing. This paper summarizes the some of the most important developments in neural network classification research. Specifically, the issues of posterior probability esti ..."
Abstract

Cited by 45 (0 self)
 Add to MetaCart
Abstract—Classification is one of the most active research and application areas of neural networks. The literature is vast and growing. This paper summarizes the some of the most important developments in neural network classification research. Specifically, the issues of posterior probability estimation, the link between neural and conventional classifiers, learning and generalization tradeoff in classification, the feature variable selection, as well as the effect of misclassification costs are examined. Our purpose is to provide a synthesis of the published research in this area and stimulate further research interests and efforts in the identified topics. Index Terms—Bayesian classifier, classification, ensemble methods, feature variable selection, learning and generalization, misclassification costs, neural networks. I.
Predicting protein disorder for N, C, and internal regions
 Genome Informatics
, 1999
"... Logistic regression (LR), discriminant analysis (DA), and neural networks (NN) were used to predict ordered and disordered regions in proteins. Training data were from a set of nonredundant Xray crystal structures, with the data being partitioned into Nterminal, Cterminal and internal (I) region ..."
Abstract

Cited by 33 (6 self)
 Add to MetaCart
Logistic regression (LR), discriminant analysis (DA), and neural networks (NN) were used to predict ordered and disordered regions in proteins. Training data were from a set of nonredundant Xray crystal structures, with the data being partitioned into Nterminal, Cterminal and internal (I) regions. The DA and LR methods gave almost identical 5cross validation accuracies that averaged to the following values: 75.9 ± 3.1 % (Nregions), 70.7 ± 1.5 % (Iregions), and 74.6 ± 4.4 % (Cregions). NN predictions gave slightly higher scores: 78.8 ± 1.2 % (Nregions), 72.5 ± 1.2 % (Iregions), and 75.3 ± 3.3 % (Cregions). Predictions improved with length of the disordered regions. Averaged over the three methods, values ranged from 52 % to 78 % for length = 914 to ≥ 21, respectively, for Iregions, from 72 % to 81 % for length = 5 to 1215, respectively, for Nregions, and from 70 % to 80 % for length = 5 to 1215, respectively, for Cregions. These data support the hypothesis that disorder is encoded by the amino acid sequence. 1
An Attractor Model of Lexical Conceptual Processing: Simulating Semantic Priming
 COGNITIVE SCIENCE
, 1999
"... ..."
Computing Second Derivatives in FeedForward Networks: a Review
 IEEE Transactions on Neural Networks
, 1994
"... . The calculation of second derivatives is required by recent training and analyses techniques of connectionist networks, such as the elimination of superfluous weights, and the estimation of confidence intervals both for weights and network outputs. We here review and develop exact and approximate ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
. The calculation of second derivatives is required by recent training and analyses techniques of connectionist networks, such as the elimination of superfluous weights, and the estimation of confidence intervals both for weights and network outputs. We here review and develop exact and approximate algorithms for calculating second derivatives. For networks with jwj weights, simply writing the full matrix of second derivatives requires O(jwj 2 ) operations. For networks of radial basis units or sigmoid units, exact calculation of the necessary intermediate terms requires of the order of 2h + 2 backward/forwardpropagation passes where h is the number of hidden units in the network. We also review and compare three approximations (ignoring some components of the second derivative, numerical differentiation, and scoring). Our algorithms apply to arbitrary activation functions, networks, and error functions (for instance, with connections that skip layers, or radial basis functions, or ...
Locally Bayesian Learning with Applications to Retrospective Revaluation and Highlighting
 Psychological Review
, 2006
"... A scheme is described for locally Bayesian parameter updating in models structured as successions of component functions. The essential idea is to backpropagate the target data to interior modules, such that an interior component’s target is the input to the next component that maximizes the probab ..."
Abstract

Cited by 26 (7 self)
 Add to MetaCart
A scheme is described for locally Bayesian parameter updating in models structured as successions of component functions. The essential idea is to backpropagate the target data to interior modules, such that an interior component’s target is the input to the next component that maximizes the probability of the next component’s target. Each layer then does locally Bayesian learning. The approach assumes online trialbytrial learning. The resulting parameter updating is not globally Bayesian but can better capture human behavior. The approach is implemented for an associative learning model that first maps inputs to attentionally filtered inputs and then maps attentionally filtered inputs to outputs. The Bayesian updating allows the associative model to exhibit retrospective revaluation effects such as backward blocking and unovershadowing, which have been challenging for associative learning models. The backpropagation of target values to attention allows the model to show trialorder effects, including highlighting and differences in magnitude of forward and backward blocking, which have been challenging for Bayesian learning models.
Shortterm memory for serial order: A recurrent neural network model
 Psychological Review
, 2006
"... Despite a century of research, the mechanisms underlying shortterm or working memory for serial order remain uncertain. Recent theoretical models have converged on a particular account, based on transient associations between independent item and context representations. In the present article, the ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
Despite a century of research, the mechanisms underlying shortterm or working memory for serial order remain uncertain. Recent theoretical models have converged on a particular account, based on transient associations between independent item and context representations. In the present article, the authors present an alternative model, according to which sequence information is encoded through sustained patterns of activation within a recurrent neural network architecture. As demonstrated through a series of computer simulations, the model provides a parsimonious account for numerous benchmark characteristics of immediate serial recall, including data that have been considered to preclude the application of recurrent neural networks in this domain. Unlike most competing accounts, the model deals naturally with findings concerning the role of background knowledge in serial recall and makes contact with relevant neuroscientific data. Furthermore, the model gives rise to numerous testable predictions that differentiate it from competing theories. Taken together, the results presented indicate that recurrent neural networks may offer a useful framework for understanding shortterm memory for serial order.
Learning in Dynamic Decision Tasks: Computational Model and Empirical Evidence
, 1997
"... this article, we have presented evidence that a computational model that instantiates approximate, local learning with graded transfer provides a good account of how subjects learn online from outcome feedback in the SPF, a simple dynamic task. We base this conclusion on the model's ability to pred ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
this article, we have presented evidence that a computational model that instantiates approximate, local learning with graded transfer provides a good account of how subjects learn online from outcome feedback in the SPF, a simple dynamic task. We base this conclusion on the model's ability to predict subjects' performance during training and on two subsequent tests of their ability to generalize, the control questions and the transfer task. We now explore the limitations of our efforts and discuss two alternative approaches to understanding human performance before concluding on our own approach 's merits