Results 1 - 10
of
32
A neuropsychological theory of multiple systems in category learning
- PSYCHOLOGICAL REVIEW
, 1998
"... A neuropsychological theory is proposed that assumes category learning is a competition between separate verbal and implicit (i.e., procedural-learning-based) categorization systems. The theory assumes that the caudate nucleus is an important component of the implicit system and that the anterior ci ..."
Abstract
-
Cited by 131 (12 self)
- Add to MetaCart
A neuropsychological theory is proposed that assumes category learning is a competition between separate verbal and implicit (i.e., procedural-learning-based) categorization systems. The theory assumes that the caudate nucleus is an important component of the implicit system and that the anterior cingulate and prefrontal cortices are critical to the verbal system. In addition to making predictions for normal human adults, the theory makes specific predictions for children, elderly people, and patients suffering from Parkinson's disease, Huntington's disease, major depression, amnesia, or lesions of the prefrontal cortex. Two separate formal descriptions of the theory are also provided. One describes trial-by-trial learning, and the other describes global dynamics. The theory is tested on published neuropsychological data and on category learning data with normal adults.
Rules and Exemplars in Category Learning
- Journal of Experimental Psychology: General
, 1998
"... haracterized by descriptions of each module and how each serves in those tasks for which it is best suited. However, these theories often do not emphasize how modules interact in producing responses and in learning. In this article we will develop a modular theory of categorization that follows fro ..."
Abstract
-
Cited by 92 (3 self)
- Add to MetaCart
haracterized by descriptions of each module and how each serves in those tasks for which it is best suited. However, these theories often do not emphasize how modules interact in producing responses and in learning. In this article we will develop a modular theory of categorization that follows from two distinct accounts of this behavior. The first account is that of rule-based theories of categorization. These theories emerge from a philosophical tradition in which concepts and categorization are described in terms of definitional rules. For example, if a living thing has a wide, flat tail and constructs dams by cutting down trees with its This work was supported by Indiana University Cognitive Science Program Fellowships and by NIMH ResearchTraining Grant PHS-T32-MH19879-03 to Erickson, and in part by NIMH FIRST Award 1-R29-MH51572-01 to Kruschke. This research was reported as a poster at the 1996 Cognitive Science Society Conference in San Diego, CA. We than
Vector Quantization with Complexity Costs
, 1993
"... Vector quantization is a data compression method where a set of data points is encoded by a reduced set of reference vectors, the codebook. We discuss a vector quantization strategy which jointly optimizes distortion errors and the codebook complexity, thereby, determining the size of the codebook. ..."
Abstract
-
Cited by 52 (17 self)
- Add to MetaCart
Vector quantization is a data compression method where a set of data points is encoded by a reduced set of reference vectors, the codebook. We discuss a vector quantization strategy which jointly optimizes distortion errors and the codebook complexity, thereby, determining the size of the codebook. A maximum entropy estimation of the cost function yields an optimal number of reference vectors, their positions and their assignment probabilities. The dependence of the codebook density on the data density for different complexity functions is investigated in the limit of asymptotic quantization levels. How different complexity measures influence the efficiency of vector quantizers is studied for the task of image compression, i.e., we quantize the wavelet coefficients of gray level images and measure the reconstruction error. Our approach establishes a unifying framework for different quantization methods like K-means clustering and its fuzzy version, entropy constrained vector quantizati...
Learning Rate Schedules For Faster Stochastic Gradient Search
, 1992
"... . Stochastic gradient descent is a general algorithm that includes LMS, on-line backpropagation, and adaptive k-means clustering as special cases. The standard choices of the learning rate j (both adaptive and fixed functions of time) often perform quite poorly. In contrast, our recently proposed cl ..."
Abstract
-
Cited by 36 (0 self)
- Add to MetaCart
. Stochastic gradient descent is a general algorithm that includes LMS, on-line backpropagation, and adaptive k-means clustering as special cases. The standard choices of the learning rate j (both adaptive and fixed functions of time) often perform quite poorly. In contrast, our recently proposed class of "search then converge" (STC) learning rate schedules (Darken and Moody, 1990b, 1991) display the theoretically optimal asymptotic convergence rate and a superior ability to escape from poor local minima However, the user is responsible for setting a key parameter. We propose here a new methodology for creating the first automatically adapting learning rates that achieve the optimal rate of convergence. INTRODUCTION The stochastic gradient descent algorithm is \DeltaW (t) = \Gammajr W f [W (t); X(t)]; (1) where j is the learning rate, t is the "time", and X(t) is the independent random exemplar chosen at time t. The purpose of the algorithm is to find a parameter vector W that minim...
Scaling EM (Expectation-Maximization) Clustering to Large Databases
, 1999
"... Practical statistical clustering algorithms typically center upon an iterative refinement optimization procedure to compute a locally optimal clustering solution that maximizes the fit to data. These algorithms typically require many database scans to converge, and within each scan they require the ..."
Abstract
-
Cited by 35 (0 self)
- Add to MetaCart
Practical statistical clustering algorithms typically center upon an iterative refinement optimization procedure to compute a locally optimal clustering solution that maximizes the fit to data. These algorithms typically require many database scans to converge, and within each scan they require the access to every record in the data table. For large databases, the scans become prohibitively expensive. We present a scalable implementation of the Expectation-Maximization (EM) algorithm. The database community has focused on distance-based clustering schemes and methods have been developed to cluster either numerical or categorical data. Unlike distancebased algorithms (such as K-Means), EM constructs proper statistical models of the underlying data source and naturally generalizes to cluster databases containing both discrete-valued and continuous-valued data. The scalable method is based on a decomposition of the basic statistics the algorithm needs: identifying regions of the data that...
Rule Inference for Financial Prediction using Recurrent Neural Networks
, 1997
"... This paper considers the prediction of noisy time series data, specifically, the prediction of foreign exchange rate data. A novel hybrid neural network algorithm for noisy time series prediction is presented which exhibits excellent performance on the problem. The method is motivated by considerati ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
This paper considers the prediction of noisy time series data, specifically, the prediction of foreign exchange rate data. A novel hybrid neural network algorithm for noisy time series prediction is presented which exhibits excellent performance on the problem. The method is motivated by consideration of how neural networks work, and by fundamental difficulties with random correlations when dealing with small sample sizes and high noise data. The method permits the inference and extraction of rules. One of the greatest complaints against neural networks is that it is hard to figure out exactly what they are doing -- this work provides one answer for the internal workings of the network. Furthermore, these rules can be used to gain insight into both the real world system and the predictor. This paper focuses on noisy time series prediction and rule inference -- use of the system in trading would typically involve the utilization of other financial indicators and domain knowledge. 1 Intr...
Using curvature information for fast stochastic search
- In Advances in Neural Information Processing Systems 9
, 1996
"... We present an algorithm for fast stochastic gradient descent that uses a nonlinear adaptive momentum scheme to optimize the late time convergence rate. The algorithm makes e ective use of curvature information, requires only O(n) storage and computation, and delivers convergence rates close to the t ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
We present an algorithm for fast stochastic gradient descent that uses a nonlinear adaptive momentum scheme to optimize the late time convergence rate. The algorithm makes e ective use of curvature information, requires only O(n) storage and computation, and delivers convergence rates close to the theoretical optimum. We demonstrate the technique on linear and large nonlinear backprop networks. Improving Stochastic Search Learning algorithms that perform gradient descent on a cost function can be formulated in either stochastic (on-line) or batch form. The stochastic version takes the form!t+1 =!t + t G (!t�xt) (1) where!t is the current weight estimate, t is the learning rate, G is minus the instantaneous gradient estimate, and xt is the input at time t1. One obtains the corresponding batch mode learning rule by takingconstant and averaging G over
On the Applicability of Neural Network and Machine Learning Methodologies to Natural Language Processing
, 1995
"... We examine the inductive inference of a complex grammar - specifically, we consider the task of training a model to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the Principles and Parameters linguistic frame ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
We examine the inductive inference of a complex grammar - specifically, we consider the task of training a model to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the Principles and Parameters linguistic framework, or Government-and-Binding theory. We investigate the following models: feed-forward neural networks, Fransconi-Gori-Soda and Back-Tsoi locally recurrent networks, Elman, Narendra & Parthasarathy, and Williams & Zipser recurrent networks, Euclidean and edit-distance nearest-neighbors, simulated annealing, and decision trees. The feed-forward neural networks and non-neural network machine learning models are included primarily for comparison. We address the question: How can a neural network, with its distributed nature and gradient descent based iterative calculations, possess linguistic capability which is traditionally handled with symbolic computation and recursive processes? Initial...
Clustering in Massive Data Sets
- Handbook of massive data sets
, 1999
"... We review the time and storage costs of search and clustering algorithms. We exemplify these, based on case-studies in astronomy, information retrieval, visual user interfaces, chemical databases, and other areas. Sections 2 to 6 relate to nearest neighbor searching, an elemental form of clustering, ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
We review the time and storage costs of search and clustering algorithms. We exemplify these, based on case-studies in astronomy, information retrieval, visual user interfaces, chemical databases, and other areas. Sections 2 to 6 relate to nearest neighbor searching, an elemental form of clustering, and a basis for clustering algorithms to follow. Sections 7 to 11 review a number of families of clustering algorithm. Sections 12 to 14 relate to visual or image representations of data sets, from which a number of interesting algorithmic developments arise.
On-line Step Size Adaptation
- INESC. 9 Rua Alves Redol, 1000
, 1997
"... Sub-category: online learning algorithms ..."

