Results 1 -
8 of
8
GAMLS: A Generalized framework for Associative Modular Learning Systems
- In Proceedings of the Applications and Science of Computational Intelligence II
, 1999
"... Learning a large number of simple local concepts is both faster and easier than learning a single global concept. Inspired by this principle of divide and conquer, a number of modular learning approaches have been proposed by the computational intelligence community. In modular learning, the classif ..."
Abstract
-
Cited by 9 (8 self)
- Add to MetaCart
Learning a large number of simple local concepts is both faster and easier than learning a single global concept. Inspired by this principle of divide and conquer, a number of modular learning approaches have been proposed by the computational intelligence community. In modular learning, the classification/regression/clustering problem is first decomposed into a number of simpler subproblems, a module is learned for each of these subproblems, and finally their results are integrated by a suitable combining method. Mixtures of experts and clustering are two of the techniques that are describable in this paradigm. In this paper we present a broad framework for Generalized Associative Modular Learning Systems (GAMLS). Modularity is introduced through soft association of each training pattern with every module. The coupled problems of learning the module parameters and learning associations are solved iteratively using deterministic annealing. Starting at a high temperature with only one modu...
Generalization Properties of Modular Networks: Implementing the Parity Function
, 2001
"... The parity function is one of the most used Boolean function for testing learning algorithms because both of its simple definition and its great complexity. Being one of the hardest problems, many different architectures have been constructed to compute parity, essentially by adding neurons in the h ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
The parity function is one of the most used Boolean function for testing learning algorithms because both of its simple definition and its great complexity. Being one of the hardest problems, many different architectures have been constructed to compute parity, essentially by adding neurons in the hidden layer in order to reduce the number of local minima where gradient-descent learning algorithms could get stuck. We construct a family of modular architectures that implement the parity function in which, every member of the family can be characterized by the fan-in max of the network, i.e., the maximum number of connections that a neuron can receive. We analyze the generalization ability of the modular networks first by computing analytically the minimum number of examples needed for perfect generalization and second by numerical simulations. Both results show that the generalization ability of these networks is systematically improved by the degree of modularity of the network. We also analyze the influence of the selection of examples in the emergence of generalization ability, by comparing the learning curves obtained through a random selection of examples to those obtained through examples selected accordingly to a general algorithm we recently proposed.
Neural Network Task Decomposition Based on Output Partitioning
- Journal of the Institution of Engineers Singapore
"... In this paper, we propose a new method for task decomposition based on output partitioning. The proposed method is able to find the appropriate architectures for largescale real-world problems automatically and efficiently. By using this method, a problem can be divided flexibly into several sub-pro ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In this paper, we propose a new method for task decomposition based on output partitioning. The proposed method is able to find the appropriate architectures for largescale real-world problems automatically and efficiently. By using this method, a problem can be divided flexibly into several sub-problems as chosen, each of which is composed of the whole input vector and a fraction of the output vector. Each module (for each subproblem) is responsible for producing a fraction of the output vector of the original problem. Hence, the hidden structure for the original problem’s output units is decoupled. These modules can be grown and trained in sequence or in parallel. Incorporated with the constructive learning algorithm, our method does not require excessive computation and any prior knowledge concerning decomposition. The feasibility of output partitioning is analyzed and proved. Several benchmarks are implemented to test the validity of this method. Their results show that this method can reduce computation time, increase learning speed, and improve generalization accuracy for both classification and regression problems.
A Neural Network Facial Expression Recognition System using Unsupervised Local Processing
- Local Processing, Cognitive Neuroscience Sector – SISSA
"... A local unsupervised processing stage is inserted within a neural network constructed to recognize facial expressions. The stage is applied in order to reduce the dimensionality of the input data while preserving some topological structure. The receptive fields of the neurons in the first hidden lay ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
A local unsupervised processing stage is inserted within a neural network constructed to recognize facial expressions. The stage is applied in order to reduce the dimensionality of the input data while preserving some topological structure. The receptive fields of the neurons in the first hidden layer self-organize according to a local energy function, taking into account the variance of the input pixels. There is just one synapse going out from every input pixel and these weights, connecting the first two layers, are trained with a hebbian algorithm. The structure of the network is completed with specialized modules, trained with backpropagation, that classify the data into the different expression categories. Thus, the neural net architecture includes 4 layers of neurons, that we train and test with images from the Yale Faces Database. We obtain a generalization rate of 84:5% on unseen faces, similar to the 83:2% rate obtained when using a similar system but implementing PCA processing at the initial stage.
Data Classification for Unsupervised Learning of Multiple Models: Convergence Results
, 1999
"... In this paper we examine a problem which arises in connection with the application of the Lainiotis Partition Algorithm to tasks of signal classification, prediction and parameter estimation. We are particularly interested in tasks which involve composite systems, comprising of a finite number of sw ..."
Abstract
- Add to MetaCart
In this paper we examine a problem which arises in connection with the application of the Lainiotis Partition Algorithm to tasks of signal classification, prediction and parameter estimation. We are particularly interested in tasks which involve composite systems, comprising of a finite number of switched sub-systems. The problem we consider arises in situations of unsupervised, online classification and modeling and can be characterized as a problem of data allocation, i.e. how to partition observed data into separate training sets and use the members of each set for training the model of a particular sub-system. We propose an algorithm that effects unsupervised, online data allocation and prove that under mild separability conditions the algorithm converges to the "correct" solution. The proposed algorithm is also tested by numerical experiments.
Short Term Load Forecasting Using Predictive Modular Neural Networks
- Bakirtzis
, 2000
"... In this paper we present an application of predictive modular neural networks (PREMONN) to short term load forecasting. PREMONNs are a family of probabilistically motivated algorithms which can be used for time series prediction, classification and identification. PREMONNs utilize local predictors o ..."
Abstract
- Add to MetaCart
In this paper we present an application of predictive modular neural networks (PREMONN) to short term load forecasting. PREMONNs are a family of probabilistically motivated algorithms which can be used for time series prediction, classification and identification. PREMONNs utilize local predictors of several types (e.g. linear predictors or artificial neural networks) and produce a final prediction which is a weighted combination of the local predictions; the weights can be interpreted as Bayesian posterior probabilities and are computed online. The method is applied to short term load forecasting for the Greek Public Power Corporation dispatching center of Crete, where PREMONN outperforms conventional prediction techniques. 2 Problem Formulation We are given a sequence y t , t=1,2, ... , where (for each t) y t has dimensions 24 1; each of the y t components corresponds to the load of a particular hour of the day on day no. t. The predictors have the general form y t =f(y t-1 , y t-2...
of Switching Time Series: the Data Allocation Problem
, 2001
"... In this paper we explore some aspects of the problem of on-line, unsupervised learning of a switching time series, i.e. a time series which is generated by a combination of several, alternately activated sources. This learning problem can be solved by a two-stage approach: (a) separating of the inco ..."
Abstract
- Add to MetaCart
In this paper we explore some aspects of the problem of on-line, unsupervised learning of a switching time series, i.e. a time series which is generated by a combination of several, alternately activated sources. This learning problem can be solved by a two-stage approach: (a) separating of the incoming data to several datasets (one dataset corresponding to each source); (b) developing one model per dataset (i.e. one model per source). We introduce a general data allocation methodology which combines the two steps into an iterative scheme: existing models compete for the incoming data; data assigned to each model are used to refine the model. We distinguish between two modes of data allocation: in parallel data allocation every incoming datablock is allocated to the model with lowest prediction error; in serial data allocation the incoming datablock is allocated to the first model with prediction error below a prespecified threshold. We present sufficient conditions for asymptotically correct allocation of the data. We also present numerical experiments to support our theoretical analysis. 1

