Results 1  10
of
59
Neural networks for classification: a survey
 and Cybernetics  Part C: Applications and Reviews
, 2000
"... Abstract—Classification is one of the most active research and application areas of neural networks. The literature is vast and growing. This paper summarizes the some of the most important developments in neural network classification research. Specifically, the issues of posterior probability esti ..."
Abstract

Cited by 132 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Classification is one of the most active research and application areas of neural networks. The literature is vast and growing. This paper summarizes the some of the most important developments in neural network classification research. Specifically, the issues of posterior probability estimation, the link between neural and conventional classifiers, learning and generalization tradeoff in classification, the feature variable selection, as well as the effect of misclassification costs are examined. Our purpose is to provide a synthesis of the published research in this area and stimulate further research interests and efforts in the identified topics. Index Terms—Bayesian classifier, classification, ensemble methods, feature variable selection, learning and generalization, misclassification costs, neural networks. I.
Making Use of Population Information in Evolutionary Artificial Neural Networks
, 1998
"... This paper is concerned with the simultaneous evolution of artificial neural network (ANN) architectures and weights. The current practice in evolving ANNs is to choose the best ANN in the last generation as the final result. This paper proposes a different approach to form the final result by combi ..."
Abstract

Cited by 87 (25 self)
 Add to MetaCart
This paper is concerned with the simultaneous evolution of artificial neural network (ANN) architectures and weights. The current practice in evolving ANNs is to choose the best ANN in the last generation as the final result. This paper proposes a different approach to form the final result by combining all the individuals in the last generation in order to make best use of all the information contained in the whole population. This approach regards a population of ANNs as an ensemble and uses a combination method to integrate them. Although there has been some work on integrating ANN modules [2], [3], little has been done in evolutionary learning to make best use of its population information. Four linear combination methods have been investigated in this paper to illustrate our ideas. Three real world data sets have been used in our experimental studies, which show that the recursive least square (RLS) algorithm always produces an integrated system that outperforms the best individua...
Geometric Methods for Feature Extraction and Dimensional Reduction
 In L. Rokach and O. Maimon (Eds.), Data
, 2005
"... Abstract We give a tutorial overview of several geometric methods for feature extraction and dimensional reduction. We divide the methods into projective methods and methods that model the manifold on which the data lies. For projective methods, we review projection pursuit, principal component anal ..."
Abstract

Cited by 41 (1 self)
 Add to MetaCart
(Show Context)
Abstract We give a tutorial overview of several geometric methods for feature extraction and dimensional reduction. We divide the methods into projective methods and methods that model the manifold on which the data lies. For projective methods, we review projection pursuit, principal component analysis (PCA), kernel PCA, probabilistic PCA, and oriented PCA; and for the manifold methods, we review multidimensional scaling (MDS), landmark MDS, Isomap, locally linear embedding, Laplacian eigenmaps and spectral clustering. The Nyström method, which links several of the algorithms, is also reviewed. The goal is to provide a selfcontained review of the concepts and mathematics underlying these algorithms.
Computational Intelligence in Wireless Sensor Networks: A Survey
 IEEE COMMUNICATIONS SURVEYS & TUTORIALS
, 2011
"... Wireless sensor networks (WSNs) are networks of distributed autonomous devices that can sense or monitor physical or environmental conditions cooperatively. WSNs face many challenges, mainly caused by communication failures, storage and computational constraints and limited power supply. Paradigms o ..."
Abstract

Cited by 37 (0 self)
 Add to MetaCart
Wireless sensor networks (WSNs) are networks of distributed autonomous devices that can sense or monitor physical or environmental conditions cooperatively. WSNs face many challenges, mainly caused by communication failures, storage and computational constraints and limited power supply. Paradigms of computational intelligence (CI) have been successfully used in recent years to address various challenges such as data aggregation and fusion, energy aware routing, task scheduling, security, optimal deployment and localization. CI provides adaptive mechanisms that exhibit intelligent behavior in complex and dynamic environments like WSNs. CI brings about flexibility, autonomous behavior, and robustness against topology changes, communication failures and scenario changes. However, WSN developers are usually not or not completely aware of the potential CI algorithms offer. On the other side, CI researchers are not familiar with all real problems and subtle requirements of WSNs. This mismatch makes collaboration and development difficult. This paper intends to close this gap and foster collaboration by offering a detailed introduction to WSNs and their properties. An extensive survey of CI applications to various problems in WSNs from various research areas and publication venues is presented in the paper. Besides, a discussion on advantages and disadvantages of CI algorithms over traditional WSN solutions is offered. In addition, a general evaluation of CI algorithms is presented, which will serve as a guide for using CI algorithms for WSNs.
Low Entropy Coding with Unsupervised Neural Networks
"... ed on visual and speech data. The ability of the network to automatically generate wavelet codes from natural images is demonstrated. These bear a close resemblance to 2D Gabor functions, which have previously been used to describe physiological receptive fields, and as a means of producing compact ..."
Abstract

Cited by 31 (0 self)
 Add to MetaCart
ed on visual and speech data. The ability of the network to automatically generate wavelet codes from natural images is demonstrated. These bear a close resemblance to 2D Gabor functions, which have previously been used to describe physiological receptive fields, and as a means of producing compact image representations. Keywords: neural networks, unsupervised learning, selforganisation, feature extraction, information theory, redundancy reduction, sparse coding, imaging models, occlusion, image coding, speech coding. Declaration This dissertation is the result of my own original work, except where reference is made to the work of others. No part of it has been submitted for any other university degree or diploma. Its length, including captions, footnotes, appendix and bibliography, is approximately 58000 words. Acknowledgements I would like first and foremost to thank Richard Prager, my supervisor, fo
Fast subspace tracking and neural network learning by a novel information criterion
 IEEE Trans. Signal Processing
, 1998
"... Abstract — We introduce a novel information criterion (NIC) for searching for the optimum weights of a twolayer linear neural network (NN). The NIC exhibits a single global maximum attained if and only if the weights span the (desired) principal subspace of a covariance matrix. The other stationary ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
(Show Context)
Abstract — We introduce a novel information criterion (NIC) for searching for the optimum weights of a twolayer linear neural network (NN). The NIC exhibits a single global maximum attained if and only if the weights span the (desired) principal subspace of a covariance matrix. The other stationary points of the NIC are (unstable) saddle points. We develop an adaptive algorithm based on the NIC for estimating and tracking the principal subspace of a vector sequence. The NIC algorithm provides a fast online learning of the optimum weights for the twolayer linear NN. We establish the connections between the NIC algorithm and the conventional meansquareerror (MSE) based algorithms such as Oja’s algorithm, LMSER, PAST, APEX, and GHA. The NIC algorithm has several key advantages such as faster convergence, which is illustrated through analysis and simulation. I.
Generalization Error of Linear Neural Networks in Unidentifiable Cases
 IN ALGORITHMIC LEARNING THEORY
, 1999
"... The statistical asymptotic theory is often used in many theoretical results in computational and statistical learning theory. It describes the limiting distribution of the maximum likelihood estimator as an normal distribution. However, in layered models such as neural networks, the regularity condi ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
The statistical asymptotic theory is often used in many theoretical results in computational and statistical learning theory. It describes the limiting distribution of the maximum likelihood estimator as an normal distribution. However, in layered models such as neural networks, the regularity condition of the asymptotic theory is not necessarily satisfied. If the true function is realized by a smallersized network than the model, the target parameter is not identifiable because it consists of a union of high dimensional submanifolds. In such cases, the maximum likelihood estimator is not subject to the asymptotic theory. There has been little known on the behavior in these cases of neural networks. In this paper, we analyze the expectation of the generalization error of threelayer linear neural networks in asymptotic situations, and elucidate a strange behavior in unidentifiable cases. We show that the expectation of the generalization error in the unidentifiable cases is larger tha...
Lowcomplexity principal component analysis for hyperspectral image compression
 Int. J. High Performance Comput. Appl
, 2008
"... Abstract—Principal component analysis (PCA) is an effective tool for spectral decorrelation of hyperspectral imagery, and PCAbased spectral transforms have been employed successfully in conjunction with JPEG2000 for hyperspectralimage compression. However, the computational cost of determining the ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
(Show Context)
Abstract—Principal component analysis (PCA) is an effective tool for spectral decorrelation of hyperspectral imagery, and PCAbased spectral transforms have been employed successfully in conjunction with JPEG2000 for hyperspectralimage compression. However, the computational cost of determining the datadependent PCA transform is high due to its traditional eigendecomposition implementation which requires calculation of a covariance matrix across the data. Several strategies for reducing the computation burden of PCA are explored, including both spatial and spectral subsampling in the covariance calculation as well as an iterative algorithm that circumvents determination of the covariance matrix entirely. Experimental results investigate the impacts of such lowcomplexity PCA on JPEG2000 compression of hyperspectral images, focusing on ratedistortion performance as well as dataanalysis performance at an anomalydetection task. Index Terms—principal component analysis, hyperspectral image compression, JPEG2000, spectral decorrelation, anomaly detection I.
Global convergence of oja’s subspace algorithm for principal component extraction
 IEEE Trans. Neural Networks
, 1998
"... Abstract—Oja’s principal subspace algorithm is a wellknown and powerful technique for learning and tracking principal information in time series. A thorough investigation of the convergence property of Oja’s algorithm is undertaken in this paper. The asymptotical convergence rates of the algorithm ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
Abstract—Oja’s principal subspace algorithm is a wellknown and powerful technique for learning and tracking principal information in time series. A thorough investigation of the convergence property of Oja’s algorithm is undertaken in this paper. The asymptotical convergence rates of the algorithm is discovered. The dependence of the algorithm on its initial weight matrix and the singularity of the data covariance matrix is comprehensively addressed. Index Terms—Convergence rate, global convergence, principal components extraction. I.
Theoretical analysis of Bayesian matrix factorization
 Journal of Machine Learning Research
"... Recently, variational Bayesian (VB) techniques have been applied to probabilistic matrix factorization and shown to perform very well in experiments. In this paper, we theoretically elucidate properties of the VB matrix factorization (VBMF) method. Through finitesample analysis of the VBMF estimato ..."
Abstract

Cited by 9 (6 self)
 Add to MetaCart
(Show Context)
Recently, variational Bayesian (VB) techniques have been applied to probabilistic matrix factorization and shown to perform very well in experiments. In this paper, we theoretically elucidate properties of the VB matrix factorization (VBMF) method. Through finitesample analysis of the VBMF estimator, we show that two types of shrinkage factors exist in the VBMF estimator: the positivepart JamesStein (PJS) shrinkage and the tracenorm shrinkage, both acting on each singular component separately for producing lowrank solutions. The tracenorm shrinkage is simply induced by nonflat prior information, similarly to the maximum a posteriori (MAP) approach. Thus, no tracenorm shrinkage remains when priors are noninformative. On the other hand, we show a counterintuitive fact that the PJS shrinkage factor is kept activated even with flat priors. This is shown to be induced by the nonidentifiability of the matrix factorization model, that is, the mapping between the target matrix and factorized matrices is not onetoone. We call this modelinduced regularization. We further extend our analysis to empirical Bayes scenarios where hyperparameters are also learned based on the VB free energy. Throughout the paper, we assume no missing entry in the observed matrix, and therefore collaborative filtering is out of scope.