Results 1  10
of
34
Improved heterogeneous distance functions
 Journal of Artificial Intelligence Research
, 1997
"... Instancebased learning techniques typically handle continuous and linear input values well, but often do not handle nominal input attributes appropriately. The Value Difference Metric (VDM) was designed to find reasonable distance values between nominal attribute values, but it largely ignores cont ..."
Abstract

Cited by 285 (9 self)
 Add to MetaCart
(Show Context)
Instancebased learning techniques typically handle continuous and linear input values well, but often do not handle nominal input attributes appropriately. The Value Difference Metric (VDM) was designed to find reasonable distance values between nominal attribute values, but it largely ignores continuous attributes, requiring discretization to map continuous values into nominal values. This paper proposes three new heterogeneous distance functions, called the Heterogeneous Value Difference Metric (HVDM), the Interpolated Value Difference Metric (IVDM), and the Windowed Value Difference Metric (WVDM). These new distance functions are designed to handle applications with nominal attributes, continuous attributes, or both. In experiments on 48 applications the new distance metrics achieve higher classification accuracy on average than three previous distance functions on those datasets that have both nominal and continuous attributes.
A Theory of Networks for Approximation and Learning
 Laboratory, Massachusetts Institute of Technology
, 1989
"... Learning an inputoutput mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multidimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, t ..."
Abstract

Cited by 237 (25 self)
 Add to MetaCart
Learning an inputoutput mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multidimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, this form of learning is closely related to classical approximation techniques, such as generalized splines and regularization theory. This paper considers the problems of an exact representation and, in more detail, of the approximation of linear and nonlinear mappings in terms of simpler functions of fewer variables. Kolmogorov's theorem concerning the representation of functions of several variables in terms of functions of one variable turns out to be almost irrelevant in the context of networks for learning. Wedevelop a theoretical framework for approximation based on regularization techniques that leads to a class of threelayer networks that we call Generalized Radial Basis Functions (GRBF), since they are mathematically related to the wellknown Radial Basis Functions, mainly used for strict interpolation tasks. GRBF networks are not only equivalent to generalized splines, but are also closely related to pattern recognition methods suchasParzen windows and potential functions and to several neural network algorithms, suchas Kanerva's associative memory,backpropagation and Kohonen's topology preserving map. They also haveaninteresting interpretation in terms of prototypes that are synthesized and optimally combined during the learning stage. The paper introduces several extensions and applications of the technique and discusses intriguing analogies with neurobiological data.
Reduction Techniques for Instancebased Learning Algorithms
 MACHINE LEARNING
, 2000
"... Instancebased learning algorithms are often faced with the problem of deciding which instances to store for use during generalization. Storing too many instances can result in large memory requirements and slow execution speed, and can cause an oversensitivity to noise. This paper has two main pur ..."
Abstract

Cited by 209 (3 self)
 Add to MetaCart
Instancebased learning algorithms are often faced with the problem of deciding which instances to store for use during generalization. Storing too many instances can result in large memory requirements and slow execution speed, and can cause an oversensitivity to noise. This paper has two main purposes. First, it provides a survey of existing algorithms used to reduce storage requirements in instancebased learning algorithms and other exemplarbased algorithms. Second, it proposes six additional reduction algorithms called DROP1–DROP5 and DEL (three of which were first described in Wilson & Martinez, 1997c, as RT1–RT3) that can be used to remove instances from the concept description. These algorithms and 10 algorithms from the survey are compared on 31 classification tasks. Of those algorithms that provide substantial storage reduction, the DROP algorithms have the highest average generalization accuracy in these experiments, especially in the presence of uniform class noise.
Connectionist Probability Estimation in HMM Speech Recognition
 IEEE Transactions on Speech and Audio Processing
, 1992
"... This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Herve Bourlard. We review the basis of HMM speech ..."
Abstract

Cited by 92 (24 self)
 Add to MetaCart
(Show Context)
This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Herve Bourlard. We review the basis of HMM speech recognition, and point out the possible benefits of incorporating connectionist networks. We discuss some issues necessary to the construction of a connectionist HMM recognition system, and describe the performance of such a system, including evaluations on the DARPA database, in collaboration with Mike Cohen and Horacio Franco of SRI International. In conclusion, we show that a connectionist component improves a state of the art HMM system. ii Part I INTRODUCTION Over the past few years, connectionist models have been widely proposed as a potentially powerful approach to speech recognition (e.g. Makino et al. (1983), Huang et al. (1988) and Waibel et al. (1989)). However, whilst connec...
StatLog: Comparison of Classification Algorithms on Large RealWorld Problems
, 1995
"... This paper describes work in the StatLog project comparing classification algorithms on large realworld problems. The algorithms compared were from: symbolic learning (CART, C4.5, NewID, AC 2 , ITrule, Cal5, CN2), statistics (Naive Bayes, knearest neighbor, kernel density, linear discriminant, qua ..."
Abstract

Cited by 68 (0 self)
 Add to MetaCart
This paper describes work in the StatLog project comparing classification algorithms on large realworld problems. The algorithms compared were from: symbolic learning (CART, C4.5, NewID, AC 2 , ITrule, Cal5, CN2), statistics (Naive Bayes, knearest neighbor, kernel density, linear discriminant, quadratic discriminant, logistic regression, projection pursuit, Bayesian networks), and neural networks (backpropagation, radial basis functions). Twelve datasets were used: five from image analysis, three from medicine, and two each from engineering and finance. We found that which algorithm performed best depended critically on the dataset investigated. We therefore developed a set of dataset descriptors to help decide which algorithms are suited to particular datasets. For example, datasets with extreme distributions (skew ? 1 and kurtosis ? 7) and with many binary/categorical attributes (? 38%) tend to favor symbolic learning algorithms. We suggest how classification algorith...
Identification and Control of Nonlinear Systems Using Neural Network Models: Design and Stability Analysis
 ELECTRICAL ENGINEERING—SYSTEMS REP
, 1991
"... The feasibility of applying neural network learning techniques in problems of system identification and control has been demonstrated through several empirical studies. These studies are based for the most part on gradient techniques for deriving parameter adjustment laws. While such schemes perf ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
The feasibility of applying neural network learning techniques in problems of system identification and control has been demonstrated through several empirical studies. These studies are based for the most part on gradient techniques for deriving parameter adjustment laws. While such schemes perform well in many cases, in general, problems arise in attempting to prove stability of the overall system, or convergence of the output error to zero. This paper presents a stability theory approach to synthesizing and analyzing identification and control schemes for nonlinear dynamical systems using neural network models. The nonlinearities of the dynamical system are assumed to be unknown and are modelled by neural network architectures. Multilayer networks with sigmoidal activation functions and radial basis function networks are the two types of neural network models that are considered. These static network architectures are combined with dynamical elements, in the form of stable filters, to construct a type of recurrent network configuration which is shown to be capable of approximating a large class of dynamical systems.
An Integrated InstanceBased Learning Algorithm
 Computational Intelligence
, 2000
"... The basic nearestneighbor rule generalizes well in many domains but has several shortcomings, including inappropriate distance functions, large storage requirements, slow execution time, sensitivity to noise, and an inability to adjust its decision boundaries after storing the training data. This p ..."
Abstract

Cited by 30 (1 self)
 Add to MetaCart
The basic nearestneighbor rule generalizes well in many domains but has several shortcomings, including inappropriate distance functions, large storage requirements, slow execution time, sensitivity to noise, and an inability to adjust its decision boundaries after storing the training data. This paper proposes methods for overcoming each of these weaknesses and combines these methods into a comprehensive learning system called the Integrated Decremental InstanceBased Learning Algorithm (IDIBL) that seeks to reduce storage, improve execution speed, and increase generalization accuracy, when compared to the basic nearest neighbor algorithm and other learning models. IDIBL tunes its own parameters using a new measure of fitness that combines confidence and crossvalidation (CVC) accuracy in order to avoid discretization problems with more traditional leaveoneout crossvalidation (LCV). In our experiments IDIBL achieves higher generalization accuracy than other less comprehensive instancebased learning algorithms, while requiring less than onefourth the storage of the nearest neighbor algorithm and improving execution speed by a corresponding factor. In experiments on 21 datasets, IDIBL also achieves higher generalization accuracy than those reported for 16 major machine learning and neural network models.
Reduction Techniques for ExemplarBased Learning Algorithms
 MACHINE LEARNING
, 2000
"... Exemplarbased learning algorithms are often faced with the problem of deciding which instances or other exemplars to store for use during generalization. Storing too many exemplars can result in large memory requirements and slow execution speed, and can cause an oversensitivity to noise. This pap ..."
Abstract

Cited by 29 (4 self)
 Add to MetaCart
(Show Context)
Exemplarbased learning algorithms are often faced with the problem of deciding which instances or other exemplars to store for use during generalization. Storing too many exemplars can result in large memory requirements and slow execution speed, and can cause an oversensitivity to noise. This paper has two main purposes. First, it provides a survey of existing algorithms used to reduce the number of exemplars retained in exemplarbased learning models. Second, it proposes six new reduction algorithms called DROP15 and DEL that can be used to prune instances from the concept description. These algorithms and 10 algorithms from the survey are compared on 31 datasets. Of those algorithms that provide substantial storage reduction, the DROP algorithms have the highest generalization accuracy in these experiments, especially in the presence of noise.
A neural network architecture that computes its own reliability
 Computers in Chemical Engineering
, 1992
"... AbstractArtificial neural networks (ANNs) have been used to construct empirical nonlinear models of process data. Because network models are not based on physical theory and contain nonlinearities, their predictions are suspect when extrapolating beyond the range of the original training data. With ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
(Show Context)
AbstractArtificial neural networks (ANNs) have been used to construct empirical nonlinear models of process data. Because network models are not based on physical theory and contain nonlinearities, their predictions are suspect when extrapolating beyond the range of the original training data. With multiple correlated inputs, it is difficult to recognize when the network is extrapolating. Furthermore, due to nonuniform distribution of the training examples and noise over the domain, the network may have local areas of poor fit even when not extrapolating. Standard measures of network performance give no indication of regions of locally poor fit or possible errors due to extrapolation. This paper introduces the "validity index network " (VInet), an extension of radial basis function networks (RBFN), that calculates the reliability and the confidence of its output and indicates local regions of poor fit and extrapolation. Because RBFNs use a composition of local fits to the data, they are readily adapted to predict local fitting accuracy. The VInet can also detect novel input patterns in classification problems, provided that the inputs to the classifier are real values. The reliability measures of the VInet are implemented as additional output nodes of the underlying RBFN. Weights associated with the reliability nodes are given analytically based on training statistics from the fitting of the target function, and thus the reliability measures can be added to a standard RBFN with no additional training effort. 1.
Spatial Coherence as an Internal Teacher for a Neural Network
, 1995
"... Supervised learning procedures for neural networks have recently met with considerable success in learning difficult mappings. So far, however, they have been limited by their poor scaling behaviour, particularly for networks with many hidden layers. A promising alternative is to develop unsupervise ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
Supervised learning procedures for neural networks have recently met with considerable success in learning difficult mappings. So far, however, they have been limited by their poor scaling behaviour, particularly for networks with many hidden layers. A promising alternative is to develop unsupervised learning algorithms by defining objective functions that characterize the quality of an internal representation without requiring knowledge of the desired outputs of the system. Our major goal is to build selforganizing network modules which capture important regularities in the environment in a simple form. A layered hierarchy of such modules should be able to learn in a time roughly linear in the number of layers. We propose that a good objective for perceptual learning is to extract higherorder features that exhibit simple coherence across time or space. This can be done by transforming the input representation into an underlying representation in which the mutual information between ...