Results 1  10
of
30
A tutorial on support vector regression
, 2004
"... In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing ..."
Abstract

Cited by 473 (2 self)
 Add to MetaCart
In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.
The Sample Complexity of Pattern Classification With Neural Networks: The Size of the Weights is More Important Than the Size of the Network
, 1997
"... Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number of training examples should grow at least linearly with the number of adjustable parameters in the ne ..."
Abstract

Cited by 177 (15 self)
 Add to MetaCart
Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number of training examples should grow at least linearly with the number of adjustable parameters in the network. Results in this paper show that if a large neural network is used for a pattern classification problem and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights. For example, consider a twolayer feedforward network of sigmoid units, in which the sum of the magnitudes of the weights associated with each unit is bounded by A and the input dimension is n. We show that the misclassification probability is no more than a certain error estimate (that is related to squared error on the training set) plus A³ p (log n)=m (ignori...
Building Text Classifiers using Positive and Unlabeled Examples
 In Proc. of the ICDM’03
, 2003
"... This paper studies the problem of building text classifiers using positive and unlabeled examples. The key feature of this problem is that there is no negative example for learning. Recently, a few techniques for solving this problem were proposed in the literature. These techniques are based on the ..."
Abstract

Cited by 67 (11 self)
 Add to MetaCart
This paper studies the problem of building text classifiers using positive and unlabeled examples. The key feature of this problem is that there is no negative example for learning. Recently, a few techniques for solving this problem were proposed in the literature. These techniques are based on the same idea, which builds a classifier in two steps. Each existing technique uses a different method for each step. In this paper, we first introduce some new methods for the two steps, and perform a comprehensive evaluation of all possible combinations of methods of the two steps. We then propose a more principled approach to solving the problem based on a biased formulation of SVM, and show experimentally that it is more accurate than the existing techniques. 1.
Robust Decision Trees: Removing Outliers from Databases
 In Knowledge Discovery and Data Mining
, 1995
"... Finding and removing outliers is an important problem in data mining. Errors in large databases can be extremely common, so an important property of a data mining algorithm is robustness with respect to errors in the database. Most sophisticated methods in machine learning address this problem ..."
Abstract

Cited by 57 (0 self)
 Add to MetaCart
Finding and removing outliers is an important problem in data mining. Errors in large databases can be extremely common, so an important property of a data mining algorithm is robustness with respect to errors in the database. Most sophisticated methods in machine learning address this problem to some extent, but not fully, and can be improved by addressing the problem more directly. In this paper we examine C4.5, a decision tree algorithm that is already quite robust  few algorithms have been shown to consistently achieve higher accuracy. C4.5 incorporates a pruning scheme that partially addresses the outlier removal problem. In our RobustC4.5 algorithm we extend the pruning method to fully remove the effect of outliers, and this results in improvement on many databases. In U. M. Fayyad and R. Uthurusamy, editors, Proceedings of the First International Conference on Knowledge Discovery and Data Mining, pages 174179, AAAI Press, Menlo Park, CA, 1995. Introduction...
Discovering Informative Patterns and Data Cleaning
, 1996
"... We present a method for discovering informative patterns from data. With this method, large databases can be reduced to only a few representative data entries. Our framework also encompasses methods for cleaning databases containing corrupted data. Both online and offline algorithms are proposed a ..."
Abstract

Cited by 49 (1 self)
 Add to MetaCart
We present a method for discovering informative patterns from data. With this method, large databases can be reduced to only a few representative data entries. Our framework also encompasses methods for cleaning databases containing corrupted data. Both online and offline algorithms are proposed and experimentally checked on databases of handwritten images. The generality of the framework makes it an attractive candidate for new applications in knowledge discovery. Keywords: knowledge discovery, machine learning, informative patterns, data cleaning, information gain. 4.1
Reducing communication for distributed learning in neural networks
 Proc. of the International Conference on Artificial Neural Networks – ICANN 2002
, 2002
"... Abstract. A learning algorithm is presented for circuits consisting of a single layer of perceptrons. We refer to such circuits as parallel perceptrons. In spite of their simplicity, these circuits are universal approximators for arbitrary boolean and continuous functions. In contrast to backprop fo ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
Abstract. A learning algorithm is presented for circuits consisting of a single layer of perceptrons. We refer to such circuits as parallel perceptrons. In spite of their simplicity, these circuits are universal approximators for arbitrary boolean and continuous functions. In contrast to backprop for multilayer perceptrons, our new learning algorithm – the parallel delta rule (pdelta rule) – only has to tune a single layer of weights, and it does not require the computation and communication of analog values with high precision. Reduced communication also distinguishes our new learning rule from other learning rules for such circuits such as those traditionally used for MADALINE. A theoretical analysis shows that the pdelta rule does in fact implement gradient descent – with regard to a suitable error measure – although it does not require to compute derivatives. Furthermore it is shown through experiments on common realworld benchmark datasets that its performance is competitive with that of other learning approaches from neural networks and machine learning. Thus our algorithm also provides an interesting new hypothesis for the organization of learning in biological neural systems. 1
Robust Linear Discriminant Trees
 In AI&Statistics95 [7
"... We present a new method for the induction of classification trees with linear discriminants as the partitioning function at each internal node. This paper presents two main contributions: first, a novel objective function called soft entropy which is used to identify optimal coefficients for the lin ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
We present a new method for the induction of classification trees with linear discriminants as the partitioning function at each internal node. This paper presents two main contributions: first, a novel objective function called soft entropy which is used to identify optimal coefficients for the linear discriminants, and second, a novel method for removing outliers called iterative refiltering which boosts performance on many datasets. These two ideas are presented in the context of a single learning algorithm called DTSEPIR, which is compared with the CART and OC1 algorithms. 36.1 Introduction Recursive partitioning classifiers, or decision trees, are an important nonparametric function representation in statistics and machine learning (Friedman 1977, Breiman, Friedman, Olshen & Stone 1984, Quinlan 1986, Quinlan 1993). Their wide and successful use in fielded applications and their simple intuitive appeal make decision tree learning algorithms an important area of study. In this p...
Support Vector Machines for Phoneme Classification
, 2001
"... In this thesis, Support Vector Machines (SVMs) are applied to the problem of phoneme classification. Given a sequence of acoustic observations and 40 phoneme targets, the task is to classify each observation to one of these targets. Since this task involves multiple classes, one of the main hurdles ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
In this thesis, Support Vector Machines (SVMs) are applied to the problem of phoneme classification. Given a sequence of acoustic observations and 40 phoneme targets, the task is to classify each observation to one of these targets. Since this task involves multiple classes, one of the main hurdles SVMs must overcome is to extend the inherently binary SVMs to the multiclass case. To do this, several methods are proposed, and their generalisation abilities are measured. It is found that even though some generalisation is lost in the transition, this can still lead to effective classifiers. In addition, a refinement to the SVMs is made to derive estimated posterior probabilities from classifications. Since almost all speech recognition systems are based on statistical models, this is necessary if SVMs are to be used in a full speech recognition system. The best accuracy found was 71.4%, which is competitive with the best results found in literature.
Support Vector Machines for Automated Gait Classification
, 2005
"... Ageing influences gait patterns causing constant threats to control of locomotor balance. Automated recognition of gait changes has many advantages including, early identification of atrisk gait and monitoring the progress of treatment outcomes. In this paper, we apply an artificial intelligence t ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Ageing influences gait patterns causing constant threats to control of locomotor balance. Automated recognition of gait changes has many advantages including, early identification of atrisk gait and monitoring the progress of treatment outcomes. In this paper, we apply an artificial intelligence technique [support vector machines (SVM)] for the automatic recognition of youngold gait types from their respective gaitpatterns. Minimum foot clearance (MFC) data of 30 young and 28 elderly participants were analyzed using a PEAK2D motion analysis system during a 20min continuous walk on a treadmill at selfselected walking speed. Gait features extracted from individual MFC histogramplot and Poincaréplot images were used to train the SVM. Crossvalidation test results indicate that the generalization performance of the SVM was on average 83.3 % @ P WA to recognize young and elderly gait patterns, compared to a neural network’s accuracy of US H S H%. A “hillclimbing ” feature selection algorithm demonstrated that a small subset (3–5) of gait features extracted from MFC plots could differentiate the gait patterns with 90 % accuracy. Performance of the gait classifier was evaluated using areas under the receiver operating characteristic plots. Improved performance of the classifier was evident when trained with reduced number of selected good features and with radial basis function kernel. These results suggest that SVMs can function as an efficient gait classifier for recognition of young and elderly gait patterns, and has the potential for wider applications in gait identification for fallsrisk minimization in the elderly.
Computation in a Single Neuron: Hodgkin and Huxley Revisited
, 2003
"... A spiking neuron “computes” by transforming a complex dynamical input into a train of action potentials, or spikes. The computation performed by the neuron can be formulated as dimensional reduction, or feature detection, followed by a nonlinear decision function over the lowdimensional space. Gener ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
A spiking neuron “computes” by transforming a complex dynamical input into a train of action potentials, or spikes. The computation performed by the neuron can be formulated as dimensional reduction, or feature detection, followed by a nonlinear decision function over the lowdimensional space. Generalizations of the reverse correlation technique with white noise input provide a numerical strategy for extracting the relevant lowdimensional features from experimental data, and information theory can be used to evaluate the quality of the low–dimensional approximation. We apply these methods to analyze the simplest biophysically realistic model neuron, the Hodgkin–Huxley (HH) model, using this system to illustrate the general methodological issues. We focus on the features in the stimulus that trigger a spike, explicitly eliminating the effects of interactions between spikes. One can approximate this triggering “feature space ” as a twodimensional linear subspace in the highdimensional space of input histories, capturing in this way a substantial fraction of the mutual information between inputs and spike time. We find that an even better approximation, however, is to describe the relevant subspace as two dimensional but curved; in this way, we can capture 90 % of the mutual information even at high time resolution. Our analysis provides a new understanding of the computational properties of the HH model. While it is common to approximate neural behavior as “integrate and fire,” the HH model is not an integrator nor is it well described by a single threshold.