Results 1  10
of
115
Correlationbased feature selection for machine learning
, 1998
"... A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task. This thesis addresses the problem of feature selection for machine learning through a correlation based approach. The central hypothesis is that ..."
Abstract

Cited by 155 (3 self)
 Add to MetaCart
A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task. This thesis addresses the problem of feature selection for machine learning through a correlation based approach. The central hypothesis is that good feature sets contain features that are highly correlated with the class, yet uncorrelated with each other. A feature evaluation formula, based on ideas from test theory, provides an operational definition of this hypothesis. CFS (Correlation based Feature Selection) is an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy. CFS was evaluated by experiments on artificial and natural datasets. Three machine learning algorithms were used: C4.5 (a decision tree learner), IB1 (an instance based learner), and naive Bayes. Experiments on artificial datasets showed that CFS quickly identifies and screens irrelevant, redundant, and noisy features, and identifies relevant features as long as their relevance does not strongly depend on other features. On natural domains, CFS typically eliminated well over half the features. In most cases, classification accuracy using the reduced feature set equaled or bettered accuracy using the complete feature set.
Wrappers For Performance Enhancement And Oblivious Decision Graphs
, 1995
"... In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are stu ..."
Abstract

Cited by 108 (8 self)
 Add to MetaCart
In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are studied under the wrapper approach. The hypothesis spaces we investigate are: decision tables with a default majority rule (DTMs) and oblivious readonce decision graphs (OODGs).
Prediction risk and architecture selection for neural networks
, 1994
"... Abstract. We describe two important sets of tools for neural network modeling: prediction risk estimation and network architecture selection. Prediction risk is defined as the expected performance of an estimator in predicting new observations. Estimated prediction risk can be used both for estimati ..."
Abstract

Cited by 73 (2 self)
 Add to MetaCart
Abstract. We describe two important sets of tools for neural network modeling: prediction risk estimation and network architecture selection. Prediction risk is defined as the expected performance of an estimator in predicting new observations. Estimated prediction risk can be used both for estimating the quality of model predictions and for model selection. Prediction risk estimation and model selection are especially important for problems with limited data. Techniques for estimating prediction risk include data resampling algorithms such as nonlinear cross–validation (NCV) and algebraic formulae such as the predicted squared error (PSE) and generalized prediction error (GPE). We show that exhaustive search over the space of network architectures is computationally infeasible even for networks of modest size. This motivates the use of heuristic strategies that dramatically reduce the search complexity. These strategies employ directed search algorithms, such as selecting the number of nodes via sequential network construction (SNC) and pruning inputs and weights via sensitivity based pruning (SBP) and optimal brain damage (OBD) respectively.
Selecting a Classification Method by CrossValidation
 Machine Learning
, 1993
"... If we lack relevant problemspecific knowledge, crossvalidation methods may be used to select a classification method empirically. We examine this idea here to show in what senses crossvalidation does and does not solve the selection problem. As illustrated empirically, crossvalidation may lead t ..."
Abstract

Cited by 66 (0 self)
 Add to MetaCart
If we lack relevant problemspecific knowledge, crossvalidation methods may be used to select a classification method empirically. We examine this idea here to show in what senses crossvalidation does and does not solve the selection problem. As illustrated empirically, crossvalidation may lead to higher average performance than application of any single classification strategy and it also cuts the risk of poor performance. On the other hand, crossvalidation is no more or less a form of bias than simpler strategies and applying it appropriately ultimately depends in the same way on prior knowledge. In fact, crossvalidation may be seen as a way of applying partial information about the applicability of alternative classification strategies. Keywords: Crossvalidation, classification, decision trees, neural networks. 1 Introduction Machine learning researchers and statisticians have produced a host of approaches to the problem of classification including methods for inducing rul...
Blur Identification by the Method of Generalized CrossValidation
 IEEE Trans. Image Processing
, 1991
"... The pointspread function (PSF) of a blurred image is often unknown a priori  the blur must first be identified from the degraded image data before restoring the image. We introduce generalized crossvalidation (GCV) to address the blur identification problem. Motivated by the success of GCV in i ..."
Abstract

Cited by 48 (1 self)
 Add to MetaCart
The pointspread function (PSF) of a blurred image is often unknown a priori  the blur must first be identified from the degraded image data before restoring the image. We introduce generalized crossvalidation (GCV) to address the blur identification problem. Motivated by the success of GCV in identifying optimal smoothing parameters for image restoration, we have extended the method to the problem of identifying blur parameters as well. The GCV criterion identifies model parameters for the blur, the image, and the regularization parameter, providing all the information necessary to restore the image. Experiments are presented which show that GCV is capable of yielding good identification results. Furthermore, a comparison of the GCV criterion to maximum likelihood (ML) estimation shows that GCV often outperforms ML in identifying the blur and image model parameters. To appear in IEEE Transactions on Image Processing. This work was supported in part by the Joint Services Electroni...
Datadriven calibration of penalties for leastsquares regression
, 2008
"... Penalization procedures often suffer from their dependence on multiplying factors, whose optimal values are either unknown or hard to estimate from data. We propose a completely datadriven calibration algorithm for these parameters in the leastsquares regression framework, without assuming a parti ..."
Abstract

Cited by 31 (10 self)
 Add to MetaCart
Penalization procedures often suffer from their dependence on multiplying factors, whose optimal values are either unknown or hard to estimate from data. We propose a completely datadriven calibration algorithm for these parameters in the leastsquares regression framework, without assuming a particular shape for the penalty. Our algorithm relies on the concept of minimal penalty, recently introduced by Birgé and Massart (2007) in the context of penalized least squares for Gaussian homoscedastic regression. On the positive side, the minimal penalty can be evaluated from the data themselves, leading to a datadriven estimation of an optimal penalty which can be used in practice; on the negative side, their approach heavily relies on the homoscedastic Gaussian nature of their stochastic framework. The purpose of this paper is twofold: stating a more general heuristics for designing a datadriven penalty (the slope heuristics) and proving that it works for penalized leastsquares regression with a random design, even for heteroscedastic nonGaussian data. For technical reasons, some exact mathematical results will be proved only for regressogram binwidth selection. This is at least a first step towards further results, since the approach and the method that we use are indeed general.
Bayesian Model Assessment and Comparison Using CrossValidation Predictive Densities
 Neural Computation
, 2002
"... In this work, we discuss practical methods for the assessment, comparison, and selection of complex hierarchical Bayesian models. A natural way to assess the goodness of the model is to estimate its future predictive capability by estimating expected utilities. Instead of just making a point estimat ..."
Abstract

Cited by 28 (11 self)
 Add to MetaCart
In this work, we discuss practical methods for the assessment, comparison, and selection of complex hierarchical Bayesian models. A natural way to assess the goodness of the model is to estimate its future predictive capability by estimating expected utilities. Instead of just making a point estimate, it is important to obtain the distribution of the expected utility estimate, as it describes the uncertainty in the estimate. The distributions of the expected utility estimates can also be used to compare models, for example, by computing the probability of one model having a better expected utility than some other model. We propose an approach using crossvalidation predictive densities to obtain expected utility estimates and Bayesian bootstrap to obtain samples from their distributions. We also discuss the probabilistic assumptions made and properties of two practical crossvalidation methods, importance sampling and kfold crossvalidation. As illustrative examples, we use MLP neural networks and Gaussian Processes (GP) with Markov chain Monte Carlo sampling in one toy problem and two challenging realworld problems.
Bayesian Approach for Neural Networks  Review and Case Studies
 Neural Networks
, 2001
"... We give a short review on the Bayesian approach for neural network learning and demonstrate the advantages of the approach in three real applications. We discuss the Bayesian approach with emphasis on the role of prior knowledge in Bayesian models and in classical error minimization approaches. The ..."
Abstract

Cited by 20 (10 self)
 Add to MetaCart
We give a short review on the Bayesian approach for neural network learning and demonstrate the advantages of the approach in three real applications. We discuss the Bayesian approach with emphasis on the role of prior knowledge in Bayesian models and in classical error minimization approaches. The generalization capability of a statistical model, classical or Bayesian, is ultimately based on the prior assumptions. The Bayesian approach permits propagation of uncertainty in quantities which are unknown to other assumptions in the model, which may be more generally valid or easier to guess in the problem. The case problems studied in this paper include a regression, a classification, and an inverse problem. In the most thoroughly analyzed regression problem, the best models were those with less restrictive priors. This emphasizes the major advantage of the Bayesian approach, that we are not forced to guess attributes that are unknown, such as the number of degrees of freedom in the model, nonlinearity of the model with respect to each input variable, or the exact form for the distribution of the model residuals.
Supervised modelbased visualization of highdimensional data
, 2000
"... When highdimensional data vectors are visualized on a two or threedimensional display, the goal is that two vectors close to each other in the multidimensional space should also be close to each other in the lowdimensional space. Traditionally, closeness is defined in terms of some standard ge ..."
Abstract

Cited by 19 (9 self)
 Add to MetaCart
When highdimensional data vectors are visualized on a two or threedimensional display, the goal is that two vectors close to each other in the multidimensional space should also be close to each other in the lowdimensional space. Traditionally, closeness is defined in terms of some standard geometric distance measure, such as the Euclidean distance, based on a more or less straightforward comparison between the contents of the data vectors. However, such distances do not generally reflect properly the properties of complex problem domains, where changing one bit in a vector may completely change the relevance of the vector. What is more, in realworld situations the similarity of two vectors is not a universal property: even if two vectors can be regarded as similar from one point of view, from another point of view they may appear quite dissimilar. In order to capture these requirements for building a pragmatic and flexible similarity measure, we propose a data visualization scheme where the similarity of two vectors is determined indirectly by using a formal model of the problem domain; in our case, a Bayesian network model. In this scheme, two vectors are considered similar if they lead to similar predictions, when given as input to a Bayesian network model. The scheme is supervised in the sense that different perspectives can be taken into account by using different predictive distributions, i.e., by changing what is to be predicted. In addition, the modeling framework can also be used for validating the rationality of the resulting visualization. This modelbased visualization scheme has been implemented and tested on realworld domains with encouraging results.