Results 1 - 10
of
44
Toward integrating feature selection algorithms for classification and clustering
- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2005
"... This paper introduces concepts and algorithms of feature selection, surveys existing feature selection algorithms for classification and clustering, groups and compares different algorithms with a categorizing framework based on search strategies, evaluation criteria, and data mining tasks, reveals ..."
Abstract
-
Cited by 71 (6 self)
- Add to MetaCart
This paper introduces concepts and algorithms of feature selection, surveys existing feature selection algorithms for classification and clustering, groups and compares different algorithms with a categorizing framework based on search strategies, evaluation criteria, and data mining tasks, reveals unattempted combinations, and provides guidelines in selecting feature selection algorithms. With the categorizing framework, we continue our efforts toward building an integrated system for intelligent feature selection. A unifying platform is proposed as an intermediate step. An illustrative example is presented to show how existing feature selection algorithms can be integrated into a meta algorithm that can take advantage of individual algorithms. An added advantage of doing so is to help a user employ a suitable algorithm without knowing details of each algorithm. Some real-world applications are included to demonstrate the use of feature selection in data mining. We conclude this work by identifying trends and challenges of feature selection research and development.
Efficient feature selection via analysis of relevance and redundancy
- Journal of Machine Learning Research
, 2004
"... Feature selection is applied to reduce the number of features in many applications where data has hundreds or thousands of features. Existing feature selection methods mainly focus on finding relevant features. In this paper, we show that feature relevance alone is insufficient for efficient feature ..."
Abstract
-
Cited by 56 (2 self)
- Add to MetaCart
Feature selection is applied to reduce the number of features in many applications where data has hundreds or thousands of features. Existing feature selection methods mainly focus on finding relevant features. In this paper, we show that feature relevance alone is insufficient for efficient feature selection of high-dimensional data. We define feature redundancy and propose to perform explicit redundancy analysis in feature selection. A new framework is introduced that decouples relevance analysis and redundancy analysis. We develop a correlation-based method for relevance and redundancy analysis, and conduct an empirical study of its efficiency and effectiveness comparing with representative methods.
Redundancy based feature selection for microarray data
- In Proc. of SIGKDD
, 2004
"... In gene expression microarray data analysis, selecting a small number of discriminative genes from thousands of genes is an important problem for accurate classification of diseases or phenotypes. The problem becomes particularly challenging due to the large number of features (genes) and small samp ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
In gene expression microarray data analysis, selecting a small number of discriminative genes from thousands of genes is an important problem for accurate classification of diseases or phenotypes. The problem becomes particularly challenging due to the large number of features (genes) and small sample size. Traditional gene selection methods often select the top-ranked genes according to their individual discriminative power without handling the high degree of redundancy among the genes. Latest research shows that removing redundant genes among selected ones can achieve a better representation of the characteristics of the targeted phenotypes and lead to improved classification accuracy. Hence, we study in this paper the relationship between feature relevance and redundancy and propose an efficient method that can effectively remove redundant genes. The efficiency and effectiveness of our method in comparison with representative methods has been demonstrated through an empirical study using public microarray data sets.
Using AUC and accuracy in evaluating learning algorithms
- IEEE Transactions on Knowledge and Data Engineering
, 2005
"... The area under the ROC (Receiver Operating Characteristics) curve, or simply AUC, has been recently proposed as an alternative single-number measure for evaluating the predictive ability of learning algorithms. However, no formal arguments were given as to why AUC should be preferred over accuracy. ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
The area under the ROC (Receiver Operating Characteristics) curve, or simply AUC, has been recently proposed as an alternative single-number measure for evaluating the predictive ability of learning algorithms. However, no formal arguments were given as to why AUC should be preferred over accuracy. In this paper, we establish formal criteria for comparing two different measures for learning algorithms, and we show theoretically and empirically that AUC is, in general, a better measure (defined precisely) than accuracy. We then reevaluate well-established claims in machine learning based on accuracy using AUC, and obtain interesting and surprising new results. We also show that AUC is more directly associated with the net profit than accuracy in direct marketing, suggesting that learning algorithms should optimize AUC instead of accuracy in real-world applications. The conclusions drawn in this paper may make a significant impact to machine learning and data mining applications. Note: This paper integrates results in our papers published in IJCAI 2003 [22] and ICDM 2003 [15]. It also includes many new results. For example, the concept of indifferency in Section II-B is new, and Sections III-B, III-C, IV-A, IV-D, and V are all new and unpublished. Index Terms Evaluation of learning algorithms, AUC vs accuracy, ROC
Compression-based averaging of selective naive Bayes classifiers
- Journal of Machine Learning Research
, 2007
"... The naive Bayes classifier has proved to be very effective on many real data applications. Its performance usually benefits from an accurate estimation of univariate conditional probabilities and from variable selection. However, although variable selection is a desirable feature, it is prone to ove ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
The naive Bayes classifier has proved to be very effective on many real data applications. Its performance usually benefits from an accurate estimation of univariate conditional probabilities and from variable selection. However, although variable selection is a desirable feature, it is prone to overfitting. In this paper, we introduce a Bayesian regularization technique to select the most probable subset of variables compliant with the naive Bayes assumption. We also study the limits of Bayesian model averaging in the case of the naive Bayes assumption and introduce a new weighting scheme based on the ability of the models to conditionally compress the class labels. The weighting scheme on the models reduces to a weighting scheme on the variables, and finally results in a naive Bayes classifier with “soft variable selection”. Extensive experiments show that the compressionbased averaged classifier outperforms the Bayesian model averaging scheme.
Optimizing time series discretization for knowledge discovery
- Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’05
, 2005
"... Knowledge Discovery in time series usually requires symbolic time series. Many discretization methods that convert numeric time series to symbolic time series ignore the temporal order of values. This often leads to symbols that do not correspond to states of the process generating the time series a ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
Knowledge Discovery in time series usually requires symbolic time series. Many discretization methods that convert numeric time series to symbolic time series ignore the temporal order of values. This often leads to symbols that do not correspond to states of the process generating the time series and cannot be interpreted meaningfully. We propose a new method for meaningful unsupervised discretization of numeric time series called Persist. The algorithm is based on the Kullback-Leibler divergence between the marginal and the self-transition probability distributions of the discretization symbols. Its performance is evaluated on both artificial and real life data in comparison to the most common discretization methods. Persist achieves significantly higher accuracy than existing static methods and is robust against noise. It also outperforms Hidden Markov Models for all but very simple cases.
Discretization for naive-Bayes learning: managing discretization bias and variance
, 2003
"... Quantitative attributes are usually discretized in naive-Bayes learning. We prove a theorem that explains why discretization can be effective for naive-Bayes learning. The use of different discretization techniques can be expected to affect the classification bias and variance of generated naive-Bay ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
Quantitative attributes are usually discretized in naive-Bayes learning. We prove a theorem that explains why discretization can be effective for naive-Bayes learning. The use of different discretization techniques can be expected to affect the classification bias and variance of generated naive-Bayes classifiers, effects we name discretization bias and variance. We argue that by properly managing discretization bias and variance, we can effectively reduce naive-Bayes classification error. In particular, we propose proportional k-interval discretization and equal size discretization, two efficient heuristic discretization methods that are able to effectively manage discretization bias and variance by tuning discretized interval size and interval number. We empirically evaluate our new techniques against five key discretization methods for naive-Bayes classifiers. The experimental results support our theoretical arguments by showing that naive-Bayes classifiers trained on data discretized by our new methods are able to achieve lower classification error than those trained on data discretized by alternative discretization methods.
Coordination Number Prediction Using Learning Classifier Systems: Performance and interpretability
, 2006
"... The prediction of the coordination number (CN) of an amino acid in a protein structure has recently received renewed attention. In a recent paper, Kinjo et al. proposed a realvalued definition of CN and a criterion to map it onto a finite set of classes, in order to predict it using classification a ..."
Abstract
-
Cited by 9 (8 self)
- Add to MetaCart
The prediction of the coordination number (CN) of an amino acid in a protein structure has recently received renewed attention. In a recent paper, Kinjo et al. proposed a realvalued definition of CN and a criterion to map it onto a finite set of classes, in order to predict it using classification approaches. The literature reports several kinds of input information used for CN prediction. The aim of this paper is to assess the performance of a state-of-the-art learning method, Learning Classifier Systems (LCS) on this CN definition, with various degrees of precision, based on several combinations of input attributes. Moreover, we will compare the LCS performance to other well-known learning techniques. Our experiments are also intended to determine the minimum set of input information needed to achieve good predictive performance, so as to generate competent yet simple and interpretable classification rules. Thus, the generated predictors (rule sets) are analyzed for their interpretability.
On Why Discretization Works for Naive-Bayes Classifiers
- In Proceedings of the 16th Australian Joint Conference on Artificial Intelligence (AI
, 2003
"... We investigate why discretization is effective in naive-Bayes learning. We prove a theorem that identifies particular conditions under which discretization will result in naive-Bayes classifiers delivering the same probability estimates as would be obtained if the correct probability density functio ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
We investigate why discretization is effective in naive-Bayes learning. We prove a theorem that identifies particular conditions under which discretization will result in naive-Bayes classifiers delivering the same probability estimates as would be obtained if the correct probability density functions were employed.
On Issues of Instance Selection
-
, 2002
"... The digital technologies and computer advances with the booming internet uses have led to massive data collection (corporate data, data warehouses, webs, just to name a few) and information (or misinformation) explosion. Szalay and Gray described this phenomenon as “drowning in data” (Szalay and Gra ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
The digital technologies and computer advances with the booming internet uses have led to massive data collection (corporate data, data warehouses, webs, just to name a few) and information (or misinformation) explosion. Szalay and Gray described this phenomenon as “drowning in data” (Szalay and Gray, 1999). They reported that each year the detectors at the CERN particle collider in Switzerland record 1 petabyte of data; and researchers in areas of science from astronomy to the human genome are facing the same problems and choking on information. A very natural question is “now that we have gathered so much data, what do we do with it?” Raw data is rarely of direct use and manual analysis simply cannot keep pace with the fast growth of data. Data mining and knowledge discovery (KDD), as a new emerging field comprising disciplines such as databases, statistics, machine learning, comes to the rescue. KDD attempts to turn raw data into nuggets and create special edges in this ever competitive world for science discovery and business intelligence. The KDD process is defined in Fayyad et al. (1996) as the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. Data Mining processes include data selection, preprocessing, data mining, interpretation and evaluation.

