Results 1 - 10
of
13
A Comparative Study of Discretization Methods for Naive-Bayes Classifiers
- In Proceedings of PKAW 2002: The 2002 Pacific Rim Knowledge Acquisition Workshop
, 2002
"... Discretization is a popular approach to handling numeric attributes in machine learning. We argue that the requirements for effective discretization differ between naive-Bayes learning and many other learning algorithms. We evaluate the effectiveness with naive-Bayes classifiers of nine discretizati ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Discretization is a popular approach to handling numeric attributes in machine learning. We argue that the requirements for effective discretization differ between naive-Bayes learning and many other learning algorithms. We evaluate the effectiveness with naive-Bayes classifiers of nine discretization methods, equal width discretization (EWD), equal frequency discretization (EFD), fuzzy discretization (FD), entropy minimization discretization (EMD), iterative discretization (ID), proportional k-interval discretization (PKID), lazy discretization (LD), nondisjoint discretization (NDD) and weighted proportional k-interval discretization (WPKID). It is found that in general naive-Bayes classifiers trained on data preprocessed by LD, NDD or WPKID achieve lower classification error than those trained on data preprocessed by the other discretization methods. But LD can not scale to large data. This study leads to a new discretization method, weighted non-disjoint discretization (WNDD) that combines WPKID and NDD's advantages. Our experiments show that among all the rival discretization methods, WNDD best helps naive-Bayes classifiers reduce average classification error.
Proportional k-interval discretization for naive-Bayes classifiers
- Proc. of the Twelfth European Conf. on Machine Learning
, 2001
"... Abstract. This paper argues that two commonly-used discretization approaches, fixed k-interval discretization and entropy-based discretization have sub-optimal characteristics for naive-Bayes classification. This analysis leads to a new discretization method, Proportional k-Interval Discretization ( ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
Abstract. This paper argues that two commonly-used discretization approaches, fixed k-interval discretization and entropy-based discretization have sub-optimal characteristics for naive-Bayes classification. This analysis leads to a new discretization method, Proportional k-Interval Discretization (PKID), which adjusts the number and size of discretized intervals to the number of training instances, thus seeks an appropriate trade-off between the bias and variance of the probability estimation for naive-Bayes classifiers. We justify PKID in theory, as well as test it on a wide cross-section of datasets. Our experimental results suggest that in comparison to its alternatives, PKID provides naive-Bayes classifiers competitive classification performance for smaller datasets and better classification performance for larger datasets. 1
Spatial Learning and Localization in Animals: A Computational Model and its Implications for Mobile Robots
, 1997
"... The ability to acquire a representation of the spatial environment and the ability to localize within it are essential for successful navigation in a-priori unknown environments. The hippocampal formation is believed to play a key role in spatial learning and navigation in animals. This paper briefl ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
The ability to acquire a representation of the spatial environment and the ability to localize within it are essential for successful navigation in a-priori unknown environments. The hippocampal formation is believed to play a key role in spatial learning and navigation in animals. This paper briefly reviews the relevant neurobiological and cognitive data and their relation to computational models of spatial learning and localization used in mobile robots. It also describes a hippocampal model of spatial learning and navigation and analyzes it using Kalman filter based tools for information fusion from multiple uncertain sources. The resulting model allows a robot to learn a place-based, metric representation of space in a-priori unknown environments and to localize itself in a stochastically optimal manner. The paper also describes an algorithmic implementation of the model and results of several experiments that demonstrate its capabilities.
Weighted Proportional k-Interval Discretization for Naive-Bayes Classifiers
- in: Proc. of the PAKDD
, 2003
"... Abstract. The use of different discretization techniques can be expected to affect the classification bias and variance of naive-Bayes classifiers. We call such an effect discretization bias and variance. Proportional kinterval discretization (PKID) tunes discretization bias and variance by adjustin ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Abstract. The use of different discretization techniques can be expected to affect the classification bias and variance of naive-Bayes classifiers. We call such an effect discretization bias and variance. Proportional kinterval discretization (PKID) tunes discretization bias and variance by adjusting discretized interval size and number proportional to the number of training instances. Theoretical analysis suggests that this is desirable for naive-Bayes classifiers. However PKID is sub-optimal when learning from training data of small size. We argue that this is because PKID equally weighs bias reduction and variance reduction. But for small data, variance reduction can contribute more to lower learning error and thus should be given greater weight than bias reduction. Accordingly we propose weighted proportional k-interval discretization (WPKID), which establishes a more suitable bias and variance trade-off for small data while allowing additional training data to be used to reduce both bias and variance. Our experiments demonstrate that for naive-Bayes classifiers, WPKID improves upon PKID for smaller datasets 1 with significant frequency; and WPKID delivers lower classification error significantly more often than not in comparison to three other leading alternative discretization techniques studied. 1
Non-disjoint discretization for naive-Bayes classifiers
- Proc. Nineteenth International Conference on Machine Learning
, 2002
"... Previous discretization techniques have discretized numeric attributes into disjoint intervals. We argue that this is neither necessary nor appropriate for naive-Bayes classifiers. The analysis leads to a new discretization method, Non-Disjoint Discretization (NDD). NDD forms overlapping intervals f ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Previous discretization techniques have discretized numeric attributes into disjoint intervals. We argue that this is neither necessary nor appropriate for naive-Bayes classifiers. The analysis leads to a new discretization method, Non-Disjoint Discretization (NDD). NDD forms overlapping intervals for a numeric attribute, always locating a value toward the middle of an interval to obtain more reliable probability estimation. It also adjusts the number and size of discretized intervals to the number of training instances, seeking an appropriate trade-off between bias and variance of probability estimation. We justify NDD in theory and test it on a wide cross-section of datasets. Our experimental results suggest that for naive-Bayes classifiers, NDD works better than alternative discretization approaches. 1.
Incremental Communication for Multilayer Neural Networks: Error Analysis
, 1995
"... Artificial neural networks (ANNs) involve a large amount of inter-node communications. To reduce the communication cost as well as the time of learning process in ANNs, we earlier proposed an incremental inter-node communication method. In the incremental communication method, instead of communicati ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Artificial neural networks (ANNs) involve a large amount of inter-node communications. To reduce the communication cost as well as the time of learning process in ANNs, we earlier proposed an incremental inter-node communication method. In the incremental communication method, instead of communicating the full magnitude of the output value of a node, only the increment or decrement to its previous value is sent on a communication link. In this paper, the effects of the limited precision incremental communication method on the convergence behavior and performance of multilayer neural networks are investigated. The nonlinear aspects of representing the incremental values with reduced (limited) precision for the commonly used error backpropagation training algorithm are analyzed. It is shown that the nonlinear effect of small perturbations in the input(s)/output of a node does not enforce instability. The analysis is supported by simulation studies of two problems. The simulation results ...
Nonlinear Estimation and Modeling of fMRI Data using Spatio-Temporal Support Vector Regression
"... Abstract. This paper presents a new and general nonlinear framework for fMRI data analysis based on statistical learning methodology: support vector machines. Unlike most current methods which assume a linear model for simplicity, the estimation and analysis of fMRI signal within the proposed framew ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. This paper presents a new and general nonlinear framework for fMRI data analysis based on statistical learning methodology: support vector machines. Unlike most current methods which assume a linear model for simplicity, the estimation and analysis of fMRI signal within the proposed framework is nonlinear, which matches recent findings on the dynamics underlying neural activity and hemodynamic physiology. The approach utilizes spatio-temporal support vector regression (SVR), within which the intrinsic spatio-temporal autocorrelations in fMRI data are reflected. The novel formulation of the problem allows merging model-driven with data-driven methods, and therefore unifies these two currently separate modes of fMRI analysis. In addition, multiresolution signal analysis is achieved and developed. Other advantages of the approach are: avoidance of interpolation after motion estimation, embedded removal of low-frequency noise components, and easy incorporation of multi-run, multi-subject, and multi-task studies into the framework. 1
Qlyhuvlgdgh)hghudogh0lqdv*hudlv
"... Construtive processes for teaching Psychophysiology were applied and evaluated by using the classification of Bloom and coworkers. A total of 238 undergraduate students, who were taking the course of Psychology at the Federal University of Minas Gerais (Brazil), were considered in this study. The ma ..."
Abstract
- Add to MetaCart
Construtive processes for teaching Psychophysiology were applied and evaluated by using the classification of Bloom and coworkers. A total of 238 undergraduate students, who were taking the course of Psychology at the Federal University of Minas Gerais (Brazil), were considered in this study. The majority of the students were between 20 to 23 years old and female (77%). Significant cognitive gains were observed in the majority of questions in the objective test of knowledge and in the subjetive test of mental habits. The topics of affective domain were subjectively evaluated and their majority showed significant variations or high performances. Course structure, pedagogic process and some points of metascience have shown significant results. However, science application, metascience related to the peripherical aspects of the brain, metascience of the superior mental functions and philosophy of science were significant only in a few cases. The consistency of the answers of subjetive topics was made by verifying the agreement of randomly ordered pre-chosen matched pairs of affirmative and negative questions. This analysis revealed a lack of consolidation of learning in many cases. In conclusion, this work had shown important gains in several pedagogic domains that calls for more attention to subjectivity and affection of the student, including personality maturation emphasized by a constructive method. Some limitations of the students and teachers related to the scarcity of a scientific culture of the country were also discussed.
SAS Global Forum 2008 Posters
"... Global warming is currently a focus of attention. This paper provides how to use SAS to create effective approaches for local measures of global warming. Based on analysis of a large data set, almost 100 years ’ daily temperatures in a local weather station, it gives the detail procedures for testin ..."
Abstract
- Add to MetaCart
Global warming is currently a focus of attention. This paper provides how to use SAS to create effective approaches for local measures of global warming. Based on analysis of a large data set, almost 100 years ’ daily temperatures in a local weather station, it gives the detail procedures for testing the equality of variance and mean, and finding the trend of time plot using “loess smoother ” technique. The results suggest that global warming really exists in the local place, and they also provide evidences that may support global warming.
Designing Dispatching Rules to Minimize Total Tardiness
"... Summary. We approximate optimal solutions to the Flexible Job-Shop Problem by using dispatching rules discovered through Genetic Programming. While Simple Priority Rules have been widely applied in practice, their efficacy remains poor due to lack of a global view. Composite Dispatching Rules have b ..."
Abstract
- Add to MetaCart
Summary. We approximate optimal solutions to the Flexible Job-Shop Problem by using dispatching rules discovered through Genetic Programming. While Simple Priority Rules have been widely applied in practice, their efficacy remains poor due to lack of a global view. Composite Dispatching Rules have been shown to be more effective as they are constructed through human experience. In this work, we employ suitable parameter and operator spaces for evolving Composite Dispatching Rules using Genetic Programming, with an aim towards greater scalability and flexibility. Experimental results show that Composite Dispatching Rules generated by our Genetic Programming framework outperforms the Single and Composite Dispatching Rules selected from literature over large validation sets with respect to total tardiness. Further results on sensitivity to changes (in coefficient values and terminals) among the evolved rules indicate that their designs are optimal. 1

