Results 1 - 10
of
12
CAIM Discretization Algorithm
, 2003
"... The task of extracting knowledge from databases is quite often performed by machine leaming algorithms. The majority of these algorithms can be applied only to data described by discrete numerical or nominal attributes (features). In the case of continuous attributes, there is a need for a discre ..."
Abstract
-
Cited by 42 (2 self)
- Add to MetaCart
(Show Context)
The task of extracting knowledge from databases is quite often performed by machine leaming algorithms. The majority of these algorithms can be applied only to data described by discrete numerical or nominal attributes (features). In the case of continuous attributes, there is a need for a discretization algorithm that transforms continuous attributes into discrete ones. This paper describes such an algorithm, called CAIM (class-attribute interdependence maximization), which is designed to work with supervised data. The goal of the CAlM algorithm is to maximize the class-attribute interdependence and to generate a (possibly) minimal number of discrete intervals. The algorithm does not require the user to predefine the number of intervals, as opposed to some other discretization algorithms. The tests performed using CALM, and six other state-of-the-art discretization algorithms, show that discrete attributes generated by the CAlM algorithm almost always have the lowest number of intervals and the highest class-attribute interdependency. Two machine learning algorithms, the CLIP4 rule algorithm and the decision tree algorithm, are used to generate classification rules from data discretized by CALM. For both the CLIP4 and decision tree algorithms, the accuracy of the generated rules is higher and the number of the rules is lower for data discretized using the CAlM algorithm when compared to data discretized using six other discretization algorithms. The highest classification accuracy was always achieved for datasets discretized using the CAlM algorithm, as compared with the other six algorithms. Of four supervised algorithms used for comparison, the CAlM algorithm is comparable in speed to the two fastest.
A Discretization Algorithm Based on a Heterogeneity Criterion
- IEEE Transactions on Knowledge and Data Engineering
, 2005
"... Abstract—Discretization, as a preprocessing step for data mining, is a process of converting the continuous attributes of a data set into discrete ones so that they can be treated as the nominal features by machine learning algorithms. Those various discretization methods, that use entropy-based cri ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Abstract—Discretization, as a preprocessing step for data mining, is a process of converting the continuous attributes of a data set into discrete ones so that they can be treated as the nominal features by machine learning algorithms. Those various discretization methods, that use entropy-based criteria, form a large class of algorithm. However, as a measure of class homogeneity, entropy cannot always accurately reflect the degree of class homogeneity of an interval. Therefore, in this paper, we propose a new measure of class heterogeneity of intervals from the viewpoint of class probability itself. Based on the definition of heterogeneity, we present a new criterion to evaluate a discretization scheme and analyze its property theoretically. Also, a heuristic method is proposed to find the approximate optimal discretization scheme. Finally, our method is compared, in terms of predictive error rate and tree size, with Ent-MDLC, a representative entropy-based discretization method well-known for its good performance. Our method is shown to produce better results than those of Ent-MDLC, although the improvement is not significant. It can be a good alternative to entropy-based discretization methods.
A global optimal algorithm for class-dependent discretization of continuous data
"... This paper presents a new method to convert continuous variables into discrete variables for inductive machine learning. The method can be applied to pattern classification problems in machine learning and data mining. The discretization process is formulated as an optimization problem. We first use ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
This paper presents a new method to convert continuous variables into discrete variables for inductive machine learning. The method can be applied to pattern classification problems in machine learning and data mining. The discretization process is formulated as an optimization problem. We first use the normalized mutual information that measures the interdependence between the class labels and the variable to be discretized as the objective function, and then use fractional programming (iterative dynamic programming) to find its optimum. Unlike the majority of class-dependent discretization methods in the literature which only find the local optimum of the objective functions, the proposed method, OCDD, or Optimal Class-Dependent Discretization, finds the global optimum. The experimental results demonstrate that this algorithm is very effective in classification when coupled with popular learning systems such as C4.5 decision trees and Naive-Bayes classifier. It can be used to discretize continuous variables for many existing inductive learning systems.
CLIP4: Hybrid inductive machine learning algorithm that generates inequality rules
, 2004
"... The paper describes a hybrid inductive machine learning algorithm called CLIP4. The algorithm first partitions data into subsets using a tree structure and then generates production rules only from subsets stored at the leaf nodes. The unique feature of the algorithm is generation of rules that invo ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
The paper describes a hybrid inductive machine learning algorithm called CLIP4. The algorithm first partitions data into subsets using a tree structure and then generates production rules only from subsets stored at the leaf nodes. The unique feature of the algorithm is generation of rules that involve inequalities. The algorithm works with the data that have large number of examples and attributes, can cope with noisy data, and can use numerical, nominal, continuous, and missing-value attributes. The algorithm's flexibility and e#ciency are shown on several well-known benchmarking data sets, and the results are compared with other machine learning algorithms. The benchmarking results in each instance show the CLIP4's accuracy, CPU time, and rule complexity. CLIP4 has built-in features like tree pruning, methods for partitioning the data (for data with large number of examples and attributes, and for data containing noise), dataindependent mechanism for dealing with missing values, genetic operators to improve accuracy on small data, and the discretization schemes. CLIP4 generates model of data that consists of well-generalized rules, and ranks attributes and selectors that can be used for feature selection.
An Approach to Qualitative Radial Basis Function Networks over Orders of Magnitude
- Proceedings of 18th International Workshop on Qualitative Reasoning
, 2004
"... This paper lies within the domain of supervised learning algorithms based on neural networks whose architecture corresponds to radial basis functions. A methodology to use RBF when the descriptors of the patterns are given by means of their orders of magnitude is developed. A qualitative distance is ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
This paper lies within the domain of supervised learning algorithms based on neural networks whose architecture corresponds to radial basis functions. A methodology to use RBF when the descriptors of the patterns are given by means of their orders of magnitude is developed. A qualitative distance is constructed over the discrete structure of absolute orders of magnitude spaces. This distance is reliant on a metric structure defined in R n. The aim is to capture the remoteness between the components of the patterns by locating labels with respect to extreme magnitudes. An application to a financial problem of the described learning method is given and it permits to compare results obtained from a qualitative treatment with those from a quantitative treatment.
Afify, "Online Discretization of Continuous-Valued Attributes in Rule Induction
- Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science
, 2005
"... Abstract: Machine learning algorithms designed for engineering applications must be able to handle numerical attributes, particularly attributes with real (or continuous) values. Many algorithms deal with continuous-valued attributes by discretizing them before starting the learning process. This pa ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract: Machine learning algorithms designed for engineering applications must be able to handle numerical attributes, particularly attributes with real (or continuous) values. Many algorithms deal with continuous-valued attributes by discretizing them before starting the learning process. This paper describes a new approach for discretization of continuous-valued attributes during the learning process. Incorporating discretization within the learning process has the advantage of taking into account the bias inherent in the learning system as well as the interactions between the different attributes. Experiments have demonstrated that the proposed method, when used in conjunction with the SRI rule induction algorithm developed by the authors, improves the accuracy of the induced model.
Financial Credit Risk Measurement Prediction Using Innovative Soft-computing Techniques
- International Conference on Computational Finance & its Applications
, 2004
"... techniques ..."
(Show Context)
Project Coordinator
"... The MERITO co-ordinated project aims at studying and developing several innovative Artificial Intelligence techniques in order to analyse and evaluate problems involving qualitative information and defined in changing environments. The study is oriented towards the modelling and resolution of a fina ..."
Abstract
- Add to MetaCart
The MERITO co-ordinated project aims at studying and developing several innovative Artificial Intelligence techniques in order to analyse and evaluate problems involving qualitative information and defined in changing environments. The study is oriented towards the modelling and resolution of a financial problem: the measurement of credit or default risk. The methodology is based on the improvement of soft-computing techniques, such as support vector machines (SVM) and radial base functions (RBF) by using orders of magnitude qualitative models. The use of these techniques will, on the one hand, permit the introduction of experts ’ knowledge and, on the other, to extract knowledge from results. It is planned to develop a methodology based on the study findings in order to measure firms ’ credit risk using their financial data and market and environment information. As a consequence of this project, there is the intention of launching a software tool, which measures the risk of default from financial and qualitative information. This product would provide a valuable decision support system and confer firms and financial bodies with a significant competitive advantage.
Jornada de Seguimiento de Proyectos
"... Las convocatorias de proyectos del Plan Nacional de Ciencia y Tecnología incluyen la obligación por parte de la Subdirección de Proyectos de Investigación de efectuar el seguimiento y evaluación de los resultados de cada proyecto subvencionado. Siguiendo el espíritu de esta norma, el Programa Nacion ..."
Abstract
- Add to MetaCart
(Show Context)
Las convocatorias de proyectos del Plan Nacional de Ciencia y Tecnología incluyen la obligación por parte de la Subdirección de Proyectos de Investigación de efectuar el seguimiento y evaluación de los resultados de cada proyecto subvencionado. Siguiendo el espíritu de esta norma, el Programa Nacional de Tecnologías Informáticas (TIN) realiza periódicamente unas jornadas de seguimiento de proyectos que tienen como fin tanto la mencionada evaluación como la difusión de las actividades en este área a otros científicos y al sector industrial. Las presentes actas recopilan los informes remitidos por los coordinadores e investigadores responsables de los proyectos que fueron seleccionados para la jornada de seguimiento del año 2004. Dicha selección presenta una muestra significativa de la investigación nacional en Informática, reuniendo proyectos de distinta índole y naturaleza. La jornada se celebró en Málaga el día 11 de Noviembre, coincidiendo con otros dos eventos: • IX Jornadas de Ingeniería del Software y Bases de Datos y • IV Jornadas de Programación y Lenguajes. Deseamos agradecer su participación a los miembros del Comité Técnico que han constituido los dos paneles en que se ha organizado el seguimiento. Dicho Comité estuvo compuesto por diferentes investigadores provenientes tanto del sector académico como industrial, incluyendo algunos miembros de otros países. Asimismo, queremos dar las gracias a todos los miembros del comité organizador local, cuyo trabajo ha sido clave para la realización de las jornadas.