Results 1 -
8 of
8
MVC - A Preprocessing Method to Deal With Missing Values
- Knowledge-Based Systems
"... Many of analysis tasks have to deal with missing values and have developed specific and internal treatments to guess them. In this paper we present an external method, MVC (Missing Values Completion), to improve performances of completion and also declarativity and interactions with the user for thi ..."
Abstract
-
Cited by 20 (5 self)
- Add to MetaCart
Many of analysis tasks have to deal with missing values and have developed specific and internal treatments to guess them. In this paper we present an external method, MVC (Missing Values Completion), to improve performances of completion and also declarativity and interactions with the user for this problem. Such qualities will allow to use it for the data cleaning step of the KDD process [6]. The core of MVC, is the RAR algorithm that we have proposed in [15]. This algorithm extends the concept of association rules [1] for databases with multiple missing values. It allows MVC to be an efficient preprocessing method: in our experiments with the c4.5 [13] decision tree program, MVC has permitted to divide, up to two, the error rate in classification, independently of a significant gain of declarativity.
Data mining: Manufacturing and service applications
- International Journal of Production Research
, 2006
"... In this paper basic concepts of machine learning and data mining are introduced. Machine learning algorithms extract knowledge from diverse data bases that can be used to build decision-making systems. For example, based on the operational engineering data, equipment faults can be detected, the numb ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
In this paper basic concepts of machine learning and data mining are introduced. Machine learning algorithms extract knowledge from diverse data bases that can be used to build decision-making systems. For example, based on the operational engineering data, equipment faults can be detected, the number of items to be ordered can be predicted, optimal control parameters can be determined. A framework for organizing and applying knowledge for decision-making in manufacturing and service applications is presented. The framework uses decision-making constructs such decision tables, decision maps, and atlases. It offers a new data-driven paradigm of importance to modern manufacturing and service organisations. Examples of data mining applications in industrial, medical, and pharmaceutical domains are presented. It is envisioned that the data-driven framework presented in the paper will enhance these applications.
An integrated data preparation scheme for neural network data analysis
- IEEE Transactions on Knowledge and Data Engineering
, 2006
"... Abstract—Data preparation is an important and critical step in neural network modeling for complex data analysis and it has a huge impact on the success of a wide variety of complex data analysis tasks, such as data mining and knowledge discovery. Although data preparation in neural network data ana ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract—Data preparation is an important and critical step in neural network modeling for complex data analysis and it has a huge impact on the success of a wide variety of complex data analysis tasks, such as data mining and knowledge discovery. Although data preparation in neural network data analysis is important, some existing literature about the neural network data preparation are scattered, and there is no systematic study about data preparation for neural network data analysis. In this study, we first propose an integrated data preparation scheme as a systematic study for neural network data analysis. In the integrated scheme, a survey of data preparation, focusing on problems with the data and corresponding processing techniques, is then provided. Meantime, some intelligent data preparation solution to some important issues and dilemmas with the integrated scheme are discussed in detail. Subsequently, a cost-benefit analysis framework for this integrated scheme is presented to analyze the effect of data preparation on complex data analysis. Finally, a typical example of complex data analysis from the financial domain is provided in order to show the application of data preparation techniques and to demonstrate the impact of data preparation on complex data analysis. Index Terms—Data preparation, neural networks, complex data analysis, cost-benefit analysis. 1
An Inclusion-Exclusion Result For Boolean Polynomials And Its Applications In Data Mining
- In Proceedings of the Discrete Mathematics in Data Mining Workshop, SIAM Datamining Conference
, 2002
"... We characterize measures on free Boolean algebras and we examine the relationships that exists between measures and binary tables in relational databases. It is shown that these measures are completely de ned by their values on positive conjunctions, and a formula that obtains this value is given b ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We characterize measures on free Boolean algebras and we examine the relationships that exists between measures and binary tables in relational databases. It is shown that these measures are completely de ned by their values on positive conjunctions, and a formula that obtains this value is given by using the method of indicators. We also obtain Bonferroni-type inequalities that allow approximative evaluations of there measures. Finally we present a measure extending the notion of support that is well suited for tables with missing values.
Data Farming Methods for Temporal Data Mining
"... Many temporal data mining projects use the existing data collected for various purposes, ranging from routinely collected data to process improvement projects and data required for regulatory purposes. In some cases, the set of considered features might be large (a wide data set) and sufficient for ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Many temporal data mining projects use the existing data collected for various purposes, ranging from routinely collected data to process improvement projects and data required for regulatory purposes. In some cases, the set of considered features might be large (a wide data set) and sufficient for extraction of knowledge. In other cases the data set might be narrow and insufficient to extract meaningful knowledge or the data may not even exist. Mining wide data sets has received wide attention in the literature. Many models and algorithms for feature selection have been developed for wide data sets. Determining features for which data should be collected in the absence of an existing data set or its partial availability (a narrow set) has not been sufficiently addressed in the literature, especially for temporal data. Yet, this issue is of paramount importance as the interest in data mining is growing. The process and methods used to determine the most appropriate features for data collection and subsequent data analysis are referred to as data farming. This paper provides a foundation for the development of data farming science for temporal analysis of data. 1.
An Interactive and Understandable Method to Treat Missing Values: Application to a Medical Data Set
"... Many analysis tasks have to deal with missing values and some of them have developed specific and internal treatments to guess them. In this paper we present the use of a new method, called MVC (Missing Values Completion), for this question: MVC is based on data preprocessing which gives prom ..."
Abstract
- Add to MetaCart
Many analysis tasks have to deal with missing values and some of them have developed specific and internal treatments to guess them. In this paper we present the use of a new method, called MVC (Missing Values Completion), for this question: MVC is based on data preprocessing which gives prominence to understandable associations and gives the user a central part. Such qualities will allow to use it for the data cleaning step of the Knowledge Discovery in Databases process. The efficiency of this method rests on the Robust Association Rules algorithm that we have proposed. This algorithm extends the concept of association rules for databases with multiple missing values. We give some examples of the use of MVC in a real world data set (in medicine), highlighting typical use of this method. Keywords: Association rules, Missing Values, Preprocessing, Elastic Stocking. 1.
SPoID: Do Not Throw Meaningful Incomplete Sequences Away!
"... Industrial databases often contain a large amount of unfilled information. During the knowledge discovery process one processing step is often necessary in order to remove these incomplete data either by deleting or assessing them. When the data mining task consists in mining for frequent sequences, ..."
Abstract
- Add to MetaCart
Industrial databases often contain a large amount of unfilled information. During the knowledge discovery process one processing step is often necessary in order to remove these incomplete data either by deleting or assessing them. When the data mining task consists in mining for frequent sequences, incomplete data are, most of the time, deleted, which leads to an important loss of information. Extracted knowledge then becomes less representative of the database. Therefore we propose a method that uses the partial information contained in incomplete records, only temporary ignoring the missing part of the record. Experiments run on various synthetic datasets show the validity of our proposal as well in terms of quality as in terms of the robustness to the rate of missing values.

