Results 1 -
5 of
5
Using Rough Sets with Heuristics for Feature Selection
- Journal of Intelligent Information Systems
, 2001
"... Practical machine learning algorithms are known to degrade in performance (prediction accuracy) when faced with many features (sometimes attribute is used instead of feature) that are not necessary for rule discovery. To cope with this problem, many methods for selecting a subset of features have be ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
Practical machine learning algorithms are known to degrade in performance (prediction accuracy) when faced with many features (sometimes attribute is used instead of feature) that are not necessary for rule discovery. To cope with this problem, many methods for selecting a subset of features have been proposed. Among such methods, the filter approach that selects a feature subset using a preprocessing step, and the wrapper approach that selects an optimal feature subset from the space of possible subsets of features using the induction algorithm itself as a part of the evaluation function, are two typical ones. Although the filter approach is a faster one, it has some blindness and the performance of induction is not considered. On the other hand, the optimal feature subsets can be obtained by using the wrapper approach, but it is not easy to use because of the complexity of time and space. In this paper, we propose an algorithm which is using rough set theory with greedy heuristics for feature selection. Selecting features is similar to the filter approach, but the evaluation criterion is related to the performance of induction. That is, we select the features that do not damage the performance of induction.
Granular Computing Using Information Tables
- In: Data Mining, Rough Sets and Granular Computing
, 2002
"... Abstract. A simple and more concrete granular computing model may be developed using the notion of information tables. In this framework, each object in a finite nonempty universe is described by a finite set of attributes. Based on attribute values of objects, one may decompose the universe into pa ..."
Abstract
-
Cited by 12 (8 self)
- Add to MetaCart
Abstract. A simple and more concrete granular computing model may be developed using the notion of information tables. In this framework, each object in a finite nonempty universe is described by a finite set of attributes. Based on attribute values of objects, one may decompose the universe into parts called granules. Objects in each granule share the same or similar description in terms of their attribute values. Studies along this line have been carried out in the theories of rough sets and databases. Within the proposed model, this paper reviews the pertinent existing results and presents their generalizations and applications. 1
Heuristic Measures of Interestingness
- Proceedings of the Third European Conference on the Principles of Data Mining and Knowledge Discovery (PKDD'99
, 1999
"... The tuples in a generalized relation (i.e., a summary generated from a database) are unique, and therefore, can be considered to be a population with a structure that can be described by some probability distribution. In this paper, we present and empirically compare sixteen heuristic measures t ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
The tuples in a generalized relation (i.e., a summary generated from a database) are unique, and therefore, can be considered to be a population with a structure that can be described by some probability distribution. In this paper, we present and empirically compare sixteen heuristic measures that evaluate the structure of a summary to assign a single real-valued index that represents its interestingness relative to other summaries generated from the same database. The heuristics are based upon well-known measures of diversity, dispersion, dominance, and inequality used in several areas of the physical, social, ecological, management, information, and computer sciences. Their use for ranking summaries generated from databases is a new application area. All sixteen heuristics rank less complex summaries (i.e., those with few tuples and/or few non-ANY attributes) as most interesting. We demonstrate that for sample data sets, the order in which some of the measures rank summaries is highly correlated.
Information-theoretic measures for knowledge discovery and data mining, in: Entropy Measures, Maximum Entropy and Emerging Applications, Karmeshu (Ed
- in Entropy Measures, Maximum Entropy and Emerging Applications
, 2003
"... Abstract. A database may be considered as a statistical population, and an attribute as a statistical variable taking values from its domain. One can carry out statistical and information-theoretic analysis on a database. Based on the attribute values, a database can be partitioned into smaller popu ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
Abstract. A database may be considered as a statistical population, and an attribute as a statistical variable taking values from its domain. One can carry out statistical and information-theoretic analysis on a database. Based on the attribute values, a database can be partitioned into smaller populations. An attribute is deemed important if it partitions the database such that previously unknown regularities and patterns are observable. Many information-theoretic measures have been proposed and applied to quantify the importance of attributes and relationships between attributes in various fields. In the context of knowledge discovery and data mining (KDD), we present a critical review and analysis of informationtheoretic measures of attribute importance and attribute association, with emphasis on their interpretations and connections. 1
Ranking the Interestingness of Summaries from Data Mining Systems
- In Proceedings of the 12th Annual Florida Artificial Intelligence Research Symposium (FLAIRS'99
, 1999
"... We study data mining where the task is description by summarization, the representation language is generalized relations, the evaluation criteria are based on heuristic measures of interestingness, and the method for searching is the Multi-Attribute Generalization algorithm for domain generali ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
We study data mining where the task is description by summarization, the representation language is generalized relations, the evaluation criteria are based on heuristic measures of interestingness, and the method for searching is the Multi-Attribute Generalization algorithm for domain generalization graphs. We present and empirically compare four heuristics for ranking the interestingness of generalized relations (or summaries). The measures are based on common measures of the diversity of a population, statistical variance, the Simpson index, and the Shannon index. All four measures rank less complex summaries (i.e., those with few tuples and/or non-ANY attributes) as most interesting. Highly ranked summaries provide a reasonable starting point for further analysis of discovered knowledge.

