Results 11 - 20
of
231
Search-Intensive Concept Induction
, 1995
"... This paper describes REGAL, a distributed genetic algorithm-based system, designed for learning First Order Logic concept descriptions from examples. The system is a hybrid between the Pittsburgh and the Michigan approaches, as the population constitutes a redundant set of partial concept descriptio ..."
Abstract
-
Cited by 71 (3 self)
- Add to MetaCart
This paper describes REGAL, a distributed genetic algorithm-based system, designed for learning First Order Logic concept descriptions from examples. The system is a hybrid between the Pittsburgh and the Michigan approaches, as the population constitutes a redundant set of partial concept descriptions, each evolved separately. In order to increase effectiveness, REGAL is specifically tailored to the concept learning task; hence, REGAL is task-dependent, but, on the other hand, domain-independent. The system proved to be particularly robust with respect to parameter setting across a variety of different application domains. REGAL is based on a selection operator, called Universal Suffrage operator, provably allowing the population to asymptotically converge, in average, to an equilibrium state, in which several species coexist. The system is presented both in a serial and in a parallel version, and a new distributed computational model is proposed and discussed. The system has been test...
Pruning and Grouping Discovered Association Rules
, 1995
"... Association rules are statements of the form "for 90 % of the rows of the relation, if the row has value 1 in the columns in set X, then it has 1 also in the columns in set Y ". Efficient methods exist for discovering association rules from large collections of data. The number of discovered rules c ..."
Abstract
-
Cited by 70 (4 self)
- Add to MetaCart
Association rules are statements of the form "for 90 % of the rows of the relation, if the row has value 1 in the columns in set X, then it has 1 also in the columns in set Y ". Efficient methods exist for discovering association rules from large collections of data. The number of discovered rules can, however, be so large that the rules cannot be presented to the user. We show how the set of rules can be pruned by forming rule covers. A rule cover is a subset of the original set of rules such that for each row in the relation there is an applicable rule in the cover if and only if there is an applicable rule in the original set. We also discuss grouping of association rules by clustering, and present some experimental results of both pruning and grouping. Keywords: data mining, association rules, covers, clustering. 1 Introduction Association rules are an interesting class of database regularities, introduced by Agrawal, Imielinski, and Swami [AIS93]. An association rule is an expres...
Feature construction with Inductive Logic Programming: a study of quantitative predictions of chemical activity aided by structural attributes
- Data Mining and Knowledge Discovery
, 1996
"... Recently, computer programs developed within the field of Inductive Logic Programming have received some attention for their ability to construct restricted first-order logic solutions using problem-specific background knowledge. Prominent applications of such programs have been concerned with d ..."
Abstract
-
Cited by 62 (9 self)
- Add to MetaCart
Recently, computer programs developed within the field of Inductive Logic Programming have received some attention for their ability to construct restricted first-order logic solutions using problem-specific background knowledge. Prominent applications of such programs have been concerned with determining "structure-activity" relationships in the areas of molecular biology and chemistry. Typically the task here is to predict the "activity" of a compound, like toxicity, from its chemical structure.
Inductive and Bayesian learning in medical diagnosis
- Applied Artificial Intelligence
, 1993
"... Abstract. Although successful in medical diagnostic problems, inductive learning systems were not widely accepted in medical practice. In this paper two di erent approaches to machine learning in medical appli-cations are compared: the system for inductive learning of decision trees Assistant, and t ..."
Abstract
-
Cited by 56 (9 self)
- Add to MetaCart
Abstract. Although successful in medical diagnostic problems, inductive learning systems were not widely accepted in medical practice. In this paper two di erent approaches to machine learning in medical appli-cations are compared: the system for inductive learning of decision trees Assistant, and the naive Bayesian classi er. Both methodologies were tested in four medical diagnostic problems: localization of primary tumor, prognostics of recurrence of breast cancer, diagnosis of thyroid diseases, and rheumatology. The accuracy of automatically acquired diagnostic knowledge from stored data records is compared and the interpretation of the knowledge and the explanation ability of the classi cation process of each system is discussed. Surprisingly, thenaiveBayesian classi er is superior to Assistant in classi cation accuracy and explanation ability, while the interpretation of the acquired knowledge seems to be equally valuable. In ad-dition, two extensions to naive Bayesian classi er are brie y described: dealing with continuous attributes, and discovering the dependencies among attributes.
Rule Induction and Instance-Based Learning: A Unified Approach
, 1995
"... This paper presents a new approach to inductive learning that combines aspects of instancebased learning and rule induction in a single simple algorithm. The RISE system searches for rules in a specific-to-general fashion, starting with one rule per training example, and avoids some of the difficult ..."
Abstract
-
Cited by 52 (5 self)
- Add to MetaCart
This paper presents a new approach to inductive learning that combines aspects of instancebased learning and rule induction in a single simple algorithm. The RISE system searches for rules in a specific-to-general fashion, starting with one rule per training example, and avoids some of the difficulties of separate-andconquer approaches by evaluating each proposed induction step globally, i.e., through an efficient procedure that is equivalent to checking the accuracy of the rule set as a whole on every training example. Classification is performed using a best-match strategy, and reduces to nearest-neighbor if all generalizations of instances were rejected. An extensive empirical study shows that RISE consistently achieves higher accuracies than state-of-the-art representatives of its "parent" paradigms (PEBLS and CN2), and also outperforms a decision-tree learner (C4.5) in 13 out of 15 test domains (in 10 with 95% confidence). 1 Introduction Several well-developed approaches to indu...
Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis
- IEEE Trans. Software Eng
, 1988
"... Solutions to the problem of learning from examples will have far-reaching benefits, and therefore, the problem is one of the most widely studied in the field of machine learning. The purpose of this study is to investigate a general solution method for the problem, the automatic generation of decisi ..."
Abstract
-
Cited by 51 (5 self)
- Add to MetaCart
Solutions to the problem of learning from examples will have far-reaching benefits, and therefore, the problem is one of the most widely studied in the field of machine learning. The purpose of this study is to investigate a general solution method for the problem, the automatic generation of decision (or classification) trees. The approach is to provide insights through in-depth empirical characterization and evaluation of decision trees for one problem domain, software resource data analysis. The purpose of the decision trees is to identify classes of objects (software modules) that had high development effort or faults, where "high" was defined to be in the uppermost quartile relative to past data. Sixteen software systems ranging from 3000 to 112,000 source lines have been selected for analysis from a NASA production environment. The collection and analysis of 74 attributes (or metrics), for over 4700 objects, capture a multitude of information about the objects: development effort...
User-system cooperation in document annotation based on information extraction
- In Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management, EKAW02
, 2002
"... Abstract. The process of document annotation for the Semantic Web is complex and time consuming, as it requires a great deal of manual annotation. Information extraction from texts (IE) is a technology used by some very recent systems for reducing the burden of annotation. The integration of IE syst ..."
Abstract
-
Cited by 49 (13 self)
- Add to MetaCart
Abstract. The process of document annotation for the Semantic Web is complex and time consuming, as it requires a great deal of manual annotation. Information extraction from texts (IE) is a technology used by some very recent systems for reducing the burden of annotation. The integration of IE systems in annotation tools is quite a new development and there is still the necessity of thinking the impact of the IE system on the whole annotation process. In this paper we initially discuss a number of requirements for the use of IE as support for annotation. Then we present and discuss a model of interaction that addresses such issues and Melita, an annotation framework that implements a methodology for active annotation for the Semantic Web based on IE. Finally we present an experiment that quantifies the gain in using IE as support to human annotators. 1.
A new methodology of extraction, optimization and application of crisp and fuzzy logical rules
- IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2001
"... A new methodology of extraction, optimization, and application of sets of logical rules is described. Neural networks are used for initial rule extraction, local, or global minimization procedures for optimization, and Gaussian uncertainties of measurements are assumed during application of logical ..."
Abstract
-
Cited by 46 (23 self)
- Add to MetaCart
A new methodology of extraction, optimization, and application of sets of logical rules is described. Neural networks are used for initial rule extraction, local, or global minimization procedures for optimization, and Gaussian uncertainties of measurements are assumed during application of logical rules. Algorithms for extraction of logical rules from data with real-valued features require determination of linguistic variables or membership functions. Context-dependent membership functions for crisp and fuzzy linguistic variables are introduced and methods of their determination described. Several neural and machine learning methods of logical rule extraction generating initial rules are described, based on constrained multilayer perceptron, networks with localized transfer functions or on separability criteria for determination of linguistic variables. A tradeoff between accuracy/simplicity is explored at the rule extraction stage and between rejection/error level at the optimization stage. Gaussian uncertainties of measurements are assumed during application of crisp logical rules, leading to “soft trapezoidal” membership functions and allowing to optimize the linguistic variables using gradient procedures. Numerous applications of this methodology to benchmark and real-life problems are reported and very simple crisp logical rules for many datasets provided.
Knowledge discovery and interestingness measures: A survey
, 1999
"... Knowledge discovery in databases, also known as data mining, is the efficient discovery of previously unknown, valid, novel, potentially useful, and understandable patterns in large databases. It encompasses many different techniques and algorithms which differ in the kinds of data that can be analy ..."
Abstract
-
Cited by 44 (1 self)
- Add to MetaCart
Knowledge discovery in databases, also known as data mining, is the efficient discovery of previously unknown, valid, novel, potentially useful, and understandable patterns in large databases. It encompasses many different techniques and algorithms which differ in the kinds of data that can be analyzed and the form of knowledge representation used to convey the discovered knowledge. An important problem in the area of data mining is the development of effective measures of interestingness for ranking the discovered knowledge. In this report, we provide a general overview of the more successful and widely known data mining techniques and algorithms, and survey seventeen interestingness measures from the literature that have been successfully employed in data mining applications. 1 1
A Comparison of Dynamic and non--Dynamic Rough Set Methods for Extracting Laws from Decision Tables
, 1998
"... We report results of experiments on several data sets, in particular: Monk's problems data (see [58]), medical data (lymphography, breast cancer, primary tumor - see [30]) and StatLog's data (see [32]). We compare standard methods for extracting laws from decision tables (see [43], [52]), based on r ..."
Abstract
-
Cited by 44 (3 self)
- Add to MetaCart
We report results of experiments on several data sets, in particular: Monk's problems data (see [58]), medical data (lymphography, breast cancer, primary tumor - see [30]) and StatLog's data (see [32]). We compare standard methods for extracting laws from decision tables (see [43], [52]), based on rough set (see [42]) and boolean reasoning (see [8]), with the method based on dynamic reducts and dynamic rules (see [3],[4],[5],[6]). We also compare the results of computer experiments on those data sets obtained by applying our system based on rough set methods with the results on the same data sets obtained with help of several data analysis systems known from literature.

