Results 1  10
of
11
Levelwise Search and Borders of Theories in Knowledge Discovery
, 1997
"... One of the basic problems in knowledge discovery in databases (KDD) is the following: given a data set r, a class L of sentences for defining subgroups of r, and a selection predicate, find all sentences of L deemed interesting by the selection predicate. We analyze the simple levelwise algorithm fo ..."
Abstract

Cited by 211 (13 self)
 Add to MetaCart
One of the basic problems in knowledge discovery in databases (KDD) is the following: given a data set r, a class L of sentences for defining subgroups of r, and a selection predicate, find all sentences of L deemed interesting by the selection predicate. We analyze the simple levelwise algorithm for finding all such descriptions. We give bounds for the number of database accesses that the algorithm makes. For this, we introduce the concept of the border of a theory, a notion that turns out to be surprisingly powerful in analyzing the algorithm. We also consider the verification problem of a KDD process: given r and a set of sentences S ` L, determine whether S is exactly the set of interesting statements about r. We show strong connections between the verification problem and the hypergraph transversal problem. The verification problem arises in a natural way when using sampling to speed up the pattern discovery step in KDD.
On an algorithm for finding all interesting sentences (Extended Abstract)
 In Cybernetics and Systems, Volume II, The Thirteenth European Meeting on Cybernetics and Systems Research
, 1996
"... Knowledge discovery in databases (KDD), also called data mining, has recently received wide attention from practitioners and researchers. One of the basic problems in KDD is the following: given a data set r, a class L of sentences defining subgroups or properties of r, and an interestingness predic ..."
Abstract

Cited by 33 (9 self)
 Add to MetaCart
Knowledge discovery in databases (KDD), also called data mining, has recently received wide attention from practitioners and researchers. One of the basic problems in KDD is the following: given a data set r, a class L of sentences defining subgroups or properties of r, and an interestingness predicate, find all sentences of L deemed interesting by the interestingness predicate. In this paper we analyze a simple and wellknown levelwise algorithm for finding all such descriptions. We give bounds for the number of database accesses that the algorithm makes. We also consider the verification problem of a KDD process: given r and a set of sentences T ` L, determine whether T is exactly the set of interesting statements about r. We show strong connections between the verification problem and the hypergraph transversal problem. The verification problem arises in a natural way when using sampling to speed up the pattern discovery step in KDD.
First Order Theory Refinement
, 1996
"... . This paper summarizes the current state of the art in the topics of firstorder theory revision and theory restructuring. The various tasks involved in firstorder theory refinement (revision vs. restructuring) are defined and then discussed in turn. For theory revision, the issue of minimality is ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
. This paper summarizes the current state of the art in the topics of firstorder theory revision and theory restructuring. The various tasks involved in firstorder theory refinement (revision vs. restructuring) are defined and then discussed in turn. For theory revision, the issue of minimality is discussed in detail, and a general outline algorithm is given which identifies the major places of difference between different algorithms. The paper then describes the various options that have been explored by different researchers. For theory restructuring, two approaches developed in the ILP project are discussed, one devoted to understandability, and the other devoted to efficiency. Firstorder methods of inductive learning have witnessed considerable interest in recent years, and the field of Inductive Logic Programming (ILP) [26] has developed rapidly. While the original ILP learning task deals with learning firstorder clausal theories from scratch given examples and background kno...
Informationtheoretic measures for knowledge discovery and data mining, in: Entropy Measures, Maximum Entropy and Emerging Applications, Karmeshu (Ed
 in Entropy Measures, Maximum Entropy and Emerging Applications
, 2003
"... Abstract. A database may be considered as a statistical population, and an attribute as a statistical variable taking values from its domain. One can carry out statistical and informationtheoretic analysis on a database. Based on the attribute values, a database can be partitioned into smaller popu ..."
Abstract

Cited by 20 (6 self)
 Add to MetaCart
Abstract. A database may be considered as a statistical population, and an attribute as a statistical variable taking values from its domain. One can carry out statistical and informationtheoretic analysis on a database. Based on the attribute values, a database can be partitioned into smaller populations. An attribute is deemed important if it partitions the database such that previously unknown regularities and patterns are observable. Many informationtheoretic measures have been proposed and applied to quantify the importance of attributes and relationships between attributes in various fields. In the context of knowledge discovery and data mining (KDD), we present a critical review and analysis of informationtheoretic measures of attribute importance and attribute association, with emphasis on their interpretations and connections. 1
A Multistrategy Approach to Relational Knowledge Discovery in Databases
 Machine Learning Journal
, 1996
"... . When learning from very large databases, the reduction of complexity is extremely important. Two extremes of making knowledge discovery in databases (KDD) feasible have been put forward. One extreme is to choose a very simple hypothesis language, thereby being capable of very fast learning on real ..."
Abstract

Cited by 13 (9 self)
 Add to MetaCart
. When learning from very large databases, the reduction of complexity is extremely important. Two extremes of making knowledge discovery in databases (KDD) feasible have been put forward. One extreme is to choose a very simple hypothesis language, thereby being capable of very fast learning on realworld databases. The opposite extreme is to select a small data set, thereby being able to learn very expressive (firstorder logic) hypotheses. A multistrategy approach allows one to include most of these advantages and exclude most of the disadvantages. Simpler learning algorithms detect hierarchies which are used to structure the hypothesis space for a more complex learning algorithm. The better structured the hypothesis space is, the better learning can prune away uninteresting or losing hypotheses and the faster it becomes. We have combined inductive logic programming (ILP) directly with a relational database management system. The ILP algorithm is controlled in a modeldriven way by t...
Discovering Robust Knowledge from Databases that Change
 DATA MINING AND KNOWLEDGE DISCOVERY
, 1998
"... Many applications of knowledge discovery and data mining such as rule discovery for semantic query optimization, database integration and decision support, require the knowledge to be consistent with data. However, databases usually change over time and makemachinediscovered knowledge inconsiste ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Many applications of knowledge discovery and data mining such as rule discovery for semantic query optimization, database integration and decision support, require the knowledge to be consistent with data. However, databases usually change over time and makemachinediscovered knowledge inconsistent. Useful knowledge should be robust against database changessothatitisunlikely to become inconsistentafter database changes. This paper defines this notion of robustness in the context of relational databases that contain multiple relations and describes how robustness of firstorder Hornclause rules can be estimated and applied in knowledge discovery.Our experiments show that the estimation approach can accurately predict the robustness of a rule.
Deciding Distinctness of Query Results by Discovered Constraints
 Proc. of the 2nd International Conf. on the Practical Application of Constraint Technology
"... The aim of query optimization is to produce an equivalent query which is less expensive to process than the original query. Semantic query optimization involves the use of semantic knowledge during the optimization process. The success strongly depends on the availability of this knowledge. We prese ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
The aim of query optimization is to produce an equivalent query which is less expensive to process than the original query. Semantic query optimization involves the use of semantic knowledge during the optimization process. The success strongly depends on the availability of this knowledge. We present a rulebased approach of semantic query optimization which overcomes the limitation of predefined knowledge by discovering constraints. We show that unnecessary distinctkeywords and group byparts of an SQL query can be detected and removed with the use of discovered constraints. 1 Introduction Semantic query optimization (SQO) provides the same kind of transparency with respect to semantic knowledge like relational optimizers do with respect to physical representation. Since semantically equivalent queries can differ significantly in their evaluation costs, it is our major goal to free the user from finding the most effective query. Finding semantically equivalent queries requires s...
On Schema Discovery
 IEEE Data Engineering Bulletin
, 2003
"... Structured data is distinguished from unstructured data by the presence of a schema describing the logical structure and semantics of the data. The schema is the means through which we understand and query the underlying data. The schema permits the more sophisticated structured queries that are not ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Structured data is distinguished from unstructured data by the presence of a schema describing the logical structure and semantics of the data. The schema is the means through which we understand and query the underlying data. The schema permits the more sophisticated structured queries that are not possible over schemaless data. Most systems assume that the schema is predefined and is an accurate reflection of the data. This assumption is often not valid in networked databases that may contain data originating from many sources and may not be valid within legacy databases where the semantics of data have evolved over time. As a result, querying and tasks that depend on structured queries (including data integration and schema mapping) may not be effective. In this paper, we consider the problem of discovering schemas from data. We focus on discovering properties of data that can be exploited in querying and transforming data. Finally, we very briefly consider the suitability of mining approaches to the task of schema discovery.
Information Tables with Neighborhood Semantics
 Tools, and Technology II, B.V. Dasarathy (Ed.), The International Society for Optical Engineering
, 2000
"... Information tables provide a convenient and useful tool for representing a set of objects using a group of attributes. This notion is enriched by introducing neighborhood systems on attribute values. The neighborhood systems represent the semantics relationships between, and knowledge about, attribu ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
Information tables provide a convenient and useful tool for representing a set of objects using a group of attributes. This notion is enriched by introducing neighborhood systems on attribute values. The neighborhood systems represent the semantics relationships between, and knowledge about, attribute values. With added semantics, neighborhood based information tables may provide a more general framework for knowledge discovery, data mining, and information retrieval.
Mining High Order Decision Rules
 Tsumoto (Eds.), Rough Set Theory and Granular Computing
, 2003
"... We introduce the notion of high order decision rules. While a standard decision rule expresses connections between attribute values of the same object, a high order decision rule expresses connections of di#erent objects in terms of their attribute values. An example of high order decision rules may ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
We introduce the notion of high order decision rules. While a standard decision rule expresses connections between attribute values of the same object, a high order decision rule expresses connections of di#erent objects in terms of their attribute values. An example of high order decision rules may state that "if an object x is related to another object y with respect to an attribute a, then x is related to y with respect to another attribute b." The problem of mining high order decision rules is formulated as a process of finding connections of objects as expressed in terms of their attribute values. In order to mine high order decision rules, we use relationships between values of attributes. Various types of relationships can be used, such as ordering relations, closeness relations, similarity relations, and neighborhood systems on attribute values. The introduction of semantics information on attribute values leads to information tables with added semantics. Depending on the decision rules to be mined, one can transform the original table into another information table, in which each new entity is a pair of objects. Any standard data mining algorithm can then be used. As an example to illustrate the basic idea, we discuss in detail the mining of ordering rules.