• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Discovery and maintenance of functional dependencies by independencies (1995)

by A Bell
Venue:Proceedings of KDD-95
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 11
Next 10 →

Levelwise Search and Borders of Theories in Knowledge Discovery

by Heikki Mannila, Hannu Toivonen , 1997
"... One of the basic problems in knowledge discovery in databases (KDD) is the following: given a data set r, a class L of sentences for defining subgroups of r, and a selection predicate, find all sentences of L deemed interesting by the selection predicate. We analyze the simple levelwise algorithm fo ..."
Abstract - Cited by 177 (12 self) - Add to MetaCart
One of the basic problems in knowledge discovery in databases (KDD) is the following: given a data set r, a class L of sentences for defining subgroups of r, and a selection predicate, find all sentences of L deemed interesting by the selection predicate. We analyze the simple levelwise algorithm for finding all such descriptions. We give bounds for the number of database accesses that the algorithm makes. For this, we introduce the concept of the border of a theory, a notion that turns out to be surprisingly powerful in analyzing the algorithm. We also consider the verification problem of a KDD process: given r and a set of sentences S ` L, determine whether S is exactly the set of interesting statements about r. We show strong connections between the verification problem and the hypergraph transversal problem. The verification problem arises in a natural way when using sampling to speed up the pattern discovery step in KDD.

On an algorithm for finding all interesting sentences (Extended Abstract)

by Heikki Mannila, Mpi Informatik, Im Stadtwaldt, Hannu Toivonen - In Cybernetics and Systems, Volume II, The Thirteenth European Meeting on Cybernetics and Systems Research , 1996
"... Knowledge discovery in databases (KDD), also called data mining, has recently received wide attention from practitioners and researchers. One of the basic problems in KDD is the following: given a data set r, a class L of sentences defining subgroups or properties of r, and an interestingness predic ..."
Abstract - Cited by 31 (9 self) - Add to MetaCart
Knowledge discovery in databases (KDD), also called data mining, has recently received wide attention from practitioners and researchers. One of the basic problems in KDD is the following: given a data set r, a class L of sentences defining subgroups or properties of r, and an interestingness predicate, find all sentences of L deemed interesting by the interestingness predicate. In this paper we analyze a simple and well-known levelwise algorithm for finding all such descriptions. We give bounds for the number of database accesses that the algorithm makes. We also consider the verification problem of a KDD process: given r and a set of sentences T ` L, determine whether T is exactly the set of interesting statements about r. We show strong connections between the verification problem and the hypergraph transversal problem. The verification problem arises in a natural way when using sampling to speed up the pattern discovery step in KDD.

First Order Theory Refinement

by Stefan Wrobel , 1996
"... . This paper summarizes the current state of the art in the topics of first-order theory revision and theory restructuring. The various tasks involved in first-order theory refinement (revision vs. restructuring) are defined and then discussed in turn. For theory revision, the issue of minimality is ..."
Abstract - Cited by 16 (0 self) - Add to MetaCart
. This paper summarizes the current state of the art in the topics of first-order theory revision and theory restructuring. The various tasks involved in first-order theory refinement (revision vs. restructuring) are defined and then discussed in turn. For theory revision, the issue of minimality is discussed in detail, and a general outline algorithm is given which identifies the major places of difference between different algorithms. The paper then describes the various options that have been explored by different researchers. For theory restructuring, two approaches developed in the ILP project are discussed, one devoted to understandability, and the other devoted to efficiency. First-order methods of inductive learning have witnessed considerable interest in recent years, and the field of Inductive Logic Programming (ILP) [26] has developed rapidly. While the original ILP learning task deals with learning first-order clausal theories from scratch given examples and background kno...

A Multistrategy Approach to Relational Knowledge Discovery in Databases

by Katharina Morik, Peter Brockhausen - Machine Learning Journal , 1996
"... . When learning from very large databases, the reduction of complexity is extremely important. Two extremes of making knowledge discovery in databases (KDD) feasible have been put forward. One extreme is to choose a very simple hypothesis language, thereby being capable of very fast learning on real ..."
Abstract - Cited by 12 (8 self) - Add to MetaCart
. When learning from very large databases, the reduction of complexity is extremely important. Two extremes of making knowledge discovery in databases (KDD) feasible have been put forward. One extreme is to choose a very simple hypothesis language, thereby being capable of very fast learning on real-world databases. The opposite extreme is to select a small data set, thereby being able to learn very expressive (first-order logic) hypotheses. A multistrategy approach allows one to include most of these advantages and exclude most of the disadvantages. Simpler learning algorithms detect hierarchies which are used to structure the hypothesis space for a more complex learning algorithm. The better structured the hypothesis space is, the better learning can prune away uninteresting or losing hypotheses and the faster it becomes. We have combined inductive logic programming (ILP) directly with a relational database management system. The ILP algorithm is controlled in a model-driven way by t...

Information-theoretic measures for knowledge discovery and data mining, in: Entropy Measures, Maximum Entropy and Emerging Applications, Karmeshu (Ed

by Y. Y. Yao - in Entropy Measures, Maximum Entropy and Emerging Applications , 2003
"... Abstract. A database may be considered as a statistical population, and an attribute as a statistical variable taking values from its domain. One can carry out statistical and information-theoretic analysis on a database. Based on the attribute values, a database can be partitioned into smaller popu ..."
Abstract - Cited by 10 (5 self) - Add to MetaCart
Abstract. A database may be considered as a statistical population, and an attribute as a statistical variable taking values from its domain. One can carry out statistical and information-theoretic analysis on a database. Based on the attribute values, a database can be partitioned into smaller populations. An attribute is deemed important if it partitions the database such that previously unknown regularities and patterns are observable. Many information-theoretic measures have been proposed and applied to quantify the importance of attributes and relationships between attributes in various fields. In the context of knowledge discovery and data mining (KDD), we present a critical review and analysis of informationtheoretic measures of attribute importance and attribute association, with emphasis on their interpretations and connections. 1

Discovering Robust Knowledge from Databases that Change

by Chun-Nan Hsu , Craig A. Knoblock - DATA MINING AND KNOWLEDGE DISCOVERY , 1998
"... Many applications of knowledge discovery and data mining such as rule discovery for semantic query optimization, database integration and decision support, require the knowledge to be consistent with data. However, databases usually change over time and makemachine-discovered knowledge inconsiste ..."
Abstract - Cited by 7 (1 self) - Add to MetaCart
Many applications of knowledge discovery and data mining such as rule discovery for semantic query optimization, database integration and decision support, require the knowledge to be consistent with data. However, databases usually change over time and makemachine-discovered knowledge inconsistent. Useful knowledge should be robust against database changessothatitisunlikely to become inconsistentafter database changes. This paper defines this notion of robustness in the context of relational databases that contain multiple relations and describes how robustness of first-order Horn-clause rules can be estimated and applied in knowledge discovery.Our experiments show that the estimation approach can accurately predict the robustness of a rule.

Information Tables with Neighborhood Semantics

by Y. Y. Yao - Tools, and Technology II, B.V. Dasarathy (Ed.), The International Society for Optical Engineering , 2000
"... Information tables provide a convenient and useful tool for representing a set of objects using a group of attributes. This notion is enriched by introducing neighborhood systems on attribute values. The neighborhood systems represent the semantics relationships between, and knowledge about, attribu ..."
Abstract - Cited by 5 (4 self) - Add to MetaCart
Information tables provide a convenient and useful tool for representing a set of objects using a group of attributes. This notion is enriched by introducing neighborhood systems on attribute values. The neighborhood systems represent the semantics relationships between, and knowledge about, attribute values. With added semantics, neighborhood based information tables may provide a more general framework for knowledge discovery, data mining, and information retrieval.

Deciding Distinctness of Query Results by Discovered Constraints

by Siegfried Bell - Proc. of the 2nd International Conf. on the Practical Application of Constraint Technology
"... The aim of query optimization is to produce an equivalent query which is less expensive to process than the original query. Semantic query optimization involves the use of semantic knowledge during the optimization process. The success strongly depends on the availability of this knowledge. We prese ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
The aim of query optimization is to produce an equivalent query which is less expensive to process than the original query. Semantic query optimization involves the use of semantic knowledge during the optimization process. The success strongly depends on the availability of this knowledge. We present a rule--based approach of semantic query optimization which overcomes the limitation of predefined knowledge by discovering constraints. We show that unnecessary distinct--keywords and group by--parts of an SQL query can be detected and removed with the use of discovered constraints. 1 Introduction Semantic query optimization (SQO) provides the same kind of transparency with respect to semantic knowledge like relational optimizers do with respect to physical representation. Since semantically equivalent queries can differ significantly in their evaluation costs, it is our major goal to free the user from finding the most effective query. Finding semantically equivalent queries requires s...

On Schema Discovery

by Renee J. Miller, Periklis Andritsos - IEEE Data Engineering Bulletin , 2003
"... Structured data is distinguished from unstructured data by the presence of a schema describing the logical structure and semantics of the data. The schema is the means through which we understand and query the underlying data. The schema permits the more sophisticated structured queries that are not ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
Structured data is distinguished from unstructured data by the presence of a schema describing the logical structure and semantics of the data. The schema is the means through which we understand and query the underlying data. The schema permits the more sophisticated structured queries that are not possible over schema-less data. Most systems assume that the schema is predefined and is an accurate reflection of the data. This assumption is often not valid in networked databases that may contain data originating from many sources and may not be valid within legacy databases where the semantics of data have evolved over time. As a result, querying and tasks that depend on structured queries (including data integration and schema mapping) may not be effective. In this paper, we consider the problem of discovering schemas from data. We focus on discovering properties of data that can be exploited in querying and transforming data. Finally, we very briefly consider the suitability of mining approaches to the task of schema discovery.

Mining High Order Decision Rules

by Y. Y. Yao - Tsumoto (Eds.), Rough Set Theory and Granular Computing , 2003
"... We introduce the notion of high order decision rules. While a standard decision rule expresses connections between attribute values of the same object, a high order decision rule expresses connections of di#erent objects in terms of their attribute values. An example of high order decision rules may ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
We introduce the notion of high order decision rules. While a standard decision rule expresses connections between attribute values of the same object, a high order decision rule expresses connections of di#erent objects in terms of their attribute values. An example of high order decision rules may state that "if an object x is related to another object y with respect to an attribute a, then x is related to y with respect to another attribute b." The problem of mining high order decision rules is formulated as a process of finding connections of objects as expressed in terms of their attribute values. In order to mine high order decision rules, we use relationships between values of attributes. Various types of relationships can be used, such as ordering relations, closeness relations, similarity relations, and neighborhood systems on attribute values. The introduction of semantics information on attribute values leads to information tables with added semantics. Depending on the decision rules to be mined, one can transform the original table into another information table, in which each new entity is a pair of objects. Any standard data mining algorithm can then be used. As an example to illustrate the basic idea, we discuss in detail the mining of ordering rules.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University