Results 1 - 10
of
103
Beyond Market Baskets: Generalizing Association Rules To Dependence Rules
, 1998
"... One of the more well-studied problems in data mining is the search for association rules in market basket data. Association rules are intended to identify patterns of the type: “A customer purchasing item A often also purchases item B. Motivated partly by the goal of generalizing beyond market bask ..."
Abstract
-
Cited by 415 (5 self)
- Add to MetaCart
One of the more well-studied problems in data mining is the search for association rules in market basket data. Association rules are intended to identify patterns of the type: “A customer purchasing item A often also purchases item B. Motivated partly by the goal of generalizing beyond market basket data and partly by the goal of ironing out some problems in the definition of association rules, we develop the notion of dependence rules that identify statistical dependence in both the presence and absence of items in itemsets. We propose measuring significance of dependence via the chi-squared test for independence from classical statistics. This leads to a measure that is upward-closed in the itemset lattice, enabling us to reduce the mining problem to the search for a border between dependent and independent itemsets in the lattice. We develop pruning strategies based on the closure property and thereby devise an efficient algorithm for discovering dependence rules. We demonstrate our algorithm’s effectiveness by testing it on census data, text data (wherein we seek term dependence), and synthetic data.
Sampling Large Databases for Association Rules
, 1996
"... Discovery of association rules is an important database mining problem. Current algorithms for nding association rules require several passes over the analyzed database, and obviously the role of I/O overhead is very signi cant for very large databases. We present new algorithms that reduce the data ..."
Abstract
-
Cited by 329 (4 self)
- Add to MetaCart
Discovery of association rules is an important database mining problem. Current algorithms for nding association rules require several passes over the analyzed database, and obviously the role of I/O overhead is very signi cant for very large databases. We present new algorithms that reduce the database activity considerably. Theidea is to pick a random sample, to ndusingthis sample all association rules that probably hold in the whole database, and then to verify the results with the restofthe database. The algorithms thus produce exact association rules, not approximations based on a sample. The approach is, however, probabilistic, and inthose rare cases where our sampling method does not produce all association rules, the missing rules can be found inasecond pass. Our experiments show that the proposed algorithms can nd association rules very e ciently in only onedatabase pass. 1
Data Mining: An Overview from Database Perspective
- IEEE Transactions on Knowledge and Data Engineering
, 1996
"... Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning, and by many industrial companies as an important area with an opportunity of major revenues. Researchers in many different fields have sh ..."
Abstract
-
Cited by 314 (23 self)
- Add to MetaCart
Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning, and by many industrial companies as an important area with an opportunity of major revenues. Researchers in many different fields have shown great interest in data mining. Several emerging applications in information providing services, such as data warehousing and on-line services over the Internet, also call for various data mining techniques to better understand user behavior, to improve the service provided, and to increase the business opportunities. In response to such a demand, this article is to provide a survey, from a database researcher's point of view, on the data mining techniques developed recently. A classification of the available data mining techniques is provided and a comparative study of such techniques is presented.
Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique
, 1996
"... An incremental updating technique is developed for maintenance of the association rules discovered by database mining. There have been many studies on efficient discovery of association rules in large databases. However, it is nontrivial to maintain such discovered rules in large databases because a ..."
Abstract
-
Cited by 166 (18 self)
- Add to MetaCart
An incremental updating technique is developed for maintenance of the association rules discovered by database mining. There have been many studies on efficient discovery of association rules in large databases. However, it is nontrivial to maintain such discovered rules in large databases because a database may allow frequent or occasional updates and such updates may not only invalidate some existing strong association rules but also turn some weak rules into strong ones. In this study, an incremental updating technique is proposed for efficient maintenance of dis- covered association rules when new transaction data are added to a transaction database.
DMQL: A Data Mining Query Language for Relational Databases
, 1996
"... The emerging data mining tools and systems lead naturally to the demand of a powerful data mining query language, on top of which manyinteractive and #exible graphical user interfaces can be developed. This motivates us to design a data mining query language, DMQL, for mining di#erent kinds of knowl ..."
Abstract
-
Cited by 109 (6 self)
- Add to MetaCart
The emerging data mining tools and systems lead naturally to the demand of a powerful data mining query language, on top of which manyinteractive and #exible graphical user interfaces can be developed. This motivates us to design a data mining query language, DMQL, for mining di#erent kinds of knowledge in relational databases. Portions of the proposed DMQL language have been implemented in our DBMiner system for interactive mining of multiple-level knowledge in relational databases. 1 Introduction Data mining is a promising #eld with #ourishing R
Efficient Mining of Association Rules in Distributed Databases
, 1996
"... Many sequential algorithms have been proposed for mining of association rules. However, very little work has been done in mining association rules in distributed databases. A direct application of sequential algorithms to distributed databases is not effective, because it requires a large amount of ..."
Abstract
-
Cited by 67 (3 self)
- Add to MetaCart
Many sequential algorithms have been proposed for mining of association rules. However, very little work has been done in mining association rules in distributed databases. A direct application of sequential algorithms to distributed databases is not effective, because it requires a large amount of communication overhead. In this study, an efficient algorithm, DMA, is proposed. It generates a small number of candidate sets and requires only O(n) messages for support count exchange for each candidate set, where n is the number of sites in a distributed database. The algorithm has been implemented on an experimental test bed and its performance is studied. The results show that DMA has superior performance when comparing with the direct application of a popular sequential algorithm in distributed databases.
A lattice conceptual clustering system and its application to browsing retrieval
- Machine Learning
, 1996
"... Abstract. The theory of concept (or Galois) lattices provides a simple and formal approach to conceptual clustering. In this paper we present GALOIS, a system that automates and applies this theory. The algorithm utilized by GALOIS to build a concept lattice is incremental and efficient, each update ..."
Abstract
-
Cited by 66 (6 self)
- Add to MetaCart
Abstract. The theory of concept (or Galois) lattices provides a simple and formal approach to conceptual clustering. In this paper we present GALOIS, a system that automates and applies this theory. The algorithm utilized by GALOIS to build a concept lattice is incremental and efficient, each update being done in time at most quadratic in the number of objects in the lattice. Also, the algorithm may incorporate background information into the lattice, and through clustering, extend the scope of the theory. The application we present is concerned with information retrieval via browsing, for which we argue that concept lattices may represent major support structures. We describe a prototype user interface for browsing through the concept lattice of a document-term relation, possibly enriched with a thesaurus of terms. An experimental evaluation of the system performed on a medium-sized bibliographic database shows good retrieval performance and a significant improvement after the introduction of background knowledge.
Dynamic Generation and Refinement of Concept Hierarchies for Knowledge Discovery in Databases
- IN PROC. AAAI'94 WORKSHOP ON KNOWLEDGE DISCOVERY IN DATABASES (KDD'94
, 1994
"... Concept hierarchies organize data and concepts in hierarchical forms or in certain partial order, which helps expressing knowledge and data relationships in databases in concise, high level terms, and thus, plays an important role in knowledge discovery processes. Concept hierarchies could be prov ..."
Abstract
-
Cited by 57 (13 self)
- Add to MetaCart
Concept hierarchies organize data and concepts in hierarchical forms or in certain partial order, which helps expressing knowledge and data relationships in databases in concise, high level terms, and thus, plays an important role in knowledge discovery processes. Concept hierarchies could be provided by knowledge engineers, domain experts or users, or embedded in some data relations. However, it is sometimes desirable to automatically generate some concept hierarchies or adjust some given hierarchies for particular learning tasks. In this paper, the issues of dynamic generation and refinement of concept hierarchies are studied. The study leads to some algorithms for automatic generation of concept hierarchies for numerical attributes based on data distributions and for dynamic refinement of a given or generated concept hierarchy based on a learning request, the relevant set of data and database statistics. These algorithms have been implemented in the DBkearn knowledge discovery system and tested against large relational databases. The experimental results show that the algorithms are efficient and effective for knowledge discovery in large databases.
CLARANS: A Method for Clustering Objects for Spatial Data Mining
- IEEE Transactions on Knowledge and Data Engineering
, 2005
"... Abstract—Spatial data mining is the discovery of interesting relationships and characteristics that may exist implicitly in spatial databases. To this end, this paper has three main contributions. First, we propose a new clustering method called CLARANS, whose aim is to identify spatial structures t ..."
Abstract
-
Cited by 56 (0 self)
- Add to MetaCart
Abstract—Spatial data mining is the discovery of interesting relationships and characteristics that may exist implicitly in spatial databases. To this end, this paper has three main contributions. First, we propose a new clustering method called CLARANS, whose aim is to identify spatial structures that may be present in the data. Experimental results indicate that, when compared with existing clustering methods, CLARANS is very efficient and effective. Second, we investigate how CLARANS can handle not only points objects, but also polygon objects efficiently. One of the methods considered, called the IR-approximation, is very efficient in clustering convex and nonconvex polygon objects. Third, building on top of CLARANS, we develop two spatial data mining algorithms that aim to discover relationships between spatial and nonspatial attributes. Both algorithms can discover knowledge that is difficult to find with existing spatial data mining algorithms. Index Terms—Spatial data mining, clustering algorithms, randomized search, computational geometry. æ 1
Set-oriented mining for association rules in relational databases
- In 11th Intl. Conf. Data Engineering
, 1995
"... hou t sma @ t rc. nl We describe set-oriented algorithms for mining as-sociation rules. Such algorithms imply performing multiple joins and may appear to be inherently less escient than special-purpose algorithms. We develop new algorithms that can be expressed as SQL queries, and discuss optimizati ..."
Abstract
-
Cited by 52 (0 self)
- Add to MetaCart
hou t sma @ t rc. nl We describe set-oriented algorithms for mining as-sociation rules. Such algorithms imply performing multiple joins and may appear to be inherently less escient than special-purpose algorithms. We develop new algorithms that can be expressed as SQL queries, and discuss optimization of these algorithms. Af-ter analytical evaluation, an algorithm named SETM emerges as the algorithm of choice. Algorithm SETM uses only simple database primitives, viz., sorting and merge-scan join. Algorithm SETM is simple, fast, and stable over the mnge of pammeter values. The major contribution of this paper is that it shows that at least some aspects of data mining can be cam’ed out by using general query languages such as SQL, mther than by developing specialized black box algo-rithms. The set-oriented nature of Algorithm SETM facilitates the development of extensions. 1

