Results 1 - 10
of
77
Web mining: Information and pattern discovery on the world wide web
, 1997
"... Application of data mining techniques to the World Wide Web, referred to as Web mining, has been the focus of several recent research projects and papers. However, there is no established vocabulary, leading to confusion when comparing research e orts. The term Web mining has been used intwo distinc ..."
Abstract
-
Cited by 207 (18 self)
- Add to MetaCart
Application of data mining techniques to the World Wide Web, referred to as Web mining, has been the focus of several recent research projects and papers. However, there is no established vocabulary, leading to confusion when comparing research e orts. The term Web mining has been used intwo distinct ways. The rst, called Web content mining in this paper, is the process of information discovery from sources across the World Wide Web. The second, called Web usage mining, is the process of mining for user browsing and access patterns. In this paper we de ne Web mining and present an overview of the various research issues, techniques, and development e orts. We brie y describe WEBMINER, a system for Web usage mining, and conclude this paper by listing research issues. 1
Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications
- In SIGMOD
, 1998
"... Data mining on large data warehouses is becoming increasingly important. In support of this trend, we consider a spectrum of architectural alternatives for coupling mining with database systems. These alternatives include: loosecoupling through a SQL cursor interface; encapsulation of a mining algor ..."
Abstract
-
Cited by 101 (5 self)
- Add to MetaCart
Data mining on large data warehouses is becoming increasingly important. In support of this trend, we consider a spectrum of architectural alternatives for coupling mining with database systems. These alternatives include: loosecoupling through a SQL cursor interface; encapsulation of a mining algorithm in a stored procedure; caching the data to a file system on-the-fly and mining; tight-coupling using primarily user-defined functions; and SQL implementations for processing in the DBMS. We comprehensively study the option of expressing the mining algorithm in the form of SQL queries using Association rule mining as a case in point. We consider four options in SQL-92 and six options in SQL enhanced with object-relational extensions (SQL-OR). Our evaluation of the different architectural alternatives shows that from a performance perspective, the Cache-Mine option is superior, although the performance of the SQL-OR option is within a factor of two. Both the Cache-Mine and the SQL-OR app...
Web Mining: Pattern discovery from World Wide Web transactions
, 1996
"... Web-based organizations often generate and collect large volumes of data in their daily operations. Analyzing such data can help these organizations to determine the life time value of clients, design cross marketing strategies across products and services, evaluate the e ectiveness of promotional c ..."
Abstract
-
Cited by 67 (5 self)
- Add to MetaCart
Web-based organizations often generate and collect large volumes of data in their daily operations. Analyzing such data can help these organizations to determine the life time value of clients, design cross marketing strategies across products and services, evaluate the e ectiveness of promotional campaigns, and nd the most e ective logical structure for their Web space. This type of analysis involves the discovery of meaningful relationships from a large collection of primarily unstructured data, often stored in Web server access logs. We propose a framework for Web mining, the applications of data mining and knowledge discovery techniques to data collected in World Wide Web transactions. We present data and transaction models for various Web mining tasks such as the discovery of association rules and sequential patterns from the Web data. We also present aWeb mining system, WEBMINER, which has been implemented based upon the proposed framework, and discuss our experimental results on real-world Web data using the WEBMINER.
Analyzing the Subjective Interestingness of Association Rules
, 2000
"... Association rules are a class of important regularities in databases. They are found to be very useful in practical applications. However, association rule mining algorithms tend to produce a huge number of rules, most of which are of no interest to the user. Due to the large number of rules, it ..."
Abstract
-
Cited by 35 (1 self)
- Add to MetaCart
Association rules are a class of important regularities in databases. They are found to be very useful in practical applications. However, association rule mining algorithms tend to produce a huge number of rules, most of which are of no interest to the user. Due to the large number of rules, it is very difficult for the user to analyze them manually in order to identify those truly interesting ones. In this paper, we propose a new approach to assist the user in finding interesting rules (in particular, unexpected rules) from a set of discovered association rules. This technique is characterized by analyzing the discovered association rules using the user's existing knowledge about the domain and then ranking the discovered rules according to various interestingness criteria, e.g., conformity and various types of unexpectedness. This technique has been implemented and successfully used in a number of applications. Keywords: subjective interestingness, association rules, interestingness analysis in data mining. 1.
Querying Multiple Sets of Discovered Rules
- In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’02
, 2002
"... Rule mining is an important data mining task that has been applied to numerous real-world applications. Often a rule mining system generates a large number of rules and only a small subset of them is really useful in applications. Although there exist some systems allowing the user to query the disc ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
Rule mining is an important data mining task that has been applied to numerous real-world applications. Often a rule mining system generates a large number of rules and only a small subset of them is really useful in applications. Although there exist some systems allowing the user to query the discovered rules, they are less suitable for complex ad hoc querying of multiple data mining rulebases to retrieve interesting rules. In this paper, we propose a new powerful rule query language Rule-QL for querying multiple rulebases that is modeled after SQL and has rigorous theoretical foundations of a rule-based calculus. In particular, we first propose a rule-based calculus RC based on the first-order logic, and then present the language Rule-QL that is at least as expressive as the safe fragment of RC. We also propose a number of efficient query evaluation techniques for Rule-QL and test them experimentally on some representative queries to demonstrate the feasibility of Rule-QL.
Is Pushing Constraints Deeply into the Mining Algorithms really what we want? - An Alternative Approach for Association Rule Mining
, 2002
"... The common approach to exploit mining constraints is to push them deeply into the mining algorithms. In our paper we argue that this approach is based on an understanding of KDD that is no longer up-to-date. In fact, today KDD is seen as a human centered, highly interactive and iterative process. Bl ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
The common approach to exploit mining constraints is to push them deeply into the mining algorithms. In our paper we argue that this approach is based on an understanding of KDD that is no longer up-to-date. In fact, today KDD is seen as a human centered, highly interactive and iterative process. Blindly enforcing constraints already during the mining runs neglects the process character of KDD and therefore is no longer state of the art. Constraints can make a single algorithm run faster but in fact we are still far from response times that would allow true interactivity in KDD. In addition we pay the price of repeated mining runs and moreover risk reducing data mining to some kind of hypothesis testing. Taking all the above into consideration we propose to do exactly the contrary of constrained mining: We accept an initial (nearly) unconstrained and costly mining run. But instead of a sequence of subsequent and still expensive constrained mining runs we answer all further mining queries from this initial result set. Whereas this is straight forward for constraints that can be implemented as filters on the result set, things get more complicated when we restrict the underlying mining data. Actually in practice such constraints are very important, e.g. the generation of rules for certain days of the week, for families, singles, male or female customers etc. We show how to postpone such rowrestriction constraints on the transactions from rule generation to rule retrieval from the initial result set.
Performance Evaluation and Optimization of Join Queries for Association Rule Mining
- In Proc. DaWaK
, 1998
"... . The explosive growth in data collection in business organizations introduces the problem of turning these rapidly expanding data stores into nuggets of actionable knowledge. The state-of-the-art data mining tools available for this integrate loosely with data stored in DBMSs, typically through ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
. The explosive growth in data collection in business organizations introduces the problem of turning these rapidly expanding data stores into nuggets of actionable knowledge. The state-of-the-art data mining tools available for this integrate loosely with data stored in DBMSs, typically through a cursor interface. In this paper, we consider several formulations of association rule mining (a typical data mining problem) using SQL-92 queries and study the performance of dierent join orders and join methods for executing them. We analyze the cost of the dierent execution plans which provides a basis to incorporate the semantics of association rule mining into future query optimizers. Based on them we identify certain optimizations and develop the Set-oriented Apriori approach. This work is an initial step towards developing \SQLaware " mining algorithms and exploring the enhancements to current relational DBMSs to make them \mining-aware" thereby bridging the gap between ...
Query languages supporting descriptive rule mining: A comparative study
- In Database Technologies for Data Mining - Discovering Knowledge with Inductive Queries, volume 2682 of LNCS
, 2004
"... Abstract. Recently, inductive databases (IDBs) have been proposed to tackle the problem of knowledge discovery from huge databases. With an IDB, the user/analyst performs a set of very different operations on data using a query language, powerful enough to support all the required manipulations, suc ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
Abstract. Recently, inductive databases (IDBs) have been proposed to tackle the problem of knowledge discovery from huge databases. With an IDB, the user/analyst performs a set of very different operations on data using a query language, powerful enough to support all the required manipulations, such as data preprocessing, pattern discovery and pattern post-processing. We provide a comparison between three query languages (MSQL, DMQL and MINE RULE) that have been proposed for descriptive rule mining and discuss their common features and differences. These query languages look like extensions of SQL. We present them using a set of examples, taken from the real practice of rule mining. In the paper we discuss also OLE DB for Data Mining and Predictive Model Markup Language, two recent proposals that like the first three query languages respectively provide native support to data mining primitives and provide a description in a standard language of statistical and data mining models. 1
Integrating pattern mining in relational databases
- In Proc. 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, LNCS
, 2006
"... Abstract. Almost a decade ago, Imielinski and Mannila introduced the notion of Inductive Databases to manage KDD applications just as DBMSs successfully manage business applications. The goal is to follow one of the key DBMS paradigms: building optimizing compilers for ad hoc queries. During the pas ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
Abstract. Almost a decade ago, Imielinski and Mannila introduced the notion of Inductive Databases to manage KDD applications just as DBMSs successfully manage business applications. The goal is to follow one of the key DBMS paradigms: building optimizing compilers for ad hoc queries. During the past decade, several researchers proposed extensions to the popular relational query language, SQL, in order to express such mining queries. In this paper, we propose a completely different and new approach, which extends the DBMS itself, not the query language, and integrates the mining algorithms into the database query optimizer. To this end, we introduce virtual mining views, which can be queried as if they were traditional relational tables (or views). Every time the database system accesses one of these virtual mining views, a mining algorithm is triggered to materialize all tuples needed to answer the query. We show how this can be done effectively for the popular association rule and frequent set mining problems. 1
Concept Hierarchy in Data Mining: Specification, Generation and Implementation
, 1997
"... Data mining is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. As one of the most important background knowledge, concept hierarchy plays a fundamentally important role in data mining. It is the purpose of this thesis to study some aspects of ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Data mining is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. As one of the most important background knowledge, concept hierarchy plays a fundamentally important role in data mining. It is the purpose of this thesis to study some aspects of concept hierarchy such as the automatic generation and encoding technique in the context of data mining. After the discussion on the basic terminology and categorization, automatic generation of concept hierarchies is studied for both nominal and numerical hierarchies. One algorithm is designed for determining the partial order on a given set of nominal attributes. The resulting partial order is a useful guide for users to finalize the concept hierarchy for their particular data mining tasks. Based on hierarchical and partitioning clustering methods, two algorithms are proposed for the automatic generation of numerical hierarchies. The quality and performance comparisons indicates that the ...

