Results 1 
7 of
7
A fast apriori implementation
 Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI’03), volume 90 of Workshop Proceedings
, 2003
"... The efficiency of frequent itemset mining algorithms is determined mainly by three factors: the way candidates are generated, the data structure that is used and the implementation details. Most papers focus on the first factor, some describe the underlying data structures, but implementation detail ..."
Abstract

Cited by 55 (2 self)
 Add to MetaCart
The efficiency of frequent itemset mining algorithms is determined mainly by three factors: the way candidates are generated, the data structure that is used and the implementation details. Most papers focus on the first factor, some describe the underlying data structures, but implementation details are almost always neglected. In this paper we show that the effect of implementation can be more important than the selection of the algorithm. Ideas that seem to be quite promising, may turn out to be ineffective if we descend to the implementation level. We theoretically and experimentally analyze APRIORI which is the most established algorithm for frequent itemset mining. Several implementations of the algorithm have been put forward in the last decade. Although they are implementations of the very same algorithm, they display large differences in running time and memory need. In this paper we describe an implementation of APRIORI that outperforms all implementations known to us. We analyze, theoretically and experimentally, the principal data structure of our solution. This data structure is the main factor in the efficiency of our implementation. Moreover, we present a simple modification of APRIORI that appears to be faster than the original algorithm. 1
Fast Algorithm for Mining Association Rules
"... One of the important problems in data mining is discovering association rules from databases of transactions where each transaction consists of a set of items. The most time consuming operation in this discovery process is the computation of the frequency of the occurrences of interesting subset of ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
One of the important problems in data mining is discovering association rules from databases of transactions where each transaction consists of a set of items. The most time consuming operation in this discovery process is the computation of the frequency of the occurrences of interesting subset of items (called candidates) in the database of transactions. Can one develop a method that may avoid or reduce candidate generation and test and utilize some novel data structures to reduce the cost in frequent pattern mining? This is the motivation of my study. A fast algorithm has been proposed for solving this problem. Our algorithm uses the &quot;TreeMap &quot; which is a structure in Java language. Also we present &quot;Arraylist&quot; technique that greatly reduces the need to traverse the database. Moreover we present experimental results which show our structure outperforms all existing available algorithms in all common data mining problems. Keywords: data mining, association rules, TreeMap, ArrayList.
Statistical Approaches to Predictive Modeling in Large Databases
, 1998
"... Prediction, i.e., predicting the potential values or value distributions of certain attributes for objects in a database or data warehouse, is an attractive goal in data mining. To predict future events not shown in databases with high quality can help users to make smart business decisions. With th ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Prediction, i.e., predicting the potential values or value distributions of certain attributes for objects in a database or data warehouse, is an attractive goal in data mining. To predict future events not shown in databases with high quality can help users to make smart business decisions. With the concern of both scalability and high quality of prediction, we propose a predictive modeling algorithm for interactive prediction in large databases and data warehouses. The algorithm consists of three steps: (1) data generalization, which converts data in relational databases or data warehouses into a multidimensional databases to which efficient analysis techniques can be applied; (2) relevance analysis, which identifies the attributes that are highly relevant to the prediction, to reduce number of attributes in prediction with the benefits in improving both efficiency and reliability of prediction; and (3) a statistical regression model, called generalized linear model, is constructed ...
Periodic Pattern Search on TimeRelated Data Sets
, 1997
"... For many applications such as accounting, banking, business transaction processing systems, geographical information systems, medical record book keeping, etc., the changes made on their databases over time are a valuable source of information which can direct the future operation of the enterprise. ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
For many applications such as accounting, banking, business transaction processing systems, geographical information systems, medical record book keeping, etc., the changes made on their databases over time are a valuable source of information which can direct the future operation of the enterprise. In this thesis, we will focus on relational databases with historical data or, in other words, timerelated data, and try to extract from them some useful knowledge about their periodic behavior. The discovered knowledge could provide user some future guidance, to which end techniques in knowledge discovery and data warehousing become important. Knowledge discovery and data warehousing have been increasingly important in handling and analyzing large databases efficiently and effectively. We can take advantage of existing online analytical processing techniques widely used in knowledge discovery and data warehousing, and apply them on timerelated data to solve periodic pattern search probl...
Mining Exceptions And Quantitative Association Rules In Olap Data Cube
, 1999
"... People nowadays are relying more and more on OLAP data to find business solutions. A typical OLAP data cube usually contains four to eight dimensions, with two to six hierarchical levels and tens to hundreds of categories for each dimension. It is often too large and has too many levels for users to ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
People nowadays are relying more and more on OLAP data to find business solutions. A typical OLAP data cube usually contains four to eight dimensions, with two to six hierarchical levels and tens to hundreds of categories for each dimension. It is often too large and has too many levels for users to browse it effectively. In this thesis we propose a system prototype which will guide users to efficiently explore exceptions in data cubes. It automatically computes the degree of exceptions for cube cells at different aggregation levels. When user browses the cube, exceptional cells as well as interesting drillingdown paths that will lead to lower level exceptions are highlighted according to their interestingness. Different statistical methods such as loglinear model, adapted linear model and Ztests are used to compute the degree of exceptions. We present algorithms and address the issue of improving the performance on large data sets. Our study on exceptions leads to mining quantitati...
One Time Mining by MultiCore Preprocessing on Generalized Dataset
"... One of the important problems in data mining is discovering association rules from databases of transactions where each transaction consists of a set of items. Many industries are interested in developing the association rules from their databases due to continuous retrieval and storage of huge amou ..."
Abstract
 Add to MetaCart
One of the important problems in data mining is discovering association rules from databases of transactions where each transaction consists of a set of items. Many industries are interested in developing the association rules from their databases due to continuous retrieval and storage of huge amount of data. The discovery of interesting association relationship among business transaction records in many business decision making process such as catalog decision, crossmarketing, and lossleader analysis. The enormity and high dimensionality of datasets typically available as input to problem of association rule discovery, and the time consuming operation in this discovery process is the computation of the frequency of interesting subset of items (called candidates) in the database of transactions. Hence, it is has become vital to develop a method that will make speedup the preprocessing computation. In this paper, We have proposed An Integrated approach of Parallel Computing and ARM for mining Association Rules in Generalized data set that is fundamentally different from all the previous algorithms in that multicore preprocessing is done and by avoiding recurring scan of dataset number of passes required is reduced. The response time is calculated on space delimited text dataset.