Results 1 - 10
of
36
Efficient data mining for path traversal patterns
- IEEE Transactions on Knowledge and Data Engineering
, 1998
"... Abstract—In this paper, we explore a new data mining capability that involves mining path traversal patterns in a distributed information-providing environment where documents or objects are linked together to facilitate interactive access. Our solution procedure consists of two steps. First, we der ..."
Abstract
-
Cited by 128 (10 self)
- Add to MetaCart
Abstract—In this paper, we explore a new data mining capability that involves mining path traversal patterns in a distributed information-providing environment where documents or objects are linked together to facilitate interactive access. Our solution procedure consists of two steps. First, we derive an algorithm to convert the original sequence of log data into a set of maximal forward references. By doing so, we can filter out the effect of some backward references, which are mainly made for ease of traveling and concentrate on mining meaningful user access sequences. Second, we derive algorithms to determine the frequent traversal patterns¦i.e., large reference sequences¦from the maximal forward references obtained. Two algorithms are devised for determining large reference sequences; one is based on some hashing and pruning techniques, and the other is further improved with the option of determining large reference sequences in batch so as to reduce the number of database scans required. Performance of these two methods is comparatively analyzed. It is shown that the option of selective scan is very advantageous and can lead to prominent performance improvement. Sensitivity analysis on various parameters is conducted. Index Terms—Data mining, traversal patterns, distributed information system, World Wide Web, performance analysis.
Predicting source code changes by mining change history
- IEEE Transactions on Software Engineering
"... ©2004 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted ..."
Abstract
-
Cited by 86 (0 self)
- Add to MetaCart
©2004 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted
Discovering Spatial Co-location Patterns: A Summary of Results
- Lecture Notes in Computer Science
, 2001
"... Given a collection of boolean spatial features, the co-location pattern discovery process finds the subsets of features frequently located together. For example, the analysis of an ecology dataset may reveal the frequent co-location of a fire ignition source feature with a needle vegetation type fea ..."
Abstract
-
Cited by 44 (5 self)
- Add to MetaCart
Given a collection of boolean spatial features, the co-location pattern discovery process finds the subsets of features frequently located together. For example, the analysis of an ecology dataset may reveal the frequent co-location of a fire ignition source feature with a needle vegetation type feature and a drought feature. The spatial co-location rule problem is different from the association rule problem. Even though boolean spatial feature types (also called spatial events) may correspond to items in association rules over market-basket datasets, there is no natural notion of transactions. This creates difficulty in using traditional measures (e.g. support, confidence) and applying association rule mining algorithms which use support based pruning. We propose a notion of user-specified neighborhoods in place of transactions to specify groups of items. New interest measures for spatial co-location patterns are proposed which are robust in the face of potentially infinite overlapping neighborhoods. We also propose an algorithm to mine frequent spatial co-location patterns and analyze its correctness, and completeness. We plan to carry out experimental evaluations and performance tuning in the near future.
Mining Patterns from Graph Traversals
- Data and Knowledge Engineering
, 2001
"... In data models that have graph representations, users navigate following the links of the graph structure. Conducting data mining on collected information about user accesses in such models, involves the determination of frequently occurring access sequences. In this paper, we examine the problem of ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
In data models that have graph representations, users navigate following the links of the graph structure. Conducting data mining on collected information about user accesses in such models, involves the determination of frequently occurring access sequences. In this paper, we examine the problem of finding traversal patterns from such collections. The determination of patterns is based on the graph structure of the model. For this purpose, we present three algorithms, one which is level-wise with respect to the lengths of the patterns and two which are not. Additionally, we consider the fact that accesses within patterns may be interleaved with random accesses due to navigational purposes. The definition of the pattern type generalizes existing ones in order to take into account this fact. The performance of all algorithms and their sensitivity to several parameters is examined experimentally.
Fast algorithms for frequent itemset mining using FP-trees
- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2005
"... Efficient algorithms for mining frequent itemsets are crucial for mining association rules as well as for many other data mining tasks. Methods for mining frequent itemsets have been implemented using a prefix-tree structure, known as an FP-tree, for storing compressed information about frequent it ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
Efficient algorithms for mining frequent itemsets are crucial for mining association rules as well as for many other data mining tasks. Methods for mining frequent itemsets have been implemented using a prefix-tree structure, known as an FP-tree, for storing compressed information about frequent itemsets. Numerous experimental results have demonstrated that these algorithms perform extremely well. In this paper, we present a novel FP-array technique that greatly reduces the need to traverse FP-trees, thus obtaining significantly improved performance for FP-tree-based algorithms. Our technique works especially well for sparse data sets. Furthermore, we present new algorithms for mining all, maximal, and closed frequent itemsets. Our algorithms use the FP-tree data structure in combination with the FP-array technique efficiently and incorporate various optimization techniques. We also present experimental results comparing our methods with existing algorithms. The results show that our methods are the fastest for many cases. Even though the algorithms consume much memory when the data sets are sparse, they are still the fastest ones when the minimum support is low. Moreover, they are always among the fastest algorithms and consume less memory than other methods when the data sets are dense.
Finding Generalized Path Patterns for Web Log Data Mining
, 2000
"... Conducting data mining on logs of web servers involves the determination of frequently occurring access sequences. We examine the problem of finding traversal patterns from web logs by considering the fact that irrelevant accesses to web documents may be interleaved within access patterns due to nav ..."
Abstract
-
Cited by 20 (6 self)
- Add to MetaCart
Conducting data mining on logs of web servers involves the determination of frequently occurring access sequences. We examine the problem of finding traversal patterns from web logs by considering the fact that irrelevant accesses to web documents may be interleaved within access patterns due to navigational purposes. We define a general type of pattern that takes into account this fact and also, we present a level-wise algorithm for the determination of these patterns, which is based on the underlying structure of the web site. The performance of the algorithm and its sensitivity to several parameters is examined experimentally with synthetic data.
Developing data allocation schemes by incremental mining of user moving patterns in a mobile computing system
- IEEE Transactions on Knowledge and Data Engineering
, 2003
"... Abstract—In this paper, we present a new data mining algorithm which involves incremental mining for user moving patterns in a mobile computing environment and exploit the mining results to develop data allocation schemes so as to improve the overall performance of a mobile system. First, we propose ..."
Abstract
-
Cited by 15 (6 self)
- Add to MetaCart
Abstract—In this paper, we present a new data mining algorithm which involves incremental mining for user moving patterns in a mobile computing environment and exploit the mining results to develop data allocation schemes so as to improve the overall performance of a mobile system. First, we propose an algorithm to capture the frequent user moving patterns from a set of log data in a mobile environment. The algorithm proposed is enhanced with the incremental mining capability and is able to discover new moving patterns efficiently without compromising the quality of results obtained. Then, in light of mining results of user moving patterns and the properties of data objects, we develop data allocation schemes that can utilize the knowledge of user moving patterns for proper allocation of both personal and shared data. By employing the data allocation schemes, the occurrences of costly remote accesses can be minimized and the performance of a mobile computing system is thus improved. For personal data allocation, two data allocation schemes, which explore different levels of mining results, are devised: one utilizes the set level of moving patterns and the other utilizes the path level of moving patterns. As can be seen later, the former is useful for the allocation of read-intensive data objects, whereas the latter is good for the allocation of update-intensive data objects. The data allocation schemes for shared data, which are able to achieve local optimization and global optimization, are also developed. Performance of these data allocation schemes is comparatively analyzed. It is shown by our simulation results that the knowledge obtained from the user moving patterns is very important in devising effective data allocation schemes which can lead to significant performance improvement in a mobile computing system. Index Terms—Data mining, mobile computing, user moving patterns, data allocation scheme, mobile database. 1
On Mining General Temporal Association Rules in a Publication Database
- In Proceedings of ICDM’2001
, 2001
"... In this paper, we explore a new problem of mining general temporal association rules in publication databases. In essence, a publication database is a set of transactions where each transaction T is a set of items of which each item contains an individual exhibition period. The current model of asso ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
In this paper, we explore a new problem of mining general temporal association rules in publication databases. In essence, a publication database is a set of transactions where each transaction T is a set of items of which each item contains an individual exhibition period. The current model of association rule mining is not able to handle the publication database due to the following fundamental problems, i.e., (1) lack of consideration of the exhibition period of each individual item; (2) lack of an equitable support counting basis for each item. To remedy this, we propose an innovative algorithm Progressive-Partition-Miner (abbreviatedly as PPM) to discover general temporal association rules in a publication database. The basic idea of PPM is to first partition the publication database in light of exhibition periods of items and then progressively accumulate the occurrence count of each candidate 2-itemset based on the intrinsic partitioning characteristics. Algorithm PPM is also designed to employ a filtering threshold in each partition to early prune out those cumulatively infrequent 2-itemsets. Explicitly, the execution time of PPM is, in orders of magnitude, smaller than those required by the schemes which are directly extended from existing methods. 1
Efficient similarity search for market basket data
- VLDB Journal
, 2002
"... Abstract. Several organizations have developed very large market basket databases for the maintenance of customer transactions. New applications, e.g., Web recommendation systems, present the requirement for processing similarity queries in market basket databases. In this paper, we propose a novel ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
Abstract. Several organizations have developed very large market basket databases for the maintenance of customer transactions. New applications, e.g., Web recommendation systems, present the requirement for processing similarity queries in market basket databases. In this paper, we propose a novel scheme for similarity search queries in basket data. We develop a new representation method, which, in contrast to existing approaches, is proven to provide correct results. New algorithms are proposed for the processing of similarity queries. Extensive experimental results, for a variety of factors, illustrate the superiority of the proposed scheme over the state-of-the-art method.

