Results 1 - 10
of
82
SPIRIT: Sequential Pattern Mining with Regular Expression Constraints
, 1999
"... Discovering sequential patterns is an important problem in data mining with a host of application domains including medicine, telecommunications, and the World Wide Web. Conventional ..."
Abstract
-
Cited by 130 (2 self)
- Add to MetaCart
Discovering sequential patterns is an important problem in data mining with a host of application domains including medicine, telecommunications, and the World Wide Web. Conventional
Efficient Adaptive-Support Association Rule Mining for Recommender Systems
- Data Mining and Knowledge Discovery
, 2002
"... Collaborative recommender systems allow personalization for e-commerce by exploiting similarities and dissimilarities among customers' preferences. We investigate the use of association rule mining as an underlying technology for collaborative recommender systems. Association rules have been used wi ..."
Abstract
-
Cited by 66 (1 self)
- Add to MetaCart
Collaborative recommender systems allow personalization for e-commerce by exploiting similarities and dissimilarities among customers' preferences. We investigate the use of association rule mining as an underlying technology for collaborative recommender systems. Association rules have been used with success in other domains. However, most currently existing association rule mining algorithms were designed with market basket analysis in mind. Such algorithms are inefficient for collaborative recommendation because they mine many rules that are not relevant to a given user. Also, it is necessary to specify the minimum support of the mined rules in advance, often leading to either too many or too few rules; this negatively impacts the performance of the overall system. We describe a collaborative recommendation technique based on a new algorithm specifically designed to mine association rules for this purpose. Our algorithm does not require the minimum support to be specified in advance. Rather, a target range is given for the number of rules, and the algorithm adjusts the minimum support for each user in order to obtain a ruleset whose size is in the desired range. Rules are mined for a specific target user, reducing the time required for the mining process. We employ associations between users as well as associations between items in making recommendations. Experimental evaluation of a system based on our algorithm reveals performance that is significantly better than that of traditional correlation-based approaches.
Complete mining of frequent patterns from graphs: Mining graph data
- Machine Learning
, 2003
"... Abstract. Basket Analysis, which is a standard method for data mining, derives frequent itemsets from database. However, its mining ability is limited to transaction data consisting of items. In reality, there are many applications where data are described in a more structural way, e.g. chemical com ..."
Abstract
-
Cited by 52 (4 self)
- Add to MetaCart
Abstract. Basket Analysis, which is a standard method for data mining, derives frequent itemsets from database. However, its mining ability is limited to transaction data consisting of items. In reality, there are many applications where data are described in a more structural way, e.g. chemical compounds and Web browsing history. There are a few approaches that can discover characteristic patterns from graph-structured data in the field of machine learning. However, almost all of them are not suitable for such applications that require a complete search for all frequent subgraph patterns in the data. In this paper, we propose a novel principle and its algorithm that derive the characteristic patterns which frequently appear in graphstructured data. Our algorithm can derive all frequent induced subgraphs from both directed and undirected graph structured data having loops (including self-loops) with labeled or unlabeled nodes and links. Its performance is evaluated through the applications to Web browsing pattern analysis and chemical carcinogenesis analysis.
A Data Mining Algorithm for Generalized Web Prefetching
- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2003
"... Predictive Web prefetching refers to the mechanism of deducing the forthcoming page accesses of a client based on its past accesses. In this paper, we present a new context for the interpretation of Web prefetching algorithms as Markov predictors. We identify the factors that affect the performanc ..."
Abstract
-
Cited by 51 (16 self)
- Add to MetaCart
Predictive Web prefetching refers to the mechanism of deducing the forthcoming page accesses of a client based on its past accesses. In this paper, we present a new context for the interpretation of Web prefetching algorithms as Markov predictors. We identify the factors that affect the performance of Web prefetching algorithms. We propose a new algorithm called WM o , which is based on data mining and is proven to be a generalization of existing ones. It was designed to address their specific limitations and its characteristics include all the above factors. It compares favorably with previously proposed algorithms. Further, the algorithm efficiently addresses the increased number of candidates. We present a detailed performance evaluation of WM o with synthetic and real data. The experimental results show that WM o can provide significant improvements over previously proposed Web prefetching algorithms.
Web Prefetching Using Partial Match Prediction
, 1998
"... Web traffic is now one of the major components of Internet traffic. One of the main directions of research in this area is to reduce the time latencies users experience when navigating through Web sites. Caching is already being used in that direction, yet, the characteristics of the Web cause cachi ..."
Abstract
-
Cited by 47 (1 self)
- Add to MetaCart
Web traffic is now one of the major components of Internet traffic. One of the main directions of research in this area is to reduce the time latencies users experience when navigating through Web sites. Caching is already being used in that direction, yet, the characteristics of the Web cause caching in this medium to have poor performance. Therefore, prefetching is now being studied in the Web context. This study investigates the use of partial match prediction, a technique taken from the data compression literature, for prefetching in the Web. The main concern when employing prefetching is to predict as many future requests as possible, while limiting the false predictions to a minimum. The simulation results suggest that a high fraction of the predictions are accurate (e.g., predicts 18%-23% of the requests with 90%-80% accuracy), so that additional network traffic is kept low. Furthermore, the simulations show that prefetching can substantially increase cache hit rates. 1 Introduc...
Clustering of Web Users Based on Access Patterns
- In Proceedings of the 1999 KDD Workshop on Web Mining
, 1999
"... The clustering of the Web users based on their access patterns is studied. Access patterns of the Web users are extracted from Web servers' log files, and then organized into sessions which represent episodes of interaction between Web users and the Web server. Using attributedoriented induction, th ..."
Abstract
-
Cited by 37 (0 self)
- Add to MetaCart
The clustering of the Web users based on their access patterns is studied. Access patterns of the Web users are extracted from Web servers' log files, and then organized into sessions which represent episodes of interaction between Web users and the Web server. Using attributedoriented induction, the sessions are then generalized according to the page hierarchy which organizes pages according to their generalities. The generalized sessions are finally clustered using a hierarchical clustering method. Our experiments on a large real data set show that the method is efficient and practical for Web mining applications. 1 Introduction With the rapid development of the World Wide Web (WWW), or the Web, many organizations now put their information on the Web and provide Web-based services such as on-line shopping, user feedback, technical support, etc. Web mining, the knowledge discovery in the Web, has become an important research area [2]. Research in Web Mining can be broadly classified...
Sliding-Window Filtering: An Efficient Algorithm for Incremental Mining
- IN PROC. OF THE INT’L CONF. ON INFO. AND KNOWLEDGE MANAGEMENT
, 2001
"... We explore in this paper an e#ective sliding-window filtering (abbreviatedly as SWF) algorithm for incremental mining of association rules. In essence, by partitioning a transaction database into several partitions, algorithm SWF employs a filtering threshold in each partition to deal with the candi ..."
Abstract
-
Cited by 28 (7 self)
- Add to MetaCart
We explore in this paper an e#ective sliding-window filtering (abbreviatedly as SWF) algorithm for incremental mining of association rules. In essence, by partitioning a transaction database into several partitions, algorithm SWF employs a filtering threshold in each partition to deal with the candidate itemset generation. Under SWF, the cumulative information of mining previous partitions is selectively carried over toward the generation of candidate itemsets for the subsequent partitions. Algorithm SWF not only significantly reduces I/O and CPU cost by the concepts of cumulative filtering and scan reduction techniques but also effectively controls memory utilization by the technique of sliding-window partition. Algorithm SWF is particularly powerful for efficient incremental mining for an ongoing time-variant transaction database. By utilizing proper scan reduction techniques, only one scan of the incremented dataset is needed by algorithm SWF. The I/O cost of SWF is, in orders of magnitude, smaller than those required by prior methods, thus resolving the performance bottleneck. Experimental studies are performed to evaluate performance of algorithm SWF. It is noted that the improvement achieved by algorithm SWF is even more prominent as the incremented portion of the dataset increases and also as the size of the database increases.
Multi-Dimensional Sequential Pattern Mining
, 2001
"... With our recently developed sequential pattern mining algorithms, such as PrefixSpan, it is possible to mine sequential user-access patterns from Web-logs. While this information is very useful when redesigning web-sites for easier perusal and fewer network traffic bottlenecks, it would be so much r ..."
Abstract
-
Cited by 27 (4 self)
- Add to MetaCart
With our recently developed sequential pattern mining algorithms, such as PrefixSpan, it is possible to mine sequential user-access patterns from Web-logs. While this information is very useful when redesigning web-sites for easier perusal and fewer network traffic bottlenecks, it would be so much richer if we could incorporate multiple dimensions of information. For example, if you knew the referral site that users frequently come from, you might be able to determine what information on your own web-site is of interest to them --- and enhance or separate this information as needed. Similarly, if you knew what weekday and time certain access patterns frequently occur at, you could ensure updated information is ready and available for these users. This thesis proposes and explores two different techniques, HYBRID and PSFP, to incorporate additional dimensions of information into the process of mining sequential patterns. It investigates the strengths and limitations of each approach. The HYBRID method first finds frequent dimension value combinations, and then mines sequential patterns from the set of sequences that satisfy each of these combinations. PSFP approaches the problem from the opposite direction. It mines the sequential patterns for the whole dataset only once (using PrefixSpan), and mines the corresponding frequent dimension patterns alongside each sequential pattern (using existing association algorithm FP-growth). Experiments show that HYBRID is most effective at low support in datasets that are sparse with respect to dimension value combinations but dense with respect to the sequential patterns present. PSFP is the better alternative in every other case, including datasets that are dense with respect to both dimension values combinations and sequential ...
On mining Web Access Logs
- In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery
, 2000
"... The proliferation of information on the world wide web has made the personalization of this information space a necessity. One possible approach to web personalization is to mine typical user profiles from the vast amount of historical data stored in access logs. In the absence of any a priori knowl ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
The proliferation of information on the world wide web has made the personalization of this information space a necessity. One possible approach to web personalization is to mine typical user profiles from the vast amount of historical data stored in access logs. In the absence of any a priori knowledge, unsupervised classification or clustering methods seem to be ideally suited to analyze the semi-structured log data of user accesses. In this paper, we define the notion of a “user session”, as well as a dissimilarity measure between two web sessions that captures the organization of a web site. To extract a user access profile, we cluster the user sessions based on the pair-wise dissimilarities using a robust fuzzy clustering algorithm that we have developed. We report the results of experiments with our algorithm and show that this leads to extraction of interesting user profiles. We also show that it outperforms association rule based approaches for this task. 1
Collaborative Recommendation via Adaptive Association Rule Mining
- Data Mining and Knowledge Discovery
, 2000
"... Collaborative recommender systems allow personalization for e-commerce by exploiting similarities and dissimilarities among users' preferences. We investigate the use of association rule mining as an underlying technology for collaborative recommender systems. Association rules have been used with s ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
Collaborative recommender systems allow personalization for e-commerce by exploiting similarities and dissimilarities among users' preferences. We investigate the use of association rule mining as an underlying technology for collaborative recommender systems. Association rules have been used with success in other domains. However, most currently existing association rule mining algorithms were designed with market basket analysis in mind. Such algorithms are inefficient for collaborative recommendation because they mine many rules that are not relevant to a given user. Also, it is necessary to specify the minimum support of the mined rules in advance, often leading to either too many or too few rules; this negatively impacts the performance of the overall system. We describe a collaborative recommendation technique based on a new algorithm specifically designed to mine association rules for this purpose. Our algorithm does not require the minimum support to be specified in advance. Ra...

