Results 1 - 10
of
217
SPIRIT: Sequential Pattern Mining with Regular Expression Constraints”. Bell Labs Tech. Memorandum BL0112370990223-03TM
, 1999
"... Discovering sequential patterns is an important problem in data mining with a host of application domains including medicine, telecommunications, and the World Wide Web. Conventional mining systems provide users with only a very restricted mechanism (based on minimum support) for specifying patterns ..."
Abstract
-
Cited by 190 (2 self)
- Add to MetaCart
(Show Context)
Discovering sequential patterns is an important problem in data mining with a host of application domains including medicine, telecommunications, and the World Wide Web. Conventional mining systems provide users with only a very restricted mechanism (based on minimum support) for specifying patterns of interest. In this paper, we propose the use of Regular Expressions (REs) as a flexible constraint specification tool that enables user-controlled focus to be incorporated into the pattern mining process. We develop a family of novel algorithms (termed SPIRIT – Sequential Pattern mIning with Regular expressIon consTraints) for mining frequent sequential patterns that also satisfy user-specified RE constraints. The main distinguishing factor among the proposed schemes is the degree to which the RE constraints are enforced to prune the search space of patterns during computation. Our solutions provide valuable insights into the tradeoffs that arise when constraints that do not subscribe to nice properties (like anti-monotonicity) are integrated into the mining process. A quantitative exploration of these tradeoffs is conducted through an extensive experimental study on synthetic and real-life data sets. 1
Complete mining of frequent patterns from graphs: Mining graph data
- MACHINE LEARNING
, 2003
"... Basket Analysis, which is a standard method for data mining, derives frequent itemsets from database. However, its mining ability is limited to transaction data consisting of items. In reality, there are many applications where data are described in a more structural way, e.g. chemical compounds and ..."
Abstract
-
Cited by 100 (5 self)
- Add to MetaCart
Basket Analysis, which is a standard method for data mining, derives frequent itemsets from database. However, its mining ability is limited to transaction data consisting of items. In reality, there are many applications where data are described in a more structural way, e.g. chemical compounds and Web browsing history. There are a few approaches that can discover characteristic patterns from graph-structured data in the field of machine learning. However, almost all of them are not suitable for such applications that require a complete search for all frequent subgraph patterns in the data. In this paper, we propose a novel principle and its algorithm that derive the characteristic patterns which frequently appear in graphstructured data. Our algorithm can derive all frequent induced subgraphs from both directed and undirected graph structured data having loops (including self-loops) with labeled or unlabeled nodes and links. Its performance is evaluated through the applications to Web browsing pattern analysis and chemical carcinogenesis analysis.
Efficient Adaptive-Support Association Rule Mining for Recommender Systems
- Data Mining and Knowledge Discovery
, 2002
"... Collaborative recommender systems allow personalization for e-commerce by exploiting similarities and dissimilarities among customers' preferences. We investigate the use of association rule mining as an underlying technology for collaborative recommender systems. Association rules have been us ..."
Abstract
-
Cited by 97 (1 self)
- Add to MetaCart
(Show Context)
Collaborative recommender systems allow personalization for e-commerce by exploiting similarities and dissimilarities among customers' preferences. We investigate the use of association rule mining as an underlying technology for collaborative recommender systems. Association rules have been used with success in other domains. However, most currently existing association rule mining algorithms were designed with market basket analysis in mind. Such algorithms are inefficient for collaborative recommendation because they mine many rules that are not relevant to a given user. Also, it is necessary to specify the minimum support of the mined rules in advance, often leading to either too many or too few rules; this negatively impacts the performance of the overall system. We describe a collaborative recommendation technique based on a new algorithm specifically designed to mine association rules for this purpose. Our algorithm does not require the minimum support to be specified in advance. Rather, a target range is given for the number of rules, and the algorithm adjusts the minimum support for each user in order to obtain a ruleset whose size is in the desired range. Rules are mined for a specific target user, reducing the time required for the mining process. We employ associations between users as well as associations between items in making recommendations. Experimental evaluation of a system based on our algorithm reveals performance that is significantly better than that of traditional correlation-based approaches.
A Data Mining Algorithm for Generalized Web Prefetching
- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2003
"... Predictive Web prefetching refers to the mechanism of deducing the forthcoming page accesses of a client based on its past accesses. In this paper, we present a new context for the interpretation of Web prefetching algorithms as Markov predictors. We identify the factors that affect the performanc ..."
Abstract
-
Cited by 76 (16 self)
- Add to MetaCart
Predictive Web prefetching refers to the mechanism of deducing the forthcoming page accesses of a client based on its past accesses. In this paper, we present a new context for the interpretation of Web prefetching algorithms as Markov predictors. We identify the factors that affect the performance of Web prefetching algorithms. We propose a new algorithm called WM o , which is based on data mining and is proven to be a generalization of existing ones. It was designed to address their specific limitations and its characteristics include all the above factors. It compares favorably with previously proposed algorithms. Further, the algorithm efficiently addresses the increased number of candidates. We present a detailed performance evaluation of WM o with synthetic and real data. The experimental results show that WM o can provide significant improvements over previously proposed Web prefetching algorithms.
Web Prefetching Using Partial Match Prediction
, 1998
"... Web traffic is now one of the major components of Internet traffic. One of the main directions of research in this area is to reduce the time latencies users experience when navigating through Web sites. Caching is already being used in that direction, yet, the characteristics of the Web cause cachi ..."
Abstract
-
Cited by 67 (1 self)
- Add to MetaCart
(Show Context)
Web traffic is now one of the major components of Internet traffic. One of the main directions of research in this area is to reduce the time latencies users experience when navigating through Web sites. Caching is already being used in that direction, yet, the characteristics of the Web cause caching in this medium to have poor performance. Therefore, prefetching is now being studied in the Web context. This study investigates the use of partial match prediction, a technique taken from the data compression literature, for prefetching in the Web. The main concern when employing prefetching is to predict as many future requests as possible, while limiting the false predictions to a minimum. The simulation results suggest that a high fraction of the predictions are accurate (e.g., predicts 18%-23% of the requests with 90%-80% accuracy), so that additional network traffic is kept low. Furthermore, the simulations show that prefetching can substantially increase cache hit rates. 1 Introduc...
Clustering of Web Users Based on Access Patterns
- In Proceedings of the 1999 KDD Workshop on Web Mining
, 1999
"... The clustering of the Web users based on their access patterns is studied. Access patterns of the Web users are extracted from Web servers' log files, and then organized into sessions which represent episodes of interaction between Web users and the Web server. Using attributedoriented inductio ..."
Abstract
-
Cited by 57 (0 self)
- Add to MetaCart
(Show Context)
The clustering of the Web users based on their access patterns is studied. Access patterns of the Web users are extracted from Web servers' log files, and then organized into sessions which represent episodes of interaction between Web users and the Web server. Using attributedoriented induction, the sessions are then generalized according to the page hierarchy which organizes pages according to their generalities. The generalized sessions are finally clustered using a hierarchical clustering method. Our experiments on a large real data set show that the method is efficient and practical for Web mining applications. 1 Introduction With the rapid development of the World Wide Web (WWW), or the Web, many organizations now put their information on the Web and provide Web-based services such as on-line shopping, user feedback, technical support, etc. Web mining, the knowledge discovery in the Web, has become an important research area [2]. Research in Web Mining can be broadly classified...
Low-complexity fuzzy relational clustering algorithms for web mining
- IEEE TRANSACTIONS ON FUZZY SYSTEMS
, 2001
"... This paper presents new algorithms—fuzzy cmedoids (FCMdd) and robust fuzzy c-medoids (RFCMdd)—for fuzzy clustering of relational data. The objective functions are based on selecting c representative objects (medoids) from the data set in such a way that the total fuzzy dissimilarity within each clus ..."
Abstract
-
Cited by 50 (2 self)
- Add to MetaCart
This paper presents new algorithms—fuzzy cmedoids (FCMdd) and robust fuzzy c-medoids (RFCMdd)—for fuzzy clustering of relational data. The objective functions are based on selecting c representative objects (medoids) from the data set in such a way that the total fuzzy dissimilarity within each cluster is minimized. A comparison of FCMdd with the wellknown relational fuzzy c-means algorithm (RFCM) shows that FCMdd is more efficient. We present several applications of these algorithms to Web mining, including Web document clustering, snippet clustering, and Web access log analysis.
Sliding-Window Filtering: An Efficient Algorithm for Incremental Mining
- IN PROC. OF THE INT’L CONF. ON INFO. AND KNOWLEDGE MANAGEMENT
, 2001
"... We explore in this paper an e#ective sliding-window filtering (abbreviatedly as SWF) algorithm for incremental mining of association rules. In essence, by partitioning a transaction database into several partitions, algorithm SWF employs a filtering threshold in each partition to deal with the candi ..."
Abstract
-
Cited by 49 (8 self)
- Add to MetaCart
We explore in this paper an e#ective sliding-window filtering (abbreviatedly as SWF) algorithm for incremental mining of association rules. In essence, by partitioning a transaction database into several partitions, algorithm SWF employs a filtering threshold in each partition to deal with the candidate itemset generation. Under SWF, the cumulative information of mining previous partitions is selectively carried over toward the generation of candidate itemsets for the subsequent partitions. Algorithm SWF not only significantly reduces I/O and CPU cost by the concepts of cumulative filtering and scan reduction techniques but also effectively controls memory utilization by the technique of sliding-window partition. Algorithm SWF is particularly powerful for efficient incremental mining for an ongoing time-variant transaction database. By utilizing proper scan reduction techniques, only one scan of the incremented dataset is needed by algorithm SWF. The I/O cost of SWF is, in orders of magnitude, smaller than those required by prior methods, thus resolving the performance bottleneck. Experimental studies are performed to evaluate performance of algorithm SWF. It is noted that the improvement achieved by algorithm SWF is even more prominent as the incremented portion of the dataset increases and also as the size of the database increases.
Multi-Dimensional Sequential Pattern Mining
, 2001
"... With our recently developed sequential pattern mining algorithms, such as PrefixSpan, it is possible to mine sequential user-access patterns from Web-logs. While this information is very useful when redesigning web-sites for easier perusal and fewer network traffic bottlenecks, it would be so much r ..."
Abstract
-
Cited by 49 (7 self)
- Add to MetaCart
With our recently developed sequential pattern mining algorithms, such as PrefixSpan, it is possible to mine sequential user-access patterns from Web-logs. While this information is very useful when redesigning web-sites for easier perusal and fewer network traffic bottlenecks, it would be so much richer if we could incorporate multiple dimensions of information. For example, if you knew the referral site that users frequently come from, you might be able to determine what information on your own web-site is of interest to them --- and enhance or separate this information as needed. Similarly, if you knew what weekday and time certain access patterns frequently occur at, you could ensure updated information is ready and available for these users. This thesis proposes and explores two different techniques, HYBRID and PSFP, to incorporate additional dimensions of information into the process of mining sequential patterns. It investigates the strengths and limitations of each approach. The HYBRID method first finds frequent dimension value combinations, and then mines sequential patterns from the set of sequences that satisfy each of these combinations. PSFP approaches the problem from the opposite direction. It mines the sequential patterns for the whole dataset only once (using PrefixSpan), and mines the corresponding frequent dimension patterns alongside each sequential pattern (using existing association algorithm FP-growth). Experiments show that HYBRID is most effective at low support in datasets that are sparse with respect to dimension value combinations but dense with respect to the sequential patterns present. PSFP is the better alternative in every other case, including datasets that are dense with respect to both dimension values combinations and sequential ...
On mining Web Access Logs
- In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery
, 2000
"... The proliferation of information on the world wide web has made the personalization of this information space a necessity. One possible approach to web personalization is to mine typical user profiles from the vast amount of historical data stored in access logs. In the absence of any a priori knowl ..."
Abstract
-
Cited by 48 (0 self)
- Add to MetaCart
(Show Context)
The proliferation of information on the world wide web has made the personalization of this information space a necessity. One possible approach to web personalization is to mine typical user profiles from the vast amount of historical data stored in access logs. In the absence of any a priori knowledge, unsupervised classification or clustering methods seem to be ideally suited to analyze the semi-structured log data of user accesses. In this paper, we define the notion of a “user session”, as well as a dissimilarity measure between two web sessions that captures the organization of a web site. To extract a user access profile, we cluster the user sessions based on the pair-wise dissimilarities using a robust fuzzy clustering algorithm that we have developed. We report the results of experiments with our algorithm and show that this leads to extraction of interesting user profiles. We also show that it outperforms association rule based approaches for this task. 1