Results 11 - 20
of
100
WebSIFT: The Web Site Information Filter System
- In Proceedings of the Web Usage Analysis and User Profiling Workshop
, 1999
"... Web Usage Mining is the application of data mining techniques to large Web data repositories in order to extract usage patterns. As with many data mining application domains, the identification of patterns that are considered interesting is a problem that must be solved in addition to simply gene ..."
Abstract
-
Cited by 36 (1 self)
- Add to MetaCart
Web Usage Mining is the application of data mining techniques to large Web data repositories in order to extract usage patterns. As with many data mining application domains, the identification of patterns that are considered interesting is a problem that must be solved in addition to simply generating them. A necessary step in identifying interesting results is quantifying what is considered uninteresting in order to form a basis for comparison. Several research efforts have relied on manually generated sets of uninteresting rules. However, manual generation of a comprehensive set of evidence about beliefs for a particular domain is impractical in many cases. Generally, domain knowledge can be used to automatically create evidence for or against a set of beliefs. For Web Usage Mining, there are three types of domain information available; usage, content, and structure. The Web Site Information Filter (WebSIFT) system uses the content and structure information from a Web site...
Mining Web Logs to Improve Website Organization
- In Proceedings of the 10th International World Wide Web Conference, Hong Kong
, 2001
"... Many websites have a hierarchical organization of content. This organization may be quite different from the organization expected by visitors to the website. In particular, it is often unclear where a specific document is located. In this paper, we propose an algorithm to automatically find pages i ..."
Abstract
-
Cited by 33 (0 self)
- Add to MetaCart
Many websites have a hierarchical organization of content. This organization may be quite different from the organization expected by visitors to the website. In particular, it is often unclear where a specific document is located. In this paper, we propose an algorithm to automatically find pages in a website whose location is different from where visitors expect to find them. The key insight is that visitors will backtrack if they do not find the information where they expect it: the point from where they backtrack is the expected location for the page. We present an algorithm for discovering such expected locations that can handle page caching by the browser. Expected locations with a significant number of hits are then presented to the website administrator. We also present algorithms for selecting expected locations (for adding navigation links) to optimize the benefit to the website or the visitor. We ran our algorithm on the Wharton business school website and found that even on this small website, there were many pages with expected locations different from their actual location. 1.
Extracting Web User Profiles Using Relational Competitive Fuzzy Clustering
, 2000
"... The proliferation of information on the web... In this paper, we define the notion of a "user session" as being a temporally compact sequence of Web accesses by a user. We also define a new distance measure between two Web sessions that captures the organization of a Web site. The Competitive Agglom ..."
Abstract
-
Cited by 30 (13 self)
- Add to MetaCart
The proliferation of information on the web... In this paper, we define the notion of a "user session" as being a temporally compact sequence of Web accesses by a user. We also define a new distance measure between two Web sessions that captures the organization of a Web site. The Competitive Agglomeration clustering algorithm which can automatically cluster data into the optimal number of components is extended so that it can work on relational data. The resulting Competitive Agglomeration for Relational Data (CARD) algorithm can deal with complex, non-Euclidean, distance/similarity measures. This algorithm was used to analyze Web server access logs successfully and obtain typical session profiles of users.
On mining Web Access Logs
- In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery
, 2000
"... The proliferation of information on the world wide web has made the personalization of this information space a necessity. One possible approach to web personalization is to mine typical user profiles from the vast amount of historical data stored in access logs. In the absence of any a priori knowl ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
The proliferation of information on the world wide web has made the personalization of this information space a necessity. One possible approach to web personalization is to mine typical user profiles from the vast amount of historical data stored in access logs. In the absence of any a priori knowledge, unsupervised classification or clustering methods seem to be ideally suited to analyze the semi-structured log data of user accesses. In this paper, we define the notion of a “user session”, as well as a dissimilarity measure between two web sessions that captures the organization of a web site. To extract a user access profile, we cluster the user sessions based on the pair-wise dissimilarities using a robust fuzzy clustering algorithm that we have developed. We report the results of experiments with our algorithm and show that this leads to extraction of interesting user profiles. We also show that it outperforms association rule based approaches for this task. 1
A Web Page Prediction Model Based on Click-Stream Tree Representation Of User
- In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2003
"... Predicting the next request of a user as she visits Web pages has gained importance as Web-based activity increases. Markov models and their variations, or models based on sequence mining have been found well suited for this problem. However, higher order Markov models are extremely complicated due ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
Predicting the next request of a user as she visits Web pages has gained importance as Web-based activity increases. Markov models and their variations, or models based on sequence mining have been found well suited for this problem. However, higher order Markov models are extremely complicated due to their large number of states whereas lower order Markov models do not capture the entire behavior of a user in a session. The models that are based on sequential pattern mining only consider the frequent sequences in the data set, making it di#cult to predict the next request following a page that is not in the sequential pattern. Furthermore, it is hard to find models for mining two di#erent kinds of information of a user session. We propose a new model that considers both the order information of pages in a session and the time spent on them. We cluster user sessions based on their pair-wise similarity and represent the resulting clusters by a click-stream tree. The new user session is then assigned to a cluster based on a similarity measure. The click-stream tree of that cluster is used to generate the recommendation set. The model can be used as part of a cache prefetching system as well as a recommendation model.
Mining Association Rules in Hypertext Databases
- In Proc. of the fourth Int. Conf. on Knowledge Discovery and Data Mining
, 1999
"... Association rule techniques traditionally aim to mine information from databases consisting of a set of flat transaction records. In this work we propose a generalisation of the notion of association rule in the context of flat transactions to that of a composite association rule in the context of a ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
Association rule techniques traditionally aim to mine information from databases consisting of a set of flat transaction records. In this work we propose a generalisation of the notion of association rule in the context of flat transactions to that of a composite association rule in the context of a structured directed graph, such as the world-wide-web. The techniques proposed aim at finding patterns in the user behaviour when traversing such a hypertext system. We redefine the concepts of confidence and support for composite association rules, which are trails of links representing a user's navigation session; the actual data may be obtained from log files. Two algorithms to mine composite association rules are exhibited: one is a modification of the directed graph Depth-First-Search algorithm, and the other uses an incremental approach to build the set of composite rules of size n+ 1 from the set of composite rules of size n. Extensive experiments with random data were conducted in o...
Making Recommender Systems Work for Organizations
- In Proceedings of PAAM'99
, 1999
"... For the past two years, we have been investigating the use of recommender systems as a technology in support of knowledge sharing in organizations. Recommender systems are a way of extending the natural process of recommendation by word-of-mouth to networked groups of people. They are able to provid ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
For the past two years, we have been investigating the use of recommender systems as a technology in support of knowledge sharing in organizations. Recommender systems are a way of extending the natural process of recommendation by word-of-mouth to networked groups of people. They are able to provide personalized recommendations that take into account similarities between people based on their user profiles. The community around recommender systems that has emerged in the past five or so years has focused on methods for constructing and learning user profiles, the exploration and testing of various recommendation algorithms, and the design of user interfaces, with applications primarily in the domains of electronic commerce and leisure/entertainment. Thus far, we have...
Low-complexity fuzzy relational clustering algorithms for web mining
- IEEE TRANSACTIONS ON FUZZY SYSTEMS
, 2001
"... This paper presents new algorithms—fuzzy cmedoids (FCMdd) and robust fuzzy c-medoids (RFCMdd)—for fuzzy clustering of relational data. The objective functions are based on selecting c representative objects (medoids) from the data set in such a way that the total fuzzy dissimilarity within each clus ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
This paper presents new algorithms—fuzzy cmedoids (FCMdd) and robust fuzzy c-medoids (RFCMdd)—for fuzzy clustering of relational data. The objective functions are based on selecting c representative objects (medoids) from the data set in such a way that the total fuzzy dissimilarity within each cluster is minimized. A comparison of FCMdd with the wellknown relational fuzzy c-means algorithm (RFCM) shows that FCMdd is more efficient. We present several applications of these algorithms to Web mining, including Web document clustering, snippet clustering, and Web access log analysis.
Robust Fuzzy Clustering Methods to Support Web Mining
- Proc. Workshop in Data Mining and knowledge Discovery, SIGMOD
, 1998
"... this paper is to describe some of the challenges of web mining, point out why existing techniques of data mining may be inadequate, outline a possible direction for research, and present our preliminary work. ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
this paper is to describe some of the challenges of web mining, point out why existing techniques of data mining may be inadequate, outline a possible direction for research, and present our preliminary work.
Separating the Swarm: Categorization Methods for User Sessions on the Web
- In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems: Changing Our World, Changing Ourselves
, 2002
"... Understanding user behaviors on Web sites enables site owners to make sites more usable, ultimately helping users to achieve their goals more quickly. Accordingly, researchers have devised methods for categorizing user sessions in hopes of revealing user interests. These techniques build user profil ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Understanding user behaviors on Web sites enables site owners to make sites more usable, ultimately helping users to achieve their goals more quickly. Accordingly, researchers have devised methods for categorizing user sessions in hopes of revealing user interests. These techniques build user profiles by combining users' navigation paths with other data features, such as page viewing time, hypedink structure, and page content. Previously, we have presented complex techniques of combining many of these data features to cluster user profiles. In this paper, we introduce a user study and a systematic evaluation of these different data features and their associated weighting schemes. We present the results of our study, including accuracy measures for a number of clustering approaches, and offer recommendations for Web analysts. While further investigation over more sites is needed to definitively settle on a robust scheme, we have characterized this analytic space.

