Results 1 -
8 of
8
Model-based clustering and visualization of navigation patterns on a web site
- Data Mining and Knowledge Discovery
, 2003
"... We present a new methodology for exploring and analyzing navigation patterns on a web site. The patterns that can be analyzed consist of sequences of URL categories traversed by users. In our approach, we rst partition site users into clusters such that users with similar navigation paths through th ..."
Abstract
-
Cited by 36 (0 self)
- Add to MetaCart
We present a new methodology for exploring and analyzing navigation patterns on a web site. The patterns that can be analyzed consist of sequences of URL categories traversed by users. In our approach, we rst partition site users into clusters such that users with similar navigation paths through the site are placed into the same cluster. Then, for each cluster, we display these paths for users within that cluster. The clustering approach weemployis model-based (as opposed to distance-based) and partitions users according to the order in which they request web pages. In particular, we cluster users by learning a mixture of rst-order Markov models using the Expectation-Maximization algorithm. The runtime of our algorithm scales linearly with the number of clusters and with the size of the data � and our implementation easily handles hundreds of thousands of user sessions in memory. In the paper, we describe the details of our method and a visualization tool based on it called WebCANVAS. We illustrate the use of our approach on user-tra c data from msnbc.com. Keywords: Model-based clustering, sequence clustering, data visualization, Internet, web 1
Web path recommendations based on page ranking and markov models
- In WIDM ’05: Proceedings of the 7th annual ACM international workshop on Web information and data management, 2–9
, 2005
"... Markov models have been widely used for modelling users' navigational behaviour in the Web graph, using the transitional probabilities between web pages, as recorded in the web logs. The recorded users ' navigation is used to extract popular web paths and predict current users ’ next steps. Such pur ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Markov models have been widely used for modelling users' navigational behaviour in the Web graph, using the transitional probabilities between web pages, as recorded in the web logs. The recorded users ' navigation is used to extract popular web paths and predict current users ’ next steps. Such purely usage-based probabilistic models, however, present certain shortcomings. Since the prediction of users ' navigational behaviour is based solely on the usage data, structural properties of the Web graph are ignored. Thus important- in terms of pagerank authority score- paths may be underrated. In this paper we present a hybrid probabilistic predictive model extending the properties of Markov models by incorporating link analysis methods. More specifically, we propose the use of a PageRank-style algorithm for assigning prior probabilities to the web pages based on their importance in the web site's graph. We prove, through experimentation, that this approach results in more objective and representative predictions than the ones produced from the pure usage-based approaches.
Internet Data Analysis for the Undergraduate Statitics Curriculum
, 2003
"... KEY WORDS: Inverse Gaussian; maximum likelihood; web server log data; internet survey data; internet traffic Statistics textbooks for undergraduates have not caught up with the enormous amount of analysis of Internet data that is taking place these days. Case studies that use Web server log data, In ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
KEY WORDS: Inverse Gaussian; maximum likelihood; web server log data; internet survey data; internet traffic Statistics textbooks for undergraduates have not caught up with the enormous amount of analysis of Internet data that is taking place these days. Case studies that use Web server log data, Internet survey data or Internet network traffic data are rare in undergraduate Statistics education. This paper summarizes the results of research in three areas of Internet data analysis: users ’ web browing behavior, user demographics, and network performance. We present some of the main questions analyzed in the literature, some unsolved problems, and some typical data analysis methods used. We illustrate the questions and the methods with large data sets. The data sets were obtained from the publicly available pool of data. Those data sets had to be processed and transformed to make them available for classroom exercises. The processed data sets as well as more material for classes, are available at a web site with address that can be obtained from the main author. 1.
Data URL
"... Computer Scientists are being called more and more frequently to provide computer log data that can be used to find out how users interact with the Internet (see, for example, Sen and Hansen (2003), Cadez et al. (forthcoming), Huberman et al (1998)). In addition to that, the number of Internet user ..."
Abstract
- Add to MetaCart
Computer Scientists are being called more and more frequently to provide computer log data that can be used to find out how users interact with the Internet (see, for example, Sen and Hansen (2003), Cadez et al. (forthcoming), Huberman et al (1998)). In addition to that, the number of Internet user surveys is growing at an amazing speed (see, for example, Wellman and Haythornthwaite (2002)). The following problem uses a data set published in the UCI KDD Archive (Kederman, 2003) to obtain data on the number of different pages visited by users who entered the msnbc.com page on September 28, 1999 and other infor-mation. The variables we analyze in this activity are usually investigated in the references provided above. Data Description The data comes from Internet Information Server (IIS) logs for msnbc.com and news-related portions of msn.com for the entire day of September 26, 1999 (Pacific Standard Time). Each sequence in the dataset corresponds to page views of a user during that twenty-four hours period. Each event in the sequence corresponds to a user’s request for a page. Requests are not recorded at the finest level of detail–that is, at the level of URL, but rather, they are recorded at the levels of page category (as determined by a site administra-
SearchGen: a Synthetic Workload Generator for Scientific Literature Digital Libraries and Search Engines
"... Due to the popularity of web applications and their heavy usage, it is important to obtain a good understanding of their workloads in order to improve performance of search services. Existing works have typically focused on generic web workloads without putting emphasis on specific domains. In this ..."
Abstract
- Add to MetaCart
Due to the popularity of web applications and their heavy usage, it is important to obtain a good understanding of their workloads in order to improve performance of search services. Existing works have typically focused on generic web workloads without putting emphasis on specific domains. In this paper, we analyze the usage logs of CiteSeer, a scientific literature digital library and search engine, to characterize workloads for both robots and users. Essential ingredients that contribute to workloads are proposed. Among them we find the access intervals show high variance, and thus cannot be predicted well with time-series models. On the other hand, client visiting path and semantics can be well captured with probabilistic models and Zipf-law. Based on the findings, we propose SearchGen, a synthetic workload generator to output traces for scientific literature digital libraries and search engines. A comparison between synthetic workloads and actual logged traces suggests that the synthetic workload fits well.
borges˙levene˙predicting˙users˙next˙step˙ijitdm A COMPARISON OF SCORING METRICS FOR PREDICTING THE NEXT NAVIGATION STEP
"... The problem of predicting the next request during a user’s navigation session has been extensively studied. In this context, higher-order Markov models have been widely used to model navigation sessions and for predicting the next navigation step, while prediction accuracy has been mainly evaluated ..."
Abstract
- Add to MetaCart
The problem of predicting the next request during a user’s navigation session has been extensively studied. In this context, higher-order Markov models have been widely used to model navigation sessions and for predicting the next navigation step, while prediction accuracy has been mainly evaluated with the hit and miss score. We claim that this score, although useful, is not sufficient for evaluating next link prediction models with the aim of finding a sufficient order of the model, the size of a recommendation set and assessing the impact of unexpected events on the prediction accuracy. Herein, we make use of a variable length Markov model to compare the usefulness of three alternatives to the hit and miss score: the Mean Absolute Error, the Ignorance Score and the Brier score. We present an extensive evaluation of the methods on real data sets and a comprehensive comparison of the scoring methods. Key words: Web usage mining; variable length Markov model; sequential prediction; scoring metrics 1.
Are Web Users Really Markovian? ABSTRACT
, 2012
"... User modeling on the Web has rested on the fundamental assumption of Markovian behavior — a user’s next action depends only on her current state, and not the history leading up to the current state. This forms the underpinning of PageRank web ranking, as well as a number of techniques for targeting ..."
Abstract
- Add to MetaCart
User modeling on the Web has rested on the fundamental assumption of Markovian behavior — a user’s next action depends only on her current state, and not the history leading up to the current state. This forms the underpinning of PageRank web ranking, as well as a number of techniques for targeting advertising to users. In this work we examine the validity of this assumption, using data from a number of Web settings. Our main result invokes statistical order estimation tests for Markov chains to establish that Web users are not, in fact, Markovian. We study the extent to which the Markovian assumption is invalid, and derive a number of avenues for further research.
Chapter 6 Web Usage Mining
"... Abstract. In recent years, e-businesses have been profiting from recent advances on the analysis of web customer behaviour. For decades experts have debated on ways of presenting the content or structure in a web site in order to captivate the attention of the web user in the web intelligence commun ..."
Abstract
- Add to MetaCart
Abstract. In recent years, e-businesses have been profiting from recent advances on the analysis of web customer behaviour. For decades experts have debated on ways of presenting the content or structure in a web site in order to captivate the attention of the web user in the web intelligence community. A solution to this could help boost sales in an e-commerce site. Web Usage Mining (WUM) is the extraction of the web user browsing behaviour using data mining techniques on web data. According to this, several models of data analysis have been used to characterize the Web User Browsing Behaviour. Nevertheless, outstanding techniques have recently developed in order to improve the conventional success rates for behavioural pattern extraction. In this chapter different approaches for WUM are presented, considering their main insights, results, and applications to web behaviour systems. The Internet has become a regular channel for communication, most of all for business transactions. Commerce over the Internet has grown to higher levels in recent years. For instance, e-shopping sales has been drastically increasing in the previous

