## Web Page Prediction Based on Conditional Random Fields

Citations: | 2 - 1 self |

### BibTeX

@MISC{Guo_webpage,

author = {Yong Zhen Guo and Kotagiri Ramamohanarao and Laurence A. F. Park},

title = {Web Page Prediction Based on Conditional Random Fields},

year = {}

}

### OpenURL

### Abstract

Abstract. Web page prefetching is used to reduce the access latency of the Internet. However, if most prefetched Web pages are not visited by the users in their subsequent accesses, the limited network bandwidth and server resources will not be used efficiently and even worsen the access delay problem. Therefore, enhancing the Web page prediction accuracy is a main problem of Web page prefetching. Conditional Random Fields (CRFs), which are popular sequential learning models, have already been successfully used for many Natural Language Processing (NLP) tasks such as POS tagging, name entity recognition (NER) and segmentation. In this paper, we propose the use of CRFs in the field of Web page prediction. We treat the accessing sessions of previous Web users as observation sequences and label each element of these observation sequences to get the corresponding label sequences, then based on these observation and label sequences we use CRFs to train a prediction model and predict the probable subsequent Web pages for the current users. Our experimental results show that CRFs can produce higher Web page prediction accuracy effectively when compared with other popular techniques like plain Markov Chains and Hidden Markov Models (HMMs).

### Citations

2310 | Conditional random fields: probabilistic models for segmenting and labeling sequence data
- Lafferty, McCallum, et al.
- 2001
(Show Context)
Citation Context ...blem. Therefore, the success of a prefetching method relies mainly on the prediction accuracy. In this paper, we propose a novel Web page prediction approach based on Conditional Random Fields (CRFs) =-=[1]-=- to improve the prediction accuracy. CRFs are powerful probabilistic framework for labeling and segmenting sequential data. Owing to their conditional nature, CRFs have the ability to model the depend... |

1164 |
Error bounds for convolutional codes and an asymptotically optimum decoding algorithm
- Viterbi
- 1967
(Show Context)
Citation Context ...j(yt, X, t) . The parameters θ = (λi, µj) can be estimated from training data using many different approaches such as GIS[20], IIS[24] and LBFGS[22, 23]. After the parameters are trained, the Viterbi =-=[21]-=- algorithm can be used to label the testing data and perform the prediction. 4 Experiments In this section we present a set of experiments that we performed to evaluate the performance of using CRFs i... |

488 | On the limited memory BFGS method for large scale optimization
- Liu, Nocedal
- 1989
(Show Context)
Citation Context ...∑ [ T∑ ( ∑ exp λifi(yt−1, yt, X, t) Y t=1 i j + ∑ )] µjsj(yt, X, t) . The parameters θ = (λi, µj) can be estimated from training data using many different approaches such as GIS[20], IIS[24] and LBFGS=-=[22, 23]-=-. After the parameters are trained, the Viterbi [21] algorithm can be used to label the testing data and perform the prediction. 4 Experiments In this section we present a set of experiments that we p... |

439 | Maximum entropy markov models for information extraction and segmentation
- McCallum, Freitag, et al.
- 2000
(Show Context)
Citation Context ...etween them. Therefore, discriminative models canovercome the inherent shortcomings of generative models and obtain higher labeling and prediction accuracy. The Maximum Entropy Markov Models (MEMMs) =-=[18]-=- are discriminative models. Because MEMMs conduct per-state normalization for the conditional probability of every next state given the current state and the observation sequence, they achieve a local... |

431 |
Generalized iterative scaling for log-linear models
- Darroch, Ratcliff
- 1972
(Show Context)
Citation Context ...llowing format: Z(X) = ∑ [ T∑ ( ∑ exp λifi(yt−1, yt, X, t) Y t=1 i j + ∑ )] µjsj(yt, X, t) . The parameters θ = (λi, µj) can be estimated from training data using many different approaches such as GIS=-=[20]-=-, IIS[24] and LBFGS[22, 23]. After the parameters are trained, the Viterbi [21] algorithm can be used to label the testing data and perform the prediction. 4 Experiments In this section we present a s... |

364 |
A Tutorial on
- Rabiner
- 1989
(Show Context)
Citation Context ...cting Web users’ browsing behaviors, and presented three schemes to eliminate the state space complexity of higher-order Markov models without influencing the performance. A Hidden Markov Model (HMM) =-=[15]-=- is a dual-stochastic process which is very popular for labeling sequences, one stochastic process is an invisible Markov chain that describes the transition between states (labels) while the other re... |

215 |
Updating quasi-Newton matrices with limited storage
- Nocedal
- 1980
(Show Context)
Citation Context ...∑ [ T∑ ( ∑ exp λifi(yt−1, yt, X, t) Y t=1 i j + ∑ )] µjsj(yt, X, t) . The parameters θ = (λi, µj) can be estimated from training data using many different approaches such as GIS[20], IIS[24] and LBFGS=-=[22, 23]-=-. After the parameters are trained, the Viterbi [21] algorithm can be used to label the testing data and perform the prediction. 4 Experiments In this section we present a set of experiments that we p... |

122 | Dynamic conditional random fields: Factorized probabilistic models for labeling and segmenting sequence data
- Sutton, Rohanimanesh, et al.
- 2004
(Show Context)
Citation Context ...ey can also incorporate various features from observation sequences to increase the prediction accuracy. CRFs have already been used with success to many labeling-related tasks, suchas text chunking =-=[4]-=-, part-of-speech (POS) tagging [1], intrusion detection [5] and even predicting the secondary structures of protein sequences [6]. If we consider the access sessions of previous Internet users as obse... |

116 |
Link prediction and path analysis using Markov chains
- Sarukkai
- 2000
(Show Context)
Citation Context ...th. In this way the user’s following access requests can be predicted, but the construction of path trees and the match of history paths are expensive in terms of both computing and storage. Sarukkai =-=[12]-=- employed a 1 st -order Markov model to analyze access paths and make predictions. In this model, every Web page is considered as a different state, and one state can transfer to another state with a ... |

109 | Selective Markov models for predicting Web-page accesses,” presented at the
- Deshpande, Karypis
(Show Context)
Citation Context ...count more states when computing the transition probability, and thus improve the prediction accuracy. However, the increase of the order will increase the state space complexity. M. Deshpande et al. =-=[14]-=- discussed the shortcomings of higher-order Markov models in predicting Web users’ browsing behaviors, and presented three schemes to eliminate the state space complexity of higher-order Markov models... |

107 | Data mining for path traversal patterns in a web environment
- Chen
- 1996
(Show Context)
Citation Context ...page prediction approachs are presented in Section 4 along with the experimental results and evaluations. Finally, we conclude in Section 5 with our future work. 2 Related Works Ming Syan Chen et al. =-=[7]-=- introduced the notion of “maximal forward reference (MFR)” to identify users’ transactions and employed data mining techniques (such as association rules discovery) to mine frequently-accessed paths ... |

105 |
Using path profiles to predict http requests
- Schechter, Krishnan, et al.
- 1998
(Show Context)
Citation Context ...ng values. Yong Zhen Guo et al. [10] extended the UPR approach by introducing the access time duration of each Web page as another biasing factor, which will yield more accurate prediction. Schechter =-=[11]-=- constructed an access path tree for the current user and used the longest-match method to find a history path which matched the user’s current navigational path. In this way the user’s following acce... |

96 | The network effects of prefetching - Crovella, Barford - 1998 |

74 | Kernel conditional random fields: representation and clique selection
- Lafferty
- 2004
(Show Context)
Citation Context ...sed with success to many labeling-related tasks, suchas text chunking [4], part-of-speech (POS) tagging [1], intrusion detection [5] and even predicting the secondary structures of protein sequences =-=[6]-=-. If we consider the access sessions of previous Internet users as observation sequences, and in each observation sequence we use each pageview’s subsequent pageview as its label to get the correspond... |

42 |
The improved iterative scaling algorithm: A gentle introduction
- Berger
- 1997
(Show Context)
Citation Context ...ormat: Z(X) = ∑ [ T∑ ( ∑ exp λifi(yt−1, yt, X, t) Y t=1 i j + ∑ )] µjsj(yt, X, t) . The parameters θ = (λi, µj) can be estimated from training data using many different approaches such as GIS[20], IIS=-=[24]-=- and LBFGS[22, 23]. After the parameters are trained, the Viterbi [21] algorithm can be used to label the testing data and perform the prediction. 4 Experiments In this section we present a set of exp... |

22 |
Mukund Deshpande and Pang-Ning Tan. Web usage mining: Discovery and applications of usage patterns from web data
- Srivastava, Cooley
- 2000
(Show Context)
Citation Context ...ly takes users’ current access requests into consideration but not the whole access paths, which will influence the prediction accuracy. In order to deal with this problem, higher-order Markov models =-=[13]-=- are proposed, which take into account more states when computing the transition probability, and thus improve the prediction accuracy. However, the increase of the order will increase the state space... |

17 | Improving World Wide Web Latency
- Venkata, Padmanabhan
- 1995
(Show Context)
Citation Context ...ions that the Web page prefetching technique is able to decrease a user’s access delay dramatically and thus enhance the service quality of the World Wide Web [2]. The results from the simulations in =-=[3]-=- show that a 36% reduction in the latency perceived by an Internet user can be achieved at the cost of a 40% increase in the network traffic. Moreover, the studies in [2] indicate that by using the “r... |

17 |
2010 Layered Approach Using Conditional Random Fields for Intrusion Detection
- Gupta, Nath
(Show Context)
Citation Context ...equences to increase the prediction accuracy. CRFs have already been used with success to many labeling-related tasks, suchas text chunking [4], part-of-speech (POS) tagging [1], intrusion detection =-=[5]-=- and even predicting the secondary structures of protein sequences [6]. If we consider the access sessions of previous Internet users as observation sequences, and in each observation sequence we use ... |

8 | UPR: Usage-based Page Ranking for Web Personalization
- Eirinaki, Vazirgiannis
- 2006
(Show Context)
Citation Context ...ty of key words, this model builds a predictor for every different category of Web pages, which enhances the prediction accuracy but also decreases the applicability of this model. M. Eirinaki et al. =-=[9]-=- proposed a novel Web personalization approach: Usage-based PageRank (UPR), which combines both Web usage information and Web link structure information to conduct Web page ranking and prediction. Thi... |

6 |
Personalized PageRank for Web Page Prediction Based on Access Time-Length and Frequency
- Guo, Ramamohanarao, et al.
(Show Context)
Citation Context ...ng and prediction. This approach employs UPR to rank the Web pages in a relevant personalized navigational graph and predicts the probable pages in terms of their ranking values. Yong Zhen Guo et al. =-=[10]-=- extended the UPR approach by introducing the access time duration of each Web page as another biasing factor, which will yield more accurate prediction. Schechter [11] constructed an access path tree... |

6 | Computationally efficient M-estimation of log-linear structure models
- Smith, Vail, et al.
- 2007
(Show Context)
Citation Context ...performance of CRFs models. A good prediction model has poor performance without good features, while a less powerful prediction model may also perform well with a set of deliberately chosen features =-=[25]-=-. In our experiments, we only used the current and previous observations as the unigram features, more useful features can be incorporated to enhance the prediction accuracy, i.e., “the length of the ... |

2 |
Huanqing Xu. “An Approach to Intelligent Web Pre-fetching Based on
- Jin
(Show Context)
Citation Context ...hastic process is an invisible Markov chain that describes the transition between states (labels) while the other reflects the statistical relationship between states and observations. Xin Jin et al. =-=[16]-=- proposed a HMMbased prefetching model in which they employed HMM to capture and mine the latent concepts of information requirement implied by Web users’ access paths, and then used the obtained info... |

1 |
Cheng Zhong Xu. Neural Nets Based Predictive Prefetching to Tolerate WWW Latency
- Ibrahim
- 2000
(Show Context)
Citation Context ...then they presented algorithms to recognize the frequent traversal patterns from the maximal forward references obtained, which can be used to predict the user’s future requests. T. I. Ibrahim et al. =-=[8]-=- introduced a neural networks model to implement the semantics-based Web page prediction. This model extracts the semantics of a Web page according to the keywords of its URL anchor text. It employs t... |

1 |
Introduction to Statistical Relational Learning: An Introduction to Conditional Random Fields for Relational Learning
- Sutton, McCallum
- 2006
(Show Context)
Citation Context ...limited, it is difficult to enumerate all possible observation sequences, thus the calculation of p(x) is only an approximation to the real distribution, which will decrease the accuracy of the model =-=[17]-=-. Furthermore, the calculation of p(x) also requires strict independence assumptions over observation elements, which is not always possible in reality since most observation sequences in reality cont... |

1 |
Conditional Random Fields: An Introduction, CIS
- Wallach
- 2004
(Show Context)
Citation Context ...graph that obeys the Markov property, for the tasks of labeling the most common graphical structure is an undirected linear chain of first-order among label sequence Y , which can be seen in Figure 1 =-=[19]-=-. In our experiments, we made use of this linear chain model for the implementation of CRFs, where X = (x1, x2, · · · , xn) denotes an observation of a user’s accessing session of length n and Y = (y1... |

1 |
Archive: msnbc.com anonymous Web data. http://kdd.ics.uci.edu/databases/msnbc/msnbc.html, last accessible on February 10
- KDD
- 2008
(Show Context)
Citation Context ...al results show an overall enhancement in the prediction accuracy by using CRF-based measures. 4.1 Experimental Dataset and Preprocessings We used the publicly accessible msnbc.com anonymous Web data =-=[26]-=- as the dataset in our experiments. The msnbc dataset is obtained from the Web logs of www.msnbc.com and contains page visits of users who visited this website on September 28, 1999. All the user visi... |