Results 1 - 10
of
51
Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions
- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2005
"... This paper presents an overview of the field of recommender systems and describes the current generation of recommendation methods that are usually classified into the following three main categories: content-based, collaborative, and hybrid recommendation approaches. This paper also describes vario ..."
Abstract
-
Cited by 381 (2 self)
- Add to MetaCart
This paper presents an overview of the field of recommender systems and describes the current generation of recommendation methods that are usually classified into the following three main categories: content-based, collaborative, and hybrid recommendation approaches. This paper also describes various limitations of current recommendation methods and discusses possible extensions that can improve recommendation capabilities and make recommender systems applicable to an even broader range of applications. These extensions include, among others, an improvement of understanding of users and items, incorporation of the contextual information into the recommendation process, support for multcriteria ratings, and a provision of more flexible and less intrusive types of recommendations.
CubeSVD: A Novel Approach to Personalized Web Search
- In Proc. of the 14 th International World Wide Web Conference (WWW
, 2005
"... As the competition of Web search market increases, there is a high demand for personalized Web search to conduct retrieval incorporating Web users' information needs. This paper focuses on utilizing clickthrough data to improve Web search. Since millions of searches are conducted everyday, a search ..."
Abstract
-
Cited by 47 (3 self)
- Add to MetaCart
As the competition of Web search market increases, there is a high demand for personalized Web search to conduct retrieval incorporating Web users' information needs. This paper focuses on utilizing clickthrough data to improve Web search. Since millions of searches are conducted everyday, a search engine accumulates a large volume of clickthrough data, which records who submits queries and which pages he/she clicks on. The clickthrough data is highly sparse and contains di#erent types of objects (user, query and Web page), and the relationships among these objects are also very complicated. By performing analysis on these data, we attempt to discover Web users' interests and the patterns that users locate information. In this paper, a novel approach CubeSVD is proposed to improve Web search. The clickthrough data is represented by a 3-order tensor, on which we perform 3-mode analysis using the higher-order singular value decomposition technique to automatically capture the latent factors that govern the relations among these multi-type objects: users, queries and Web pages. A tensor reconstructed based on the CubeSVD analysis reflects both the observed interactions among these objects and the implicit associations among them. Therefore, Web search activities can be carried out based on CubeSVD analysis. Experimental evaluations using a real-world data set collected from an MSN search engine show that CubeSVD achieves encouraging search results in comparison with some standard methods.
Web Usage Mining Based on Probabilistic Latent Semantic Analysis
, 2004
"... The primary goal of Web usage mining is the discovery of patterns in the navigational behavior of Web users. Standard approaches, such as clustering of user sessions and discovering association rules or frequent navigational paths, do not generally provide the ability to automatically characterize o ..."
Abstract
-
Cited by 29 (5 self)
- Add to MetaCart
The primary goal of Web usage mining is the discovery of patterns in the navigational behavior of Web users. Standard approaches, such as clustering of user sessions and discovering association rules or frequent navigational paths, do not generally provide the ability to automatically characterize or quantify the unobservable factors that lead to common navigational patterns. It is, therefore, necessary to develop techniques that can automatically identify the users' underlying navigational objectives and to discover hidden semantic relationships among users as well as between users and Web objects. Probabilistic Latent Semantic Analysis (PLSA) is particularly useful in this context, since it can uncover latent semantic associations among users and pages based on the co-occurrence patterns of these pages in user sessions. In this paper, we develop a unified framework for the discovery and analysis of Web navigational patterns based on PLSA. We show the flexibility of this framework in characterizing various relationships among users and Web objects. Since these relationships are measured in terms of probabilities, we are able to use probabilistic inference to perform a variety of analysis tasks such as user segmentation, page classification, as well as predictive tasks such as collaborative recommendations. We demonstrate the e#ectiveness our approach through experiments performed on several real-world data sets.
The impact of site structure and user environment on session reconstruction in web usage analysis
, 2002
"... The analysis of user behavior on the Web presupposes a reliable reconstruction of the users ’ navigational activities. Cookies and server-generated session identifiers have been designed to allow a faithful session reconstruction. However, in the absence of reliable methods, analysts must rely on he ..."
Abstract
-
Cited by 27 (4 self)
- Add to MetaCart
The analysis of user behavior on the Web presupposes a reliable reconstruction of the users ’ navigational activities. Cookies and server-generated session identifiers have been designed to allow a faithful session reconstruction. However, in the absence of reliable methods, analysts must rely on heuristics methods (a) to identify unique visitors to a site, and (b) to distinguish among the activities of such users during independent sessions. The characteristics of the site, such as the site structure, as well as the methods used for data collection (e.g., the existence of cookies and reliable synchronization across multiple servers) may necessitate the use of different types of heuristics. In this study, we extend our work on the reliability of sessionizing mechanisms, by investigating the impact of site structure on the quality of constructed sessions. Specifically, we juxtapose sessionizing on a frame-based and a frame-free version of a site. We investigate the behavior of cookies, server-generated session identification, and heuristics that exploit session duration, page stay time and page linkage. Different measures of session reconstruction quality, as well as experiments on the impact on the prediction of frequent entry and exit pages, show that different reconstruction heuristics can be recommended depending on the characteristics of the site. We also present first results on the impact of session reconstruction heuristics on predictive applications such as Web personalization.
Semantically Enhanced Collaborative Filtering on the Web
- Web Mining: From Web to Semantic Web. LNAI 3209. Springer-Verlag (2004
, 2004
"... Item-based Collaborative Filtering (CF) algorithms have been designed to deal with the scalability problems associated with traditional user-based CF approaches without sacrificing recommendation or prediction accuracy. Item-based algorithms avoid the bottleneck in computing user-user correlatio ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
Item-based Collaborative Filtering (CF) algorithms have been designed to deal with the scalability problems associated with traditional user-based CF approaches without sacrificing recommendation or prediction accuracy. Item-based algorithms avoid the bottleneck in computing user-user correlations by first considering the relationships among items and performing similarity computations in a reduced space. Because the computation of item similarities is independent of the methods used for generating predictions, multiple knowledge sources, including structured semantic information about items, can be brought to bear in determining similarities among items. The integration of semantic similarities for items with rating- or usage-based similarities allows the system to make inferences based on the underlying reasons for which a user may or may not be interested in a particular item. Furthermore, in cases where little or no rating (or usage) information is available (such as in the case of newly added items, or in very sparse data sets), the system can still use the semantic similarities to provide reasonable recommendations for users. In this paper, we introduce an approach for semantically enhanced collaborative filtering in which structured semantic knowledge about items, extracted automatically from the Web based on domain-specific reference ontologies, is used in conjunction with user-item mappings to create a combined similarity measure and generate predictions. Our experimental results demonstrate that the integrated approach yields significant advantages both in terms of improving accuracy, as well as in dealing with very sparse data sets or new items.
Model-based collaborative filtering as a defense against profile injection attacks
- In Proceedings of the 21st National Conference on Artificial Intelligence
, 2006
"... The open nature of collaborative recommender systems allows attackers who inject biased profile data to have a significant impact on the recommendations produced. Standard memory-based collaborative filtering algorithms, such as knearest neighbor, have been shown to be quite vulnerable to such attac ..."
Abstract
-
Cited by 15 (9 self)
- Add to MetaCart
The open nature of collaborative recommender systems allows attackers who inject biased profile data to have a significant impact on the recommendations produced. Standard memory-based collaborative filtering algorithms, such as knearest neighbor, have been shown to be quite vulnerable to such attacks. In this paper, we examine the robustness of model-based recommendation algorithms in the face of profile injection attacks. In particular, we consider two recommendation algorithms, one based on k-means clustering and the other based on Probabilistic Latent Semantic Analysis (PLSA). These algorithms aggregate similar users into user segments that are compared to the profile of an active user to generate recommendations. Traditionally, model-based algorithms have been used to alleviate the scalability problems associated with memory-based recommender systems. We show, empirically, that these algorithms also offer significant improvements in stability and robustness over the standard knearest neighbor approach when attacked. Furthermore, our results show that, particularly, the PLSA-based approach can achieve comparable recommendation accuracy.
Data Mining for Web Personalization
- The Adaptive Web: Methods and Strategies of Web Personalization. Lecture
, 2006
"... Abstract. In this chapter we present an overview of Web personalization process viewed as an application of data mining requiring support for all the phases of a typical data mining cycle. These phases include data collection and preprocessing, pattern discovery and evaluation, and finally applying ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Abstract. In this chapter we present an overview of Web personalization process viewed as an application of data mining requiring support for all the phases of a typical data mining cycle. These phases include data collection and preprocessing, pattern discovery and evaluation, and finally applying the discovered knowledge in real-time to mediate between the user and the Web. This view of the personalization process provides added flexibility in leveraging multiple data sources and in effectively using the discovered models in an automatic personalization system. The chapter provides a detailed discussion of a host of activities and techniques used at different stages of this cycle, including the preprocessing and integration of data from multiple sources, as well as pattern discovery techniques that are typically applied to this data. We consider a number of classes of data mining algorithms used particularly for Web personalization, including techniques based on clustering, association rule discovery, sequential pattern mining, Markov models, and probabilistic mixture and hidden (latent) variable models. Finally, we discuss hybrid data mining frameworks that leverage data from a variety
Intelligent techniques for web personalization
- IJCAI 2003 Workshop, ITWP 2003
, 2005
"... Abstract. In this chapter we provide a comprehensive overview of the topic of Intelligent Techniques for Web Personalization. Web Personalization is viewed as an application of data mining and machine learning techniques to build models of user behaviour that can be applied to the task of predicting ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Abstract. In this chapter we provide a comprehensive overview of the topic of Intelligent Techniques for Web Personalization. Web Personalization is viewed as an application of data mining and machine learning techniques to build models of user behaviour that can be applied to the task of predicting user needs and adapting future interactions with the ultimate goal of improved user satisfaction. This chapter survey’s the state-of-the-art in Web personalization. We start by providing a description of the personalization process and a classification of the current approaches to Web personalization. We discuss the various sources of data available to personalization systems, the modelling approaches employed and the current approaches to evaluating these systems. A number of challenges faced by researchers developing these systems are described as are solutions to these challenges proposed in literature. The chapter concludes with a discussion on the open challenges that must be addressed by the research community if this technology is to make a positive impact on user satisfaction with the Web. 1
A Customer Purchase Incidence Model Applied to Recommender Services
- WEBKDD 2001 Mining Web Log Data Accross All Customer Touch Points. Volume 2356 of LNAI
, 2001
"... In this contribution we transfer a customer purchase incidence model for consumer products which is based on Ehrenberg 's repeat-buying theory to Web-based information products. Ehrenberg's repeat-buying theory successfully describes regularities on a large number of consumer product markets. We sho ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
In this contribution we transfer a customer purchase incidence model for consumer products which is based on Ehrenberg 's repeat-buying theory to Web-based information products. Ehrenberg's repeat-buying theory successfully describes regularities on a large number of consumer product markets. We show that these regularities exist in electronic markets for information goods too, and that purchase incidence models provide a well founded theoretical base for recommender and alert services.
Web usage mining: extracting unexpected periods from web logs
- DATA MINING AND KNOWLEDGE DISCOVERY
, 2008
"... Existing Web Usage Mining techniques are currently based on an arbitrary division of the data (e.g. "one log per month") or guided by presumed results (e.g "what is the customers behaviour for the period of Christmas purchases?"). Those approaches have two main drawbacks. First, they depend on this ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
Existing Web Usage Mining techniques are currently based on an arbitrary division of the data (e.g. "one log per month") or guided by presumed results (e.g "what is the customers behaviour for the period of Christmas purchases?"). Those approaches have two main drawbacks. First, they depend on this arbitrary organization of the data. Second, they cannot automatically extract seasons peaks among the stored data. In this paper, we propose to perform a specific data mining process (and particularly to extract frequent behaviours) in order to automatically discover the densest periods. Our method extracts, among the whole set of possible combinations, the frequent sequential patterns related to the extracted periods. A period will be considered as dense if it contains at least one frequent sequential pattern for the set of users connected to the Web site in that period. Our experiments show that the extracted periods are relevant and our approach is able to extract both frequent sequential patterns and the associated dense periods.

