• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Web dynamics and their ramifications for the development of web search engines,” (2006)

by Y Ke, L Deng, W Ng, D-L Lee
Venue:Computer Networks,
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 10

Observing Linked Data Dynamics

by Tobias Käfer, Ahmed Abdelrahman, Jürgen Umbrich, Aidan Hogan - In ESWC , 2013
"... Abstract. In this paper, we present the design and first results of the Dynamic Linked Data Observatory: a long-term experiment to monitor the two-hop neighbourhood of a core set of eighty thousand diverse Linked Data documents on a weekly basis. We present the methodology used for sampling the URIs ..."
Abstract - Cited by 20 (3 self) - Add to MetaCart
Abstract. In this paper, we present the design and first results of the Dynamic Linked Data Observatory: a long-term experiment to monitor the two-hop neighbourhood of a core set of eighty thousand diverse Linked Data documents on a weekly basis. We present the methodology used for sampling the URIs to monitor, retrieving the documents, and further crawling part of the two-hop neighbourhood. Having now run this experiment for six months, we analyse the dynamics of the monitored documents over the data collected thus far. We look at the estimated lifespan of the core documents, how often they go on-line or offline, how often they change; we further investigate domain-level trends. Next we look at changes within the RDF content of the core documents across the weekly snapshots, examining the elements (i.e., triples, subjects, predicates, objects, classes) that are most frequently added or removed. Thereafter, we look at how the links between dereferenceable documents evolves over time in the two-hop neighbourhood. 1
(Show Context)

Citation Context

...change (e.g., time of day, day of the week), growth rate of the Web, dynamicity of links, the relation between top-level domains and dynamicity, etc. We refer readers to the broad survey by Ke et al. =-=[6]-=- about Web dynamics. As opposed to these related works, we focus specifically on the dynamicity of RDF documents in the context of Linked Data. Few papers specifically analyse RDF or Linked Data dynam...

Mining User Preference Using Spy Voting for Search Engine Personalization

by Wilfred Ng, Lin Deng, Dik Lun Lee
"... This paper addresses search engine personalization. We present a new approach to mining a user’s preferences on the search results from clickthrough data and using the discovered preferences to adapt the search engine’s ranking function for improving search quality. We develop a new preference minin ..."
Abstract - Cited by 14 (2 self) - Add to MetaCart
This paper addresses search engine personalization. We present a new approach to mining a user’s preferences on the search results from clickthrough data and using the discovered preferences to adapt the search engine’s ranking function for improving search quality. We develop a new preference mining technique called SpyNB, which is based on the practical assumption that the search results clicked on by the user reflect the user’s preferences, but it does not draw any conclusions about the results that the user did not click on. As such, SpyNB is still valid even if the user does not follow any order in reading the search results or does not click on all relevant results. Our extensive offline experiments demonstrate that SpyNB discovers many more accurate preferences than existing algorithms do. The interactive online experiments further confirm that SpyNB and our personalization approach are effective in practice. We also show that the efficiency of SpyNB is comparable to existing simple preference mining algorithms.
(Show Context)

Citation Context

...ge, a round of search engine adaptation is finished. The adaptation process can be repeated regularly to determine the most updated user preferences. It is worth mentioning that in our recent survey [=-=Ke et al. 2005-=-], we classified search engine adaptation into three categories, namely, content-based personalization, link-based personalization and function-based personalization. Our current approach falls in the...

imecho: an associative memory based desktop search system

by Jidong Chen, Hang Guo, Wentao Wu, Wei Wang - In CIKM , 2009
"... ..."
Abstract - Cited by 11 (1 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...ified ways for web search. In a nutshell, the techniques can be classified into three categories, namely, content based personalization, link-based personalization, and function-based personalization =-=[14]-=-. Content-based personalization deals with the “relevance” measure of Web pages and the user’s queries. In order to manage user interests, a content-based personalization technique is used to construc...

A method for measuring the evolution of a topic on the Web: The case of “informetrics

by Judit Bar-ilan, Bluma C. Peritz - Journal of the American Society for Information Science and Technology , 2009
"... The World Wide Web is growing at an enormous speed, and has become an indispensable source for information and research. New pages are being added to the Web, but there are additional processes as well: pages are moved or removed and/or their content changes. In order to obtain a better understandin ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
The World Wide Web is growing at an enormous speed, and has become an indispensable source for information and research. New pages are being added to the Web, but there are additional processes as well: pages are moved or removed and/or their content changes. In order to obtain a better understanding of these processes, we developed a method for tracking topics on the Web for long periods of time. We use multiple data collection methods that allow us: to discover new pages related to the topic; to identify changes to existing pages and to detect previously existing pages that have been removed or their content is not relevant anymore to the specified topic. The method is demonstrated through monitoring Web pages that contain the term “informetrics ” for a period of eight years. The data collection method also allowed us to analyze the dynamic changes in search engine coverage; here we illustrate these changes on Google, the search engine used for the longest period of time for data collection in this project.
(Show Context)

Citation Context

...s on data from 2003-4 (three data collectionspoints). Ortega et al. [34] crawled about a thousand sites twice,sonce in 1997 and once in 2004. Additional studies are covered insthe survey by Ke et al. =-=[26]-=-.sAll previous studies that we were able to locate used a single datascollection method. They either monitored a fixed data set (e.g.,s[22, 28]) or crawled in a pre-specified manner a fixed number ofs...

2008), A three-year study on the freshness of Web search engine databases

by D Lewandowski , 2008
"... ..."
Abstract - Cited by 6 (2 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...he indexing policy of the search engines or in the size of the databases. These are important factors that could explain inconsistencies in the results, as we will report below. Ke, Deng, Ng, and Lee =-=[16]-=- give a good overview of the problems for search engines resulting from Web dynamics. Crawling and indexing problems resulting from Web dynamics from a commercial search engine’s point of view can be ...

Time series analysis of the dynamics of news websites

by Maria Carla Calzarossa , Daniele Tessera
"... Abstract-The content of news websites changes frequently and rapidly and its relevance tends to decay with time. To be of any value to the users, tools, such as, search engines, have to cope with these evolving websites and detect in a timely manner their changes. In this paper we apply time series ..."
Abstract - Add to MetaCart
Abstract-The content of news websites changes frequently and rapidly and its relevance tends to decay with time. To be of any value to the users, tools, such as, search engines, have to cope with these evolving websites and detect in a timely manner their changes. In this paper we apply time series analysis to study the properties and the temporal patterns of the change rates of the content of three news websites. Our investigation shows that changes are characterized by large fluctuations with periodic patterns and time dependent behavior. The time series describing the change rate is decomposed into trend, seasonal and irregular components and models of each component are then identified. The trend and seasonal components describe the daily and weekly patterns of the change rates. Trigonometric polynomials best fit these deterministic components, whereas the class of ARMA models represents the irregular component. The resulting models can be used to describe the dynamics of the changes and predict future change rates.
(Show Context)

Citation Context

... follows. After an overview of the literature on Web dynamics given in Section II, the methodological approach followed for the analysis of the evolution of the news websites is introduced in Sect. III. The experimental results are described in Section IV. Finally, Section V summarizes the major findings and outlines future research directions. II. RELATED WORK The design of search engines has to take into account the characteristics of websites and in particular their dynamics. In the literature, Web dynamics have been studied under some different perspectives, see, e.g., [1], [2], [3], [4], [5], [6], [7], [8]. For example, in [3], authors focus on growth and update dynamics, whereas a survey of the major research challenges on Web dynamics in the framework of four dimensions, namely, size, pages, link structures, and user interests, is presented in [5]. Other aspects related to how much a Web document changes and whether changes are clustered, are investigated in [1] where authors define two measures that quantify these changes and discuss the implications of this approach for Web information system maintenance. A fine grain characterization of the evolution of Web content is presen...

Chapter 2 Advanced Techniques in Web Data Pre-processing and Cleaning

by Pablo E. Román, Robert F. Dell, Juan D. Velásquez, Robert F. Dell
"... Abstract. Central to successful e-business is the construction of web sites that attract users, capture user preferences, and entice them into making a purchase. Web mining is diverse data mining applied to categorize both the content and structure of web sites with the goal of aiding e-business. We ..."
Abstract - Add to MetaCart
Abstract. Central to successful e-business is the construction of web sites that attract users, capture user preferences, and entice them into making a purchase. Web mining is diverse data mining applied to categorize both the content and structure of web sites with the goal of aiding e-business. Web mining requires knowledge of the web site structure (hyperlink graph), the web content (vector model) and user sessions (the sequence of pages visited by each user to a site). Much of the data for web mining can be noisy. The origin of the noise comes from many sources, for example, undocumented changes to the web site structure and content, a different understanding of the text and media semantic, and web logs without individual user identification. There may not be any record of the number of times a specific page has been visited in a session as page is stored on a proxy or web browser cache. Such noise presents a challenge for web mining. This chapter presents issues with and approaches for cleaning web data in preparation for web mining analysis. The Web has become the primary communication channel for many financial, trading,
(Show Context)

Citation Context

... nodes and hyperlinks as edges. However, the link structure is as dynamic as content and develops exponentially [6]. Despite the changing structure, large-scale statistical analyses of the hyperlinks =-=[5, 53]-=- show a power law distribution (p(x) ∼ x −α )hasa good fit for several structural measures (categorized by the exponent α). The study [5] suggests that the number of pages per web site is a power law ...

Context-aware Search for Personal Information Management Systems

by Jidong Chen
"... With the fast growth of disk capacity in personal computers, keyword search over personal data (a.k.a. desktop search) is becoming increasingly important. Nonetheless, desktop search has been shown to be more challenging than traditional Web search. Modern commercial Web search engines heavily rely ..."
Abstract - Add to MetaCart
With the fast growth of disk capacity in personal computers, keyword search over personal data (a.k.a. desktop search) is becoming increasingly important. Nonetheless, desktop search has been shown to be more challenging than traditional Web search. Modern commercial Web search engines heavily rely on structural information (i.e., hyperlinks between Web pages) to rank their search results. However, such information is not available in the circumstance of desktop search. Therefore, state-of-the-art desktop search systems such as Google Desktop Search usually leverage pure textbased ranking approaches (e.g., TF-IDF), which often fail to give promising rankings due to the misinterpretation of user intention. We observed that in desktop search, the semantics of keyword queries are often context-aware, i.e., they are related to the current activity state (e.g., writing a paper, navigating a website, etc.) of the user. In this paper, we present a novel context-aware search framework by taking this activity information into consideration. Specifically, we use Hidden Markov Model (HMM) to capture the relationships between user’s access actions (e.g., opening/closing files, sending/receiving emails, etc.) and activity states. The model is learned from user’s past access history and is used to predict user’s current activity upon the submission of some keyword query. We further propose a ranking scheme with this predicted context information incorporated. Experimental evaluation demonstrates both the effectiveness of the proposed context-aware search method and the enhancement to user’s search experience. 1
(Show Context)

Citation Context

...ified ways for web search. In a nutshell, the techniques can be classified into three categories, namely, content based personalization, link-based personalization, and function-based personalization =-=[17]-=-. Content-based personalization deals with the “relevance” measure of Web pages and the user’s queries. In this approach, the query is modified to adapt the search results for the specific user. In or...

SEE PROFILE

by Roman Y Shtykh, Qun Jin, Available From Qun Jin, Roman Y. Shtykh, Qun Jin
"... Harnessing user contributions and dynamic profiling to better satisfy individual information search needs ..."
Abstract - Add to MetaCart
Harnessing user contributions and dynamic profiling to better satisfy individual information search needs
(Show Context)

Citation Context

...ture of the World Wide Web (the web) in terms of its constantlysincreasing size, the great diversity of web contents, the hyperlink structure variability andsthe changeability of users’ search needs (=-=Ke et al., 2006-=-) poses a tremendous challengesfor Information Retrieval (IR) on the web. Nowadays, in the era of massive informationsoverload, no longer is web search considered solely as a match of terms contained ...

UNIVERSIDAD DE CHILE Web User Behavior Analysis

by Pablo Enrique, Roman Asenjo , 2011
"... “Scientia vincere tenebras” ..."
Abstract - Add to MetaCart
“Scientia vincere tenebras”
(Show Context)

Citation Context

...evisit schedule, a politeness policy, and parallel processing. Several strategies are available based on incomplete information: Breadth search ordering, Page rank based, prediction of changing pages =-=[KDNL06]-=-. Other issues relate to the imperfect mapping between URL and the page that is visually seen by users [RHGJ06]. A common HTML structure like frameset, groups a set of URLs in the same visual presente...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University