Results 1 - 10
of
10
Observing Linked Data Dynamics
- In ESWC
, 2013
"... Abstract. In this paper, we present the design and first results of the Dynamic Linked Data Observatory: a long-term experiment to monitor the two-hop neighbourhood of a core set of eighty thousand diverse Linked Data documents on a weekly basis. We present the methodology used for sampling the URIs ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
(Show Context)
Abstract. In this paper, we present the design and first results of the Dynamic Linked Data Observatory: a long-term experiment to monitor the two-hop neighbourhood of a core set of eighty thousand diverse Linked Data documents on a weekly basis. We present the methodology used for sampling the URIs to monitor, retrieving the documents, and further crawling part of the two-hop neighbourhood. Having now run this experiment for six months, we analyse the dynamics of the monitored documents over the data collected thus far. We look at the estimated lifespan of the core documents, how often they go on-line or offline, how often they change; we further investigate domain-level trends. Next we look at changes within the RDF content of the core documents across the weekly snapshots, examining the elements (i.e., triples, subjects, predicates, objects, classes) that are most frequently added or removed. Thereafter, we look at how the links between dereferenceable documents evolves over time in the two-hop neighbourhood. 1
Mining User Preference Using Spy Voting for Search Engine Personalization
"... This paper addresses search engine personalization. We present a new approach to mining a user’s preferences on the search results from clickthrough data and using the discovered preferences to adapt the search engine’s ranking function for improving search quality. We develop a new preference minin ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
(Show Context)
This paper addresses search engine personalization. We present a new approach to mining a user’s preferences on the search results from clickthrough data and using the discovered preferences to adapt the search engine’s ranking function for improving search quality. We develop a new preference mining technique called SpyNB, which is based on the practical assumption that the search results clicked on by the user reflect the user’s preferences, but it does not draw any conclusions about the results that the user did not click on. As such, SpyNB is still valid even if the user does not follow any order in reading the search results or does not click on all relevant results. Our extensive offline experiments demonstrate that SpyNB discovers many more accurate preferences than existing algorithms do. The interactive online experiments further confirm that SpyNB and our personalization approach are effective in practice. We also show that the efficiency of SpyNB is comparable to existing simple preference mining algorithms.
A method for measuring the evolution of a topic on the Web: The case of “informetrics
- Journal of the American Society for Information Science and Technology
, 2009
"... The World Wide Web is growing at an enormous speed, and has become an indispensable source for information and research. New pages are being added to the Web, but there are additional processes as well: pages are moved or removed and/or their content changes. In order to obtain a better understandin ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
The World Wide Web is growing at an enormous speed, and has become an indispensable source for information and research. New pages are being added to the Web, but there are additional processes as well: pages are moved or removed and/or their content changes. In order to obtain a better understanding of these processes, we developed a method for tracking topics on the Web for long periods of time. We use multiple data collection methods that allow us: to discover new pages related to the topic; to identify changes to existing pages and to detect previously existing pages that have been removed or their content is not relevant anymore to the specified topic. The method is demonstrated through monitoring Web pages that contain the term “informetrics ” for a period of eight years. The data collection method also allowed us to analyze the dynamic changes in search engine coverage; here we illustrate these changes on Google, the search engine used for the longest period of time for data collection in this project.
2008), A three-year study on the freshness of Web search engine databases
, 2008
"... ..."
(Show Context)
Time series analysis of the dynamics of news websites
"... Abstract-The content of news websites changes frequently and rapidly and its relevance tends to decay with time. To be of any value to the users, tools, such as, search engines, have to cope with these evolving websites and detect in a timely manner their changes. In this paper we apply time series ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract-The content of news websites changes frequently and rapidly and its relevance tends to decay with time. To be of any value to the users, tools, such as, search engines, have to cope with these evolving websites and detect in a timely manner their changes. In this paper we apply time series analysis to study the properties and the temporal patterns of the change rates of the content of three news websites. Our investigation shows that changes are characterized by large fluctuations with periodic patterns and time dependent behavior. The time series describing the change rate is decomposed into trend, seasonal and irregular components and models of each component are then identified. The trend and seasonal components describe the daily and weekly patterns of the change rates. Trigonometric polynomials best fit these deterministic components, whereas the class of ARMA models represents the irregular component. The resulting models can be used to describe the dynamics of the changes and predict future change rates.
Chapter 2 Advanced Techniques in Web Data Pre-processing and Cleaning
"... Abstract. Central to successful e-business is the construction of web sites that attract users, capture user preferences, and entice them into making a purchase. Web mining is diverse data mining applied to categorize both the content and structure of web sites with the goal of aiding e-business. We ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. Central to successful e-business is the construction of web sites that attract users, capture user preferences, and entice them into making a purchase. Web mining is diverse data mining applied to categorize both the content and structure of web sites with the goal of aiding e-business. Web mining requires knowledge of the web site structure (hyperlink graph), the web content (vector model) and user sessions (the sequence of pages visited by each user to a site). Much of the data for web mining can be noisy. The origin of the noise comes from many sources, for example, undocumented changes to the web site structure and content, a different understanding of the text and media semantic, and web logs without individual user identification. There may not be any record of the number of times a specific page has been visited in a session as page is stored on a proxy or web browser cache. Such noise presents a challenge for web mining. This chapter presents issues with and approaches for cleaning web data in preparation for web mining analysis. The Web has become the primary communication channel for many financial, trading,
Context-aware Search for Personal Information Management Systems
"... With the fast growth of disk capacity in personal computers, keyword search over personal data (a.k.a. desktop search) is becoming increasingly important. Nonetheless, desktop search has been shown to be more challenging than traditional Web search. Modern commercial Web search engines heavily rely ..."
Abstract
- Add to MetaCart
(Show Context)
With the fast growth of disk capacity in personal computers, keyword search over personal data (a.k.a. desktop search) is becoming increasingly important. Nonetheless, desktop search has been shown to be more challenging than traditional Web search. Modern commercial Web search engines heavily rely on structural information (i.e., hyperlinks between Web pages) to rank their search results. However, such information is not available in the circumstance of desktop search. Therefore, state-of-the-art desktop search systems such as Google Desktop Search usually leverage pure textbased ranking approaches (e.g., TF-IDF), which often fail to give promising rankings due to the misinterpretation of user intention. We observed that in desktop search, the semantics of keyword queries are often context-aware, i.e., they are related to the current activity state (e.g., writing a paper, navigating a website, etc.) of the user. In this paper, we present a novel context-aware search framework by taking this activity information into consideration. Specifically, we use Hidden Markov Model (HMM) to capture the relationships between user’s access actions (e.g., opening/closing files, sending/receiving emails, etc.) and activity states. The model is learned from user’s past access history and is used to predict user’s current activity upon the submission of some keyword query. We further propose a ranking scheme with this predicted context information incorporated. Experimental evaluation demonstrates both the effectiveness of the proposed context-aware search method and the enhancement to user’s search experience. 1
SEE PROFILE
"... Harnessing user contributions and dynamic profiling to better satisfy individual information search needs ..."
Abstract
- Add to MetaCart
(Show Context)
Harnessing user contributions and dynamic profiling to better satisfy individual information search needs
UNIVERSIDAD DE CHILE Web User Behavior Analysis
, 2011
"... “Scientia vincere tenebras” ..."
(Show Context)