Results 1 -
8 of
8
A method for measuring the evolution of a topic on the Web: The case of “informetrics
- Journal of the American Society for Information Science and Technology
, 2009
"... The World Wide Web is growing at an enormous speed, and has become an indispensable source for information and research. New pages are being added to the Web, but there are additional processes as well: pages are moved or removed and/or their content changes. In order to obtain a better understandin ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
The World Wide Web is growing at an enormous speed, and has become an indispensable source for information and research. New pages are being added to the Web, but there are additional processes as well: pages are moved or removed and/or their content changes. In order to obtain a better understanding of these processes, we developed a method for tracking topics on the Web for long periods of time. We use multiple data collection methods that allow us: to discover new pages related to the topic; to identify changes to existing pages and to detect previously existing pages that have been removed or their content is not relevant anymore to the specified topic. The method is demonstrated through monitoring Web pages that contain the term “informetrics ” for a period of eight years. The data collection method also allowed us to analyze the dynamic changes in search engine coverage; here we illustrate these changes on Google, the search engine used for the longest period of time for data collection in this project.
Semantic Web Mining and the Representation, Analysis, and Evolution of Web Space
- In Proceedings of RAWS 2005 Workshop
, 2005
"... Abstract. Semantic Web Mining aims at combining the two fast-deve-loping research areas Semantic Web and Web Mining. This survey an-alyzes the convergence of trends from both areas: Growing numbers of researchers work on improving the results of Web Mining by exploiting semantic structures in the We ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Semantic Web Mining aims at combining the two fast-deve-loping research areas Semantic Web and Web Mining. This survey an-alyzes the convergence of trends from both areas: Growing numbers of researchers work on improving the results of Web Mining by exploiting semantic structures in the Web, and they use Web Mining techniques for building the Semantic Web. Last but not least, these techniques can be used for mining the Semantic Web itself. The second aim of this pa-per is to use these concepts to circumscribe what Web space is, what it represents and how it can be represented and analyzed. This is used to sketch the role that Semantic Web Mining and the software agents and human agents involved in it can play in the evolution of Web space. 1
Understanding Content Reuse on the Web: Static and Dynamic Analyses
"... Abstract. In this paper we present static and dynamic studies of duplicate and near-duplicate documents in the Web. The static and dynamic studies involve the analysis of similar content among pages within a given snapshot of the Web and how pages in an old snapshot are reused to compose new documen ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Abstract. In this paper we present static and dynamic studies of duplicate and near-duplicate documents in the Web. The static and dynamic studies involve the analysis of similar content among pages within a given snapshot of the Web and how pages in an old snapshot are reused to compose new documents in a more recent snapshot. We ran a series of experiments using four snapshots of the Chilean Web. In the static study, we identify duplicates in both parts of the Web graph – reachable (connected by links) and unreachable components (unconnected) – aiming to identify where duplicates occur more frequently. We show that the number of duplicates in the Web seems to be much higher than previously reported (about 50 % higher) and in our data the duplicated in the unreachable Web is 74,6 % higher than the number of duplicates in the reachable component of the Web graph. In the dynamic study, we show that some of the old content is used to compose new pages. If a page in a newer snapshot has content of a page in an older snapshot, we say that the source is a parent of the new page. We state the hypothesis that people use search engines to find pages and republish their content as a new document. We present evidences that this happens for part of the pages that have parents. In this case, part of the Web content is biased by the ranking function of search engines. 1
unknown title
"... An approach to semantic indexing and information retrieval Una aproximación a la indexación semántica y a la recuperación de información ..."
Abstract
- Add to MetaCart
(Show Context)
An approach to semantic indexing and information retrieval Una aproximación a la indexación semántica y a la recuperación de información
Beijing, China Genealogical Trees on the Web: A Search Engine User Perspective
"... This paper presents an extensive study about the evolution of textual content on the Web, which shows how some new pages are created from scratch while others are created using already existing content. We show that a significant fraction of the Web is a byproduct of the latter case. We introduce th ..."
Abstract
- Add to MetaCart
(Show Context)
This paper presents an extensive study about the evolution of textual content on the Web, which shows how some new pages are created from scratch while others are created using already existing content. We show that a significant fraction of the Web is a byproduct of the latter case. We introduce the concept of Web genealogical tree, in which every page in a Web snapshot is classified into a component. We study in detail these components, characterizing the copies and identifying the relation between a source of content and a search engine, by comparing page relevance measures, documents returned by real queries performed in the past, and click-through data. We observe that sources of copies are more frequently returned by queries and more clicked than other documents.
“CRAWL.PL” Measuring Statistical and Structural Properties of the Polish Web.
"... Abstract. This document summarizes the results of an experiment made in the ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. This document summarizes the results of an experiment made in the
ABSTRACT Genealogical Trees on the Web: A Search Engine User Perspective
"... This paper presents an extensive study about the evolution of textual content on the Web, which shows how some new pages are created from scratch while others are created using already existing content. We show that a significant fraction of the Web is a byproduct of the latter case. We introduce th ..."
Abstract
- Add to MetaCart
(Show Context)
This paper presents an extensive study about the evolution of textual content on the Web, which shows how some new pages are created from scratch while others are created using already existing content. We show that a significant fraction of the Web is a byproduct of the latter case. We introduce the concept of Web genealogical tree, in which every page in a Web snapshot is classified into a component. We study in detail these components, characterizing the copies and identifying the relation between a source of content and a search engine, by comparing page relevance measures, documents returned by real queries performed in the past, and click-through data. We observe that sources of copies are more frequently returned by queries and more clicked than other documents.
Semantic Web Mining and the Representation, Semantic Analysis, and Evolution of Web Space
"... Semantic Web Mining aims at combining the two fast-developing research areas Semantic Web and Web Mining. This survey analyzes the convergence of trends from both areas: Growing numbers of researchers work on improving the results of Web Mining by exploiting semantic structures in the Web, and they ..."
Abstract
- Add to MetaCart
Semantic Web Mining aims at combining the two fast-developing research areas Semantic Web and Web Mining. This survey analyzes the convergence of trends from both areas: Growing numbers of researchers work on improving the results of Web Mining by exploiting semantic structures in the Web, and they use Web Mining techniques for building the Semantic Web. Last but not least, these techniques can be used for mining the Semantic Web itself. The second aim of this paper is to use these concepts to circumscribe what Web space is, what it represents and how it can be represented and analyzed. This is used to sketch the role that Semantic Web Mining and the software agents and human agents involved in it can play in the evolution of Web space.