Results 1 -
5 of
5
Human performance on clustering web pages: a preliminary study
- In Proceedings of The Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98
, 1998
"... With the increase in information on the World Wide Web it has become difficult to quickly find desired information without using multiple queries or using a topic-specific search engine. One way to help in the search is by grouping HTML pages together that appear in some way to be related. In order ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
With the increase in information on the World Wide Web it has become difficult to quickly find desired information without using multiple queries or using a topic-specific search engine. One way to help in the search is by grouping HTML pages together that appear in some way to be related. In order to better understand this task, we performed an initial study of human clustering of web pages, in the hope that it would provide some insight into the difficulty of automating this task. Our results show that subjects did not cluster identically; in fact, on average, any two subjects had little similarity in their web-page clusters. We also found that subjects generally created rather small clusters, and those with access only to URLs created fewer clusters than those with access to the full text of each web page. Generally the overlap of documents between clusters for any given subject increased when given the full text, as did the percentage of documents clustered. When analyzing individual subjects, we found that each had different behavior across queries, both in terms of overlap, size of clusters, and number of clusters. These results provide a sobering note on any quest for a single clearly correct clustering method for web pages.
Using Mobile Crawlers to Search the Web Efficiently
- International Journal of Computer and Information Science
, 2000
"... Due to the enormous rowth of the World Wide Web, search engines have become indispensable tools for Web navigation. In order to provide powerful search facilities, search engines maintain comprehensive indices for documents and their contents on the Web by continuously downloading Web pages for ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Due to the enormous rowth of the World Wide Web, search engines have become indispensable tools for Web navigation. In order to provide powerful search facilities, search engines maintain comprehensive indices for documents and their contents on the Web by continuously downloading Web pages for processing. In this paper, we demonstrate an alternative, more efficient approach to the "download-first process-later" strategy of existing search engines by using mobile crawlers. The major advantage of the mobile approach is that the analysis portion of the crawling process is done locally where the data resides rather than remotely inside the Web search engine. This can significantly reduce network load which, in turn, can improve the performance of the crawling process.
An Approach to Mobile Software Robots for the WWW
- IEEE Transactions on Knowledge and Data Engineering
, 1999
"... The paper describes a framework for developing mobile software robots by using the Planet mobile object system, whichischaracterized by language-neutral layered architecture, the native code execution of mobile objects, and asynchronous object passing. We propose an approach to implementing mobile W ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
The paper describes a framework for developing mobile software robots by using the Planet mobile object system, whichischaracterized by language-neutral layered architecture, the native code execution of mobile objects, and asynchronous object passing. We propose an approach to implementing mobile Web search robots that takes full advantage of these characteristics, and we base our discussion of its efffectiveness on experiments conducted in the Internet environment.
Human performance on clustering web pages
, 1998
"... With the increase in information on the World Wide Web it has become difficult to find desired information quickly without using multiple queries or using a topic-specific search engine. One way to help in the search is by grouping HTML pages together that appear in some way to be related. In order ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
With the increase in information on the World Wide Web it has become difficult to find desired information quickly without using multiple queries or using a topic-specific search engine. One way to help in the search is by grouping HTML pages together that appear in some way to be related. In order to better understand this task, we performed an initial study of human clustering of web pages, in the hope that it would provide some insight into the difficulty of automating this task. Our results show that subjects did not cluster identically; in fact, on average, any two subjects had little similarity in their web-page clusters. We also found that subjects generally created rather small clusters, and those with access only to URLs created fewer clusters than those with access to the full text of each web page. Generally the overlap of documents between clusters for any given subject increased when given the full text, as did the percentage of documents clustered. When analyzing individual subjects, we found that each had different behavior across queries, both in terms of overlap, size of clusters, and number of clusters. These results provide a sobering note on any quest for a single clearly correct clustering method for web pages. 1 1
Indexing and Searching Virtual Libraries
"... It is well known that selectivity leaves a lot to be desired in searching for information resources on the Internet with existing search systems[DESA4]. This has prompted a number of researchers to turn their attention to the development and implementation of models for indexing and searching inform ..."
Abstract
- Add to MetaCart
It is well known that selectivity leaves a lot to be desired in searching for information resources on the Internet with existing search systems[DESA4]. This has prompted a number of researchers to turn their attention to the development and implementation of models for indexing and searching information resources on the Internet. In this white paper 2 we examine briefly the results of a simple query on a number of existing search systems and then discuss two proposed index metadata structures for indexing and supporting search and discovery: the Dublin Core Elements List and the Semantic Header. Introduction Access to relevant information is one of the most important requirements of all human endeavours. This need has been recognized and has resulted in the continuing effort to describe and organize information so as to facilitate its expected discovery and ready access. An increasing number of research institutes, universities and business organizations are currently providing th...

