Results 1 -
5 of
5
Topical web crawlers: Evaluating adaptive algorithms
- ACM Transactions on Internet Technology
, 2004
"... Topical crawlers are increasingly seen as a way to address the scalability limitations of universal search engines, by distributing the crawling process across users, queries, or even client computers. The context available to such crawlers can guide the navigation of links with the goal of efficien ..."
Abstract
-
Cited by 35 (11 self)
- Add to MetaCart
Topical crawlers are increasingly seen as a way to address the scalability limitations of universal search engines, by distributing the crawling process across users, queries, or even client computers. The context available to such crawlers can guide the navigation of links with the goal of efficiently locating highly relevant target pages. We developed a framework to fairly evaluate topical crawling algorithms under a number of performance metrics. Such a framework is employed here to evaluate different algorithms that have proven highly competitive among those proposed in the literature and in our own previous research. In particular we focus on the tradeoff between exploration and exploitation of the cues available to a crawler, and on adaptive crawlers that use machine learning techniques to guide their search. We find that the best performance is achieved by a novel combination of explorative and exploitative bias, and introduce an evolutionary crawler that surpasses the performance of the best non-adaptive crawler after sufficiently long crawls. We also analyze the computational complexity of the various crawlers and discuss how performance and complexity scale with available resources. Evolutionary crawlers achieve high efficiency and scalability by distributing the work across concurrent agents, resulting in the best performance/cost ratio.
Personalized and Focused Web Spiders
- YAO (EDS.), WEB INTELLIGENCE
, 2003
"... As the size of the Web continues to grow, searching it for useful information has become increasingly difficult. Researchers have studied different ways to search the Web automatically using programs that have been known as spiders, crawlers, Web robots, Web agents, Webbots, etc. In this chapter, ..."
Abstract
-
Cited by 11 (7 self)
- Add to MetaCart
As the size of the Web continues to grow, searching it for useful information has become increasingly difficult. Researchers have studied different ways to search the Web automatically using programs that have been known as spiders, crawlers, Web robots, Web agents, Webbots, etc. In this chapter, we will review research in this area, present two case studies, and suggest some future research directions.
The Use of Dynamic Contexts to Improve Casual Internet Searching
- ACM TRANSACTIONS ON INFORMATION SYSTEMS
, 2003
"... ..."
World Wide Web Search Technologies
"... With over 800 million pages covering most areas of human endeavor, the World Wide Web is fertile ground for information retrieval. Numerous search technologies have been applied to Web searches, and the dominant search method has yet to be identified. This chapter provides an overview of existing ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
With over 800 million pages covering most areas of human endeavor, the World Wide Web is fertile ground for information retrieval. Numerous search technologies have been applied to Web searches, and the dominant search method has yet to be identified. This chapter provides an overview of existing Web search technologies and classifies them into six categories: (i) hyperlink exploration, (ii) information retrieval, (iii) metasearches, (iv) SQL approaches, (v) content-based multimedia searches, and (vi) others. A comparative study of some major commercial and experimental search services is presented, and some future research directions for Web searches are suggested.
An Agent for Web Information Dissemination
, 2003
"... We present and evaluate the architecture of a personal agent that mines web information sources and retrieves documents according to user's interests. The agent represents and retrieves documents using classical information retrieval techniques and uses user's feedback and genetic algorithms to ..."
Abstract
- Add to MetaCart
We present and evaluate the architecture of a personal agent that mines web information sources and retrieves documents according to user's interests. The agent represents and retrieves documents using classical information retrieval techniques and uses user's feedback and genetic algorithms to learn and adapt to changes in user's interests. The agent was customized to help professionals that need to be up to date with the Brazilian Tax Legislation, stored at the Brazilian Federal Revenue site. Experiments discussed on the paper shown that the agent has good precision and recall rates when retrieving documents, and is also able to successfully detect changes in user's interests.

