Results 1 - 10
of
16
Topical web crawlers: Evaluating adaptive algorithms
- ACM Transactions on Internet Technology
, 2004
"... Topical crawlers are increasingly seen as a way to address the scalability limitations of universal search engines, by distributing the crawling process across users, queries, or even client computers. The context available to such crawlers can guide the navigation of links with the goal of efficien ..."
Abstract
-
Cited by 35 (11 self)
- Add to MetaCart
Topical crawlers are increasingly seen as a way to address the scalability limitations of universal search engines, by distributing the crawling process across users, queries, or even client computers. The context available to such crawlers can guide the navigation of links with the goal of efficiently locating highly relevant target pages. We developed a framework to fairly evaluate topical crawling algorithms under a number of performance metrics. Such a framework is employed here to evaluate different algorithms that have proven highly competitive among those proposed in the literature and in our own previous research. In particular we focus on the tradeoff between exploration and exploitation of the cues available to a crawler, and on adaptive crawlers that use machine learning techniques to guide their search. We find that the best performance is achieved by a novel combination of explorative and exploitative bias, and introduce an evolutionary crawler that surpasses the performance of the best non-adaptive crawler after sufficiently long crawls. We also analyze the computational complexity of the various crawlers and discuss how performance and complexity scale with available resources. Evolutionary crawlers achieve high efficiency and scalability by distributing the work across concurrent agents, resulting in the best performance/cost ratio.
MySpiders : Evolve your own intelligent Web crawlers
, 2002
"... The dynamic nature of the World Wide Web makes it a challenge to find information that is both relevant and recent. Intelligent agents can complement the power of search engines to meet this challenge. We present a Web tool called MySpiders, which implements an evolutionary algorithms managing a pop ..."
Abstract
-
Cited by 24 (8 self)
- Add to MetaCart
The dynamic nature of the World Wide Web makes it a challenge to find information that is both relevant and recent. Intelligent agents can complement the power of search engines to meet this challenge. We present a Web tool called MySpiders, which implements an evolutionary algorithms managing a population of adaptive crawlers who browse the Web autonomously. Each agent acts as an intelligent client on behalf of the user, driven by a user query and by textual and linkage clues in the crawled pages. Agents autonomously decide which links to follow, which clues to internalize, when to spawn o#spring to focus the search near a relevant source, and when to starve. The tool is available to the public as a threaded Java applet. We discuss the development and deployment of such a system. 1
Efficient and Scalable Pareto Optimization by Evolutionary Local Selection Algorithms
, 2000
"... Local selection is a simple selection scheme in evolutionary computation. Individual fitnesses are accumulated over time and compared to a fixed threshold, rather than to each other, to decide who gets to reproduce. Local selection, coupled with fitness functions stemming from the consumption of ..."
Abstract
-
Cited by 20 (9 self)
- Add to MetaCart
Local selection is a simple selection scheme in evolutionary computation. Individual fitnesses are accumulated over time and compared to a fixed threshold, rather than to each other, to decide who gets to reproduce. Local selection, coupled with fitness functions stemming from the consumption of finite shared environmental resources, maintains diversity in a way similar to fitness sharing. However, it is more efficient than fitness sharing and lends itself to parallel implementations for distributed tasks. While local selection is not prone to premature convergence, it applies minimal selection pressure to the population.
Complementing Search Engines with Online Web Mining Agents
, 2002
"... While search engines have become the major decision support tools for the Internet, there is a growing disparity between the image of the World Wide Web stored in search engine repositories and the actual dynamic, distributed nature of Web data. We propose to attack this problem using an adaptive po ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
While search engines have become the major decision support tools for the Internet, there is a growing disparity between the image of the World Wide Web stored in search engine repositories and the actual dynamic, distributed nature of Web data. We propose to attack this problem using an adaptive population of intelligent agents mining the Web online at query time. We discuss the benefits and shortcomings of using dynamic search strategies versus the traditional static methods in which search and retrieval are disjoint. This paper presents a public Web intelligence tool called MySpiders, a threaded multiagent system designed for information discovery. The performance of the system is evaluated by comparing its effectiveness in locating recent, relevant documents with that of search engines. We present results suggesting that augmenting search engines with adaptive populations of intelligent search agents can lead to a significant competitive advantage. We also discuss some of the challenges of evaluating such a system on current Web data, introduce three novel metrics for this purpose, and outline some of the lessons learned in the process.
Evolving Heterogeneous Neural Agents by Local Selection
, 2000
"... Evolutionary algorithms have been appied to the synthesis of neural architectures... ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
Evolutionary algorithms have been appied to the synthesis of neural architectures...
Topic-Driven Crawlers: Machine Learning Issues
- ACM TOIT, Submitted
, 2002
"... Topic driven crawlers are increasingly seen as a way to address the scalability limitations of universal search engines, by distributing the crawling process across users, queries, or even client computers. ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Topic driven crawlers are increasingly seen as a way to address the scalability limitations of universal search engines, by distributing the crawling process across users, queries, or even client computers.
Embodiment of Evolutionary Computation in General Agents
, 2001
"... Holland's Adaptation in Natural and Artificial Systems largely dealt with how systems, comprised of many self-interested entities, can and should adapt as a whole. This seminal book led to the last 25 years of work in genetic algorithms (GAs), and related forms of evolutionary computation (EC). I ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Holland's Adaptation in Natural and Artificial Systems largely dealt with how systems, comprised of many self-interested entities, can and should adapt as a whole. This seminal book led to the last 25 years of work in genetic algorithms (GAs), and related forms of evolutionary computation (EC). In recent years, the expansions of the Internet, other telecommunications technologies, and other large scale networks, have led to a world where large numbers of semi-autonomous software entities (i.e., agents) will be interacting in an open, universal system. This development cast the importance of Holland's legacy in a new light. This paper argues that Holland's fundamental arguments, and the years of developments that have followed, have a direct impact on systems of general network agents, regardless of whether they explicitly exploit EC. However, it also argues that the techniques and theories of EC cannot be directly transferred to the world of general (rather than EC-specific) agents, without examination of e#ects that are embodied in general software agents. This paper introduces a framework for EC interchanges between general-purpose software agents. Preliminary results are shown that illustrate the EC e#ects of asynchronous actions of agents within this framework.
Complementing Search Engines with Online Web Mining Agents
, 2000
"... There is a mismatch between the static image of the World Wide Web stored in search engine repositories and the actual dynamic, distributed nature of Web data. We propose to attack this problem using an adaptive population of intelligent agents mining the Web online at query time. We discuss the ben ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
There is a mismatch between the static image of the World Wide Web stored in search engine repositories and the actual dynamic, distributed nature of Web data. We propose to attack this problem using an adaptive population of intelligent agents mining the Web online at query time. We discuss the benefits and shortcomings of using dynamic search strategies versus the traditional static methods in which search and retrieval are disjoint. This paper presents a new implementation of InfoSpiders, a threaded multi-agent system designed for information discovery. The performance of the system is evaluated by comparing its effectiveness in locating recent relevant documents with that of search engines. Preliminary results suggest that augmenting search engines with adaptive populations of intelligent search agents can lead to a competitiveadvantage. We also discuss some of the challenges of evaluating such a system on actual, current Web data and outline some of the lessons learned in th...
Nootropia: a Self-Organising Agent for Adaptive Document Filtering
"... This paper presents Nootropia, a self-organising information agent, capable of evaluating documents according to a user’s multiple and changing interests. In Nootropia, a hierarchical term network that takes into account term dependencies is used to represent a user’s multiple topics of interest. No ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
This paper presents Nootropia, a self-organising information agent, capable of evaluating documents according to a user’s multiple and changing interests. In Nootropia, a hierarchical term network that takes into account term dependencies is used to represent a user’s multiple topics of interest. Non-linear document evaluation is established on that network based on a directed spreading activation model. We then introduce a process for adjusting the network in response to changes in user feedback. We argue that Nootropia exhibits self-organising characteristics, which, as demonstrated experimentally, allow Nootropia to adapt to a variety of simulated interest changes. 1.
Learnable Crawling: An Efficient Approach to Topic-specific Web Resource Discovery
, 2002
"... The rapid growth of the Internet has put us into trouble when we need to find information in such a large network of databases. At present, using topic-specific web crawler becomes a way to seek the needed information. The main characteristic of a topic-specific web crawler is to select and retrieve ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The rapid growth of the Internet has put us into trouble when we need to find information in such a large network of databases. At present, using topic-specific web crawler becomes a way to seek the needed information. The main characteristic of a topic-specific web crawler is to select and retrieve only relevant web pages in each crawling process. There are many previous researches focusing on the topic-specific web crawling. However, no one has ever mentioned about how the crawler does during the next crawlings. In this paper, we present an algorithm that covers the detail of both the first and the next crawlings. For efficient result of the next crawling, we keep the log of previous crawling to build some knowledge bases: seed URLs, topic keywords and URL prediction. These knowledge bases are used to build the experience of the topic-specific web crawler to produce the result of the next crawling in a more efficient way.

