Results 1 - 10
of
372
Optimizing Search Engines using Clickthrough Data
, 2002
"... This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches ..."
Abstract
-
Cited by 568 (20 self)
- Add to MetaCart
This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches to learning retrieval functions from examples exist, they typically require training data generated from relevance judgments by experts. This makes them difficult and expensive to apply. The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking. Such clickthrough data is available in abundance and can be recorded at very low cost. Taking a Support Vector Machine (SVM) approach, this paper presents a method for learning retrieval functions. From a theoretical perspective, this method is shown to be well-founded in a risk minimization framework. Furthermore, it is shown to be feasible even for large sets of queries and features. The theoretical results are verified in a controlled experiment. It shows that the method can effectively adapt the retrieval function of a meta-search engine to a particular group of users, outperforming Google in terms of retrieval quality after only a couple of hundred training examples.
Data Preparation for Mining World Wide Web Browsing Patterns
- KNOWLEDGE AND INFORMATION SYSTEMS
, 1999
"... The World Wide Web (WWW) continues to grow at an astounding rate in both the sheer volume of tra#c and the size and complexity of Web sites. The complexity of tasks such as Web site design, Web server design, and of simply navigating through a Web site have increased along with this growth. An i ..."
Abstract
-
Cited by 367 (39 self)
- Add to MetaCart
The World Wide Web (WWW) continues to grow at an astounding rate in both the sheer volume of tra#c and the size and complexity of Web sites. The complexity of tasks such as Web site design, Web server design, and of simply navigating through a Web site have increased along with this growth. An important input to these design tasks is the analysis of how a Web site is being used. Usage analysis includes straightforward statistics, such as page access frequency, as well as more sophisticated forms of analysis, such as finding the common traversal paths through a Web site. Web Usage Mining is the application of data mining techniques to usage logs of large Web data repositories in order to produce results that can be used in the design tasks mentioned above. However, there are several preprocessing tasks that must be performed prior to applying data mining algorithms to the data collected from server logs. This paper presents several data preparation techniques in order to identify unique users and user sessions. Also, a method to divide user sessions into semantically meaningful transactions is defined and successfully tested against two other methods. Transactions identified by the proposed methods are used to discover association rules from real world data using the WEBMINER system [15].
A Roadmap of Agent Research and Development
- INT JOURNAL OF AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS
, 1998
"... This paper provides an overview of research and development activities in the field of autonomous agents and multi-agent systems. It aims to identify key concepts and applications, and to indicate how they relate to one-another. Some historical context to the field of agent-based computing is give ..."
Abstract
-
Cited by 331 (8 self)
- Add to MetaCart
This paper provides an overview of research and development activities in the field of autonomous agents and multi-agent systems. It aims to identify key concepts and applications, and to indicate how they relate to one-another. Some historical context to the field of agent-based computing is given, and contemporary research directions are presented. Finally, a range of open issues and future challenges are highlighted.
WebWatcher: A Tour Guide for the World Wide Web
- PROCEEDINGS OF IJCAI97
, 1997
"... We explore the notion of a tour guide software agent for assisting users browsing the World Wide Web. A Web tour guide agent provides assistance similar to that provided by ahuman tour guide in a museum -- it guides the user along an appropriate path through the collection, based on its knowledge of ..."
Abstract
-
Cited by 290 (7 self)
- Add to MetaCart
We explore the notion of a tour guide software agent for assisting users browsing the World Wide Web. A Web tour guide agent provides assistance similar to that provided by ahuman tour guide in a museum -- it guides the user along an appropriate path through the collection, based on its knowledge of the user's interests, of the location and relevance of various items in the collection, and of the way in which others have interacted with the collection in the past. This paper describes a simple but operational tour guide, called Web-Watcher, which has given over 5000 tours to people browsing CMU's School of Computer Science Web pages. WebWatcher accompanies users from page to page, suggests appropriate hyperlinks, and learns from experience to improve its advice-giving skills. We describe the learning algorithms used by WebWatcher, experimental results showing their effectiveness, and lessons learned from this case study in Web tour guide agents.
Learning to Extract Symbolic Knowledge from the World Wide Web
, 1998
"... The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable world wide knowledge base whose content mirrors that of the World Wide Web. Such a ..."
Abstract
-
Cited by 290 (24 self)
- Add to MetaCart
The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable world wide knowledge base whose content mirrors that of the World Wide Web. Such a
A Scalable Comparison-Shopping Agent for the World-Wide Web
- In Proceedings of the First International Conference on Autonomous Agents
, 1997
"... The Web is less agent-friendly than we might hope. Most information on the Web is presented in loosely structured natural language text with no agent-readable semantics. HTML annotations structure the display of Web pages, but provide virtually no insight into their content. Thus, the designers of i ..."
Abstract
-
Cited by 279 (18 self)
- Add to MetaCart
The Web is less agent-friendly than we might hope. Most information on the Web is presented in loosely structured natural language text with no agent-readable semantics. HTML annotations structure the display of Web pages, but provide virtually no insight into their content. Thus, the designers of intelligent Web agents need to address the following questions: (1) To what extent can an agent understand information published at Web sites? (2) Is the agent's understanding sufficient to provide genuinely useful assistance to users? (3) Is site-specific hand-coding necessary, or can the agent automatically extract information from unfamiliar Web sites? (4) What aspects of the Web facilitate this competence? In this paper we investigate these issues with a case study using the ShopBot. ShopBot is a fullyimplemented, domain-independent comparison-shopping agent. Given the home pages of several on-line stores, ShopBot autonomously learns how to shop at those vendors. After its learning is com...
Software agents: An overview
- Knowledge Engineering Review
, 1996
"... Agent software is a rapidly developing area of research. However, the overuse of the word ‘agent ’ has tended to mask the fact that, in reality, there is a truly heterogeneous body of research being carried out under this banner. This overview paper presents a typology of agents. Next, it places age ..."
Abstract
-
Cited by 272 (4 self)
- Add to MetaCart
Agent software is a rapidly developing area of research. However, the overuse of the word ‘agent ’ has tended to mask the fact that, in reality, there is a truly heterogeneous body of research being carried out under this banner. This overview paper presents a typology of agents. Next, it places agents in context, defines them and then goes on, inter alia, to overview critically the rationales, hypotheses, goals, challenges and state-of-the-art demonstrators of the various agent types in our typology. Hence, it attempts to make explicit much of what is usually implicit in the agents literature. It also proceeds to overview some other general issues which pertain to all the types of agents in the typology. This paper largely reviews software agents, and it also contains some strong opinions that are not necessarily widely accepted by the agent community. 1 1
Learning and Revising User Profiles: The Identification of Interesting Web Sites
- Machine Learning
, 1997
"... . We discuss algorithms for learning and revising user profiles that can determine which World Wide Web sites on a given topic would be interesting to a user. We describe the use of a naive Bayesian classifier for this task, and demonstrate that it can incrementally learn profiles from user feedback ..."
Abstract
-
Cited by 228 (14 self)
- Add to MetaCart
. We discuss algorithms for learning and revising user profiles that can determine which World Wide Web sites on a given topic would be interesting to a user. We describe the use of a naive Bayesian classifier for this task, and demonstrate that it can incrementally learn profiles from user feedback on the interestingness of Web sites. Furthermore, the Bayesian classifier may easily be extended to revise user provided profiles. In an experimental evaluation we compare the Bayesian classifier to computationally more intensive alternatives, and show that it performs at least as well as these approaches throughout a range of different domains. In addition, we empirically analyze the effects of providing the classifier with background knowledge in form of user defined profiles and examine the use of lexical knowledge for feature selection. We find that both approaches can substantially increase the prediction accuracy. Keywords: Information filtering, intelligent agents, multistrategy lea...
Learning to Construct Knowledge Bases from the World Wide Web
, 2000
"... The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable knowledge base whose content mirrors that of the World Wide Web. Such a knowledge base would ena ..."
Abstract
-
Cited by 187 (3 self)
- Add to MetaCart
The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable knowledge base whose content mirrors that of the World Wide Web. Such a knowledge base would enable much more effective retrieval of Web information, and promote new uses of the Web to support knowledge-based inference and problem solving. Our approach is to develop a trainable information extraction system that takes two inputs. The first is an ontology that defines the classes (e.g., company, person, employee, product) and relations (e.g., employed_by, produced_by) of interest when creating the knowledge base. The second is a set of training data consisting of labeled regions of hypertext that represent instances of these classes and relations. Given these inputs, the system learns to extract information from other pages and hyperlinks on the Web. This article describes our general a...
WebMate: A Personal Agent for Browsing and Searching
- In Proceedings of the Second International Conference on Autonomous Agents
, 1998
"... The World-Wide Web is developing very fast. Currently, finding useful information on the Web is a time consuming process. In this paper, we present WebMate, an agent that helps users to effectively browse and search the Web. WebMate extends the state of the art in Web-based information retrieval in ..."
Abstract
-
Cited by 164 (9 self)
- Add to MetaCart
The World-Wide Web is developing very fast. Currently, finding useful information on the Web is a time consuming process. In this paper, we present WebMate, an agent that helps users to effectively browse and search the Web. WebMate extends the state of the art in Web-based information retrieval in many ways. First, it uses multiple TF-IDF vectors to keep track of user interests in different domains. These domains are automatically learned by WebMate. Second, WebMate uses the Trigger Pair Model to automatically extract keywords for refining document search. Third, during search, the user can provide multiple pages as similarity/relevance guidance for the search. The system extracts and combines relevant keywords from these relevant pages and uses them for keyword refinement. Using these techniques, WebMate provides effective browsing and searching help and also compiles and sends to users personal newspaper by automatically spiding news sources. We have experimentally evaluated the per...

