Results 1 - 10
of
139
Letizia: An agent that assists web browsing
- PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-95
, 1995
"... ..."
Data Preparation for Mining World Wide Web Browsing Patterns
- KNOWLEDGE AND INFORMATION SYSTEMS
, 1999
"... The World Wide Web (WWW) continues to grow at an astounding rate in both the sheer volume of tra#c and the size and complexity of Web sites. The complexity of tasks such as Web site design, Web server design, and of simply navigating through a Web site have increased along with this growth. An i ..."
Abstract
-
Cited by 567 (43 self)
- Add to MetaCart
(Show Context)
The World Wide Web (WWW) continues to grow at an astounding rate in both the sheer volume of tra#c and the size and complexity of Web sites. The complexity of tasks such as Web site design, Web server design, and of simply navigating through a Web site have increased along with this growth. An important input to these design tasks is the analysis of how a Web site is being used. Usage analysis includes straightforward statistics, such as page access frequency, as well as more sophisticated forms of analysis, such as finding the common traversal paths through a Web site. Web Usage Mining is the application of data mining techniques to usage logs of large Web data repositories in order to produce results that can be used in the design tasks mentioned above. However, there are several preprocessing tasks that must be performed prior to applying data mining algorithms to the data collected from server logs. This paper presents several data preparation techniques in order to identify unique users and user sessions. Also, a method to divide user sessions into semantically meaningful transactions is defined and successfully tested against two other methods. Transactions identified by the proposed methods are used to discover association rules from real world data using the WEBMINER system [15].
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
, 1997
"... The Rocchio relevance feedback algorithm is one of the most popular and widely applied learning methods from information retrieval. Here, a probabilistic analysis of this algorithm is presented in a text categorization framework. The analysis gives theoretical insight into the heuristics used in the ..."
Abstract
-
Cited by 456 (1 self)
- Add to MetaCart
The Rocchio relevance feedback algorithm is one of the most popular and widely applied learning methods from information retrieval. Here, a probabilistic analysis of this algorithm is presented in a text categorization framework. The analysis gives theoretical insight into the heuristics used in the Rocchio algorithm, particularly the word weighting scheme and the similarity metric. It also suggests improvements which lead to a probabilistic variant of the Rocchio classifier. The Rocchio classifier, its probabilistic variant, and a naive Bayes classifier are compared on six text categorization tasks. The results show that the probabilistic algorithms are preferable to the heuristic Rocchio classifier not only because they are more well-founded, but also because they achieve better performance.
Learning to Extract Symbolic Knowledge from the World Wide Web
, 1998
"... The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable world wide knowledge base whose content mirrors that of the World Wide Web. Such a ..."
Abstract
-
Cited by 403 (29 self)
- Add to MetaCart
(Show Context)
The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable world wide knowledge base whose content mirrors that of the World Wide Web. Such a
WebWatcher: A Tour Guide for the World Wide Web
- PROCEEDINGS OF IJCAI97
, 1997
"... We explore the notion of a tour guide software agent for assisting users browsing the World Wide Web. A Web tour guide agent provides assistance similar to that provided by ahuman tour guide in a museum -- it guides the user along an appropriate path through the collection, based on its knowledge of ..."
Abstract
-
Cited by 359 (8 self)
- Add to MetaCart
We explore the notion of a tour guide software agent for assisting users browsing the World Wide Web. A Web tour guide agent provides assistance similar to that provided by ahuman tour guide in a museum -- it guides the user along an appropriate path through the collection, based on its knowledge of the user's interests, of the location and relevance of various items in the collection, and of the way in which others have interacted with the collection in the past. This paper describes a simple but operational tour guide, called Web-Watcher, which has given over 5000 tours to people browsing CMU's School of Computer Science Web pages. WebWatcher accompanies users from page to page, suggests appropriate hyperlinks, and learns from experience to improve its advice-giving skills. We describe the learning algorithms used by WebWatcher, experimental results showing their effectiveness, and lessons learned from this case study in Web tour guide agents.
Learning to Construct Knowledge Bases from the World Wide Web
, 2000
"... The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable knowledge base whose content mirrors that of the World Wide Web. Such a knowledge base would ena ..."
Abstract
-
Cited by 242 (5 self)
- Add to MetaCart
(Show Context)
The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable knowledge base whose content mirrors that of the World Wide Web. Such a knowledge base would enable much more effective retrieval of Web information, and promote new uses of the Web to support knowledge-based inference and problem solving. Our approach is to develop a trainable information extraction system that takes two inputs. The first is an ontology that defines the classes (e.g., company, person, employee, product) and relations (e.g., employed_by, produced_by) of interest when creating the knowledge base. The second is a set of training data consisting of labeled regions of hypertext that represent instances of these classes and relations. Given these inputs, the system learns to extract information from other pages and hyperlinks on the Web. This article describes our general a...
WebMate: A Personal Agent for Browsing and Searching
- In Proceedings of the Second International Conference on Autonomous Agents
, 1998
"... The World-Wide Web is developing very fast. Currently, finding useful information on the Web is a time consuming process. In this paper, we present WebMate, an agent that helps users to effectively browse and search the Web. WebMate extends the state of the art in Web-based information retrieval in ..."
Abstract
-
Cited by 239 (10 self)
- Add to MetaCart
(Show Context)
The World-Wide Web is developing very fast. Currently, finding useful information on the Web is a time consuming process. In this paper, we present WebMate, an agent that helps users to effectively browse and search the Web. WebMate extends the state of the art in Web-based information retrieval in many ways. First, it uses multiple TF-IDF vectors to keep track of user interests in different domains. These domains are automatically learned by WebMate. Second, WebMate uses the Trigger Pair Model to automatically extract keywords for refining document search. Third, during search, the user can provide multiple pages as similarity/relevance guidance for the search. The system extracts and combines relevant keywords from these relevant pages and uses them for keyword refinement. Using these techniques, WebMate provides effective browsing and searching help and also compiles and sends to users personal newspaper by automatically spiding news sources. We have experimentally evaluated the per...
Autonomous Interface Agents
, 1997
"... Two branches of the trend towards "agents" that are gaining currency are interface agents, software that actively assists a user in operating an interactive interface, and autonomous agents, software that takes action without user intervention and operates concurrently, either while the us ..."
Abstract
-
Cited by 217 (11 self)
- Add to MetaCart
(Show Context)
Two branches of the trend towards "agents" that are gaining currency are interface agents, software that actively assists a user in operating an interactive interface, and autonomous agents, software that takes action without user intervention and operates concurrently, either while the user is idle or taking other actions. These two branches are related, but not identical, and are often lumped together under the single term "agent". Much agent work can be classified as either being an interface agent, but not autonomous, or as an autonomous agent, but not operating directly in the interface. We show why it is important to have agents that are both interface agents and autonomous agents. We explore some design principles for such agents, and illustrate these principles with a description of Letizia, an autonomous interface agent that makes real-time suggestions for Web pages that a user might be interested in browsing. Keywords Agents, interface agents, autonomous agents, Web, browsi...
One-Class SVMs for Document Classification
- Journal of Machine Learning Research
, 2001
"... We implemented versions of the SVM appropriate for one-class classification in the context of information retrieval. The experiments were conducted on the standard Reuters data set. For the SVM implementation we used both a version of Schölkopf et al. and a somewhat different version of one-class SV ..."
Abstract
-
Cited by 185 (3 self)
- Add to MetaCart
(Show Context)
We implemented versions of the SVM appropriate for one-class classification in the context of information retrieval. The experiments were conducted on the standard Reuters data set. For the SVM implementation we used both a version of Schölkopf et al. and a somewhat different version of one-class SVM based on identifying “outlier ” data as representative of the second-class. We report on experiments with different kernels for both of these implementations and with different representations of the data, including binary vectors, tf-idf representation and a modification called “Hadamard ” representation. Then we compared it with one-class versions of the algorithms prototype (Rocchio), nearest neighbor, naive Bayes, and finally a natural one-class neural network classification method based on “bottleneck” compression generated filters. The SVM approach as represented by Schölkopf was superior to all the methods except the neural network one, where it was, although occasionally worse, essentially comparable. However, the SVM methods turned out to be quite sensitive to the choice of representation and kernel in ways which are not well understood; therefore, for the time being leaving the neural network approach as the most robust.
An Adaptive Web Page Recommendation Service
, 1997
"... An adaptive recommendation service seeks to adapt to its users, providing increasingly personalized recommendations over time. In this paper we introduce the "Fab" adaptive web page recommendation service. There has been much research on analyzing document content in order to improve recom ..."
Abstract
-
Cited by 124 (0 self)
- Add to MetaCart
An adaptive recommendation service seeks to adapt to its users, providing increasingly personalized recommendations over time. In this paper we introduce the "Fab" adaptive web page recommendation service. There has been much research on analyzing document content in order to improve recommendations or search results. More recently researchers have begun to explore how the similarities between users can be exploited to the same ends. The Fab system strikes a balance between these two approaches, taking advantage of the shared interests among users without losing the benefits of the representations provided by content analysis. Running since March 1996, it has been populated with a collection of agents for the collection and selection of web pages, whose interaction fosters emergent collaborative properties. In this paper we explain the design of the system architecture and report the results of our first experiment, evaluating recommendations provided to a group of test users. 1 Introd...