Results 1 - 10
of
15
A Scalable Comparison-Shopping Agent for the World-Wide Web
- In Proceedings of the First International Conference on Autonomous Agents
, 1997
"... The Web is less agent-friendly than we might hope. Most information on the Web is presented in loosely structured natural language text with no agent-readable semantics. HTML annotations structure the display of Web pages, but provide virtually no insight into their content. Thus, the designers of i ..."
Abstract
-
Cited by 279 (18 self)
- Add to MetaCart
The Web is less agent-friendly than we might hope. Most information on the Web is presented in loosely structured natural language text with no agent-readable semantics. HTML annotations structure the display of Web pages, but provide virtually no insight into their content. Thus, the designers of intelligent Web agents need to address the following questions: (1) To what extent can an agent understand information published at Web sites? (2) Is the agent's understanding sufficient to provide genuinely useful assistance to users? (3) Is site-specific hand-coding necessary, or can the agent automatically extract information from unfamiliar Web sites? (4) What aspects of the Web facilitate this competence? In this paper we investigate these issues with a case study using the ShopBot. ShopBot is a fullyimplemented, domain-independent comparison-shopping agent. Given the home pages of several on-line stores, ShopBot autonomously learns how to shop at those vendors. After its learning is com...
The World Wide Web: quagmire or gold mine?
- COMMUNICATIONS OF THE ACM
, 1996
"... This article considers the question: is effective Web mining possible? ..."
Abstract
-
Cited by 62 (0 self)
- Add to MetaCart
This article considers the question: is effective Web mining possible?
Learning to Understand Information on the Internet: An Example-Based Approach
- JOURNAL OF INTELLIGENT INFORMATION SYSTEMS
"... The explosive growth of the Web has made intelligent software assistants increasingly necessary for ordinary computer users. Both traditional approaches -- search engines, hierarchical indices -- and intelligent software agents require significant amounts of human effort to keep up with the Web. As ..."
Abstract
-
Cited by 48 (2 self)
- Add to MetaCart
The explosive growth of the Web has made intelligent software assistants increasingly necessary for ordinary computer users. Both traditional approaches -- search engines, hierarchical indices -- and intelligent software agents require significant amounts of human effort to keep up with the Web. As an alternative, we investigate the problem of automatically learning to interact with information sources on the Internet. We report on ShopBot and ILA, two implemented agents that learn to use such resources. ShopBot learns how to extract information from online vendors using only minimal knowledge about product domains. Given the home pages of several online stores, ShopBot autonomously learns how to shop at those vendors. After its learning is complete, ShopBot is able to speedily visit over a dozen software stores and CD vendors, extract product information, and summarize the results for the user. ILA learns to translate information from Internet sources into its own internal concept...
A Case-Based Approach to Knowledge Navigation
, 1994
"... 28> The problem The Find-Me systems are designed to allow a user to navigate through a set of possible solutions or products that #t their needs. The class of problems addressed by the Find-Me systems is best explained through an example: You want to rent a video. In particular, you'd like someth ..."
Abstract
-
Cited by 33 (1 self)
- Add to MetaCart
28> The problem The Find-Me systems are designed to allow a user to navigate through a set of possible solutions or products that #t their needs. The class of problems addressed by the Find-Me systems is best explained through an example: You want to rent a video. In particular, you'd like something like Back to the Future whichyou've seen and liked. Howdoyou go about #nding something? Do you wanttoseeBack to the FutureII?Doyou want to see another Michael J. Fox movie? Do you want to see Crocodile Dundee, another movie about a person dropped 1 into an unfamiliar setting? Time After Time, another time travel #lm? Who Framed Roger Rabbit?, another movie by the same director? The goal of the Find-Me project is to develop systems that deal with this sort of search problem. These problems relate to domains with the following features:
Building task-specific interfaces to high volume conversational data
- In Proceedings of CHI 1997 (Atlanta GA
, 1997
"... As people participate in the thousands of global conversations that comprise Usenet news, one thing they do is post their opinions of web resources. Phoaks is a collaborative filtering system that continuously parses, classifies, abstracts and tallies those opinions. About 3,500 users per day consul ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
As people participate in the thousands of global conversations that comprise Usenet news, one thing they do is post their opinions of web resources. Phoaks is a collaborative filtering system that continuously parses, classifies, abstracts and tallies those opinions. About 3,500 users per day consult Phoaks web pages that reflect the results. Phoaks also features a general architecture for building similar collaborative filtering interfaces to conversational data. We report here on the Phoaks resource recommendation interface, the architecture, and the issues and experience that make up its rationale. Keywords human-computer interaction, human interface, computersupported cooperative work, organizational computing, social filtering, collaborative filtering, data mining,
Text Classification in USENET Newsgroups: A Progress Report
, 1996
"... We report on our investigations into topic classification with USENET newsgroups. Our framework is to determine the newsgroup that a new document should be posted to. We train our system by forming "metadocuments " that represent each topic. We discuss our experiments with this method, and provide e ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
We report on our investigations into topic classification with USENET newsgroups. Our framework is to determine the newsgroup that a new document should be posted to. We train our system by forming "metadocuments " that represent each topic. We discuss our experiments with this method, and provide evidence that choosing particular documents or words to use in these models degrades classification accuracy. We also describe a technique called classification-based retrieval for finding documents similar to a query document. A Domain For Text Classification Most work in classification has involved articles taken off of a newswire, or from a medical database(Lewis 1992). In these cases, correct topic labels are chosen by human experts. The domain of USENET newsgroup postings is another interesting testbed for classification. The "labels" here are just the newsgroups to which the documents were originally posted. Since users of the Internet must make this classification decision every time...
Semantic Structuring and Visual Querying of Document Abstracts in Digital Libraries
- In Proc. of the Second European Conference on Research and Advanced Technology for Digital Libraries (LNCS 1513
, 1998
"... . Digital libraries offer a vast source of very different information. To enable users to fruitfully browse through a collection of documents without necessarily having to state a complex query, advanced retrieval techniques have to be developed. Those methods have to be able to structure informa ..."
Abstract
-
Cited by 7 (7 self)
- Add to MetaCart
. Digital libraries offer a vast source of very different information. To enable users to fruitfully browse through a collection of documents without necessarily having to state a complex query, advanced retrieval techniques have to be developed. Those methods have to be able to structure information in a semantic manner. This work presents some first steps in semantically organizing thematically pre-selected documents of a digital library. The semantic structure of the document collection will be expressively visualized by the proposed system. We illustrate our ideas using a database of medical abstracts from the field of oncology as a walking example. 1
Template-Based Information Mining from HTML Documents
, 1997
"... Tools for mining information from data can create added value for the Internet. As the majority of electronic documents available over the network are in unstructured textual form, extracting useful information from a document usually involves information retrieval techniques or manual processing. T ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Tools for mining information from data can create added value for the Internet. As the majority of electronic documents available over the network are in unstructured textual form, extracting useful information from a document usually involves information retrieval techniques or manual processing. This paper presents a novel approach to mining information from HTML documents using tree-structured templates. In addition to syntactic and semantic descriptions, each template is designed to capture the logical structure of a class of documents. Experiments have been conducted to extract FAQ information automatically from over one hundred HTML documents collected from the Web. Using two basic templates, the prototype FAQ Miner has accurately analyzed 65% of the collection of FAQ documents. With additional processing to handle "near-pass"es, the success rate is approximately 75%. The preliminary results have demonstrated the utility of structural templates for mining information from semi-st...
Finding semantically similar questions based on their answers
- In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 2005
"... A large number of question and answer pairs can be collected from question and answer boards and FAQ pages on the Web. This paper proposes an automatic method of finding the questions that have the same meaning. The method can detect semantically similar questions that have little word overlap becau ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
A large number of question and answer pairs can be collected from question and answer boards and FAQ pages on the Web. This paper proposes an automatic method of finding the questions that have the same meaning. The method can detect semantically similar questions that have little word overlap because it calculates question-question similarities by using the corresponding answers as well as the questions. We develop two different similarity measures based on language modeling and compare them with the traditional similarity measures. Experimental results show that semantically similar questions pairs can be effectively found with the proposed similarity measures.
The World Wide Web: Quagmire or Goldmine?
, 1996
"... this article argues for the structured Web hypothesis: Information on the Web is sufficiently structured to facilitate effective Web mining. ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
this article argues for the structured Web hypothesis: Information on the Web is sufficiently structured to facilitate effective Web mining.

