Results 1 -
3 of
3
Glean: using syntactic information in document filtering
- Inf. Process. Manage
, 1998
"... In the networked world of the information age, we are exposed to inordinate amounts of information. Search engines and information retrieval systems seek to discern the relevant from the irrelevant information given the context of a user's query. In this paper, we describe a system named Glean, whic ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In the networked world of the information age, we are exposed to inordinate amounts of information. Search engines and information retrieval systems seek to discern the relevant from the irrelevant information given the context of a user's query. In this paper, we describe a system named Glean, which is based on the idea that coherent textcontains signi cant latent information, such as syntactic structure and patterns of language use, which can be used to enhance the performance of information retrieval systems. We propose a trainable approachthat makes use of syntactic information to increase the precision of information retrieval systems. We present results on these improvements to precision under di erent scenarios: using syntactic information at di erent granularity, and di erent sizes of syntactic contexts.
Catering to the needs of Web users: Integrating Retrieval and Browsing
"... We propose a new approach to querying hypermedia documents on the Web based on information retrieval (IR), browsing, and database techniques so as to provide maximum flexibility to the user. We present a model based on object representation where an identity does not correspond to a source HTML page ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We propose a new approach to querying hypermedia documents on the Web based on information retrieval (IR), browsing, and database techniques so as to provide maximum flexibility to the user. We present a model based on object representation where an identity does not correspond to a source HTML page but to a fragment of it. A fragment is identified using the explicit structure provided by the HTML tags as well as the implicit structure extracted using IR techniques. Our fragmentation provides access to different heterogeneous components (text, image, audio, video, etc.) of a given document, and to their relationships (implicit or explicit through hyperlinks). Our language expresses browsing and restructuring based on IR techniques in a unified framework. All these are integral components of the AKIRA system, currently under development. Keywords: multimedia, hypermedia, Web, views, data model, query language, information retrieval, agents 1 Introduction The Web invades our lives. Whi...
Precise Environmental Searches: Integrating Hierarchical Information Search with EnviroDaemon
"... Information retrieval has evolved from searches of references, to abstracts, to documents. Search on the Web involves search engines that promise to parse full-text and other files: audio, video, and multimedia. With the indexable Web at 320 million pages and growing, difficulties with locating rele ..."
Abstract
- Add to MetaCart
Information retrieval has evolved from searches of references, to abstracts, to documents. Search on the Web involves search engines that promise to parse full-text and other files: audio, video, and multimedia. With the indexable Web at 320 million pages and growing, difficulties with locating relevant information have become apparent. The most prevalent means for information retrieval relies on syntax-based methods: keywords or strings of characters are presented to a search engine, and it returns all the matches in the available documents. This method is satisfactory and easy to implement, but it has some inherent limitations that make it unsuitable for many tasks. Instead of looking for syntactical patterns, the user often is interested in keyword meaning or the location of a particular word in a title or header. This paper describes some precise search approaches in the environmental domain that locate information according to syntactic criteria, augmented by the utilization of information in a certain context. The main emphasis of this paper lies in the treatment of structured knowledge, where essential aspects about the topic of interest are encoded not only by the individual items, but also by their relationships among each other. Examples for such structured knowledge are hypertext documents, diagrams, logical and chemical formulae. Benefits of this approach are enhanced precision and approximate search in an already focused, context-specific search engine for the environment: EnviroDaemon.

