Results 1 - 10
of
11
Enhanced Hypertext Categorization Using Hyperlinks
, 1998
"... A major challenge in indexing unstructured hypertext databases is to automatically extract meta-data that enables structured search using topic taxonomies, circumvents keyword ambiguity, and improves the quality of search and profile-based routing and filtering. Therefore, an accurate classifier is ..."
Abstract
-
Cited by 326 (8 self)
- Add to MetaCart
A major challenge in indexing unstructured hypertext databases is to automatically extract meta-data that enables structured search using topic taxonomies, circumvents keyword ambiguity, and improves the quality of search and profile-based routing and filtering. Therefore, an accurate classifier is an essential component of a hypertext database. Hyperlinks pose new problems not addressed in the extensive text classification literature. Links clearly contain highquality semantic clues that are lost upon a purely termbased classifier, but exploiting link information is non-trivial because it is noisy. Naive use of terms in the link neighborhood of a document can even degrade accuracy. Our contribution is to propose robust statistical models and a relaxation labeling technique for better classification by exploiting link information in a small neighborhood around documents. Our technique also adapts gracefully to the fraction of neighboring documents having known topics. We experimented ...
Evaluation of Learning Schemes Used in Information Retrieval
, 1996
"... Searching within the context of information retrieval may be viewed as a communication process between the users and the indexers (or the authors). It is known that in expressing the same concept or idea, different people tend to use different words or phrases, and also that the meaning of words att ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Searching within the context of information retrieval may be viewed as a communication process between the users and the indexers (or the authors). It is known that in expressing the same concept or idea, different people tend to use different words or phrases, and also that the meaning of words attached to document surrogates tends to change over time. To overcome these phenomena, various learning schemes have been designed so as to automatically infer knowledge about document content from the relevance assessments of past queries. Thus, in contrast to most retrieval models that represent the semantic content of documents as static entities, these adaptive search models might change the descriptions of documents through an inductive learning scheme. The evaluation of such dynamic document space strategies may be based on retrospective tests within which the same set of queries is applied to train and test the system. Based on cross-validation principles, this paper suggests a more "ho...
Applications of Machine Learning in Information Retrieval
, 1997
"... Information retrieval systems provide access to collections of thousands, or millions, of documents, from which, by providing an appropriate description, users can recover any one. Typically, users iteratively refine the descriptions they provide to satisfy their needs, and retrieval systems can uti ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Information retrieval systems provide access to collections of thousands, or millions, of documents, from which, by providing an appropriate description, users can recover any one. Typically, users iteratively refine the descriptions they provide to satisfy their needs, and retrieval systems can utilize user feedback on selected documents to indicate the accuracy of
Modelling Hypermedia Retrieval in Datalog
- Hypertext - Information Retrieval - Multimedia, Synergieeffekte elektronischer Informationssysteme
"... In this paper, we take the logical approach to information retrieval in order to identify and describe new concepts required for performing hypermedia retrieval. For this purpose, we consider hypertext linking of nodes, hierarchical structure of documents and document type hierarchies. These concept ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
In this paper, we take the logical approach to information retrieval in order to identify and describe new concepts required for performing hypermedia retrieval. For this purpose, we consider hypertext linking of nodes, hierarchical structure of documents and document type hierarchies. These concepts are described in Datalog, a horn logic without functions. Furthermore, we discuss terminological inference and propose a new approach for its application in retrieval, for which we also describe the mapping into Datalog formulas. It turns out that this logic is able to express most of the concepts, but that a higher-level language would be more appropriate for hypermedia retrieval. 1 Introduction In the logical approach to information retrieval (IR), retrieval is interpreted as inference. For a query q, the system is searching for documents d which imply the query logically, i.e. for which the logical formula q / d is true. Due to the intrinsic vagueness and imprecision of IR, a logic tha...
Hypertext Versions of Journal Articles: Computer-aided linking and realistic human-based evaluation
, 1999
"... My overall objective is to develop and evaluate ways of automatically incorporating hypertext links into pre-existing scholarly journal articles. I describe a rule-based approach for making three types of links (structural, definition, and semantic). Structural links are a way of making explicit som ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
My overall objective is to develop and evaluate ways of automatically incorporating hypertext links into pre-existing scholarly journal articles. I describe a rule-based approach for making three types of links (structural, definition, and semantic). Structural links are a way of making explicit some connections between parts of the text. Definition links connect the use of a term, defined elsewhere in the document, to that definition. Links that connect parts of text that discuss similar things are semantic links. I distinguish several types of semantic links. I use two information retrieval (IR) systems (Cornell's SMART system and Bellcore's Latent Semantic Indexing) to select links based on the content of the articles. I conducted an experiment to compare the performance of the links forged using these two systems. The effectiveness of the links (and the rules used to make them) is tested by people reading the hypertext versions for information under a time constraint. A within-subj...
Using LSI to evaluate the quality of hypertext links
- IR and Automatic Construction of Hypermedia: A Research Workshop. ACM SIGIR
, 1995
"... Introduction We present a new method for evaluating the usefulness of hypertext links and we present experiments using this method. Each experiment made use of the same corpus of 1608 documents from approximately 320 authors. The documents were drawn from the main Usenet newsgroup about computer gr ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Introduction We present a new method for evaluating the usefulness of hypertext links and we present experiments using this method. Each experiment made use of the same corpus of 1608 documents from approximately 320 authors. The documents were drawn from the main Usenet newsgroup about computer graphics. A typical document in the corpus had 200 words. Hypertext is any form of `non-sequential writing --- text that branches and allows choices to the reader, best read at an interactive screen' [1]. Linked hypertext is the most prevalent form of hypertext today. In such a document, users navigate between chunks of text by following links. Our goal is to create linked hypertext that will be useful for browsing. We compare the semantic closeness of documents with the number of links in Presented at ACM SIGIR IR and Automatic Construction of Hypermedia: a research workshop, Maristella Agosti and James Allan, eds. July 1995. 1 the sho
Probabilistic Logical Information Retrieval for Content, Hypertext, and Database Querying
, 1997
"... Classical retrieval models support content-oriented searching for documents using a set of words as data model. However, in hypertext and database applications we want to consider the link structure and attribute values of documents in addition to the pure content. In this paper, we present a framew ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Classical retrieval models support content-oriented searching for documents using a set of words as data model. However, in hypertext and database applications we want to consider the link structure and attribute values of documents in addition to the pure content. In this paper, we present a framework based on probabilistic logical retrieval for describing the retrieval function for a query which refers to the content of documents, to the hypertext structure of documents, and to the database attribute values of documents. The challenge is to find a retrieval function which yields welldefined retrieval weights for ranking the documents with respect to a combination of the query criteria. We demonstrate the implementation and evaluation of our approach using HySpirit, a prototypical system of a probabilistic deductive database. 1 Introduction Todays information retrieval (IR) applications for searching hypertext and multimedia documents require a more powerful data model than the class...
Browsing Document Collections: Automatically Organizing Digital Libraries and Hypermedia using the Gray Code
- Information Processing and Management
, 1998
"... Relevance and economic feedback may be used to produce an ordering of documents that supports browsing in hypermedia and digital libraries. Document classification based on the Gray code provides paths through the entire collection, each path traversing each node in the set of documents exactly o ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Relevance and economic feedback may be used to produce an ordering of documents that supports browsing in hypermedia and digital libraries. Document classification based on the Gray code provides paths through the entire collection, each path traversing each node in the set of documents exactly once. Systems organizing documents based on weighted and unweighted Gray codes are examined. Relevance feedback is used to conceptually organize the collection for an individual to browse, based on that individual's interests and information needs, as reflected by their relevance judgements and user supplied economic preferences. We apply Bayesian learning theory to estimating the characteristics of documents of interest to the user and supply an analytic model of browsing performance, based on minimizing the Expected Browsing Distance (EBD). Economic feedback may be used to change the ordering of documents to benefit the user. Using these techniques, a hypermedia or digital library ma...
Logic and Uncertainty in Information Retrieval
- Lectures in Information Retrieval, Lecture Notes in Computer Science
, 2001
"... The use of logic in Information Retrieval (IR) enables one to formulate models that are more general than other well known IR models. ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The use of logic in Information Retrieval (IR) enables one to formulate models that are more general than other well known IR models.
Enhancing Retrieval with Hyperlinks: A General Model Based On Propositional . . .
- J. AM. SOC. INF. SCI. TECHNOL
, 2003
"... ... This article proposes a general model for using hyperlinks based on Probabilistic Argumentation Systems, in which each of the above-mentioned techniques can be stated. This model will allow to discover some inconsistencies in the mentioned techniques, and to take a higher level and systematic ap ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
... This article proposes a general model for using hyperlinks based on Probabilistic Argumentation Systems, in which each of the above-mentioned techniques can be stated. This model will allow to discover some inconsistencies in the mentioned techniques, and to take a higher level and systematic approach for using hyperlinks for retrieval.

