Results 1 - 10
of
11
Enhanced Hypertext Categorization Using Hyperlinks
, 1998
"... A major challenge in indexing unstructured hypertext databases is to automatically extract meta-data that enables structured search using topic taxonomies, circumvents keyword ambiguity, and improves the quality of search and profile-based routing and filtering. Therefore, an accurate classifier is ..."
Abstract
-
Cited by 326 (8 self)
- Add to MetaCart
A major challenge in indexing unstructured hypertext databases is to automatically extract meta-data that enables structured search using topic taxonomies, circumvents keyword ambiguity, and improves the quality of search and profile-based routing and filtering. Therefore, an accurate classifier is an essential component of a hypertext database. Hyperlinks pose new problems not addressed in the extensive text classification literature. Links clearly contain highquality semantic clues that are lost upon a purely termbased classifier, but exploiting link information is non-trivial because it is noisy. Naive use of terms in the link neighborhood of a document can even degrade accuracy. Our contribution is to propose robust statistical models and a relaxation labeling technique for better classification by exploiting link information in a small neighborhood around documents. Our technique also adapts gracefully to the fraction of neighboring documents having known topics. We experimented ...
Cha-Cha: A system for organizing intranet search results
- In Proceedings of the 2nd USENIX Symposium on Internet Technologies and Systems
, 1999
"... Although search over World Wide Web pages has recently received much academic and commercial attention, surprisingly little research has been done on how to search the web pages within large, diverse intranets. Intranets contain the information associated with the internal workings of an organizatio ..."
Abstract
-
Cited by 32 (2 self)
- Add to MetaCart
Although search over World Wide Web pages has recently received much academic and commercial attention, surprisingly little research has been done on how to search the web pages within large, diverse intranets. Intranets contain the information associated with the internal workings of an organization. A standard search engine retrieves web pages that fall within a widely diverse range of information contexts, but presents these results uniformly, in a ranked list. As an alternative, the Cha-Cha system organizes web search results in such a way as to reflect the underlying structure of the intranet. In our approach, an “outline ” or “table of contents ” is created by first recording the shortest paths in hyperlinks from root pages to every page within the web intranet. After the user issues a query, these shortest paths are dynamically combined to form a hierarchical outline of the context in which the search results occur. The system is designed to be helpful for users with a wide range of computer skills. Preliminary user study and survey results suggest that some users find the resulting structure more helpful than the standard retrieval results display for intranet search. 1
Augmenting a Characterization Network with Semantic Information
, 1997
"... A searcher in a list of phrases which serves as an index to a set of documents often has problems finding the right words when the information sought for has to be described. Offering alternative phrasings and pointing to related concepts in the index could be a great help in this difficult process ..."
Abstract
-
Cited by 13 (9 self)
- Add to MetaCart
A searcher in a list of phrases which serves as an index to a set of documents often has problems finding the right words when the information sought for has to be described. Offering alternative phrasings and pointing to related concepts in the index could be a great help in this difficult process of query formulation. Usually the index is obtained by characterizing the set of documents. This paper examines the effect of adding semantic relations to the index. Various ways in which nodes in an index can be related are discussed, and criteria for adding new index entries are introduced. The effects of adding relations on the process of offering support during the formulation process are treated as well. Constructing a representation of the information need can be done in many ways. In this paper we adopt a process called Query by Navigation. A searcher can use this process in order to browse an index, selecting relevant items along the way. Keywords: information retrieval; user modelli...
Personalized Search Support For Networked Document Retrieval Using Link Inference
- Proceedings of the 7th International Conference DEXA'96 on Data Base and Expert System Applications, volume 1134 of Lecture Notes in Computer Science
, 1996
"... . Constructing a query consisting of a set of terms or descriptors is often an iterative process. To the user, the starting query and the final result could be strongly related. These two queries could even be worthy of a link between them. This paper presents a method for deciding when a link betwe ..."
Abstract
-
Cited by 12 (10 self)
- Add to MetaCart
. Constructing a query consisting of a set of terms or descriptors is often an iterative process. To the user, the starting query and the final result could be strongly related. These two queries could even be worthy of a link between them. This paper presents a method for deciding when a link between two descriptors is justified. The decision hinges on the way in which the user has moved from one to the other. In order to allow for users with different levels of experience and different backgrounds, we introduce a number of parameters with which the inference process can be controlled. 1 Introduction Document retrieval becomes more and more important as the World Wide Web is frequented by searchers from a multitude of backgrounds and with a full spectrum of experience. When a person has to formulate a query in the context of document retrieval, this usually is an iterative process, where to an observer the end result very often only slightly resembles the original query. To the user ...
Annotation-based Document Retrieval with Four-Valued Probabilistic Datalog
- In WIRD ’04: Proceedings of the first SIGIR Workshop on the Integration of Information Retrieval and Databases
, 2004
"... The COLLATE system (collaboratory for annotation, indexing and retrieval of digitized historical archive material) provides film researchers with a collaborative environment in which historic documents about European films can be analysed, interpreted and discussed, using nested annotations and disc ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
The COLLATE system (collaboratory for annotation, indexing and retrieval of digitized historical archive material) provides film researchers with a collaborative environment in which historic documents about European films can be analysed, interpreted and discussed, using nested annotations and discourse structure relations among them. Annotations are metadata, and annotation threads form a hypertext containing positive and negative links, constituting a certain kind of context exploitable for document retrieval. In this paper, we discuss a solution for using annotations for information retrieval. To exploit annotation threads which consist of nested annotations and typed links between them, an annotation-based retrieval approach should have to cope with negative and contradictory statements. The nested annotation retrieval approach (NARA) is an approach addressing these issues. Based on this, we present NARAlog, an implementation using four-valued probabilistic datalog (FVPD), able to perform an in-depth analysis of annotation threads and to deal with contradictory statements.
The Hypertext Concordance: A Better Back-of-the-Book Index
- Proceedings of First Workshop on Computational Terminology
, 1998
"... This paper describes a tool that creates a usable index for organizing and hyperlinking related web pages. This type of hypertext construction is an important application of terminology extraction. The tool is based on a new type of hypertext: the hypertext concordance (HC). An HC is a hypertext bac ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This paper describes a tool that creates a usable index for organizing and hyperlinking related web pages. This type of hypertext construction is an important application of terminology extraction. The tool is based on a new type of hypertext: the hypertext concordance (HC). An HC is a hypertext back-of-the-book index that lists terms with context the same way a concordance does. An HC has the following characteristics: 1. Index terms are selected using a terminology extraction algorithm. 2. Occurrences of indexed terms in documents are hyperlinked to the index. 3. Term references in the index are listed with surrounding context in the manner of a concordance. 4. Each term reference in the index is hyperlinked to its document occurrence (thereby making the page numbers of paper indices unnecessary). The HC is designed for look-up (the traditional use of a book index), but also for browsing. 1 Introduction The power of large hypertext systems such as the World Wide Web derives from hy...
Browsing Document Collections: Automatically Organizing Digital Libraries and Hypermedia using the Gray Code
- Information Processing and Management
, 1998
"... Relevance and economic feedback may be used to produce an ordering of documents that supports browsing in hypermedia and digital libraries. Document classification based on the Gray code provides paths through the entire collection, each path traversing each node in the set of documents exactly o ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Relevance and economic feedback may be used to produce an ordering of documents that supports browsing in hypermedia and digital libraries. Document classification based on the Gray code provides paths through the entire collection, each path traversing each node in the set of documents exactly once. Systems organizing documents based on weighted and unweighted Gray codes are examined. Relevance feedback is used to conceptually organize the collection for an individual to browse, based on that individual's interests and information needs, as reflected by their relevance judgements and user supplied economic preferences. We apply Bayesian learning theory to estimating the characteristics of documents of interest to the user and supply an analytic model of browsing performance, based on minimizing the Expected Browsing Distance (EBD). Economic feedback may be used to change the ordering of documents to benefit the user. Using these techniques, a hypermedia or digital library ma...
Document-centered collaboration for scholars
- in the humanities - the COLLATE system. In Koch and Sølvberg [12
, 2003
"... Abstract. In contrast to electronic document collections we find in contemporary digital libraries, systems applied in a cultural domain have to satisfy specific requirements with respect to data ingest, management, and access. Such systems should as well be able to support the collaborative work of ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract. In contrast to electronic document collections we find in contemporary digital libraries, systems applied in a cultural domain have to satisfy specific requirements with respect to data ingest, management, and access. Such systems should as well be able to support the collaborative work of domain experts and furthermore offer mechanisms to exploit the value-added information resulting from a collaborative process like scientific discussions. In this paper, we present the solutions to these requirements developed and realized in the COLLATE system, where advanced methods for document classification, content management, and a new kind of context-based retrieval using scientific discourses are applied. 1
Logic as a tool in a term matching information retrieval system
- In Proceedings of the Workshop on Logical and Uncertainty Models for Information Systems
, 1999
"... Abstract. Information retrieval can be seen both as an inference process under uncertainty involving complex relationships between information items, and as a task of proper assessment of uncertainty. Probabilistic argumentation systems are a technique for reasoning under uncertainty which emphasize ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. Information retrieval can be seen both as an inference process under uncertainty involving complex relationships between information items, and as a task of proper assessment of uncertainty. Probabilistic argumentation systems are a technique for reasoning under uncertainty which emphasize both aspects, by clearly distinguishing the qualitative and quantitative aspects of uncertainty. This paper presents the use of probabilistic argumentation systems for (1) taking into account hypertext links in order to improve an initial ranking of documents and (2) considering statistical similarities between query terms to improve their weighting. These two applications can be easily integrated in a retrieval system based on term matching. 1
Ranking Strategies for Navigation based Query Formulation
- Journal of Intelligent Information Systems
, 1996
"... Navigating through a hypermedia retrieval system bears the problem of selecting an item from a large number of options which are available to continue the trajectory. Ranking these options according to some criterion is a method to ease the task of navigation. A number of ranking strategies have alr ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Navigating through a hypermedia retrieval system bears the problem of selecting an item from a large number of options which are available to continue the trajectory. Ranking these options according to some criterion is a method to ease the task of navigation. A number of ranking strategies have already been proposed. This paper presents a formalization of the concept of ranking, and of the aforementioned strategies. Furthermore we propose two strategies which allow a personalized approach to ranking. Submitted for publication in Journal of Intelligent Information Systems 1 Introduction In some ways, the introduction of mass storage devices like CD-ROM has been a mixed blessing. True, we can offer large amounts of information, be it sound, video or text. However, the task of finding the right information has become increasingly difficult. Although indexing the information somewhat reduces the complexity, the user may not have a clear overview of the indices which are resident in the i...

