Results 1 - 10
of
15
Enhanced Hypertext Categorization Using Hyperlinks
, 1998
"... A major challenge in indexing unstructured hypertext databases is to automatically extract meta-data that enables structured search using topic taxonomies, circumvents keyword ambiguity, and improves the quality of search and profile-based routing and filtering. Therefore, an accurate classifier is ..."
Abstract
-
Cited by 326 (8 self)
- Add to MetaCart
A major challenge in indexing unstructured hypertext databases is to automatically extract meta-data that enables structured search using topic taxonomies, circumvents keyword ambiguity, and improves the quality of search and profile-based routing and filtering. Therefore, an accurate classifier is an essential component of a hypertext database. Hyperlinks pose new problems not addressed in the extensive text classification literature. Links clearly contain highquality semantic clues that are lost upon a purely termbased classifier, but exploiting link information is non-trivial because it is noisy. Naive use of terms in the link neighborhood of a document can even degrade accuracy. Our contribution is to propose robust statistical models and a relaxation labeling technique for better classification by exploiting link information in a small neighborhood around documents. Our technique also adapts gracefully to the fraction of neighboring documents having known topics. We experimented ...
Clustering Hypertext With Applications To Web Searching
- In Proceedings of the 11th ACM Conference on Hypertext and Hypermedia
, 2000
"... : Clustering separates unrelated documents and groups related documents, and is useful for discrimination, disambiguation, summarization, organization, and navigation of unstructured collections of hypertext documents. We propose a novel clustering algorithm that clusters hypertext documents using w ..."
Abstract
-
Cited by 45 (0 self)
- Add to MetaCart
: Clustering separates unrelated documents and groups related documents, and is useful for discrimination, disambiguation, summarization, organization, and navigation of unstructured collections of hypertext documents. We propose a novel clustering algorithm that clusters hypertext documents using words (contained in the document), out-links (from the document) , and in-links (to the document). The algorithm automatically determines the relative importance of words, out-links, and in-links for a given collection of hypertext documents. We annotate each cluster using six information nuggets: summary, breakthrough, review, keywords, citation, and reference. These nuggets constitute high-quality information resources that are representatives of the content of the clusters, and are extremely effective in compactly summarizing and navigating the collection of hypertext documents. We employ web searching as an application to illustrate our results. Keywords: cluster annotation, feature comb...
A Gonio-photometric Analysis
- of Ink Jet Bronzing
, 2008
"... If I have seen farther, it is by standing on the shoulders of giants. ..."
Abstract
-
Cited by 43 (1 self)
- Add to MetaCart
If I have seen farther, it is by standing on the shoulders of giants.
Fuzzy Clustering of Semantic Spaces
, 2001
"... In this paper the GK# model for the construction of thesaurus classes based on fuzzy semantic association measure between index terms and concepts (thesaurus classes) is presented. The association measure is obtained on the basis of fuzzy semantic relations between index terms, and it is used to clu ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In this paper the GK# model for the construction of thesaurus classes based on fuzzy semantic association measure between index terms and concepts (thesaurus classes) is presented. The association measure is obtained on the basis of fuzzy semantic relations between index terms, and it is used to cluster index terms into concepts. A hierarchical algorithm is introduced which runs on a simple numerical example. # 2001 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.
A comparative study of citations and links in document classification
- In Proceedings of the 6th ACM/IEEECS Joint Conference on Digital Libraries, Chapel
, 2006
"... It is well known that links are an important source of information when dealing with Web collections. However, the question remains on whether the same techniques that are used on the Web can be applied to collections of documents containing citations between scientific papers. In this work we prese ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
It is well known that links are an important source of information when dealing with Web collections. However, the question remains on whether the same techniques that are used on the Web can be applied to collections of documents containing citations between scientific papers. In this work we present a comparative study of digital library citations and Web links, in the context of automatic text classification. We show that there are in fact differences between citations and links in this context. For the comparison, we run a series of experiments using a digital library of computer science papers and a Web directory. In our reference collections, measures based on co-citation tend to perform better for pages in the Web directory, with gains up to 37 % over text based classifiers, while measures based on bibliographic coupling perform better in a digital library. We also propose a simple and effective way of combining a traditional text based classifier with a citation-link based classifier. This combination is based on the notion of classifier reliability and presented gains of up to 14 % in micro-averaged F1 in the Web collection. However, no significant gain was obtained in the digital library. Finally, a user study was performed to further investigate the causes for these results. We discovered that misclassifications by the citation-link based classifiers are in fact difficult cases, hard to classify even for humans.
S.N.: Improving search on the semantic desktop using associative retrieval techniques
- In: Proceedings of I-MEDIA 2007 and I-SEMANTICS 2007
, 2007
"... Abstract: While it is agreed that semantic enrichment of resources would lead to better search results, at present the low coverage of resources on the web with semantic information presents a major hurdle in realizing the vision of search on the Semantic Web. To address this problem we investigate ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Abstract: While it is agreed that semantic enrichment of resources would lead to better search results, at present the low coverage of resources on the web with semantic information presents a major hurdle in realizing the vision of search on the Semantic Web. To address this problem we investigate how to improve retrieval performance in a setting where resources are sparsely annotated with semantic information. We suggest employing techniques from associative information retrieval to find relevant material, which was not originally annotated with the concepts used in a query. We present an associative retrieval system for the Semantic Desktop and show how the use of associative retrieval increased retrieval performance. Key Words: semantic desktop, associative information retrieval Category: H.3.3, I.2.4, I.2.6, I.2.11
TSSP: A Reinforcement Algorithm to Find Related Papers
- In Proceedings of the Web Intelligence
, 2004
"... Content analysis and citation analysis are two common methods in recommending system. Compared with content analysis, citation analysis can discover more implicitly related papers. However, the citation-based methods may introduce more noise in citation graph and cause topic drift. Some work combine ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Content analysis and citation analysis are two common methods in recommending system. Compared with content analysis, citation analysis can discover more implicitly related papers. However, the citation-based methods may introduce more noise in citation graph and cause topic drift. Some work combine content with citation to improve similarity measurement. The problem is that the two features are not used to reinforce each other to get better result. To solve the problem, we propose a new algorithm, Topic Sensitive Similarity Propagation (TSSP), to effectively integrate content similarity into similarity propagation. TSSP has two parts: citation context based propagation and iterative reinforcement. First, citation contexts provide clues for which papers are topic related to and filter out less irrelevant citations. Second, iteratively integrating content and citation similarity enable them to reinforce each other during the propagation. The experimental results of a user study show TSSP outperforms other algorithms in almost all cases. 1.
Exploiting Coauthorship to Infer Topicality in a Digital Library of Computer Science Technical Reports
, 1996
"... We propose a method of mapping the topical content of distributed digital libraries and demonstrate the technique using data from the Networked Computer Science Technical Report Library (NCSTRL) digital library project. This method seeks to exploit information derived from document coauthorship to p ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We propose a method of mapping the topical content of distributed digital libraries and demonstrate the technique using data from the Networked Computer Science Technical Report Library (NCSTRL) digital library project. This method seeks to exploit information derived from document coauthorship to produce improved automatic subject classifications of the documents. In a distributed digital library, these subject classifications are useful in characterizing both intra-site and inter-site content. They are also helpful in providing secondary retrieval services. We present the method and describe an experiment and results showing that improved clusterings can be achieved relative to traditional document clustering. 1 Introduction Digital library projects[6] place heavy demands on existing infrastructure and hence, the performance of these systems is an object of considerable study. With the increased availability of digital libraries and distributed information retrieval systems, we can...
Exploiting Multiple Sources of Evidence of Document Relatedness in Hybrid Search Engines: A Unifying Model and Design Proposal
"... This is a draft of an article submitted (August 2001) for publication in ..."
Adaptive Retrieval Agents: Internalizing Local . . .
- MACHINE LEARNING
, 2000
"... This paper discusses a novel distributed adaptive algorithm and representation used to construct populations of adaptive Web agents. These InfoSpiders browse networked information environments on-line in search of pages relevant to the user, by traversing hyperlinks in an autonomous and intelligent ..."
Abstract
- Add to MetaCart
This paper discusses a novel distributed adaptive algorithm and representation used to construct populations of adaptive Web agents. These InfoSpiders browse networked information environments on-line in search of pages relevant to the user, by traversing hyperlinks in an autonomous and intelligent fashion. Each agent adapts to the spatial and temporal regularities of its local context thanks to a combination of machine learning techniques inspired by ecological models: evolutionary adaptation with local selection, reinforcement learning and selective query expansion by internalization of environmental signals, and optional relevance feedback. We evaluate the feasibility and performance of these methods in three domains: a general class of artificial graph environments, a controlled subset of the Web, and (preliminarly) the full Web. Our results suggest that InfoSpiders could take advantage of the starting points provided by search engines, based on global word statistics, and then use linkage topology to guide their search on-line. We show how this approach can complement the current state of the art, especially with respect to the scalability challenge.

