Results 11 - 20
of
22
Document Cards: A Top Trumps Visualization for Documents
"... Fig. 1. Document Cards help to display the important key terms and images of a document in a single compact view. Abstract — Finding suitable, less space consuming views for a document’s main content is crucial to provide convenient access to large document collections on display devices of differen ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Fig. 1. Document Cards help to display the important key terms and images of a document in a single compact view. Abstract — Finding suitable, less space consuming views for a document’s main content is crucial to provide convenient access to large document collections on display devices of different size. We present a novel compact visualization which represents the document’s key semantic as a mixture of images and important key terms, similar to cards in a top trumps game. The key terms are extracted using an advanced text mining approach based on a fully automatic document structure extraction. The images and their captions are extracted using a graphical heuristic and the captions are used for a semi-semantic image weighting. Furthermore, we use the image color histogram for classification and show at least one representative from each non-empty image class. The approach is demonstrated for the IEEE InfoVis publications of a complete year. The method can easily be applied to other publication collections and sets of documents which contain images. Index Terms—document visualization, visual summary, content extraction, document collection browsing 1
Partners NEPOMUK 08.01.2007
, 2007
"... 0.3 24.11.06 Integrated info from wiki and LaTeX files; marked open questions 0.4 29.11.06 Added draft texts for introduction + semantic wiki state of the art 0.5 29.11.06 Added language processing and semantic annotation parts ..."
Abstract
- Add to MetaCart
0.3 24.11.06 Integrated info from wiki and LaTeX files; marked open questions 0.4 29.11.06 Added draft texts for introduction + semantic wiki state of the art 0.5 29.11.06 Added language processing and semantic annotation parts
Approximate Matching for Evaluating Keyphrase Extraction
"... We propose a new evaluation strategy for keyphrase extraction based on approximate keyphrase matching. It corresponds well with human judgments and is better suited to assess the performance of keyphrase extraction approaches. Additionally, we propose a generalized framework for comprehensive analys ..."
Abstract
- Add to MetaCart
We propose a new evaluation strategy for keyphrase extraction based on approximate keyphrase matching. It corresponds well with human judgments and is better suited to assess the performance of keyphrase extraction approaches. Additionally, we propose a generalized framework for comprehensive analysis of keyphrase extraction that subsumes most existing approaches, which allows for fair testing conditions. For the first time, we compare the results of state-of-the-art unsupervised and supervised keyphrase extraction approaches on three evaluation datasets and show that the relative performance of the approaches heavily depends on the evaluation metric as well as on the properties of the evaluation dataset. Keywords keyphrase extraction; approximate matching 1
The Use of Topic Representative Words in Text Categorization
"... • Goal: evaluate the empirical utility of various “topic representative words ” for topic classification • Motivation: terms such as keyphrases and named entities are highly indicative of particular topics ..."
Abstract
- Add to MetaCart
• Goal: evaluate the empirical utility of various “topic representative words ” for topic classification • Motivation: terms such as keyphrases and named entities are highly indicative of particular topics
unknown title
"... Abstract: The interdisciplinary ViFaChem II project aims at providing a chemical digital library infrastructure for creating personalized information spaces. The value added services and scientific Web 2.0 techniques actively support chemical scientists and researchers in retrieval tasks as well as ..."
Abstract
- Add to MetaCart
Abstract: The interdisciplinary ViFaChem II project aims at providing a chemical digital library infrastructure for creating personalized information spaces. The value added services and scientific Web 2.0 techniques actively support chemical scientists and researchers in retrieval tasks as well as in deriving new knowledge from the collected information in a highly personalized fashion. The complex requirements of a digital library for chemists are described and an overall architecture tackling these requirements is presented. Also preliminary results regarding chemical entity recognition and automatic dynamic generated document facets are presented and discussed. 1
Evaluating N-gram based Evaluation Metrics for Automatic Keyphrase Extraction
"... This paper describes a feasibility study of n-gram-based evaluation metrics for automatic keyphrase extraction. To account for near-misses currently ignored by standard evaluation metrics, we adapt various evaluation metrics developed for machine translation and summarization, and also the R-precisi ..."
Abstract
- Add to MetaCart
This paper describes a feasibility study of n-gram-based evaluation metrics for automatic keyphrase extraction. To account for near-misses currently ignored by standard evaluation metrics, we adapt various evaluation metrics developed for machine translation and summarization, and also the R-precision evaluation metric from keyphrase evaluation. In evaluation, the R-precision metric is found to achieve the highest correlation with human annotations. We also provide evidence that the degree of semantic similarity varies with the location of the partially-matching component words. 1
Biomimetic Information Retrieval with Spreading-Activation Networks
, 2007
"... Information management systems act as a prosthetic scaffold for human memory. They retain and organize information objects to be conveniently recalled in support of knowledge-based tasks. We note a striking similarity between the functions of human memory and the processes in computational informati ..."
Abstract
- Add to MetaCart
Information management systems act as a prosthetic scaffold for human memory. They retain and organize information objects to be conveniently recalled in support of knowledge-based tasks. We note a striking similarity between the functions of human memory and the processes in computational information retrieval. For this reason, we ask whether it is viable to purposely design information management systems biomimetically, i.e., in a manner inspired by biological systems. Based on a comparison of cognitive models of human memory and computational information retrieval algorithms, we propose the Principles of Mnemonic Associative Knowledge (P-MAK) to describe the necessary components of biomimetic systems: the constraints of computing machines, the properties of human memory, how semantic knowledge representations are constructed, and the contexts in which information is usefully retrieved. The goal of P-MAK is to describe systems that are simple, inspectable, comprehensible, and easy to use. Since human memory as described by cognitive network models is analogous to a large associative hypertext repository, P-MAK’s principles suggest that networks would be an appropriate representation format. Therefore, we build a semantic similarity network from a document corpus using information retrieval (IR) algorithms, and describe how these processes are comparable to the functions of human semantic memory.
Information Filtering using Index Word Selection based on the Topics
"... Abstract — We have proposed an information filtering system using index word selection from a document set based on the topics included in a set of documents. This method narrows down the particularly characteristic words in a document set and the topics are obtained by Sparse Non-negative Matrix Fa ..."
Abstract
- Add to MetaCart
Abstract — We have proposed an information filtering system using index word selection from a document set based on the topics included in a set of documents. This method narrows down the particularly characteristic words in a document set and the topics are obtained by Sparse Non-negative Matrix Factorization. In information filtering, a document is often represented with the vector in which the elements correspond to the weight of the index words, and the dimension of the vector becomes larger as the number of documents is increased. Therefore, it is possible that useless words as index words for the information filtering are included. In order to address the problem, the dimension needs to be reduced. Our proposal reduces the dimension by selecting index words based on the topics included in a document set. We have applied the Sparse Non-negative Matrix Factorization to the document set to obtain these topics. The filtering is carried out based on a centroid of the learning document set. The centroid is regarded as the user’s interest. In addition, the centroid is represented with a document vector whose elements consist of the weight of the selected index words. Using the English test collection MEDLINE, thus, we confirm the effectiveness of our proposal. Hence, our proposed selection can confirm the improvement of the recommendation accuracy from the other previous methods when selecting the appropriate number of index words. In addition, we discussed the selected index words by our proposal and we found our proposal was able to select the index words covered some minor topics included in the document set.
Keyword Extraction and Multi-view Clustering Trees for Search and Retrieval in Customer Product Forums
"... Customers of large enterprises post threads in companysupported online forums to discuss solutions to product malfunctions, errors and problems. The ability to retrieve the most relevant threads in response to a customer’s search query in the product forums requires robust search capabilities. Unfor ..."
Abstract
- Add to MetaCart
Customers of large enterprises post threads in companysupported online forums to discuss solutions to product malfunctions, errors and problems. The ability to retrieve the most relevant threads in response to a customer’s search query in the product forums requires robust search capabilities. Unfortunately, the lack of (recommendation) links between the threads in the product discussion forums makes it infeasible to use web search algorithms such as PageRank in these forums. We provide a keyword extraction technique based on term co-occurrences that performs better than traditional tfidfbased techniques. We further provide a search and retrieval algorithm to retrieve the relevant threads in response to a search query in product discussion forums. Our approach is based on a hierarchical, multi-view (thread title and thread content) clustering of the threads. We evaluate our approach and compare with existing statistical algorithms on threads selected from HP’s customer support forum. 1.
Supervised by:
"... Business processes in modern enterprises require interlinking critical business information distributed across structured and unstructured data sources. This interlinking process is often initiated by a users ’ informational need represented as a keyword query, either created manually by the user or ..."
Abstract
- Add to MetaCart
Business processes in modern enterprises require interlinking critical business information distributed across structured and unstructured data sources. This interlinking process is often initiated by a users ’ informational need represented as a keyword query, either created manually by the user or automatically

