Results 1 - 10
of
44
Coherent keyphrase extraction via web mining
- In Proceedings of IJCAI
, 2003
"... Keyphrases are useful for a variety of purposes, including summarizing, indexing, labeling, categorizing, clustering, highlighting, browsing, and searching. The task of automatic keyphrase extraction is to select keyphrases from within the text of a given document. Automatic keyphrase extraction mak ..."
Abstract
-
Cited by 39 (1 self)
- Add to MetaCart
Keyphrases are useful for a variety of purposes, including summarizing, indexing, labeling, categorizing, clustering, highlighting, browsing, and searching. The task of automatic keyphrase extraction is to select keyphrases from within the text of a given document. Automatic keyphrase extraction makes it feasible to generate keyphrases for the huge number of documents that do not have manually assigned keyphrases. A limitation of previous keyphrase extraction algorithms is that the selected keyphrases are occasionally incoherent. That is, the majority of the output keyphrases may fit together well, but there may be a minority that appear to be outliers, with no clear semantic relation to the majority or to each other. This paper presents enhancements to the Kea keyphrase extraction algorithm that are designed to increase the coherence of the extracted keyphrases. The approach is to use the degree of statistical association among candidate keyphrases as evidence that they may be semantically related. The statistical association is measured using web mining. Experiments demonstrate that the enhancements improve the quality of the extracted keyphrases. Furthermore, the enhancements are not domain-specific: the algorithm generalizes well when it is trained on one domain (computer science documents) and tested on another (physics documents). 1
Thesaurus based automatic keyphrase indexing
- In: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
, 2006
"... We propose a new method that enhances automatic keyphrase extraction by using semantic information on terms and phrases gleaned from a domain-specific thesaurus. We evaluate the results against keyphrase sets assigned by a state-of-the-art keyphrase extraction system and those assigned by six profes ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
We propose a new method that enhances automatic keyphrase extraction by using semantic information on terms and phrases gleaned from a domain-specific thesaurus. We evaluate the results against keyphrase sets assigned by a state-of-the-art keyphrase extraction system and those assigned by six professional indexers. Categories and Subject Descriptors H.3.1 [Content Analysis and Indexing]: Indexing methods, linguistic processing, thesauruses.
Using keyphrases as search result surrogates on small screen devices
- Personal and Ubiquitous Computing
, 2004
"... This paper investigates user interpretation of search result displays on small screen devices. Such devices present interesting design challenges given their limited display capabilities, particularly in relation to screen size. Our aim is to provide users with succinct yet useful representations of ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
This paper investigates user interpretation of search result displays on small screen devices. Such devices present interesting design challenges given their limited display capabilities, particularly in relation to screen size. Our aim is to provide users with succinct yet useful representations of search results that allow rapid and accurate decisions to be made about the utility of result documents, yet minimize user actions (such as scrolling), the use of device resources, and the volume of data to be downloaded. Our hypothesis is that keyphrases that are automatically extracted from documents can support this aim. We report on a user study that compared how accurately users categorized result documents on small screens when the document surrogates consisted of either keyphrases only, or document titles. We found no significant performance differences between the two conditions. In addition to these encouraging results, keyphrases have the benefit that they can be extracted and presented when no other document metadata can be identified.
Human Evaluation of Kea, an Automatic Keyphrasing System
, 2001
"... This paper describes an evaluation of the Kea automatic keyphrase extraction algorithm. Tools that automatically identify keyphrases are desirable because document keyphrases have numerous applications in digital library systems, but are costly and time consuming to manually assign. Keyphrase extrac ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
This paper describes an evaluation of the Kea automatic keyphrase extraction algorithm. Tools that automatically identify keyphrases are desirable because document keyphrases have numerous applications in digital library systems, but are costly and time consuming to manually assign. Keyphrase extraction algorithms are usually evaluated by comparison to author-specified keywords, but this methodology has several well-known shortcomings. The results presented in this paper are based on subjective evaluations of the quality and appropriateness of keyphrases by human assessors, and make a number of contributions. First, they validate previous evaluations of Kea that rely on author keywords. Second, they show Kea's performance is comparable to that of similar systems that have been evaluated by human assessors. Finally, they justify the use of author keyphrases as a performance metric by showing that authors generally choose good keywords.
Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction
"... Though both document summarization and keyword extraction aim to extract concise representations from documents, these two tasks have usually been investigated independently. This paper proposes a novel iterative reinforcement approach to simultaneously ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Though both document summarization and keyword extraction aim to extract concise representations from documents, these two tasks have usually been investigated independently. This paper proposes a novel iterative reinforcement approach to simultaneously
Narrative Text Classification and Automatic Key Phrase Extraction in Web Document Corpora
- In 7th ACM Intern. Workshop on Web Information and Data Management (WIDM 2005
"... Automatic key phrase extraction is a useful tool in many text related applications such as clustering and summarization. State-of-the-art methods are aimed towards extracting key phrases from traditional text such as technical papers. Application of these methods on Web documents, which often contai ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Automatic key phrase extraction is a useful tool in many text related applications such as clustering and summarization. State-of-the-art methods are aimed towards extracting key phrases from traditional text such as technical papers. Application of these methods on Web documents, which often contain diverse and heterogeneous contents, is of particular interest and challenge in the information age. In this work, we investigate the significance of narrative text classification in the task of automatic key phrase extraction in Web document corpora. We benchmark three methods, TFIDF, KEA, and Keyterm, used to extract key phrases from all the plain text and from only the narrative text of Web pages. ANOVA tests are used to analyze the ranking data collected in a user study using quantitative measures of acceptable percentage and quality value. The evaluation shows that key phrases extracted from the narrative text only are significantly better than those obtained from all plain text of Web pages. This demonstrates that narrative text classification is indispensable for effective key phrase extraction in Web document corpora.
Language Models for Hierarchical Summarization
, 2003
"... Hierarchies have long been used for organization, summarization, and access to information. In this dissertation we define summarization in terms of a probabilistic language model and use this definition to explore a new technique for automatically generating topic hierarchies. We use the language ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Hierarchies have long been used for organization, summarization, and access to information. In this dissertation we define summarization in terms of a probabilistic language model and use this definition to explore a new technique for automatically generating topic hierarchies. We use the language model to characterize the documents that will be summarized and then apply a graph-theoretic algorithm to determine the best topic words for the hierarchical summary. This work is very different from previous attempts to generate topic hierarchies because it relies on statistical analysis and language modeling to identify descriptive words for a document and organize the words in a hierarchical structure. We compare
Automating Iterative Tasks With Programming By Demonstration
, 2000
"... iii Abstract Programming by demonstration is an end-user programming technique that allows people to create programs by showing the computer examples of what they want to do. Users do not need specialised programming skills. Instead, they instruct the computer by demonstrating examples, much as the ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
iii Abstract Programming by demonstration is an end-user programming technique that allows people to create programs by showing the computer examples of what they want to do. Users do not need specialised programming skills. Instead, they instruct the computer by demonstrating examples, much as they might show another person how to do the task. Programming by demonstration empowers users to create programs that perform tedious and time-consuming computer chores. However, it is not in widespread use, and is instead confined to research applications that end users never see. This makes it difficult to evaluate programming by demonstration tools and techniques.
Mining the Web for Lexical Knowledge to Improve Keyphrase Extraction: Learning from Labeled and Unlabeled Data
- ERB-1096 NRC #44947, National Research Council, Institute for Information Technology
, 2002
"... A journal article is often accompanied by a list of keyphrases, composed of about five to fifteen important words and phrases that capture the article’s main topics. Keyphrases are useful for a variety of purposes, including summarizing, indexing, labeling, categorizing, clustering, highlighting, br ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
A journal article is often accompanied by a list of keyphrases, composed of about five to fifteen important words and phrases that capture the article’s main topics. Keyphrases are useful for a variety of purposes, including summarizing, indexing, labeling, categorizing, clustering, highlighting, browsing, and searching. The task of automatic keyphrase extraction is to select keyphrases from within the text of a given document. Automatic keyphrase extraction makes it feasible to generate keyphrases for the huge number of documents that do not have manually assigned keyphrases. Good performance on this task has been obtained by approaching it as a supervised learning problem. An input document is treated as a set of candidate phrases that must be classified as either keyphrases or non-keyphrases. To classify a candidate phrase as a keyphrase, the most important features (attributes) appear to be the frequency and location of the candidate phrase in the document. Recent work has demonstrated that it is also useful to know the frequency of the candidate phrase as a manually assigned keyphrase for other documents in the same domain as the given document (e.g., the domain of computer science). Unfortunately, this keyphrase-frequency
Automatic document indexing in large medical collections
- In HIKM
, 2006
"... Term extraction relates to extracting the most characteristic or important terms (words or phrases) in a document. This information is commonly used for improving the accuracy of document indexing and retrieval in large text collections. It also allows for faster and better understanding of the cont ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Term extraction relates to extracting the most characteristic or important terms (words or phrases) in a document. This information is commonly used for improving the accuracy of document indexing and retrieval in large text collections. It also allows for faster and better understanding of the contents of a document collection without first browsing through the contents of its documents. This paper presents AMTEX, an automatic term extraction method, specifically designed for the automatic indexing of documents in large medical collections such as MEDLINE, the premier bibliographic database of the U.S. National Library of Medicine (NLM). AMTEX combines MeSH, the terminological thesaurus resource of NLM, with a well-established method for extraction of domain terms, the C/NC-value method. The performance evaluation of various AMTEX configurations in the indexing task is measured against the current state-of-the-art, the MMTx method. The experimental results on a subset of MEDLINE documents demonstrate that AMTEX achieves better precision and recall than MMTx.

