Results 1 -
8 of
8
Kea: Practical automatic keyphrase extraction
- IN PROCEEDINGS OF THE 4TH ACM CONFERENCE ON DIGITAL LIBRARIES
, 1998
"... Keyphrases provide semantic metadata that summarize and characterize documents. This paper describes Kea, an algorithm for automatically extracting keyphrases from text. Kea identifies candidate keyphrases using lexical methods, calculates feature values for each candidate, and uses a machine-learni ..."
Abstract
-
Cited by 70 (8 self)
- Add to MetaCart
Keyphrases provide semantic metadata that summarize and characterize documents. This paper describes Kea, an algorithm for automatically extracting keyphrases from text. Kea identifies candidate keyphrases using lexical methods, calculates feature values for each candidate, and uses a machine-learning algorithm to predict which candidates are good keyphrases. The machine learning scheme first builds a prediction model using training documents with known keyphrases, and then uses the model to find keyphrases in new documents. We use a large test corpus to evaluate Kea’s effectiveness in terms of how many author-assigned keyphrases are correctly identified. The system is simple, robust, and available under the GNU General Public License; the paper gives instructions for use.
Query-Free News Search
, 2005
"... Many daily activities present information in the form of a stream of text, and often people can benefit from additional information on the topic discussed. TV broadcast news can be treated as one such stream of text; in this paper we discuss finding news articles on the web that are relevant to news ..."
Abstract
-
Cited by 39 (0 self)
- Add to MetaCart
Many daily activities present information in the form of a stream of text, and often people can benefit from additional information on the topic discussed. TV broadcast news can be treated as one such stream of text; in this paper we discuss finding news articles on the web that are relevant to news currently being broadcast. We evaluated a variety of algorithms for this problem, looking at the impact of inverse document frequency, stemming, compounds, history, and query length on the relevance and coverage of news articles returned in real time during a broadcast. We also evaluated several postprocessing techniques for improving the precision, including reranking using additional terms, reranking by document similarity, and filtering on document similarity. For the best algorithm, 84–91 % of the articles found were relevant, with at least 64 % of the articles being on the exact topic of the broadcast. In addition, a relevant article was found for at least 70 % of the topics.
A modified fuzzy art for soft document clustering
- In: Proc. International Joint Conference on Neural Networks
, 2002
"... Document clustering is a very useful application in recent days especially with the advent of the World Wide Web. Most of the existing document clustering algorithms either produce clusters of poor quality or are highly computationally expensive. In this paper we propose a document-clustering algori ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Document clustering is a very useful application in recent days especially with the advent of the World Wide Web. Most of the existing document clustering algorithms either produce clusters of poor quality or are highly computationally expensive. In this paper we propose a document-clustering algorithm, KMART, that uses an unsupervised Fuzzy Adaptive Resonance Theory (Fuzzy-ART) neural network. A modified version of the Fuzzy ART is used to enable a document to be in multiple clusters. The number of clusters is determined dynamically. Some experiments are reported to compare the efficiency and execution time of our algorithm with other document-clustering algorithm like Fuzzy c Means. The results show that KMART is both effective and efficient. 1.
Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction
"... Though both document summarization and keyword extraction aim to extract concise representations from documents, these two tasks have usually been investigated independently. This paper proposes a novel iterative reinforcement approach to simultaneously ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Though both document summarization and keyword extraction aim to extract concise representations from documents, these two tasks have usually been investigated independently. This paper proposes a novel iterative reinforcement approach to simultaneously
A Dynamic Grouping Technique for Distributing Codified-Knowledge in Large Organizations
, 2000
"... Information overload is a major problem plaguing the "mailing lists" approach to knowledge distribution. A recent trend towards resolving the problem is to distribute information to organizational members according to their individual needs. In this paper, we propose a new technique, referred to as ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Information overload is a major problem plaguing the "mailing lists" approach to knowledge distribution. A recent trend towards resolving the problem is to distribute information to organizational members according to their individual needs. In this paper, we propose a new technique, referred to as "dynamic grouping", for distributing codified knowledge in large organizations. To enable dynamic grouping, we develop a data structure called "organizational concept space" (OCS) consisting of a similarity network and an interest matrix . One major advantage of the dynamic grouping technique is that it can reduce information overload while avoiding information starvation by accommodating different levels of user demand for knowledge.
Shallow NLP techniques for Internet Search
, 2006
"... Information Retrieval (IR) is a major component in many of our daily activities, with perhaps its most prominent role manifested in search engines. Today's most advanced engines use the keyword-based ("bag of words") paradigm, which concedes some inherent disadvantages. We believe that natural langu ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Information Retrieval (IR) is a major component in many of our daily activities, with perhaps its most prominent role manifested in search engines. Today's most advanced engines use the keyword-based ("bag of words") paradigm, which concedes some inherent disadvantages. We believe that natural language (NL) is a more user-oriented, context-preservative and intuitive mechanism for web search.
Workflow-centric Information Distribution through Email
, 2000
"... Organizations require ways to efficiently distribute information such as news releases, seminar announcements, and memos. While the machinery for information storage, manipulation, and retrieval exists, research dealing directly with its distribution in an organizational context is scarce. In this p ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Organizations require ways to efficiently distribute information such as news releases, seminar announcements, and memos. While the machinery for information storage, manipulation, and retrieval exists, research dealing directly with its distribution in an organizational context is scarce. In this paper, we address this need by first examining the pros and cons of the conventional "mailing lists" approach and then proposing new workflow mechanisms that improve the efficiency and effectiveness of information distribution through email. The proposed approach is relevant to other information distribution approaches beyond e-mail. The main contributions of this study include: (1) offering a workflow perspective on organizational information distribution; (2) analysis of workflows in two new information distribution methods based on dynamic mailing lists and profile matching, respectively; and (3) proposing a new way of matching supply and demand of information that extends existing informa...
Copyright 1999 By
"... Many academic journals ask their authors to provide a list of about five to fifteen key words, to appear on the first page of each article. Since these key words are often phrases of two or more words, we prefer to call them keyphrases. There is a surprisingly wide variety of tasks for which keyphra ..."
Abstract
- Add to MetaCart
Many academic journals ask their authors to provide a list of about five to fifteen key words, to appear on the first page of each article. Since these key words are often phrases of two or more words, we prefer to call them keyphrases. There is a surprisingly wide variety of tasks for which keyphrases are useful, as we discuss in this paper. Recent commercial software, such as Microsoft's Word 97 and Verity's Search 97, includes algorithms that automatically extract keyphrases from documents.

