• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Recognizing acronyms and their definitions (1995)

by K Taghva, J Gilbreth
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 20
Next 10 →

A Simple Algorithm For Identifying Abbreviation Definitions in Biomedical Text

by Ariel S. Schwartz, Marti A. Hearst , 2003
"... The volume of biomedical text is growing at a fast rate, creating challenges for humans and computer systems alike. One of these challenges arises from the frequent use of novel abbreviations in these texts, thus requiring that biomedical lexical ontologies be continually updated. In this paper w ..."
Abstract - Cited by 116 (1 self) - Add to MetaCart
The volume of biomedical text is growing at a fast rate, creating challenges for humans and computer systems alike. One of these challenges arises from the frequent use of novel abbreviations in these texts, thus requiring that biomedical lexical ontologies be continually updated. In this paper we show that the problem of identifying abbreviations' definitions can be solved with a much simpler algorithm than that proposed by other research efforts. The algorithm achieves 96% precision and 82% recall on a standard test collection, which is at least as good as existing approaches. It also achieves 95% precision and 82% recall on another, larger test set. A notable advantage of the algorithm is that, unlike other approaches, it does not require any training data.

Creating an online dictionary of abbreviations from MEDLINE

by Jeffrey T. Chang, Hinrich Schütze Ph. D, Novation Biosciences, Russ B. Altman, Ph. D - Journal of the American Medical Informatics Association , 2002
"... 1 To whom correspondence should be addressed The immense volume and rapid growth of biomedical literature present special challenges for humans as well as computer programs analyzing it. One such challenge comes from the common use of abbreviations that effectively augments the size of the vocabular ..."
Abstract - Cited by 56 (2 self) - Add to MetaCart
1 To whom correspondence should be addressed The immense volume and rapid growth of biomedical literature present special challenges for humans as well as computer programs analyzing it. One such challenge comes from the common use of abbreviations that effectively augments the size of the vocabulary for the field. To cope with this, we have developed an algorithm to identify abbreviations in text. It uses a statistical learning algorithm logistic regression to score abbreviations based on their resemblance to previously identified ones, achieving up to 84 % recall at 81 % precision. We then scanned all of MEDLINE and found 781,632 high-scoring abbreviation definitions. We are making these available as a public abbreviation server

Hybrid Text Mining for Finding Abbreviations and Their Definitions

by Youngja Park, Roy J. Byrd , 2001
"... We present a hybrid text mining method for finding abbreviations and their definitions in free format texts. To deal with the problem, this method employs pattern-based abbreviation rules in addition to text markers and cue words. The pattern-based rules describe how abbreviations are formed ..."
Abstract - Cited by 29 (0 self) - Add to MetaCart
We present a hybrid text mining method for finding abbreviations and their definitions in free format texts. To deal with the problem, this method employs pattern-based abbreviation rules in addition to text markers and cue words. The pattern-based rules describe how abbreviations are formed from definitions. Rules can be generated automatically and/or manually and can be augmented when the system processes new documents. The proposed method has the advantages of high accuracy, high flexibility, wide coverage, and fast recognition.

Automatic Extraction of Acronyms from Text

by Stuart Yeates - University of Waikato , 1999
"... A brief introduction to acronyms is given and motivation for extracting them in a digital library environment is discussed. A technique for extracting acronyms is given with an analysis of the results. The technique is found to have a low number of false negatives and a high number of false positive ..."
Abstract - Cited by 24 (2 self) - Add to MetaCart
A brief introduction to acronyms is given and motivation for extracting them in a digital library environment is discussed. A technique for extracting acronyms is given with an analysis of the results. The technique is found to have a low number of false negatives and a high number of false positives. Introduction Digital library research seeks to build tools to enable access of content, while making as few as possible assumptions about the content, since assumptions limit the range of applicability of the tools. Generally, the broader the assumptions the more widely applicable the tools. For example, keyword based indexing [5] is based on communications theory and applies to all natural human textual languages (allowances for differences in character sets and similar localisation issues not withstanding) . The algorithm described in this paper makes much stronger assumptions about the content. It assumes textual content that contains acronyms, an assumption which is known to hold for...

Acrophile: an automated acronym extractor and server

by Leah S. Larkey, Paul Ogilvie, M. Andrew Price, Brenden Tamilio - In ACM DL , 2000
"... We implemented a web server for acronym and abbreviation lookup, containing a collection of acronyms and their expansions gathered from a large number of web pages by a heuristic extraction process. Several different extraction algorithms were evaluated and compared. The corpus resulting from the be ..."
Abstract - Cited by 24 (3 self) - Add to MetaCart
We implemented a web server for acronym and abbreviation lookup, containing a collection of acronyms and their expansions gathered from a large number of web pages by a heuristic extraction process. Several different extraction algorithms were evaluated and compared. The corpus resulting from the best algorithm is comparable to a highquality hand-crafted site, but has the potential to be much more inclusive as data from more web pages are processed.

A supervised learning approach to acronym identification

by David Nadeau, Peter D. Turney - In 8th Canadian Conference on Artificial Intelligence (AI’2005) (LNAI 3501 , 2005
"... Abstract. This paper addresses the task of finding acronym-definition pairs in text. Most of the previous work on the topic is about systems that involve manually generated rules or regular expressions. In this paper, we present a supervised learning approach to the acronym identification task. Our ..."
Abstract - Cited by 15 (1 self) - Add to MetaCart
Abstract. This paper addresses the task of finding acronym-definition pairs in text. Most of the previous work on the topic is about systems that involve manually generated rules or regular expressions. In this paper, we present a supervised learning approach to the acronym identification task. Our approach reduces the search space of the supervised learning system by putting some weak constraints on the kinds of acronym-definition pairs that can be identified. We obtain results comparable to hand-crafted systems that use stronger constraints. We describe our method for reducing the search space, the features used by our supervised learning system, and our experiments with various learning schemes. 1

Normalization of Non-Standard Words

by Richard Sproat, Alan W. Black, Stanley Chen, Shankar Kumar, Mari Ostendorf, Christopher Richards - WS'99 Final Report , 1999
"... In addition to ordinary words and names, real text contains non-standard “words” (NSWs), including numbers, abbreviations, dates, currency amounts and acronyms. Typically, one cannot find NSWs in a dictionary, nor can one find their pronunciation by an application of ordinary “letter-to-sound ” rule ..."
Abstract - Cited by 12 (1 self) - Add to MetaCart
In addition to ordinary words and names, real text contains non-standard “words” (NSWs), including numbers, abbreviations, dates, currency amounts and acronyms. Typically, one cannot find NSWs in a dictionary, nor can one find their pronunciation by an application of ordinary “letter-to-sound ” rules. Non-standard words also have a greater propensity than ordinary words to be ambiguous with respect to their interpretation or pronunciation. In many applications, it is desirable to “normalize ” text by replacing the NSWs with the contextually appropriate ordinary word or sequence of words. Typical technology for text normalization involves sets of ad hoc rules tuned to handle one or two genres of text (often newspaper-style text) with the expected result that the techniques do not usually generalize well to new domains. The purpose of the work reported here is to take some initial steps towards addressing deficiencies in previous approaches to text normalization. We developed a taxonomy of NSWs on the basis of four rather distinct text

Sarad: A simple and robust abbreviation dictionary

by Eytan Adar - Bioinformatics , 2004
"... Motivation: Due to recent interest in the use of textual material to augment traditional experiments it has become necessary to automatically cluster, classify, and filter natural language information. Results: The Simple and Robust Abbreviation Dictionary (SaRAD) provides an easy to implement, high ..."
Abstract - Cited by 10 (1 self) - Add to MetaCart
Motivation: Due to recent interest in the use of textual material to augment traditional experiments it has become necessary to automatically cluster, classify, and filter natural language information. Results: The Simple and Robust Abbreviation Dictionary (SaRAD) provides an easy to implement, high performance tool for the construction of a biomedical symbol dictionary. The algorithms, applied to the MEDLINE document set, result in a high quality dictionary and toolset to disambiguate abbreviation symbols automatically. Availability: The SaRAD dictionary is available as a web based demonstration, and in pseudo-code form. Contact:

Biomedical term mapping databases

by Jonathan D. Wren, Jeffrey T. Chang, James Pustejovsky, Eytan Adar, Harold R. Garner, Russ B. Altman - Nucleic Acids Res , 2005
"... Longer words and phrases are frequently mapped onto a shorter form such as abbreviations or acronyms for efficiency of communication. These abbreviations are pervasive in all aspects of biology and medicine and as the amount of biomedical literature grows, so does the number of abbreviations and the ..."
Abstract - Cited by 9 (0 self) - Add to MetaCart
Longer words and phrases are frequently mapped onto a shorter form such as abbreviations or acronyms for efficiency of communication. These abbreviations are pervasive in all aspects of biology and medicine and as the amount of biomedical literature grows, so does the number of abbreviations and the average number of definitions per abbreviation. Even more confusing, different authors will often abbreviate the same word/phrase differently. This ambiguity impedes our ability to retrieve information, integrate databases and mine textual databases for content. Efforts to standardize nomenclature, especially those doing so retrospectively, need to be aware of different abbreviatory mappings and spelling variations. To address this problem, there have been several efforts to develop computer algorithms to identify the mapping of terms between short and long form within a large body of literature. To date, four such algorithms have been applied to create online databases that comprehensively map biomedical terms and abbreviations within MEDLINE: ARGH

Automatic text summarization based on word clusters and ranking algorithms

by Massih R. Amini, Nicolas Usunier, Patrick Gallinari - In Proceedings of the 27 th European Conference on Information Retrieval , 2005
"... Abstract. This paper investigates a new approach for Single Document Summarization based on a Machine Learning ranking algorithm. The use of machine learning techniques for this task allows one to adapt summaries to the user needs and to the corpus characteristics. These desirable properties have mo ..."
Abstract - Cited by 7 (6 self) - Add to MetaCart
Abstract. This paper investigates a new approach for Single Document Summarization based on a Machine Learning ranking algorithm. The use of machine learning techniques for this task allows one to adapt summaries to the user needs and to the corpus characteristics. These desirable properties have motivated an increasing amount of work in this field over the last few years. Most approaches attempt to generate summaries by extracting text-spans (sentences in our case) and adopt the classification framework which consists to train a classifier in order to discriminate between relevant and irrelevant spans of a document.A set of features is first used to produce a vector of scores for each sentence in a given document and a classifier is trained in order to make a global combination of these scores. We believe that the classification criterion for training a classifier is not adapted for SDS and propose an original framework based on ranking for this task. A ranking algorithm also combines the scores of different features but its criterion tends to reduce the relative misordering of sentences within a document. Features we use here are either based on the state-of-the-art or built upon word-clusters. These clusters are groups of words which often co-occur with each other, and can serve to expand a query or to enrich the representation of the sentences of the documents. We analyze the performance of our ranking algorithm on two data sets- the Computation and Language (cmp lg) collection of TIPSTER SUMMAC and the WIPO collection. We perform comparisons with different baseline- non learning- systems, and a reference trainable summarizer system based on the classification framework. The experiments show that the learning algorithms perform better than the non-learning systems while the ranking algorithm outperforms the classifier. The difference of performance between the two learning algorithms depends on the nature of datasets. We give an explanation of this fact by the different separability hypothesis of the data made by the two learning algorithms. 1
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University