• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Wong M: Towards routine automatic pathway discovery from on-line scientific text abstracts (1999)

by S-K Ng
Venue:Proc Workshop on Genome Informatics
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 34
Next 10 →

Accomplishments and Challenges in Literature Data Mining for Biology

by Lynette Hirschman, Jong C. Park, Junichi Tsujii, Limsoon Wong, Cathy H. Wu , 2002
"... We review recent results in literature data mining for biology and discuss the need and the steps for a challenge evaluation for this field. Literature data mining has progressed from simple recognition of terms to extraction of interaction relationships from complex sentences, and has broadened fro ..."
Abstract - Cited by 118 (8 self) - Add to MetaCart
We review recent results in literature data mining for biology and discuss the need and the steps for a challenge evaluation for this field. Literature data mining has progressed from simple recognition of terms to extraction of interaction relationships from complex sentences, and has broadened from recognition of protein interactions to arange of problems such as improving homology search, identifying cellular location, and so on. To encourage participation and accelerate progress in this expanding field, we propose creating challenge evaluations, and we describe two specific applications in this context.

Associating Genes with Gene Ontology Codes Using a Maximum Entropy Analysis of Biomedical Literature

by Soumya Raychaudhuri, Jeffrey T. Chang, Patrick D. Sutphin, Russ B. Altman , 2002
"... this paper but has been provided elsewhere (Ratnaparkhi 1997; Manning and Schutze 1999) ..."
Abstract - Cited by 58 (3 self) - Add to MetaCart
this paper but has been provided elsewhere (Ratnaparkhi 1997; Manning and Schutze 1999)

Discovering patterns to extract protein-protein interactions from full texts

by Minlie Huang, et al. - BIOINFORMATICS , 2004
"... Motivation: Although there are several databases storing protein–protein interactions, most such data still exist only in the scientific literature. They are scattered in scientific literature written in natural languages, defying data mining efforts. Much time and labor have to be spent on extrac ..."
Abstract - Cited by 44 (4 self) - Add to MetaCart
Motivation: Although there are several databases storing protein–protein interactions, most such data still exist only in the scientific literature. They are scattered in scientific literature written in natural languages, defying data mining efforts. Much time and labor have to be spent on extracting protein pathways from literature. Our aim is to develop a robust and powerful methodology to mine protein–protein interactions from biomedical texts. Results: We present a novel and robust approach for extracting protein–protein interactions from literature. Our method uses a dynamic programming algorithm to compute distinguishing patterns by aligning relevant sentences and key verbs that describe protein interactions. A matching algorithm is designed to extract the interactions between proteins. Equipped only with a dictionary of protein names, our system achieves a recall rate of 80.0 % and precision rate of 80.5%.

Mining Medline: Abstracts, Sentences, Or Phrases?

by J. Ding, D. Berleant, D. Nettleton, E. Wurtele , 2002
"... Sentence pair Sentence Phrase w--} w>0.511 - 0.339
Abstract - Cited by 38 (1 self) - Add to MetaCart
Sentence pair Sentence Phrase w--} w>0.511 - 0.339<w<0.510 w<0.338 5 Discussion and Conclusion In view of the results reported here it is not surprising that researchers have reported interesting results for text mining in MEDLINE based on abstracts, sentences, and phrases. Tables 2 and 3 and the statistical significance summary in the preceding section indicate that each of these units has advantages and disadvantages compared to the others.

A Shallow Parser Based on Closed-Class Words to Capture Relations in Biomedical Text

by Gondy Leroy, Hsinchun Chen, Jesse D. Martinez , 2003
"... Natural language processing for biomedical text currently focuses mostly on entity and relation extraction. These entities and relations are usually pre-specified entities, e.g., proteins, and pre-specified relations, e.g., inhibit relations. A shallow parser that captures the relations between noun ..."
Abstract - Cited by 27 (4 self) - Add to MetaCart
Natural language processing for biomedical text currently focuses mostly on entity and relation extraction. These entities and relations are usually pre-specified entities, e.g., proteins, and pre-specified relations, e.g., inhibit relations. A shallow parser that captures the relations between noun phrases automatically from free text has been developed and evaluated. It uses heuristics and a noun phraser to capture entities of interest in the text. Cascaded finite state automata structure the relations between individual entities. The automata are based on closed-class English words and model generic relations not limited to specific words. The parser also recognizes coordinating conjunctions and captures negation in text, a feature usually ignored by others. Three cancer researchers evaluated 330 relations extracted from 26 abstracts of interest to them. There were 296 relations correctly extracted from the abstracts resulting in 90% precision of the relations and an average of 11 correct relations per abstract.

Learning Language in Logic - Genic Interaction Extraction Challenge

by C. Nédellec - Proceedings of the Learning Language in Logic 2005 Workshop at the International Conference on Machine Learning , 2005
"... We describe here the context of the LLL challenge of Genic Interaction extraction, the background of its organization and the data sets. We discuss then the results of the participating systems. ..."
Abstract - Cited by 25 (0 self) - Add to MetaCart
We describe here the context of the LLL challenge of Genic Interaction extraction, the background of its organization and the data sets. We discuss then the results of the participating systems.

Predicting The Sub-Cellular Location Of Proteins From Text Using Support Vector Machines

by B.J. Stapley, L.A. Kelley, M.J.E. Sternberg , 2002
"... this paper is to treat the protein as a vector of terms from relevant Medline documents. This approach derives from the vector-based model common in information retrieval 8. The term weights of a vector are a functions of their frequencies within the document collection as a whole and the frequency ..."
Abstract - Cited by 24 (0 self) - Add to MetaCart
this paper is to treat the protein as a vector of terms from relevant Medline documents. This approach derives from the vector-based model common in information retrieval 8. The term weights of a vector are a functions of their frequencies within the document collection as a whole and the frequency within the relevant documents. Given a set of protein term-vectors the task is to find some function that partitions the space according to the localisation of the protein. For this task we employ support vector machines (SVM) 9 Support vector machines are a mathematical method for performing si- multaneous dimension reduction and binary classification 9. SVMs have been applied to the problems of pattern recognition 10, regression estimation l0 and information retrieval ,?. Because SVMs cope well with high dimensionality and are very fast to train, they are particularly suited to problems in text data-mining/information retrieval. Kwok studied the use of SVMs in text catagorization of Reuters newswire documents 2. In this paper, we apply an analogous approach to Medline/SWISS-PROT documents

GAPSCORE: finding gene and protein names one word at a time

by Jeffrey T. Chang, Hinrich Schütze, Russ B. Altman , 2004
"... Motivation: New high-throughput technologies have accelerated the accumulation of knowledge about genes and proteins. However, much knowledge is still stored as written natural language text. Therefore, we have developed a new method, GAPSCORE, to identify gene and protein names in text. GAPSCORE sc ..."
Abstract - Cited by 21 (0 self) - Add to MetaCart
Motivation: New high-throughput technologies have accelerated the accumulation of knowledge about genes and proteins. However, much knowledge is still stored as written natural language text. Therefore, we have developed a new method, GAPSCORE, to identify gene and protein names in text. GAPSCORE scores words based on a statistical model of gene names that quantifies their appearance, morphology and context.

AW: Extracting biochemical interactions from MEDLINE using a link grammar parser

by Jing Ding, Daniel Berleant, Jun Xu, Andy W. Fulmer - In Proceedings of 15th IEEE International Conference on Tools with Artificial Intelligence Edited by: Werner, B. IEEE Computer Society
"... Many natural language processing approaches at various complexity levels have been reported for extracting biochemical interactions from MEDLINE. While some algorithms using simple template matching are unable to deal with the complex syntactic structures, others exploiting sophisticated parsing tec ..."
Abstract - Cited by 21 (2 self) - Add to MetaCart
Many natural language processing approaches at various complexity levels have been reported for extracting biochemical interactions from MEDLINE. While some algorithms using simple template matching are unable to deal with the complex syntactic structures, others exploiting sophisticated parsing techniques are hindered by greater computational cost. This study investigates link grammar parsing for extracting biochemical interactions. Link grammar parsing can handle many syntactic structures and is computationally relatively efficient. We experimented on a sample MEDLINE corpus. Although the parser was originally developed for conversational English and made many mistakes in parsing sentences from the biochemical domain, it nevertheless achieved better overall performance than a co-occurrence-only method. Customizing the parser for the biomedical domain is expected to improve its performance further. 1.

Discovering patterns to extract protein–protein interactions from the literature

by Yu Hao, Xiaoyan Zhu, Minlie Huang, Ming Li - Part II. Bioinformatics , 2005
"... doi:10.1093/bioinformatics/bti493 ..."
Abstract - Cited by 12 (0 self) - Add to MetaCart
doi:10.1093/bioinformatics/bti493
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University