Results 1 - 10
of
34
Accomplishments and Challenges in Literature Data Mining for Biology
, 2002
"... We review recent results in literature data mining for biology and discuss the need and the steps for a challenge evaluation for this field. Literature data mining has progressed from simple recognition of terms to extraction of interaction relationships from complex sentences, and has broadened fro ..."
Abstract
-
Cited by 118 (8 self)
- Add to MetaCart
We review recent results in literature data mining for biology and discuss the need and the steps for a challenge evaluation for this field. Literature data mining has progressed from simple recognition of terms to extraction of interaction relationships from complex sentences, and has broadened from recognition of protein interactions to arange of problems such as improving homology search, identifying cellular location, and so on. To encourage participation and accelerate progress in this expanding field, we propose creating challenge evaluations, and we describe two specific applications in this context.
Associating Genes with Gene Ontology Codes Using a Maximum Entropy Analysis of Biomedical Literature
, 2002
"... this paper but has been provided elsewhere (Ratnaparkhi 1997; Manning and Schutze 1999) ..."
Abstract
-
Cited by 58 (3 self)
- Add to MetaCart
this paper but has been provided elsewhere (Ratnaparkhi 1997; Manning and Schutze 1999)
Discovering patterns to extract protein-protein interactions from full texts
- BIOINFORMATICS
, 2004
"... Motivation: Although there are several databases storing protein–protein interactions, most such data still exist only in the scientific literature. They are scattered in scientific literature written in natural languages, defying data mining efforts. Much time and labor have to be spent on extrac ..."
Abstract
-
Cited by 44 (4 self)
- Add to MetaCart
Motivation: Although there are several databases storing protein–protein interactions, most such data still exist only in the scientific literature. They are scattered in scientific literature written in natural languages, defying data mining efforts. Much time and labor have to be spent on extracting protein pathways from literature. Our aim is to develop a robust and powerful methodology to mine protein–protein interactions from biomedical texts. Results: We present a novel and robust approach for extracting protein–protein interactions from literature. Our method uses a dynamic programming algorithm to compute distinguishing patterns by aligning relevant sentences and key verbs that describe protein interactions. A matching algorithm is designed to extract the interactions between proteins. Equipped only with a dictionary of protein names, our system achieves a recall rate of 80.0 % and precision rate of 80.5%.
Mining Medline: Abstracts, Sentences, Or Phrases?
, 2002
"... Sentence pair Sentence Phrase w--} w>0.511 - 0.339
Abstract
-
Cited by 38 (1 self)
- Add to MetaCart
Sentence pair Sentence Phrase w--} w>0.511 - 0.339<w<0.510 w<0.338 5 Discussion and Conclusion In view of the results reported here it is not surprising that researchers have reported interesting results for text mining in MEDLINE based on abstracts, sentences, and phrases. Tables 2 and 3 and the statistical significance summary in the preceding section indicate that each of these units has advantages and disadvantages compared to the others.
A Shallow Parser Based on Closed-Class Words to Capture Relations in Biomedical Text
, 2003
"... Natural language processing for biomedical text currently focuses mostly on entity and relation extraction. These entities and relations are usually pre-specified entities, e.g., proteins, and pre-specified relations, e.g., inhibit relations. A shallow parser that captures the relations between noun ..."
Abstract
-
Cited by 27 (4 self)
- Add to MetaCart
Natural language processing for biomedical text currently focuses mostly on entity and relation extraction. These entities and relations are usually pre-specified entities, e.g., proteins, and pre-specified relations, e.g., inhibit relations. A shallow parser that captures the relations between noun phrases automatically from free text has been developed and evaluated. It uses heuristics and a noun phraser to capture entities of interest in the text. Cascaded finite state automata structure the relations between individual entities. The automata are based on closed-class English words and model generic relations not limited to specific words. The parser also recognizes coordinating conjunctions and captures negation in text, a feature usually ignored by others. Three cancer researchers evaluated 330 relations extracted from 26 abstracts of interest to them. There were 296 relations correctly extracted from the abstracts resulting in 90% precision of the relations and an average of 11 correct relations per abstract.
Learning Language in Logic - Genic Interaction Extraction Challenge
- Proceedings of the Learning Language in Logic 2005 Workshop at the International Conference on Machine Learning
, 2005
"... We describe here the context of the LLL challenge of Genic Interaction extraction, the background of its organization and the data sets. We discuss then the results of the participating systems. ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
We describe here the context of the LLL challenge of Genic Interaction extraction, the background of its organization and the data sets. We discuss then the results of the participating systems.
Predicting The Sub-Cellular Location Of Proteins From Text Using Support Vector Machines
, 2002
"... this paper is to treat the protein as a vector of terms from relevant Medline documents. This approach derives from the vector-based model common in information retrieval 8. The term weights of a vector are a functions of their frequencies within the document collection as a whole and the frequency ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
this paper is to treat the protein as a vector of terms from relevant Medline documents. This approach derives from the vector-based model common in information retrieval 8. The term weights of a vector are a functions of their frequencies within the document collection as a whole and the frequency within the relevant documents. Given a set of protein term-vectors the task is to find some function that partitions the space according to the localisation of the protein. For this task we employ support vector machines (SVM) 9 Support vector machines are a mathematical method for performing si- multaneous dimension reduction and binary classification 9. SVMs have been applied to the problems of pattern recognition 10, regression estimation l0 and information retrieval ,?. Because SVMs cope well with high dimensionality and are very fast to train, they are particularly suited to problems in text data-mining/information retrieval. Kwok studied the use of SVMs in text catagorization of Reuters newswire documents 2. In this paper, we apply an analogous approach to Medline/SWISS-PROT documents
GAPSCORE: finding gene and protein names one word at a time
, 2004
"... Motivation: New high-throughput technologies have accelerated the accumulation of knowledge about genes and proteins. However, much knowledge is still stored as written natural language text. Therefore, we have developed a new method, GAPSCORE, to identify gene and protein names in text. GAPSCORE sc ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
Motivation: New high-throughput technologies have accelerated the accumulation of knowledge about genes and proteins. However, much knowledge is still stored as written natural language text. Therefore, we have developed a new method, GAPSCORE, to identify gene and protein names in text. GAPSCORE scores words based on a statistical model of gene names that quantifies their appearance, morphology and context.
AW: Extracting biochemical interactions from MEDLINE using a link grammar parser
- In Proceedings of 15th IEEE International Conference on Tools with Artificial Intelligence Edited by: Werner, B. IEEE Computer Society
"... Many natural language processing approaches at various complexity levels have been reported for extracting biochemical interactions from MEDLINE. While some algorithms using simple template matching are unable to deal with the complex syntactic structures, others exploiting sophisticated parsing tec ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
Many natural language processing approaches at various complexity levels have been reported for extracting biochemical interactions from MEDLINE. While some algorithms using simple template matching are unable to deal with the complex syntactic structures, others exploiting sophisticated parsing techniques are hindered by greater computational cost. This study investigates link grammar parsing for extracting biochemical interactions. Link grammar parsing can handle many syntactic structures and is computationally relatively efficient. We experimented on a sample MEDLINE corpus. Although the parser was originally developed for conversational English and made many mistakes in parsing sentences from the biochemical domain, it nevertheless achieved better overall performance than a co-occurrence-only method. Customizing the parser for the biomedical domain is expected to improve its performance further. 1.
Discovering patterns to extract protein–protein interactions from the literature
- Part II. Bioinformatics
, 2005
"... doi:10.1093/bioinformatics/bti493 ..."

