Results 1 - 10
of
16
A Graph Kernel for Protein-Protein Interaction Extraction
"... In this paper, we propose a graph kernel based approach for the automated extraction of protein-protein interactions (PPI) from scientific literature. In contrast to earlier approaches to PPI extraction, the introduced alldependency-paths kernel has the capability to consider full, general dependenc ..."
Abstract
-
Cited by 30 (5 self)
- Add to MetaCart
In this paper, we propose a graph kernel based approach for the automated extraction of protein-protein interactions (PPI) from scientific literature. In contrast to earlier approaches to PPI extraction, the introduced alldependency-paths kernel has the capability to consider full, general dependency graphs. We evaluate the proposed method across five publicly available PPI corpora providing the most comprehensive evaluation done for a machine learning based PPI-extraction system. Our method is shown to achieve state-of-theart performance with respect to comparable evaluations, achieving 56.4 F-score and 84.8 AUC on the AImed corpus. Further, we identify several pitfalls that can make evaluations of PPI-extraction systems incomparable, or even invalid. These include incorrect crossvalidation strategies and problems related to comparing F-score results achieved on different evaluation resources. 1
Event Extraction from Trimmed Dependency Graphs
"... We describe the approach to event extraction which the JULIELab Team from FSU Jena (Germany) pursued to solve Task 1 in the “BioNLP’09 Shared Task on Event Extraction”. We incorporate manually curated dictionaries and machine learning methodologies to sort out associated event triggers and arguments ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
We describe the approach to event extraction which the JULIELab Team from FSU Jena (Germany) pursued to solve Task 1 in the “BioNLP’09 Shared Task on Event Extraction”. We incorporate manually curated dictionaries and machine learning methodologies to sort out associated event triggers and arguments on trimmed dependency graph structures. Trimming combines pruning irrelevant lexical material from a dependency graph and decorating particularly relevant lexical conceptual class information. Given that methodological framework, the JULIELab Team scored on 2nd rank among 24 competing teams, with 45.8 % precision, 47.5 % recall and 46.7 % F1-score on all 3,182 events. 1
Shift-reduce dependency DAG parsing
- In Proc. of COLING
, 2008
"... Abstract � Most data-driven dependency parsing approaches assume that sentence structure is represented as trees. Although trees have several desirable properties from both computational and linguistic perspectives, the structure of linguistic phenomena that goes beyond shallow syntax often cannot b ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Abstract � Most data-driven dependency parsing approaches assume that sentence structure is represented as trees. Although trees have several desirable properties from both computational and linguistic perspectives, the structure of linguistic phenomena that goes beyond shallow syntax often cannot be fully captured by tree representations. We present a parsing approach that is nearly as simple as current data-driven transition-based dependency parsing frameworks, but outputs directed acyclic graphs (DAGs). We demonstrate the benefits of DAG parsing in two experiments where its advantages over dependency tree parsing can be clearly observed: predicate-argument analysis of English and syntactic analysis of Danish with a representation that includes long-distance dependencies and anaphoric reference links. 1
Task-oriented Evaluation of Syntactic Parsers and Their Representations
- PROCEEDINGS OF THE 46TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES
, 2008
"... This paper presents a comparative evaluation of several state-of-the-art English parsers based on different frameworks. Our approach is to measure the impact of each parser when it is used as a component of an information extraction system that performs protein-protein interaction (PPI) identificati ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
This paper presents a comparative evaluation of several state-of-the-art English parsers based on different frameworks. Our approach is to measure the impact of each parser when it is used as a component of an information extraction system that performs protein-protein interaction (PPI) identification in biomedical papers. We evaluate eight parsers (based on dependency parsing, phrase structure parsing, or deep parsing) using five different parse representations. We run a PPI system with several combinations of parser and parse representation, and examine their impact on PPI identification accuracy. Our experiments show that the levels of accuracy obtained with these different parsers are similar, but that accuracy improvements vary when the parsers are retrained with domain-specific data.
A rich feature vector for protein-protein interaction extraction from multiple corpora
- In: EMNLP
, 2009
"... Because of the importance of proteinprotein interaction (PPI) extraction from text, many corpora have been proposed with slightly differing definitions of proteins and PPI. Since no single corpus is large enough to saturate a machine learning system, it is necessary to learn from multiple different ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Because of the importance of proteinprotein interaction (PPI) extraction from text, many corpora have been proposed with slightly differing definitions of proteins and PPI. Since no single corpus is large enough to saturate a machine learning system, it is necessary to learn from multiple different corpora. In this paper, we propose a solution to this challenge. We designed a rich feature vector, and we applied a support vector machine modified for corpus weighting (SVM-CW) to complete the task of multiple corpora PPI extraction. The rich feature vector, made from multiple useful kernels, is used to express the important information for PPI extraction, and the system with our feature vector was shown to be both faster and more accurate than the original kernelbased system, even when using just a single corpus. SVM-CW learns from one corpus, while using other corpora for support. SVM-CW is simple, but it is more effective than other methods that have been successfully applied to other NLP tasks earlier. With the feature vector and SVM-CW, our system achieved the best performance among all state-of-the-art PPI extraction systems reported so far. 1
A Study on Dependency Tree Kernels for Automatic Extraction of Protein-Protein Interaction
"... Kernel methods are considered the most effective techniques for various relation extraction (RE) tasks as they provide higher accuracy than other approaches. In this paper, we introduce new dependency tree (DT) kernels for RE by improving on previously proposed dependency tree structures. These are ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Kernel methods are considered the most effective techniques for various relation extraction (RE) tasks as they provide higher accuracy than other approaches. In this paper, we introduce new dependency tree (DT) kernels for RE by improving on previously proposed dependency tree structures. These are further enhanced to design more effective approaches that we call mildly extended dependency tree (MEDT) kernels. The empirical results on the protein-protein interaction (PPI) extraction task on the AIMed corpus show that tree kernels based on our proposed DT structures achieve higher accuracy than previously proposed DT and phrase structure tree (PST) kernels. 1
Evaluating the Effects of Treebank Size in a Practical Application for Parsing
"... Natural language processing modules such as part-of-speech taggers, named-entity recognizers and syntactic parsers are commonly evaluated in isolation, under the assumption that artificial evaluation metrics for individual parts are predictive of practical performance of more complex language techno ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Natural language processing modules such as part-of-speech taggers, named-entity recognizers and syntactic parsers are commonly evaluated in isolation, under the assumption that artificial evaluation metrics for individual parts are predictive of practical performance of more complex language technology systems that perform practical tasks. Although this is an important issue in the design and engineering of systems that use natural language input, it is often unclear how the accuracy of an end-user application is affected by parameters that affect individual NLP modules. We explore this issue in the context of a specific task by examining the relationship between the accuracy of a syntactic parser and the overall performance of an information extraction system for biomedical text that includes the parser as one of its components. We present an empirical investigation of the relationship between factors that affect the accuracy of syntactic analysis, and how the difference in parse accuracy affects the overall system. 1
Linguistic feature analysis for protein interaction extraction
, 2009
"... This is an Open Access article distributed under the terms of the Creative Commons Attribution License ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This is an Open Access article distributed under the terms of the Creative Commons Attribution License

