and Media Informatics,
BibTeX
@MISC{Solt_andmedia,
author = {Illés Solt and Roman Klinger Ulf Leser and Unter Den Linden and Magyar Tudósok Körútja},
title = {and Media Informatics,},
year = {}
}
OpenURL
Abstract
Most relation extraction methods, especially in the domain of biology, rely on machine learning methods to classify a cooccurring pair of entities in a sentence to be related or not. Such an approach requires a training corpus, which involves expert annotation and is tedious, timeconsuming, and expensive. We overcome this problem by the use of existing knowledge in structured databases to automatically generate a training corpus for protein-protein interactions. An extensive evaluation of different instance selection strategies is performed to maximize robustness on this presumably noisy resource. Successful strategies to consistently improve performance include a majority voting ensemble of classifiers trained on subsets of the training corpus and the use of knowledge bases consisting of proven non-interactions. Our best configured model built without manually annotated data shows very competitive results on several publicly available benchmark corpora. 1







