Results 1 - 10
of
31
Extracting clinical relationships from patient narratives
- BioNLP 2008: Current Trends in Biomedical Natural Language Processing
, 2008
"... The Clinical E-Science Framework (CLEF) project has built a system to extract clinically significant information from the textual component of medical records, for clinical research, evidence-based healthcare and genotype-meets-phenotype informatics. One part of this system is the identification of ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
The Clinical E-Science Framework (CLEF) project has built a system to extract clinically significant information from the textual component of medical records, for clinical research, evidence-based healthcare and genotype-meets-phenotype informatics. One part of this system is the identification of relationships between clinically important entities in the text. Typical approaches to relationship extraction in this domain have used full parses, domain-specific grammars, and large knowledge bases encoding domain knowledge. In other areas of biomedical NLP, statistical machine learning approaches are now routinely applied to relationship extraction. We report on the novel application of these statistical techniques to clinical relationships. We describe a supervised machine learning system, trained with a corpus of oncology narratives hand-annotated with clinically important relationships. Various shallow features are extracted from these texts, and used to train statistical classifiers. We compare the suitability of these features for clinical relationship extraction, how extraction varies between inter- and intra-sentential relationships, and examine the amount of training data needed to learn various relationships.
Automatic extraction of hierarchical relations from text
, 2006
"... Abstract. Automatic extraction of semantic relationships between entity instances in an ontology is useful for attaching richer semantic metadata to documents. In this paper we propose an SVM based approach to hierarchical relation extraction, using features derived automatically from a number of GA ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Automatic extraction of semantic relationships between entity instances in an ontology is useful for attaching richer semantic metadata to documents. In this paper we propose an SVM based approach to hierarchical relation extraction, using features derived automatically from a number of GATE-based open-source language processing tools. In comparison to the previous works, we use several new features including part of speech tag, entity subtype, entity class, entity role, semantic representation of sentence and WordNet synonym set. The impact of the features on the performance is investigated, as is the impact of the relation classification hierarchy. The results show there is a trade-off among these factors for relation extraction and the features containing more information such as semantic ones can improve the performance of the ontological relation extraction task. 1
An Experimental Study on Boundary Classification Algorithms for Information Extraction using SVM
- Proceedings of the EACL Workshop on Adaptive Text Extraction and Mining (ATEM
, 2006
"... This paper investigates the incorporation of diverse features in boundary classification algorithms for IE using SVM. Our study reveals that the use of rich data resources greatly contributes to the performance of IE systems and it is more likely to explain the differences in performance reported by ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
This paper investigates the incorporation of diverse features in boundary classification algorithms for IE using SVM. Our study reveals that the use of rich data resources greatly contributes to the performance of IE systems and it is more likely to explain the differences in performance reported by several systems than the design decisions relative to the learning model. Evaluation of our system shows an improvement over the state-of-art on a standard dataset using the same data resources but a much simpler learning model than the previously best-reported system. 1
Large-scale, parallel automatic patent annotation
- In: Proceedings of the 1st ACM workshop on Patent information retrieval. PaIR ’08, 30
, 2008
"... Large-scale, parallel automatic patent annotation ..."
(Show Context)
Extracting Opinions and Facts for Business Intelligence
"... Abstract. Finding information about companies on multiple sources on the Web has become increasingly important for business analysts. In particular, since the emergence of the Web 2.0, opinions about companies and their services or products need to be found and distilled in order to create an accura ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
Abstract. Finding information about companies on multiple sources on the Web has become increasingly important for business analysts. In particular, since the emergence of the Web 2.0, opinions about companies and their services or products need to be found and distilled in order to create an accurate picture of a business entity. Without appropriate text mining tools, company analysts would have to read hundreds of textual reports, newspaper articles, forums’ postings and manually dig out factual as well as subjective information. This paper describes a series of experiments to assess the value of a number of lexical, morpho-syntactic, and sentiment-based features derived from linguistic processing and from an existing lexical database for the classification of evaluative texts. The paper describes experiments carried out with two different web sources: one source contains positive and negative opinions while the other contains fine grain classifications in a 5-point qualitative scale. The results obtain are positive and in line with current research in the area. Our aim is to use the result of classification in a practical application that will combine factual and opinionated information in order to create the reputation of a business entity. 1
OrganismTagger: detection, normalization and grounding of organism entities in biomedical documents
- Bioinformatics
, 2011
"... Motivation: Semantic tagging of organism mentions in full-text articles is an important part of literature mining and semantic enrichment solutions. Tagged organism mentions also play a pivotal role in disambiguating other entities in a text, such as proteins. A high-precision organism tagging syste ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
Motivation: Semantic tagging of organism mentions in full-text articles is an important part of literature mining and semantic enrichment solutions. Tagged organism mentions also play a pivotal role in disambiguating other entities in a text, such as proteins. A high-precision organism tagging system must be able to detect the numerous forms of organism mentions, including common names as well as the traditional taxonomic groups: genus, species, and strains. In addition, such a system must resolve abbreviations and acronyms, assign the scientific name, and if possible link the detected mention to the NCBI Taxonomy database for further semantic queries and literature navigation. Results: We present the OrganismTagger, a hybrid rule-based/machine learning system to extract organism mentions from the literature. It includes tools for automatically generating lexical and ontological resources from a copy of the NCBI Taxonomy database, thereby facilitating system updates by end-users. Its novel ontology-based resources can also be reused in other semantic mining and linked data tasks. Each detected organism mention is normalized to a canonical name through the resolution of acronyms and abbreviations and subsequently grounded with an NCBI Taxonomy database ID. In particular, our system combines a novel machine-learning approach with rule-based and lexical methods for detecting strain mentions in documents. On our manually annotated OT corpus, the OrganismTagger achieves a precision of 95%, a recall of 94 % and a grounding accuracy of 97.5%. On the manually annotated corpus of Linnaeus-100, the results show a precision of 99%, recall of 97 % and grounding accuracy of 97.4%. Availability: The OrganismTagger, including supporting tools, resources, training data and manual annotations, as well as end-user and developer documentation, is freely available under an open source license at
Experiments of Opinion Analysis on the Corpora MPQA and NTCIR-6
"... This paper describes the algorithms and linguistic features used in our participating system for the opinion analysis pilot task at NTCIR-6. It presents and discusses the results of our system on the opinion analysis task. It also presents our experiments of opinion analysis on the two corpora MPQA ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
This paper describes the algorithms and linguistic features used in our participating system for the opinion analysis pilot task at NTCIR-6. It presents and discusses the results of our system on the opinion analysis task. It also presents our experiments of opinion analysis on the two corpora MPQA and NTCIR-6, by using our learning based system. Our system was base on the SVM learning. It achieved state of the art results on the MPQA corpus for the two problems, opinionated sentence recognition and opinion holder extraction. The results using the NTCIR-6 English corpus for both training and testing are certainly among the first ones. Our results on the opinionated sentence recognition sub-task of the NTCIR-6 were encouraging. The results on the English evaluation of the NTCIR-6 opinion analysis task were obtained from the models learned from the MPQA corpus. The lower results on the NTCIR-6 opinion holder extraction subtask, in comparison with those using each corpus for both training and testing, may possibly show that there exist substantial differences between the MPQA corpus and the NTCIR-6 English corpus. 1
Evaluating Evaluation Metrics for Ontology-Based Applications: Infinite Reflection
"... In this paper, we discuss methods of measuring the performance of ontology-based information extraction systems. We focus particularly on the Balanced Distance Metric (BDM), a new metric we have proposed which aims to take into account the more flexible nature of ontologically-based applications. We ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
In this paper, we discuss methods of measuring the performance of ontology-based information extraction systems. We focus particularly on the Balanced Distance Metric (BDM), a new metric we have proposed which aims to take into account the more flexible nature of ontologically-based applications. We first examine why traditional Precision and Recall metrics, as used for flat information extraction tasks, are inadequate when dealing with ontologies. We then describe the Balanced Distance Metric (BDM) which takes ontological similarity into account. Finally, we discuss a range of experiments designed to test the accuracy and usefulness of the BDM when compared with traditional metrics and with a standard distance-based metric. 1.
Large Scale Semantic Annotation, Indexing, and Search at The National Archives
"... This paper describes a tool developed to improve access to the enormous volume of data housed at the UK’s National Archives, both for the general public and for specialist researchers. The system we have developed, TNA-Search, enables a multi-paradigm search over the entire electronic archive (42TB ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
This paper describes a tool developed to improve access to the enormous volume of data housed at the UK’s National Archives, both for the general public and for specialist researchers. The system we have developed, TNA-Search, enables a multi-paradigm search over the entire electronic archive (42TB of data in various formats). The search functionality allows queries that arbitrarily mix any combination of full-text, structural, linguistic and semantic queries. The archive is annotated and indexed with respect to a massive semantic knowledge base containing data from the LOD cloud, data.gov.uk, related TNA projects, and a large geographical database. The semantic annotation component achieves approximately 83 % F-measure, which is very reasonable considering the wide range of entities and document types and the open domain. The technologies are being adopted by real users at The National Archives and will form the core of their suite of search tools, with additional in-house interfaces.
Towards Semantic Music Information Extraction from the Web Using Rule Patterns and Supervised Learning
"... We present first steps towards automatic Music Information Extraction, i.e., methods to automatically extract semantic information and relations about musical entities from arbitrary textual sources. The corresponding approaches allow us to derive structured meta-data from unstructured or semi-struc ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
We present first steps towards automatic Music Information Extraction, i.e., methods to automatically extract semantic information and relations about musical entities from arbitrary textual sources. The corresponding approaches allow us to derive structured meta-data from unstructured or semi-structured sources and can be used to build advanced recommendation systems and browsing interfaces. In this paper, several approaches to identify and extract two specific semantic relations from related Web documents are presented and evaluated. The addressed relations are members of a music band (band−members) and artists ’ discographies (artist − albums, EP s, singles). In addition, the proposed methods are shown to be useful to relate (Web-)documents to musical artists. For all purposes, supervised learning approaches and rule-based methods are systematically evaluated on two different sets of Web documents.