Results 1 -
7 of
7
Cleaning Inconsistencies in Information Extraction via Prioritized Repairs
"... The population of a predefined relational schema from textual con-tent, commonly known as Information Extraction (IE), is a per-vasive task in contemporary computational challenges associated with Big Data. Since the textual content varies widely in nature and structure (from machine logs to informa ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
(Show Context)
The population of a predefined relational schema from textual con-tent, commonly known as Information Extraction (IE), is a per-vasive task in contemporary computational challenges associated with Big Data. Since the textual content varies widely in nature and structure (from machine logs to informal natural language), it is notoriously difficult to write IE programs that extract the sought information without any inconsistencies (e.g., a substring should not be annotated as both an address and a person name). Dealing with inconsistencies is hence of crucial importance in IE systems. Industrial-strength IE systems like GATE and IBM SystemT there-fore provide a built-in collection of cleaning operations to remove inconsistencies from extracted relations. These operations, how-ever, are collected in an ad-hoc fashion through use cases. Ideally, we would like to allow IE developers to declare their own poli-
A Relational Framework for Information Extraction
"... Information Extraction commonly refers to the task of populating a relational schema, having predefined underlying semantics, from textual content. This task is pervasive in contemporary computational challenges associated with Big Data. In this article we provide an overview of our work on document ..."
Abstract
- Add to MetaCart
Information Extraction commonly refers to the task of populating a relational schema, having predefined underlying semantics, from textual content. This task is pervasive in contemporary computational challenges associated with Big Data. In this article we provide an overview of our work on document spannersâa relational framework for Information Extraction that is inspired by rule-based systems such as IBMâs SystemT.
Database Principles in Information Extraction
, 2014
"... Information Extraction commonly refers to the task of populating a relational schema, having predefined underlying semantics, from textual content. This task is pervasive in contemporary computational challenges associated with Big Data. This tutorial gives an overview of the algorithmic concepts an ..."
Abstract
- Add to MetaCart
(Show Context)
Information Extraction commonly refers to the task of populating a relational schema, having predefined underlying semantics, from textual content. This task is pervasive in contemporary computational challenges associated with Big Data. This tutorial gives an overview of the algorithmic concepts and techniques used for per-forming Information Extraction tasks, and describes some of the declarative frameworks that provide abstractions and infrastructure for programming extractors. In addition, the tutorial highlights opportunities for research impact through principles of data management, illustrates these opportunities through recent work, and proposes directions for future research.
Querying and Learning in Probabilistic Databases
"... Abstract. Probabilistic Databases (PDBs) lie at the expressive inter-section of databases, first-order logic, and probability theory. PDBs em-ploy logical deduction rules to process Select-Project-Join (SPJ) queries, which form the basis for a variety of declarative query languages such as Datalog, ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. Probabilistic Databases (PDBs) lie at the expressive inter-section of databases, first-order logic, and probability theory. PDBs em-ploy logical deduction rules to process Select-Project-Join (SPJ) queries, which form the basis for a variety of declarative query languages such as Datalog, Relational Algebra, and SQL. They employ logical consistency constraints to resolve data inconsistencies, and they represent query an-swers via logical lineage formulas (aka.“data provenance”) to trace the dependencies between these answers and the input tuples that led to their derivation. While the literature on PDBs dates back to more than 25 years of research, only fairly recently the key role of lineage for es-tablishing a closed and complete representation model of relational op-erations over this kind of probabilistic data was discovered. Although PDBs benefit from their efficient and scalable database infrastructures for data storage and indexing, they couple the data computation with probabilistic inference, the latter of which remains a #P-hard problem also in the context of PDBs. In this chapter, we provide a review on the key concepts of PDBs with a particular focus on our own recent research results related to this field. We highlight a number of ongoing research challenges related to PDBs, and we keep referring to an information extraction (IE) scenario as a running application to manage uncertain and temporal facts obtained from IE techniques directly inside a PDB setting.
Analysis and Refinement of Temporal Relation Aggregation
"... To obtain a complete temporal picture of a relation it is necessary to aggregate fragments of tem-poral information across relation instances in text. This process is non-trivial even for humans because temporal information can be imprecise and inconsistent, and systems face the additional challenge ..."
Abstract
- Add to MetaCart
To obtain a complete temporal picture of a relation it is necessary to aggregate fragments of tem-poral information across relation instances in text. This process is non-trivial even for humans because temporal information can be imprecise and inconsistent, and systems face the additional challenge that each of their classifications is potentially false. Even a small amount of incorrect proposed temporal information about a relation can severely affect the resulting aggregate tempo-ral knowledge. We motivate and evaluate three methods to modify temporal relation information prior to aggregation to address this challenge. 1
IOS Press An Infrastructure for Probabilistic Reasoning with Web Ontologies
"... Abstract. We present an infrastructure for probabilistic reasoning with ontologies that is based on our Markov logic engine ROCKIT. Markov logic is a template language that combines first-order logic with log-linear graphical models. We show how to translate OWL-EL as well as RDF schema to Markov lo ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. We present an infrastructure for probabilistic reasoning with ontologies that is based on our Markov logic engine ROCKIT. Markov logic is a template language that combines first-order logic with log-linear graphical models. We show how to translate OWL-EL as well as RDF schema to Markov logic and how to use ROCKIT for applying MAP inference on the given set of formulas. The resulting system is an infrastructure for log linear logics that can be used for probabilistic reasoning with both extended OWL-EL and RDF schema. We describe our system and illustrate its benefits by presenting two application scenarios. These scenarios are ontology matching, and knowledge base verification, with a special focus on temporal reasoning. Our results indicate that our system, which is based on a well-founded probabilistic semantics, is capable of solving relevant problems as good as or better than state of the art systems that have specifically been designed for the respective problem.
Authors ' Addresses
, 2014
"... We would like to thank Floris Geerts and Rainer Gemulla for helpful technical discussions. We thank Radu Curticapean for pointing out Bezout's theorem. Learning the parameters of complex probabilistic-relational models from la-beled training data is a standard technique in machine learning, whi ..."
Abstract
- Add to MetaCart
(Show Context)
We would like to thank Floris Geerts and Rainer Gemulla for helpful technical discussions. We thank Radu Curticapean for pointing out Bezout's theorem. Learning the parameters of complex probabilistic-relational models from la-beled training data is a standard technique in machine learning, which has been intensively studied in the subeld of Statistical Relational Learning (SRL), but|so far|this is still an under-investigated topic in the context of Probabilistic Databases (PDBs). In this paper, we focus on learning the probability values of base tuples in a PDB from query answers, the latter of which are represented as labeled lineage formulas. Specically, we consider labels in the form of pairs, each consisting of a Boolean lineage formula and a marginal probability that comes attached to the corresponding query an-swer. The resulting learning problem can be viewed as the inverse problem to condence computations in PDBs: given a set of labeled query answers, learn the probability values of the base tuples, such that the marginal prob-