• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

A temporal-probabilistic database model for information extraction (0)

by M Dylla, I Miliaraki, M Theobald
Venue:PVLDB
Add To MetaCart

Tools

Sorted by:
Results 1 - 7 of 7

Cleaning Inconsistencies in Information Extraction via Prioritized Repairs

by Ronald Fagin, Benny Kimelfeld, Frederick Reiss, Stijn Vansummeren
"... The population of a predefined relational schema from textual con-tent, commonly known as Information Extraction (IE), is a per-vasive task in contemporary computational challenges associated with Big Data. Since the textual content varies widely in nature and structure (from machine logs to informa ..."
Abstract - Cited by 4 (3 self) - Add to MetaCart
The population of a predefined relational schema from textual con-tent, commonly known as Information Extraction (IE), is a per-vasive task in contemporary computational challenges associated with Big Data. Since the textual content varies widely in nature and structure (from machine logs to informal natural language), it is notoriously difficult to write IE programs that extract the sought information without any inconsistencies (e.g., a substring should not be annotated as both an address and a person name). Dealing with inconsistencies is hence of crucial importance in IE systems. Industrial-strength IE systems like GATE and IBM SystemT there-fore provide a built-in collection of cleaning operations to remove inconsistencies from extracted relations. These operations, how-ever, are collected in an ad-hoc fashion through use cases. Ideally, we would like to allow IE developers to declare their own poli-
(Show Context)

Citation Context

...ks include rule engineering [20,35,37], rule learning [11,28], probabilistic graph models [25, 27, 30], and other statistical models such as Markov Logic Networks [31, 33] and probabilistic databases =-=[12]-=-. There are also general frameworks for the development and scalable execution of IE programs, such as UIMA [18], the General Architecture for Text Engineering (GATE) [9], Xlog [36] and SystemT [7]. G...

A Relational Framework for Information Extraction

by Ronald Fagin, Benny Kimelfeld, Frederick Reiss, Stijn Vansummeren
"... Information Extraction commonly refers to the task of populating a relational schema, having predefined underlying semantics, from textual content. This task is pervasive in contemporary computational challenges associated with Big Data. In this article we provide an overview of our work on document ..."
Abstract - Add to MetaCart
Information Extraction commonly refers to the task of populating a relational schema, having predefined underlying semantics, from textual content. This task is pervasive in contemporary computational challenges associated with Big Data. In this article we provide an overview of our work on document spanners—a relational framework for Information Extraction that is inspired by rule-based systems such as IBM’s SystemT.

Database Principles in Information Extraction

by Benny Kimelfeld , 2014
"... Information Extraction commonly refers to the task of populating a relational schema, having predefined underlying semantics, from textual content. This task is pervasive in contemporary computational challenges associated with Big Data. This tutorial gives an overview of the algorithmic concepts an ..."
Abstract - Add to MetaCart
Information Extraction commonly refers to the task of populating a relational schema, having predefined underlying semantics, from textual content. This task is pervasive in contemporary computational challenges associated with Big Data. This tutorial gives an overview of the algorithmic concepts and techniques used for per-forming Information Extraction tasks, and describes some of the declarative frameworks that provide abstractions and infrastructure for programming extractors. In addition, the tutorial highlights opportunities for research impact through principles of data management, illustrates these opportunities through recent work, and proposes directions for future research.
(Show Context)

Citation Context

...efined relationship, such as person-organization), event extraction [3] (find events of predefined types along with their key players, such as nomination and nominee), temporal information extraction =-=[21,44]-=- (associate mentions of facts with mentions of their validity period, such as nomination-date), and coreference resolution [53] (match between phrases that refer to the same entity, such as “Obama,” “...

Querying and Learning in Probabilistic Databases

by Maximilian Dylla, Martin Theobald, Iris Miliaraki
"... Abstract. Probabilistic Databases (PDBs) lie at the expressive inter-section of databases, first-order logic, and probability theory. PDBs em-ploy logical deduction rules to process Select-Project-Join (SPJ) queries, which form the basis for a variety of declarative query languages such as Datalog, ..."
Abstract - Add to MetaCart
Abstract. Probabilistic Databases (PDBs) lie at the expressive inter-section of databases, first-order logic, and probability theory. PDBs em-ploy logical deduction rules to process Select-Project-Join (SPJ) queries, which form the basis for a variety of declarative query languages such as Datalog, Relational Algebra, and SQL. They employ logical consistency constraints to resolve data inconsistencies, and they represent query an-swers via logical lineage formulas (aka.“data provenance”) to trace the dependencies between these answers and the input tuples that led to their derivation. While the literature on PDBs dates back to more than 25 years of research, only fairly recently the key role of lineage for es-tablishing a closed and complete representation model of relational op-erations over this kind of probabilistic data was discovered. Although PDBs benefit from their efficient and scalable database infrastructures for data storage and indexing, they couple the data computation with probabilistic inference, the latter of which remains a #P-hard problem also in the context of PDBs. In this chapter, we provide a review on the key concepts of PDBs with a particular focus on our own recent research results related to this field. We highlight a number of ongoing research challenges related to PDBs, and we keep referring to an information extraction (IE) scenario as a running application to manage uncertain and temporal facts obtained from IE techniques directly inside a PDB setting.
(Show Context)

Citation Context

...robabilistic databases, i.e., capturing data that is valid during a specific time interval with a given probability. In this context, we present a unified temporal-probabilistic database (TPDB) model =-=[23]-=- in which both time and probability are considered as first-class citizens. Top-k Query Processing. Query evaluation in PDBs involves—apart from the common data computation step, found also in determi...

Analysis and Refinement of Temporal Relation Aggregation

by Taylor Cassidy, Heng Ji
"... To obtain a complete temporal picture of a relation it is necessary to aggregate fragments of tem-poral information across relation instances in text. This process is non-trivial even for humans because temporal information can be imprecise and inconsistent, and systems face the additional challenge ..."
Abstract - Add to MetaCart
To obtain a complete temporal picture of a relation it is necessary to aggregate fragments of tem-poral information across relation instances in text. This process is non-trivial even for humans because temporal information can be imprecise and inconsistent, and systems face the additional challenge that each of their classifications is potentially false. Even a small amount of incorrect proposed temporal information about a relation can severely affect the resulting aggregate tempo-ral knowledge. We motivate and evaluate three methods to modify temporal relation information prior to aggregation to address this challenge. 1

IOS Press An Infrastructure for Probabilistic Reasoning with Web Ontologies

by unknown authors
"... Abstract. We present an infrastructure for probabilistic reasoning with ontologies that is based on our Markov logic engine ROCKIT. Markov logic is a template language that combines first-order logic with log-linear graphical models. We show how to translate OWL-EL as well as RDF schema to Markov lo ..."
Abstract - Add to MetaCart
Abstract. We present an infrastructure for probabilistic reasoning with ontologies that is based on our Markov logic engine ROCKIT. Markov logic is a template language that combines first-order logic with log-linear graphical models. We show how to translate OWL-EL as well as RDF schema to Markov logic and how to use ROCKIT for applying MAP inference on the given set of formulas. The resulting system is an infrastructure for log linear logics that can be used for probabilistic reasoning with both extended OWL-EL and RDF schema. We describe our system and illustrate its benefits by presenting two application scenarios. These scenarios are ontology matching, and knowledge base verification, with a special focus on temporal reasoning. Our results indicate that our system, which is based on a well-founded probabilistic semantics, is capable of solving relevant problems as good as or better than state of the art systems that have specifically been designed for the respective problem.
(Show Context)

Citation Context

...ible with a reasoner that supports DL-safe rules in order to detect inconsistencies. However, their system can only detect if a knowledge base is consistent but cannot resolve the existing conflicts. =-=[34,13,12]-=- proposed different approaches to resolve temporal conflicts at query time. In particular, they define temporal constraint as Datalog rules. However, these approaches do not incorporate terminological...

Authors ' Addresses

by Maximilian Dylla, Martin Theobald, Maximilian Dylla, Martin Theobald , 2014
"... We would like to thank Floris Geerts and Rainer Gemulla for helpful technical discussions. We thank Radu Curticapean for pointing out Bezout's theorem. Learning the parameters of complex probabilistic-relational models from la-beled training data is a standard technique in machine learning, whi ..."
Abstract - Add to MetaCart
We would like to thank Floris Geerts and Rainer Gemulla for helpful technical discussions. We thank Radu Curticapean for pointing out Bezout's theorem. Learning the parameters of complex probabilistic-relational models from la-beled training data is a standard technique in machine learning, which has been intensively studied in the subeld of Statistical Relational Learning (SRL), but|so far|this is still an under-investigated topic in the context of Probabilistic Databases (PDBs). In this paper, we focus on learning the probability values of base tuples in a PDB from query answers, the latter of which are represented as labeled lineage formulas. Specically, we consider labels in the form of pairs, each consisting of a Boolean lineage formula and a marginal probability that comes attached to the corresponding query an-swer. The resulting learning problem can be viewed as the inverse problem to condence computations in PDBs: given a set of labeled query answers, learn the probability values of the base tuples, such that the marginal prob-
(Show Context)

Citation Context

...al probabilities P (ϕi) are repeatedly computed, it is benecial to preprocess the lineage formulas ϕi, e.g. by compiling them to OBDDs [27], or bysattening them via a few targeted Shannon expansions =-=[16]-=-, the latter of which we also apply in this work. Next, Line 8 might yield a probability value that exceeds the interval [0, 1], which we counter by the logit function. It denes a mapping from probab...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University