Results 11 - 20
of
119
Probabilistic Datalog: Implementing Logical Information Retrieval for Advanced Applications
- Journal of the American Society for Information Science
, 1999
"... In the logical approach to information retrieval (IR), retrieval is considered as uncertain inference. ..."
Abstract
-
Cited by 36 (6 self)
- Add to MetaCart
In the logical approach to information retrieval (IR), retrieval is considered as uncertain inference.
The dichotomy of conjunctive queries on probabilistic structures
- In PODS
, 2007
"... We show that for every conjunctive query, the complexity of evaluating it on a probabilistic database is either PTIME or #P-complete, and we give an algorithm for deciding whether a given conjunctive query is PTIME or #P-complete. The dichotomy property is a fundamental result on query evaluation on ..."
Abstract
-
Cited by 35 (12 self)
- Add to MetaCart
We show that for every conjunctive query, the complexity of evaluating it on a probabilistic database is either PTIME or #P-complete, and we give an algorithm for deciding whether a given conjunctive query is PTIME or #P-complete. The dichotomy property is a fundamental result on query evaluation on probabilistic databases and it gives a complete classification of the complexity of conjunctive queries. 1. PROBLEM STATEMENT Fix a relational vocabulary R1,..., Rk, denoted R. A tuple-independent probabilistic structure is a pair (A, p) where A = (A, R A 1,..., R A k) is first order structure and p is a function that associates to each tuple t in A a rational number p(t) ∈ [0, 1]. A probabilistic structure (A,p) induces a probability distribution on the set of substructures B of A by: p(B) = kY ( Y p(t) × i=1 t∈RB i Y t∈R A i −RB i (1 − p(t))) (1) where B ⊆ A, more precisely B = (A, R B 1,..., B B k) is s.t. R B i ⊆ R A i for i = 1, k. A conjunctive query, q, is a sentence of the form ∃¯x.(ϕ1 ∧... ∧ϕm), where each ϕi is a positive atomic predicate R(t), called a sub-goal, and the tuple t consists of variables and/or constants. As usual, we drop the existential quantifiers and the ∧, writing q = ϕ1, ϕ2,..., ϕm. A conjunctive property is a property on structures defined by a conjunctive query q, and its probability on a probabilistic structure (A, p) is defined as: p(q) = X p(B) (2)
AVATAR information extraction system
- IEEE Data Engineering Bulletin
, 2006
"... Abstract The AVATAR Information Extraction System (IES) at the IBM Almaden Research Center enables high-precision, rule-based, information extraction from text-documents. Drawing from our experience we propose the use of probabilistic database techniques as the formal underpinnings of information ex ..."
Abstract
-
Cited by 32 (2 self)
- Add to MetaCart
Abstract The AVATAR Information Extraction System (IES) at the IBM Almaden Research Center enables high-precision, rule-based, information extraction from text-documents. Drawing from our experience we propose the use of probabilistic database techniques as the formal underpinnings of information extrac-tion systems so as to maintain high precision while increasing recall. This involves building a framework where rule-based annotators can be mapped to queries in a database system. We use examplesfrom AVATAR IES to describe the challenges in achieving this goal. Finally, we show that deriving precision estimates in such a database system presents a significant challenge for probabilistic databasesystems.
Conditioning Probabilistic Databases
"... Past research on probabilistic databases has studied the problem of answering queries on a static database. Application scenarios of probabilistic databases however often involve the conditioning of a database using additional information in the form of new evidence. The conditioning problem is thus ..."
Abstract
-
Cited by 32 (13 self)
- Add to MetaCart
Past research on probabilistic databases has studied the problem of answering queries on a static database. Application scenarios of probabilistic databases however often involve the conditioning of a database using additional information in the form of new evidence. The conditioning problem is thus to transform a probabilistic database of priors into a posterior probabilistic database which is materialized for subsequent query processing or further refinement. It turns out that the conditioning problem is closely related to the problem of computing exact tuple confidence values. It is known that exact confidence computation is an NPhard problem. This has lead researchers to consider approximation techniques for confidence computation. However, neither conditioning nor exact confidence computation can be solved using such techniques. In this paper we present efficient techniques for both problems. We study several problem decomposition methods and heuristics that are based on the most successful search techniques from constraint satisfaction, such as the variable elimination rule of the Davis-Putnam algorithm. We complement this with a thorough experimental evaluation of the algorithms proposed. Our experiments show that our exact algorithms scale well to realistic database sizes and can in some scenarios compete with the most efficient previous approximation algorithms.
XIRQL: An XML Query Language Based on Information Retrieval Concepts
, 2001
"... Most proposals for XML query languages are based on the data-centric view on XML and do not support uncertainty and vagueness, thus being insuitable for information retrieval (IR) of XML documents. Based on the document-centric view, we present the query language XIRQL which implements IR-related fe ..."
Abstract
-
Cited by 31 (2 self)
- Add to MetaCart
Most proposals for XML query languages are based on the data-centric view on XML and do not support uncertainty and vagueness, thus being insuitable for information retrieval (IR) of XML documents. Based on the document-centric view, we present the query language XIRQL which implements IR-related features such as weighting and ranking, relevance-oriented search, datatypes with vague predicates, and structural relativism. XIRQL integrates these features by using ideas from logic-based probabilistic IR models, in combination with concepts from the database area. For processing XIRQL queries, a path algebra is presented which also serves as a starting point for query optimization.
Answering Queries from Statistics and Probabilistic Views
, 2005
"... this paper, require complex correlations between tuples, for which the query semantics has not been previously studied ..."
Abstract
-
Cited by 31 (3 self)
- Add to MetaCart
this paper, require complex correlations between tuples, for which the query semantics has not been previously studied
HySpirit - a Probabilistic Inference Engine for Hypermedia Retrieval in Large Databases
- Proceedings of the 6th International Conference on Extending Database Technology (EDBT
, 1998
"... . HySpirit is a retrieval engine for hypermedia retrieval integrating concepts from information retrieval (IR) and deductive databases. The logical view on IR models retrieval as uncertain inference, for which we use probabilistic reasoning. Since the expressiveness of classical IR models is not suf ..."
Abstract
-
Cited by 30 (8 self)
- Add to MetaCart
. HySpirit is a retrieval engine for hypermedia retrieval integrating concepts from information retrieval (IR) and deductive databases. The logical view on IR models retrieval as uncertain inference, for which we use probabilistic reasoning. Since the expressiveness of classical IR models is not sufficient for hypermedia retrieval, HySpirit is based on a probabilistic version of Datalog. In hypermedia retrieval, different nodes may contain contradictory information; thus, we introduce probabilistic four-valued Datalog. In order to support fact queries as well as contentbased retrieval, HySpirit is based on an open world assumption, but allows for predicate-specific closed world assumptions. For performing efficient retrieval on large databases, our system provides access to external data. We demonstrate the application of HySpirit by giving examples for retrieval on images, structured documents and large databases. 1 Introduction Due to the advances in hardware, processing of multimed...
Towards correcting input data errors probabilistically using integrity constraints
- In MobiDB
, 2006
"... Mobile and pervasive applications frequently rely on devices such as RFID antennas or sensors (light, temperature, motion) to provide them information about the physical world. These devices, however, are unreliable. They produce streams of information where portions of data may be missing, duplicat ..."
Abstract
-
Cited by 30 (9 self)
- Add to MetaCart
Mobile and pervasive applications frequently rely on devices such as RFID antennas or sensors (light, temperature, motion) to provide them information about the physical world. These devices, however, are unreliable. They produce streams of information where portions of data may be missing, duplicated, or erroneous. Current state of the art is to correct errors locally (e.g., range constraints for temperature readings) or use spatial/temporal correlations (e.g., smoothing temperature readings). However, errors are often apparent only in a global setting, e.g., missed readings of objects that are known to be present, or exit readings from a parking garage without matching entry readings. In this paper, we present StreamClean, a system for correcting input data errors automatically using application defined global integrity constraints. Because it is frequently impossible to make corrections with certainty, we propose a probabilistic approach, where the system assigns to each input tuple the probability that it is correct. We show that StreamClean handles a large class of input data errors, and corrects them sufficiently fast to keep-up with input rates of many mobile and pervasive applications. We also show that the probabilities assigned by StreamClean correspond to a user’s intuitive notion of correctness.
Querying Structured Text in an XML Database
, 2003
"... XML databases often contain documents comprising structured text. Therefore, it is important to integrate "information retrieval style" query evaluation, which is well-suited for natural language text, with standard "database style" query evaluation, which handles structured queries efficiently. Rel ..."
Abstract
-
Cited by 29 (6 self)
- Add to MetaCart
XML databases often contain documents comprising structured text. Therefore, it is important to integrate "information retrieval style" query evaluation, which is well-suited for natural language text, with standard "database style" query evaluation, which handles structured queries efficiently. Relevance scoring is central to information retrieval. In the case of XML, this operation becomes more complex because the data required for scoring could reside not directly in an element itself but also in its descendant elements.
A Unified Approach to Ranking in Probabilistic Databases
"... The dramatic growth in the number of application domains that naturally generate probabilistic, uncertain data has resulted in a need for efficiently supporting complex querying and decision-making over such data. In this paper, we present a unified approach to ranking and top-k query processing in ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
The dramatic growth in the number of application domains that naturally generate probabilistic, uncertain data has resulted in a need for efficiently supporting complex querying and decision-making over such data. In this paper, we present a unified approach to ranking and top-k query processing in probabilistic databases by viewing it as a multi-criteria optimization problem, and by deriving a set of features that capture the key properties of a probabilistic dataset that dictate the ranked result. We contend that a single, specific ranking function may not suffice for probabilistic databases, and we instead propose two parameterized ranking functions, called P RF ω and P RF e, that generalize or can approximate many of the previously proposed ranking functions. We present novel generating functions-based algorithms for efficiently ranking large datasets according to these ranking functions, even if the datasets exhibit complex correlations modeled using probabilistic and/xor trees or Markov networks. We further propose that the parameters of the ranking function be learned from user preferences, and we develop an approach to learn those parameters. Finally, we present a comprehensive experimental study that illustrates the effectiveness of our parameterized ranking functions, especially P RF e, at approximating other ranking functions and the scalability of our proposed algorithms for exact or approximate ranking. 1.

