Results 1 - 10
of
12
A Probabilistic Relational Algebra for the Integration of Information Retrieval and Database Systems
- ACM Transactions on Information Systems
, 1994
"... We present a probabilistic relational algebra (PRA) which is a generalization of standard relational algebra. Here tuples are assigned probabilistic weights giving the probability that a tuple belongs to a relation. Based on intensional semantics, the tuple weights of the result of a PRA expression ..."
Abstract
-
Cited by 149 (28 self)
- Add to MetaCart
We present a probabilistic relational algebra (PRA) which is a generalization of standard relational algebra. Here tuples are assigned probabilistic weights giving the probability that a tuple belongs to a relation. Based on intensional semantics, the tuple weights of the result of a PRA expression always confirm to the underlying probabilistic model. We also show for which expressions extensional semantics yields the same results. Furthermore, we discuss complexity issues and indicate possibilities for optimization. With regard to databases, the approach allows for representing imprecise attribute values, whereas for information retrieval, probabilistic document indexing and probabilistic search term weighting can be modelled. As an important extension, we introduce the concept of vague predicates which yields a probabilistic weight instead of a Boolean value, thus allowing for queries with vague selection conditions. So PRA implements uncertainty and vagueness in combination with the...
Automated ranking of database query results
- In CIDR
, 2003
"... We investigate the problem of ranking answers to a database query when many tuples are returned. We adapt and apply principles of probabilistic models from Information Retrieval for structured data. Our proposed solution is domain independent. It leverages data and workload statistics and correlatio ..."
Abstract
-
Cited by 67 (8 self)
- Add to MetaCart
We investigate the problem of ranking answers to a database query when many tuples are returned. We adapt and apply principles of probabilistic models from Information Retrieval for structured data. Our proposed solution is domain independent. It leverages data and workload statistics and correlations. Our ranking functions can be further customized for different applications. We present results of preliminary experiments which demonstrate the efficiency as well as the quality of our ranking system. 1.
Selectivity Estimation using Probabilistic Models
, 2001
"... Estimating the result size of complex queries that involve selection on multiple attributes and the join of several relations is a difficult but fundamental task in database query processing. It arises in cost-based query optimization, query profiling, and approximate query answering. In this paper, ..."
Abstract
-
Cited by 65 (3 self)
- Add to MetaCart
Estimating the result size of complex queries that involve selection on multiple attributes and the join of several relations is a difficult but fundamental task in database query processing. It arises in cost-based query optimization, query profiling, and approximate query answering. In this paper, we show how probabilistic graphical models can be effectively used for this task as an accurate and compact approximation of the joint frequency distribution of multiple attributes across multiple relations. Probabilistic Relational Models (PRMs) are a recent development that extends graphical statistical models such as Bayesian Networks to relational domains. They represent the statistical dependencies between attributes within a table, and between attributes across foreign-key joins. We provide an efficient algorithm for constructing a PRM from a database, and show how a PRM can be used to compute selectivity estimates for a broad class of queries. One of the major contributions of this work is a unified framework for the estimation of queries involving both select and foreign-key join operations. Furthermore, our approach is not limited to answering a small set of predetermined queries; a single model can be used to effectively estimate the sizes of a wide collection of potential queries across multiple tables. We present results for our approach on several real-world databases. For both single-table multi-attribute queries and a general class of select-join queries, our approach produces more accurate estimates than standard approaches to selectivity estimation, using comparable space and time.
"Is This Document Relevant? ...Probably": A Survey of Probabilistic Models in Information Retrieval
, 2001
"... This article surveys probabilistic approaches to modeling information retrieval. The basic concepts of probabilistic approaches to information retrieval are outlined and the principles and assumptions upon which the approaches are based are presented. The various models proposed in the developmen ..."
Abstract
-
Cited by 55 (12 self)
- Add to MetaCart
This article surveys probabilistic approaches to modeling information retrieval. The basic concepts of probabilistic approaches to information retrieval are outlined and the principles and assumptions upon which the approaches are based are presented. The various models proposed in the development of IR are described, classified, and compared using a common formalism. New approaches that constitute the basis of future research are described
Probabilistic information retrieval approach for ranking of database query results
- ACM Transactions on Database Systems (TODS
, 2006
"... We investigate the problem of ranking the answers to a database query when many tuples are returned. In particular, we present methodologies to tackle the problem for conjunctive and range queries, by adapting and applying principles of probabilistic models from Information Retrieval for structured ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
We investigate the problem of ranking the answers to a database query when many tuples are returned. In particular, we present methodologies to tackle the problem for conjunctive and range queries, by adapting and applying principles of probabilistic models from Information Retrieval for structured data. Our solution is domain independent and leverages data and workload statistics and correlations. We evaluate the quality of our approach with a user survey on a real database. Furthermore, we present and experimentally evaluate algorithms to efficiently retrieve the top ranked results, which demonstrate the feasibility of our ranking system.
Document Retrieval Facilities for Repository-Based System Development Environments
, 1996
"... Modern system development environments usually deploy the object management facilities of a so-called repository to store the documents created and maintained during system development. PCTE is the ISO and ECMA standard for a public tool interface for an open repository [23]. In this paper we presen ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
Modern system development environments usually deploy the object management facilities of a so-called repository to store the documents created and maintained during system development. PCTE is the ISO and ECMA standard for a public tool interface for an open repository [23]. In this paper we present document retrieval extensions for an OQLoriented query language for PCTE. The extensions proposed cover (1) pattern matching, (2) term based document retrieval with automatically generated document description vectors, (3) the flexible definition of what is addressed as a "document" in agiven query, and (4) the integration of these facilities into a CASE tool. Whereas the integration of pattern matching facilities into query languages has been addressed by other authors before, the main contribution of our approach is the homogeneous integration of term based document retrieval and the flexible definition of documents. 1 Introduction Repository-based applications are in wide-spread use i...
Towards automatic association of relevant unstructured content with structured query results
- In CIKM
, 2005
"... Faced with growing knowledge management needs, enterprises are increasingly realizing the importance of seamlessly integrating critical business information distributed across both structured and unstructured data sources. In existing information integration solutions, the application needs to formu ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Faced with growing knowledge management needs, enterprises are increasingly realizing the importance of seamlessly integrating critical business information distributed across both structured and unstructured data sources. In existing information integration solutions, the application needs to formulate the SQL logic to retrieve the needed structured data on one hand, and identify a set of keywords to retrieve the related unstructured data on the other. This paper proposes a novel approach wherein the application specifies its information needs using only a SQL query on the structured data, and this query is automatically “translated ” into a set of keywords that can be used to retrieve relevant unstructured data. We describe the techniques used for obtaining these keywords from (i) the query result, and (ii) additional related information in the underlying database. We further show that these techniques achieve high accuracy with very reasonable overheads.
Structured Document Handling - a Case for Integrating Databases and Information Retrieval
, 1994
"... In this paper 1 we discuss the structured multimedia documents that will be, or already are, to some degree the communication backbone of the so-called superhighways. It will be shown that storage and retrieval of such documents will best be handled by an integration of database and information re ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In this paper 1 we discuss the structured multimedia documents that will be, or already are, to some degree the communication backbone of the so-called superhighways. It will be shown that storage and retrieval of such documents will best be handled by an integration of database and information retrieval technologies. We assume documents to be structured with the help of standards like SGML/HyTime and represented by the multitude of formats currently used for multimedia data. Starting with an approach based on object-oriented database technology we extend both their functionality on the cost models for query evaluation on one side with multimedia features and on the other with logic-based models of information retrieval to truly combine structure and content information about the documents in question. 1 Introduction As an effect of advances in information technology the size and amount of documents being available have increased dramatically. This affects all aspects of document h...
Query Evaluation with Soft-Key Constraints
"... Key Violations often occur in real-life datasets, especially in those integrated from different sources. Enforcing constraints strictly on these datasets is not feasible. In this paper we formalize the notion of soft-key constraints on probabilistic databases, which allow for violation of key constr ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Key Violations often occur in real-life datasets, especially in those integrated from different sources. Enforcing constraints strictly on these datasets is not feasible. In this paper we formalize the notion of soft-key constraints on probabilistic databases, which allow for violation of key constraint by penalizing every violating world by a quantity proportional to the violation. To represent our probabilistic database with constraints, we define a class of markov networks, where we can do query evaluation in PTIME. We also study the evaluation of conjunctive queries on relations with soft keys and present a dichotomy that separates this set into those in PTIME and the rest which are #P-Hard. 1.
Decision-Guidance Management Systems (DGMS): Seamless Integration of Data Acquisition, Learning, Prediction, and Optimization ∗
"... A Decision Guidance Management System (DGMS) is a productivity platform for fast development of applications that require a closed-loop data acquisition, learning, prediction, and decision optimization. This paper introduces the DGMS concept, and the first DGMS data model with its query language, DG ..."
Abstract
- Add to MetaCart
A Decision Guidance Management System (DGMS) is a productivity platform for fast development of applications that require a closed-loop data acquisition, learning, prediction, and decision optimization. This paper introduces the DGMS concept, and the first DGMS data model with its query language, DG-SQL. The DGMS data model is an extension of the relational model with probability distributions over a set of attributes as random variables. DG-SQL supports a seamless integration of (1) querying the data collection and construction of learning sets, (2) learning from the learning sets, using parameterized transformers and optionally defining an estimation utility, such as sum of squares of errors, to be minimized; (3) probabilistic prediction and simulation, using expressions that involve random variables, such as expectation, variance and probability of a logical formula; and (4) stochastic or deterministic optimization, where search space is defined as a set of feasible non-deterministic query evaluations. 1.

