A Probabilistic Relational Algebra for the Integration of Information Retrieval and Database Systems
 ACM Transactions on Information Systems
, 1994
"... We present a probabilistic relational algebra (PRA) which is a generalization of standard relational algebra. Here tuples are assigned probabilistic weights giving the probability that a tuple belongs to a relation. Based on intensional semantics, the tuple weights of the result of a PRA expression ..."
Cited by 178 (31 self)
We present a probabilistic relational algebra (PRA) which is a generalization of standard relational algebra. Here tuples are assigned probabilistic weights giving the probability that a tuple belongs to a relation. Based on intensional semantics, the tuple weights of the result of a PRA expression always confirm to the underlying probabilistic model. We also show for which expressions extensional semantics yields the same results. Furthermore, we discuss complexity issues and indicate possibilities for optimization. With regard to databases, the approach allows for representing imprecise attribute values, whereas for information retrieval, probabilistic document indexing and probabilistic search term weighting can be modelled. As an important extension, we introduce the concept of vague predicates which yields a probabilistic weight instead of a Boolean value, thus allowing for queries with vague selection conditions. So PRA implements uncertainty and vagueness in combination with the...
Representing and querying correlated tuples in probabilistic databases
 In ICDE
, 2007
"... Probabilistic databases have received considerable attention recently due to the need for storing uncertain data produced by many real world applications. The widespread use of probabilistic databases is hampered by two limitations: (1) current probabilistic databases make simplistic assumptions abo ..."
Cited by 118 (11 self)
Probabilistic databases have received considerable attention recently due to the need for storing uncertain data produced by many real world applications. The widespread use of probabilistic databases is hampered by two limitations: (1) current probabilistic databases make simplistic assumptions about the data (e.g., complete independence among tuples) that make it difficult to use them in applications that naturally produce correlated data, and (2) most probabilistic databases can only answer a restricted subset of the queries that can be expressed using traditional query languages. We address both these limitations by proposing a framework that can represent not only probabilistic tuples, but also correlations that may be present among them. Our proposed framework naturally lends itself to the possible world semantics thus preserving the precise query semantics extant in current probabilistic databases. We develop an efficient strategy for query evaluation over such probabilistic databases by casting the query processing problem as an inference problem in an appropriately constructed probabilistic graphical model. We present several optimizations specific to probabilistic databases that enable efficient query evaluation. We validate our approach by presenting an experimental evaluation that illustrates the effectiveness of our techniques at answering various queries using real and synthetic datasets. 1
Vague: a user interface to relational databases that permits vague queries
 ACM Transactions on Office Information Systems
, 1988
"... A specific query establishes a rigid qualification and is concerned only with data that match it precisely. A vague query establishes a target qualification and is concerned also with data that are close to this target. Most conventional database systems cannot handle vague queries directly, forcing ..."
Cited by 103 (6 self)
A specific query establishes a rigid qualification and is concerned only with data that match it precisely. A vague query establishes a target qualification and is concerned also with data that are close to this target. Most conventional database systems cannot handle vague queries directly, forcing their users to retry specific queries repeatedly with minor modifications until they match data that are satisfactory. This article describes a system called VAGUE that can handle vague queries directly. The principal concept behind VAGUE is its extension to the relational data model with data metrics, which are definitions of distances between values of the same domain. A problem with implementing data distances is that different users may have different interpretations for the notion of distance. VAGUE incorporates several features that enable it to adapt itself to the individual views and priorities of its users.
Fuzzy functional dependencies and lossless join decomposition of fuzzy relational database systems
 ACM Transactions on Database Systems
, 1988
"... This paper deals with the application of fuzzy logic in a relational database environment with the objective of capturing more meaning of the data. It is shown that with suitable interpretations for the fuzzy membership functions, a fuzzy relational data model can be used to represent ambiguities in ..."
Cited by 75 (0 self)
This paper deals with the application of fuzzy logic in a relational database environment with the objective of capturing more meaning of the data. It is shown that with suitable interpretations for the fuzzy membership functions, a fuzzy relational data model can be used to represent ambiguities in data values as well as impreciseness in the association among them. Relational operators for fuzzy relations have been studied, and applicability of fuzzy logic in capturing integrity constraints has been investigated. By introducing a fuzzy resemblance measure EQUAL for comparing domain values, the definition of classical functional dependency has been generalized to fuzzy functional dependency (ffd). The implication problem of ffds has been examined and a set of sound and complete inference axioms has been proposed. Next, the problem of lossless join decomposition of fuzzy relations for a given set of fuzzy functional dependencies is investigated. It is proved that with a suitable restriction on EQUAL, the design theory of a classical relational database with functional dependencies can be extended to fuzzy relations satisfying fuzzy functional dependencies.
A probabilistic framework for vague queries and imprecise information in databases
 PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES
, 1990
"... A probabilistic learning model for vague queries and missing or imprecise information in databases is described. Instead of retrieving only a set of answers, our approach yields a ranking of objects from the database in response to a query. By using relevance judgements from the user about the objec ..."
Cited by 57 (13 self)
A probabilistic learning model for vague queries and missing or imprecise information in databases is described. Instead of retrieving only a set of answers, our approach yields a ranking of objects from the database in response to a query. By using relevance judgements from the user about the objects retrieved, the ranking for the actual query as well as the overall retrieval quality of the system can be further improved. For specifying different kinds of conditions in vague queries, the notion of vague predicates is introduced. Based on the underlying probabilistic model, also imprecise or missing attribute values can be treated easily. In addition, the corresponding formulas can be applied in combination with standard predicates (from twovalued logic), thus extending standard database systems for coping with missing or imprecise data.
Current Approaches to Handling Imperfect Information in Data and Knowledge Bases
, 1996
"... This paper surveys methods for representing and reasoning with imperfect information. It opens with an attempt to classify the different types of imperfection that may pervade data, and a discussion of the sources of such imperfections. The classification is then used as a framework for considering ..."
Cited by 54 (1 self)
This paper surveys methods for representing and reasoning with imperfect information. It opens with an attempt to classify the different types of imperfection that may pervade data, and a discussion of the sources of such imperfections. The classification is then used as a framework for considering work that explicitly concerns the representation of imperfect information, and related work on how imperfect information may be used as a basis for reasoning. The work that is surveyed is drawn from both the field of databases and the field of artificial intelligence. Both of these areas have long been concerned with the problems caused by imperfect information, and this paper stresses the relationships between the approaches developed in each.
A Survey of Uncertain Data Algorithms and Applications
 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2009
"... In recent years, a number of indirect data collection methodologies have led to the proliferation of uncertain data. Such databases are much more complex because of the additional challenges of representing the probabilistic information. In this paper, we provide a survey of uncertain data mining a ..."
Cited by 33 (11 self)
In recent years, a number of indirect data collection methodologies have led to the proliferation of uncertain data. Such databases are much more complex because of the additional challenges of representing the probabilistic information. In this paper, we provide a survey of uncertain data mining and management applications. We will explore the various models utilized for uncertain data representation. In the field of uncertain data management, we will examine traditional database management methods such as join processing, query processing, selectivity estimation, OLAP queries, and indexing. In the field of uncertain data mining, we will examine traditional mining problems such as frequent pattern mining, outlier detection, classification, and clustering. We discuss different methodologies to process and mine uncertain data in a variety of forms.
Probabilistic Temporal Databases, I: Algebra
"... ... In this paper, we first introduce the syntax of TemporalProbabilistic (TP) relations and then show how they can be converted to an explicit, significantly more spaceconsuming form called Annotated Relations. We then present a Theoretical Annotated Temporal Algebra (TATA). Being explicit, TATA ..."
Cited by 26 (6 self)
... In this paper, we first introduce the syntax of TemporalProbabilistic (TP) relations and then show how they can be converted to an explicit, significantly more spaceconsuming form called Annotated Relations. We then present a Theoretical Annotated Temporal Algebra (TATA). Being explicit, TATA is convenient for specifying how the algebraic operations should behave, but is impractical to use because annotated relations are overwhelmingly large. Next, we
An Introduction to the Fuzzy Set and Possibility TheoryBased Treatment of Soft Queries and Uncertain Or Imprecise Databases
, 1994
"... In this paper, it is shown that fuzzy sets and possibility theory provide an homogeneous framework for the representation of both imprecise/uncertain information and soft queries with a flexible interpretation. Incompletely known information as well as flexible query handling capabilities are expect ..."
Cited by 24 (4 self)
In this paper, it is shown that fuzzy sets and possibility theory provide an homogeneous framework for the representation of both imprecise/uncertain information and soft queries with a flexible interpretation. Incompletely known information as well as flexible query handling capabilities are expected to extend the range of applications for future database management systems. The term fuzzy databases which is extensively used in the specialized literature covers several different meanings which are reviewed. A special emphasis is put on flexible queries addressed to regular databases. Such queries enables the user to easily express preferences among more or less admissible attribute values. Several approaches for introducing flexibility, including fuzzy sets, are compared. A query language based on SQL is outlined and some issues related to query processing are discussed. In addition, possibility theory proves to be useful for representing imperfectly known data and soft constraints. P...
A Probabilistic Relational Model for the Integration of IR and Databases
 In Proceedings of ACM SIGIR
, 1993
"... In this paper, a probabilistic relational model is presented which combines relational algebra with probabilistic retrieval. Based on certain independence assumptions, the operators of the relational algebra are redefined such that the probabilistic algebra is a generalization of the standard relati ..."
Cited by 20 (1 self)
In this paper, a probabilistic relational model is presented which combines relational algebra with probabilistic retrieval. Based on certain independence assumptions, the operators of the relational algebra are redefined such that the probabilistic algebra is a generalization of the standard relational algebra. Furthermore, a special join operator implementing probabilistic retrieval is proposed. When applied to typical document databases, queries can not only ask for documents, but for any kind of object in the database. In addition, an implicit ranking of these objects is provided in case the query relates to probabilistic indexing or uses the probabilistic join operator. The proposed algebra is intended as a standard interface to combined database and IR systems, as a basis for implementing userfriendly interfaces. 1 Introduction The fields of databases (DB) and information retrieval (IR) have been coexisting for a very long time, but with little influence on each other. IR peop...