Results 1 - 10
of
28
PXML: A probabilistic semistructured data model and algebra
- In ICDE
, 2003
"... ehung,getoor,vs£ Despite the recent proliferation of work on semistructured data models, there has been little work to date on supporting uncertainty in these models. In this paper, we propose a model for probabilistic semistructured data (PSD). The advantage of our approach is that it supports a fl ..."
Abstract
-
Cited by 37 (3 self)
- Add to MetaCart
ehung,getoor,vs£ Despite the recent proliferation of work on semistructured data models, there has been little work to date on supporting uncertainty in these models. In this paper, we propose a model for probabilistic semistructured data (PSD). The advantage of our approach is that it supports a flexible representation that allows the specification of a wide class of distributions over semistructured instances. We provide two semantics for the model and show that the semantics are probabilistically coherent. Next, we develop an extension of the relational algebra to handle probabilistic semistructured data and describe efficient algorithms for answering queries that use this algebra. Finally, we present experimental results showing the efficiency of our algorithms. 1
On the complexity of managing probabilistic XML data
- In PODS
, 2007
"... In [3], we introduced a framework for querying and updating probabilistic information over unordered labeled trees, the probabilistic tree model. The data model is based on trees where nodes are annotated with conjunctions of probabilistic event variables. We briefly described an implementation and ..."
Abstract
-
Cited by 20 (5 self)
- Add to MetaCart
In [3], we introduced a framework for querying and updating probabilistic information over unordered labeled trees, the probabilistic tree model. The data model is based on trees where nodes are annotated with conjunctions of probabilistic event variables. We briefly described an implementation and scenarios of usage. We develop here a mathematical foundation for this model. In particular, we present complexity results. We identify a very large class of queries for which simple variations of querying and updating algorithms from [3] compute the correct answer. A main contribution is a full complexity analysis of queries and updates. We also exhibit a decision procedure for the equivalence of probabilistic trees and prove it is in co-rp. Furthermore, we study the issue of removing less probable possible worlds, and that of validating a probabilistic tree against a DTD. We show that these two problems are intractable in the most general case.
Range Search on Multidimensional Uncertain Data
"... In an uncertain database, every object o is associated with a probability density function, which describes the likelihood that o appears at each position in a multidimensional workspace. This article studies two types of range retrieval fundamental to many analytical tasks. Specifically, a nonfuzzy ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
In an uncertain database, every object o is associated with a probability density function, which describes the likelihood that o appears at each position in a multidimensional workspace. This article studies two types of range retrieval fundamental to many analytical tasks. Specifically, a nonfuzzy query returns all the objects that appear in a search region rq with at least a certain probability tq. On the other hand, given an uncertain object q, fuzzy search retrieves the set of objects that are within distance εq from q with no less than probability tq. The core of our methodology is a novel concept of “probabilistically constrained rectangle”, which permits effective pruning/validation of nonqualifying/qualifying data. We develop a new index structure called the U-tree for minimizing the query overhead. Our algorithmic findings are accompanied with a thorough theoretical analysis, which reveals valuable insight into the problem characteristics, and mathematically confirms the efficiency of our solutions. We verify the effectiveness of the proposed techniques with extensive
Ontology-Based User Context Management: The Challenges of Imperfection and Time-Dependence
- in On the Move to Meaningful Internet Systems 2006: CoopIS, DOA, GADA, and ODBASE. Part I., ser. Lecture
, 2006
"... Robust and scalable user context management is the key enabler for the emerging context- and situation-aware applications, and ontology-based approaches have shown their usefulness for capturing especially context information on a high level of abstraction. But so far the problem has not been app ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
Robust and scalable user context management is the key enabler for the emerging context- and situation-aware applications, and ontology-based approaches have shown their usefulness for capturing especially context information on a high level of abstraction. But so far the problem has not been approached as a data management problem, which is key to scalability and robustness. The specific challenges lie in the imperfection of high-level context information, its time-dependence and the variability in the dynamics between its different elements.
Fusion rules for merging uncertain information
- Information Fusion
, 2006
"... In previous papers, we have presented a logic-based framework based on fusion rules for merging structured news reports [Hun00, Hun02b, Hun02a, HS03, HS04]. Structured news reports are XML documents, where the textentries are restricted to individual words or simple phrases, such as names and domain ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
In previous papers, we have presented a logic-based framework based on fusion rules for merging structured news reports [Hun00, Hun02b, Hun02a, HS03, HS04]. Structured news reports are XML documents, where the textentries are restricted to individual words or simple phrases, such as names and domain-specific terminology, and numbers and units. We assume structured news reports do not require natural language processing. Fusion rules are a form of scripting language that define how structured news reports should be merged. The antecedent of a fusion rule is a call to investigate the information in the structured news reports and the background knowledge, and the consequent of a fusion rule is a formula specifying an action to be undertaken to form a merged report. It is expected that a set of fusion rules is defined for any given application. In this paper we extend the approach to handling probability values, degrees of beliefs, or necessity measures associated with textentries in the news reports. We present the formal definition for each of these types of uncertainty and explain how they can be handled using fusion rules. We also discuss the methods of detecting inconsistencies among sources. 1
Aggregate Queries for Discrete and Continuous Probabilistic XML ∗
"... Sources of data uncertainty and imprecision are numerous. A way to handle this uncertainty is to associate probabilistic annotations to data. Many such probabilistic database models have been proposed, both in the relational and in the semi-structured setting. The latter is particularly well adapted ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
Sources of data uncertainty and imprecision are numerous. A way to handle this uncertainty is to associate probabilistic annotations to data. Many such probabilistic database models have been proposed, both in the relational and in the semi-structured setting. The latter is particularly well adapted to the management of uncertain data coming from a variety of automatic processes. An important problem, in the context of probabilistic XML databases, is that of answering aggregate queries (count, sum, avg, etc.), which has received limited attention so far. In a model unifying the various (discrete) semi-structured probabilistic models studied up to now, we present algorithms to compute the distribution of the aggregation values (exploiting some regularity properties of the aggregate functions) and probabilistic moments (especially, expectation and variance) of this distribution. We also prove the intractability of some of these problems and investigate approximation techniques. We finally extend the discrete model to a continuous one, in order to take into account continuous data values, such as measurements from sensor networks, and present algorithms to compute distribution functions and moments for various classes of continuous distributions of data values.
Representing Probabilistic Information in XML
- Department of Computer Science, University of Kentucky
, 2003
"... Traditional databases have been successful in managing large amounts of deterministic information. However, the real world is lled with uncertainty. Up until now, there has been limited success in managing probabilistic information. In 2001, Dekhtyar, Goldsmith and Hawkes [1] proposed a formal ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Traditional databases have been successful in managing large amounts of deterministic information. However, the real world is lled with uncertainty. Up until now, there has been limited success in managing probabilistic information. In 2001, Dekhtyar, Goldsmith and Hawkes [1] proposed a formal Semistructured Probabilistic Object (SPO) data model to represent probabilistic information and Semistructured Probabilistic Algebra (SP-Algebra) to query it. In this paper, we discuss the XMLbased implementation of this model. We show how SPOs are represented in XML. We discuss our storage strategies for eciently mapping XML representations of SPOs into relational tables. We also describe a query translation mechanism which automatically generates a set of SQL queries for evaluating SP-Algebra queries. Based on this framework, a prototype semistructured probabilistic DBMS (SPDBMS) has been implemented on top of a relational DBMS. We also report on the results of the experiments testing the performance of the SPDBMS.
Integration of IR into an XML Database
, 2002
"... Structure matching has been the focus and strength of standard XML querying. However, textual content is still an essential component of XML data. It is therefore important to extend the standard XML database engine to allow for "Information Retrieval" style queries, namely, "keyword " based retriev ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Structure matching has been the focus and strength of standard XML querying. However, textual content is still an essential component of XML data. It is therefore important to extend the standard XML database engine to allow for "Information Retrieval" style queries, namely, "keyword " based retrieval and "result ranking". In this paper, we describe our e#ort in integrating information retrieval techniques into the Timber XML database system being developed at the University of Michigan, and our participation in the INitiative for the Evaluation of XML Retrieval (INEX).
A Framework for Management of Semistructured Probabilistic Data
- Journal of Intelligent Information Systems
, 2004
"... This paper describes the theoretical framework and implementation of a database management system for storing and manipulating diverse probability distributions and associated information. A formal Semistructured Probabilistic Object (SPO) data model and a Semistructured Probabilistic Query Algebra ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This paper describes the theoretical framework and implementation of a database management system for storing and manipulating diverse probability distributions and associated information. A formal Semistructured Probabilistic Object (SPO) data model and a Semistructured Probabilistic Query Algebra (SP-algebra) are proposed. The SP-algebra supports standard database queries as well as some specific to probabilities, such as conditionalization and marginalization. Thus, the Semistructured Probabilistic Database may be used as a backend to any application that involves the management of large quantities of probabilistic information, such as building stochastic models. The implementation uses XML encoding of SPOs to facilitate communication with diverse applications. The database management system has been implemented on top of a relational DBMS. The translation of SP-algebra queries into relational queries are discussed here, and the results of initial experiments evaluating the system are reported.
Query Selectivity Estimation for Uncertain Data
- In 20th Intl. Conf. on Scientific and Statistical Database Management
, 2008
"... Abstract. Applications requiring the handling of uncertain data have led to the development of database management systems extending the scope of relational databases to include uncertain (probabilistic) data as a native data type. New automatic query optimizations having the ability to estimate the ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract. Applications requiring the handling of uncertain data have led to the development of database management systems extending the scope of relational databases to include uncertain (probabilistic) data as a native data type. New automatic query optimizations having the ability to estimate the cost of execution of a given query plan, as available in existing databases, need to be developed. For probabilistic data this involves providing selectivity estimations that can handle multiple values for each attribute and also new query types with threshold values. This paper presents novel selectivity estimation functions for uncertain data and shows how these functions can be integrated into PostgreSQL to achieve query optimization for probabilistic queries over uncertain data. The proposed methods are able to handle both attribute- and tuple-uncertainty. Our experimental results show that our algorithms are efficient and give good selectivity estimates with low space-time overhead. 1

