## Evaluating Probabilistic Queries over Imprecise Data (2003)

### Cached

### Download Links

- [www4.comp.polyu.edu.hk]
- [www.ics.uci.edu]
- [www.ics.uci.edu]
- [www.cs.purdue.edu]
- [www.cs.hku.hk]
- [www.cs.purdue.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In SIGMOD |

Citations: | 227 - 42 self |

### BibTeX

@INPROCEEDINGS{Cheng03evaluatingprobabilistic,

author = {Reynold Cheng},

title = {Evaluating Probabilistic Queries over Imprecise Data},

booktitle = {In SIGMOD},

year = {2003},

pages = {551--562}

}

### Years of Citing Articles

### OpenURL

### Abstract

Sensors are often employed to monitor continuously changing entities like locations of moving ob-jects and temperature. The sensor readings are reported to a database system, and are subsequently used to answer queries. Due to continuous changes in these values and limited resources (e.g., net-work bandwidth and battery power), the database may not be able to keep track of the actual values of the entities. Queries that use these old values may produce incorrect answers. However, if the degree of uncertainty between the actual data value and the database value is limited, one can place more confidence in the answers to the queries. More generally, query answers can be augmented with probabilistic guarantees of the validity of the answers. In this paper, we study probabilistic query evaluation based on uncertain data. A classification of queries is made based upon the nature of the result set. For each class, we develop algorithms for computing probabilistic answers, and provide efficient indexing and numeric solutions. We address the important issue of measuring the quality of the answers to these queries, and provide algorithms for efficiently pulling data from relevant sensors or moving objects in order to improve the quality of the executing queries. Extensive experiments

### Citations

6568 |
The mathematical theory of communication
- Shannon, Wienner
- 1963
(Show Context)
Citation Context ...e entropy of a message X ∈ {X1, . . . , Xn} is: n� 1 H(X) = p(Xi) log2 p(Xi) i=1 The entropy, H(X), measures the average number of bits required to encode X, or the amount of information carried in X =-=[9]-=-. If H(X) equals 0, there exists some i such that p(Xi) = 1, and we are certain that Xi is the message, and there is no uncertainty associated with X. On the other hand, H(X) attains the maximum value... |

507 | Nearest neighbor queries
- Roussopoulos, Kelley, et al.
- 1995
(Show Context)
Citation Context ...to acquire the identity of the sensor that yields the maximum temperature value over a region being monitored by sensors [6]. Nearest neighbor queries have also been widely used in location databases =-=[7]-=-. Notice that for all these queries the condition � Ti∈R pi = 1 holds. 4. Value-based Aggregate Class The final class involves aggregate operators that return a single value. Examples include: 8sQuery... |

468 | Nonlinear time series analysis - Kantz, Schreiber - 1997 |

362 | Efficient query evaluation on probabilistic databases
- Dalvi, Suciu
- 2004
(Show Context)
Citation Context ...s. Yazici et al. [21] discussed a comprehensive characterization of uncertainty for different data types, including fuzzy sets and fuzzy intervals. Probabilistic databases, such as those discussed in =-=[22, 23, 24]-=-, augment a probability value to each relational tuple to specify its probability of presence. To derive information about data fuzziness, statistical databases [25] can be employed, which keep track ... |

353 | Model driven data acquisition in sensor networks
- Deshpande, Guestrin, et al.
(Show Context)
Citation Context ...egating information in data streams sensor networks. For example, one may want to acquire the identity of the sensor that yields the maximum temperature value over a region being monitored by sensors =-=[6]-=-. Nearest neighbor queries have also been widely used in location databases [7]. Notice that for all these queries the condition � Ti∈R pi = 1 holds. 4. Value-based Aggregate Class The final class inv... |

330 |
Introduction to time series and forecasting
- Brockwell, Davis
- 2002
(Show Context)
Citation Context ...d at time t0 to find out how likely each reading is inside [l, u] = [1, 15], and the uncertainty intervals derived from the readings of s1, s2, s3 and s4 at time t0 are [2, 12], [8, 18], [11, 21] and =-=[25, 30]-=- respectively. Since the reading of s1 is always inside [l, u], , it has a probability of 1 for satisfying the ERQ. The reading of s4 is always outside [l, u], thus it has a probability of 0 of being ... |

225 |
The Management of Probabilistic Data
- Barbara, Garcia-Molina, et al.
- 1992
(Show Context)
Citation Context ...s. Yazici et al. [21] discussed a comprehensive characterization of uncertainty for different data types, including fuzzy sets and fuzzy intervals. Probabilistic databases, such as those discussed in =-=[22, 23, 24]-=-, augment a probability value to each relational tuple to specify its probability of presence. To derive information about data fuzziness, statistical databases [25] can be employed, which keep track ... |

217 | The Analysis of Time Series An Introduction - Chatfield - 2004 |

209 |
New sampling-based summary statistics for improving approximate query answers
- Gibbons, Matias
- 1998
(Show Context)
Citation Context ... exact answer E can be approximated by two sets: a certain set C which is the subset of E, and a possible set P such that C ∪ P is a superset of E. Other techniques like precomputation [13], sampling =-=[14]-=- and synopses [15] are used to produce statistical results. While these efforts investigate approximate answers based upon a subset of the (exact) values of the data, our work addresses probabilistic ... |

176 | Probview: a flexible probabilistic database system
- Lakshmanan, Leone, et al.
- 1997
(Show Context)
Citation Context ...s. Yazici et al. [21] discussed a comprehensive characterization of uncertainty for different data types, including fuzzy sets and fuzzy intervals. Probabilistic databases, such as those discussed in =-=[22, 23, 24]-=-, augment a probability value to each relational tuple to specify its probability of presence. To derive information about data fuzziness, statistical databases [25] can be employed, which keep track ... |

164 | Updating and querying databases that track mobile units. Distributed and Parallel Databases
- Wolfson, Sistla, et al.
- 1999
(Show Context)
Citation Context ... queried objects. The idea of probabilistic answers to queries over uncertain data in a constantly-changing database (such as sensors and moving-object database) was briefly studied by Wolfson et. al =-=[2]-=-. They considered range queries in the context of a moving object database. The objects were assumed to move in straight lines with a known average speed. The answers to the queries consist of objects... |

150 | Join Synopses for Approximate Query Answering
- Acharya, Gibbons, et al.
- 1999
(Show Context)
Citation Context ...s4 at time t0 will give us the result: ls4 , us4 , 1/(us4 − ls4 ). Now suppose an ERQ (represented by the interval [l, u]) is invoked at time t0 to find out how likely each reading is inside [l, u] = =-=[1, 15]-=-, and the uncertainty intervals derived from the readings of s1, s2, s3 and s4 at time t0 are [2, 12], [8, 18], [11, 21] and [25, 30] respectively. Since the reading of s1 is always inside [l, u], , i... |

149 | A model for the prediction of r-tree performance
- THEODORIDIS, SELLIS
- 1996
(Show Context)
Citation Context ...rval [l, u]) is invoked at time t0 to find out how likely each reading is inside [l, u] = [1, 15], and the uncertainty intervals derived from the readings of s1, s2, s3 and s4 at time t0 are [2, 12], =-=[8, 18]-=-, [11, 21] and [25, 30] respectively. Since the reading of s1 is always inside [l, u], , it has a probability of 1 for satisfying the ERQ. The reading of s4 is always outside [l, u], thus it has a pro... |

133 | Querying Imprecise Data in Moving Object Environments
- Cheng, Prabhakar, et al.
- 2002
(Show Context)
Citation Context ...often limited to the scope of aggregate functions. In contrast, our work adopts the notion of probability and provides a paradigm for answering general queries involving uncertainty. Although [2] and =-=[3]-=- also discuss probabilistic queries, they only consider probabilistic range queries and nearest-neighbor queries in a moving-object database model. We study a more general uncertainty 36smodel that ca... |

122 | Query Indexing and Velocity Constrained Indexing: Scalable Techniques for Continuous Queries on Moving Objects
- Prabhakar, Xia, et al.
- 2002
(Show Context)
Citation Context ...robability that x has the minimum value is affected by the relative value and bounds for y. In this paper we investigate how such queries are evaluated, with the aid of the Velocity-Constrained Index =-=[5]-=- and numerical techniques. A probabilistic answer also reflects a certain level of uncertainty that results from the uncertainty of the queried object values. If the uncertainty of all (or some) of th... |

107 | Efficient indexing methods for probabilistic threshold queries over uncertain data
- Cheng, Xia, et al.
- 2004
(Show Context)
Citation Context ...tudy a more general uncertainty 36smodel that can be applied to sensor data, and also define the quality of probabilistic query results which, to the best of our knowledge, has not been addressed. In =-=[19]-=-, the problem of indexing one-dimensional uncertain data for answering “probabilistic threshold range query” was studied. A probabilistic threshold range query is a probabilistic range query with an a... |

106 | Adaptive precision setting for cached approximate values
- Olston, Loo, et al.
- 2001
(Show Context)
Citation Context ...babilistic answers based upon all the (imprecise) values of the data. The problem of balancing the tradeoff between precision and performance for querying replicated data was studied by Olston et. al =-=[16, 17, 10]-=-. In their model, the cache in the server cannot keep track of the exact values of sensor sources due to limited network bandwidth. Instead of storing the actual value in the server’s cache, an interv... |

95 | Offering a precision-performance tradeoff for aggregation queries over replicated data
- Olston, Widom
(Show Context)
Citation Context ...babilistic answers based upon all the (imprecise) values of the data. The problem of balancing the tradeoff between precision and performance for querying replicated data was studied by Olston et. al =-=[16, 17, 10]-=-. In their model, the cache in the server cannot keep track of the exact values of sensor sources due to limited network bandwidth. Instead of storing the actual value in the server’s cache, an interv... |

95 | Indexing Multi-dimensional Uncertain Data with Arbitrary Probability Density Functions
- Tao
(Show Context)
Citation Context ...tic threshold range query is a probabilistic range query with an additional requirement that only objects with probability higher than a user-defined value are qualified as answers. A recent paper in =-=[20]-=- extends the indexing solution to support uncertain data in high-dimensional space. The problems studied in those papers are different from ours in three aspects: (1) the uncertainty intervals in thos... |

65 | Best-effort cache synchronization with source cooperation
- Olston, Widom
- 2002
(Show Context)
Citation Context ... assume that the sensors cooperate with the central server i.e., a sensor can respond to update requests from the sensor by sending the newest value to the server, as in the system model described in =-=[10]-=-. Suppose after the execution of a probabilistic query, some slack time is available for the query. The server can improve the quality of the answers to that query by requesting updates from sensors, ... |

60 | OLAP and statistical databases: similarities and differences
- Shoshani
- 1997
(Show Context)
Citation Context ...d at time t0 to find out how likely each reading is inside [l, u] = [1, 15], and the uncertainty intervals derived from the readings of s1, s2, s3 and s4 at time t0 are [2, 12], [8, 18], [11, 21] and =-=[25, 30]-=- respectively. Since the reading of s1 is always inside [l, u], , it has a probability of 1 for satisfying the ERQ. The reading of s4 is always outside [l, u], thus it has a probability of 0 of being ... |

53 | Querying the Uncertain Position of Moving Objects
- Sistla, Wolfson, et al.
- 1998
(Show Context)
Citation Context ...rates that the database does not always truly capture the state of the external world, and the value of the sensor readings can change without being recx 0 y 0sognized by the database. Sistla et. al. =-=[14]-=- identify this type of data as a dynamic attribute, whose value changes over time even if it is not explicitly updated in the database. In this example, because the exact values of the data items are ... |

45 | The complexity of query reliability
- Grädel, Gurevich, et al.
- 1998
(Show Context)
Citation Context ..., defining “fuzzy queries” such as “Which employee has a low salary?”, while our work is about uncertainty in the numerical domain. The quality of a query for a probabilistic database is discussed in =-=[26]-=-, where an “observed” database is augmented with a probability function on the truth values of a set of atomic statements about the database. They also assume the actual database is known, which is no... |

31 | Efficient Evaluation of Continuous Range Queries on Moving Objects
- Kalashnikov, Prabhakar, et al.
- 2002
(Show Context)
Citation Context ... the context of a movingobject environment. We extend their ideas significantly by providing probabilistic guarantees to general queries for a generic model of uncertainty. Other related work include =-=[12, 3, 6]-=-. 9. CONCLUSIONS In this chapter we studied the problem of augmenting probability information to queries over uncertain data. We propose a flexible model of uncertainty, which is defined by (1) an low... |

23 | Querying the Trajectories of OnLine Mobile Objects
- Pfoser, Jensen
(Show Context)
Citation Context ...ly to a vast class of applications dealing with constantlychanging environments. Our techniques are also compatible with common models of uncertainty that 3shave been proposed elsewhere e.g., [2] and =-=[4]-=-. The probabilities in the answer allow the user to place appropriate confidence in the answer as opposed to having an incorrect answer or no answer at all. Depending upon the application, one may cho... |

23 | On computing functions with uncertainty
- Khanna, Tan
(Show Context)
Citation Context ...rval [l, u]) is invoked at time t0 to find out how likely each reading is inside [l, u] = [1, 15], and the uncertainty intervals derived from the readings of s1, s2, s3 and s4 at time t0 are [2, 12], =-=[8, 18]-=-, [11, 21] and [25, 30] respectively. Since the reading of s1 is always inside [l, u], , it has a probability of 1 for satisfying the ERQ. The reading of s4 is always outside [l, u], thus it has a pro... |

19 |
Fast approximate query answering using precomputed statistics
- Poosala, Ganti
(Show Context)
Citation Context ...ngle value). An exact answer E can be approximated by two sets: a certain set C which is the subset of E, and a possible set P such that C ∪ P is a superset of E. Other techniques like precomputation =-=[13]-=-, sampling [14] and synopses [15] are used to produce statistical results. While these efforts investigate approximate answers based upon a subset of the (exact) values of the data, our work addresses... |

11 |
Querying the Uncertain Position of Moving Objects. Temporal Databases: Research and Practice
- Sistla, Wolfson, et al.
- 1399
(Show Context)
Citation Context ...e (x0 and y0), the query returns “x” as the result. In reality, the 2 x0 x (c) y y0stemperature readings could have changed to values x1 and y1, in which case “y” is the correct answer. Sistla et. al =-=[1]-=- identify this type of data as a dynamic attribute, whose value changes over time even if it is not explicitly updated in the database. In this example, the database incorrectly assumes that the recor... |

10 | Time Series Forecasting; Chapman & Hall/CRC - Chatfield - 2000 |

6 |
Uncertainty in a nested relational database model
- Yazici, Soysal, et al.
- 1999
(Show Context)
Citation Context ...u]) is invoked at time t0 to find out how likely each reading is inside [l, u] = [1, 15], and the uncertainty intervals derived from the readings of s1, s2, s3 and s4 at time t0 are [2, 12], [8, 18], =-=[11, 21]-=- and [25, 30] respectively. Since the reading of s1 is always inside [l, u], , it has a probability of 1 for satisfying the ERQ. The reading of s4 is always outside [l, u], thus it has a probability o... |

5 |
Producing approximate answers to set- and singlevalued queries
- Vrbsky, Liu
- 1994
(Show Context)
Citation Context ... the interval [l, u]) is invoked at time t0 to find out how likely each reading is inside [l, u] = [1, 15], and the uncertainty intervals derived from the readings of s1, s2, s3 and s4 at time t0 are =-=[2, 12]-=-, [8, 18], [11, 21] and [25, 30] respectively. Since the reading of s1 is always inside [l, u], , it has a probability of 1 for satisfying the ERQ. The reading of s4 is always outside [l, u], thus it ... |

1 | there is no generic solution that is applicable to all situations. Instead, active participation of an expert in time series analysis is required, who tries different methods and possibly utilizes some domain-specific knowledge. A time series is a set of - Currently |