## Indexing multi-dimensional uncertain data with arbitrary probability density functions (2005)

### Cached

### Download Links

- [www4.comp.polyu.edu.hk]
- [www.cse.cuhk.edu.hk]
- [www.vldb2005.org]
- [vldb.idi.ntnu.no]
- [www.vldb.org]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proc. VLDB |

Citations: | 90 - 14 self |

### BibTeX

@INPROCEEDINGS{Tao05indexingmulti-dimensional,

author = {Yufei Tao and Reynold Cheng and Xiaokui Xiao and Wang Kay Ngai and Ben Kao and Sunil Prabhakar},

title = {Indexing multi-dimensional uncertain data with arbitrary probability density functions},

booktitle = {In Proc. VLDB},

year = {2005},

pages = {922--933}

}

### Years of Citing Articles

### OpenURL

### Abstract

In an “uncertain database”, an object o is associated with a multi-dimensional probability density function (pdf), which describes the likelihood that o appears at each position in the data space. A fundamental operation is the “probabilistic range search ” which, given a value pq and a rectangular area rq, retrieves the objects that appear in rq with probabilities at least pq. In this paper, we propose the U-tree, an access method designed to optimize both the I/O and CPU time of range retrieval on multi-dimensional imprecise data. The new structure is fully dynamic (i.e., objects can be incrementally inserted/deleted in any order), and does not place any constraints on the data pdfs. We verify the query and update efficiency of U-trees with extensive experiments. 1

### Citations

8512 |
Introduction to Algorithms
- Cormen, Leiserson, et al.
- 1991
(Show Context)
Citation Context ...the 2m linear constraints shown in inequalities 12 and 13. Linear programming has been very well studied and numerous efficient solutions exist. In our implementation, we adopt the well-known Simplex =-=[7]-=- method. So far we have focused on computing cfbout, while a similar approach can be utilized to obtain cfb in. Since cfbin(pj) is always enclosed by pcr(pj), we aim at maximizing a metric identical t... |

977 | The R*-tree: An Efficient and Robust Access Method for Point and Rectangles
- Beckmann, Kriegel, et al.
(Show Context)
Citation Context ...ections on the temperature-, humidity-, UV-index dimensions described by the corresponding ranges, respectively. Although conventional range search (on a “precise” dataset) has been very well studied =-=[1, 3]-=-, its solutions are not applicable to uncertain data, since they do not consider the probabilistic requirements [6]. As explained later, the key of optimizing a prob-range query is to avoid, as much a... |

505 | The X-tree: An index structure for high-dimensional data
- Berchtold, Keim, et al.
- 1996
(Show Context)
Citation Context ...ections on the temperature-, humidity-, UV-index dimensions described by the corresponding ranges, respectively. Although conventional range search (on a “precise” dataset) has been very well studied =-=[1, 3]-=-, its solutions are not applicable to uncertain data, since they do not consider the probabilistic requirements [6]. As explained later, the key of optimizing a prob-range query is to avoid, as much a... |

346 | Efficient Query Evaluation on Probabilistic Databases
- Dalvi, Suciu
- 2004
(Show Context)
Citation Context ...s with arbitrary pdfs (e.g., one method targets only uniform pdfs), and (ii) they may incur large actual execution overhead due to the hidden constants in their complexity guarantees. Dalvi and Suciu =-=[8]-=- discuss “probabilistic databases”, where each record is the same as a tuple in a conventional database, except that it is associated with an “existential” probability. For example, a 60% existential ... |

317 | Indexing the positions of continuously moving objects
- Saltenis, Jensen, et al.
- 2000
(Show Context)
Citation Context ...es e1 and e2 of the resulting nodes have small MBR(p) for all values p = p1, ..., pm in the U-catalog. Therefore, ideally, the best split should be obtained by per5 A similar technique was applied in =-=[11]-=- to convert R*-trees to a spatio-temporal index. 930 forming a sorting at each pj (1 ≤ j ≤ m) which, unfortunately, incurs expensive overhead. We avoid so many sorting operations using a simple heuris... |

219 | S.: Evaluating probabilistic queries over imprecise data
- Cheng, Kalashnikov, et al.
(Show Context)
Citation Context ...tions of moving objects. In this context, query algorithms aim at minimizing the amount of data transmission (for updating the central server) to ensure the precision of database values. Cheng et al. =-=[4]-=- are the first to formulate uncertain retrieval in general domains. They present an interesting taxonomy of novel query types, together with the corresponding processing strategies. An I/O efficient a... |

211 | Trio: A System for Integrated Management of Data, Accuracy, and Lineage
- Widom
- 2005
(Show Context)
Citation Context ...place any constraints on the data pdfs. We verify the query and update efficiency of U-trees with extensive experiments. 1 Introduction Uncertain databases are gaining considerable attention recently =-=[13]-=-. In such a system, tuples may not accurately capture the properties of real-world entities, which is an inherent property of numerous applications that manage “dynamic attributes” [14] with continuou... |

183 | A cost model for nearest neighbor search in high-dimensional data space
- Berchtold, Bohm, et al.
- 1997
(Show Context)
Citation Context ... to the mean. In this case, Equation 2 cannot be derived into a formula without any integrals, and hence, must be evaluated numerically through, for example, the following “monte-carlo” 924 approach1 =-=[2]-=-. First, a number n1 of points x1, x2, ..., xn1 are randomly generated in the uncertainty region o.ur of an object o. Without loss of generality, assume that n2 of these points fall into the search re... |

156 | Updating and Querying Databases that Track Mobile Units
- Wolfson, Sistla, et al.
- 1999
(Show Context)
Citation Context ...he properties of real-world entities, which is an inherent property of numerous applications that manage “dynamic attributes” [14] with continuously changing values. To enable location-based services =-=[15]-=-, for instance, a moving client informs a server about its coordinates, if its distance from the previously reported location exceeds Permission to copy without fee all or part of this material is gra... |

147 | A model for the prediction of r-tree performance
- Theodoridis, Sellis
(Show Context)
Citation Context ...ting issue is to investigate the algorithms that deploy U-trees to solve other types of queries (e.g., those defined in [4]). Another exciting direction for future work is to derive analytical models =-=[12]-=- that can accurately estimate the query costs. Such models can be utilized to facilitate query optimization, which is also an important topic to be studied. Acknowledgements Yufei Tao and Xiaokui Xiao... |

127 | Querying Imprecise Data in Moving Object Environments
- Cheng, Kalashnikov, et al.
- 2004
(Show Context)
Citation Context ...eneral domains. They present an interesting taxonomy of novel query types, together with the corresponding processing strategies. An I/O efficient algorithm for nearest neighbor search is proposed in =-=[5]-=-. None of the above works considers prob-range retrieval. Cheng et al. [6] develop several solutions for prob-range queries which, however, target 1D space only. They argue that range search in uncert... |

118 | Capturing the uncertainty of moving-object representations
- foser, Jensen
- 2008
(Show Context)
Citation Context ...R*-tree, which is an effective multidimensional access method for range queries on precise data, and is fundamental to the subsequent discussion. 2.1 Query Processing on Imprecise Data Early research =-=[10, 14, 15]-=- primarily focuses on various data models for accurately capturing the locations of moving objects. In this context, query algorithms aim at minimizing the amount of data transmission (for updating th... |

104 | Efficient indexing methods for probabilistic threshold queries over uncertain data
- Cheng, Xia, et al.
- 2004
(Show Context)
Citation Context ...ugh conventional range search (on a “precise” dataset) has been very well studied [1, 3], its solutions are not applicable to uncertain data, since they do not consider the probabilistic requirements =-=[6]-=-. As explained later, the key of optimizing a prob-range query is to avoid, as much as possible, computing the appearance probability that an object satisfies a query. Such computation is expensive (e... |

84 | Cost and Imprecision in Modeling the Position of Moving Objects
- Wolfson, Chamberlain, et al.
- 1998
(Show Context)
Citation Context ...ntion recently [13]. In such a system, tuples may not accurately capture the properties of real-world entities, which is an inherent property of numerous applications that manage “dynamic attributes” =-=[14]-=- with continuously changing values. To enable location-based services [15], for instance, a moving client informs a server about its coordinates, if its distance from the previously reported location ... |

77 |
Towards an analysis of range query performance in spatial data structures
- PAGEL, SIX, et al.
- 1993
(Show Context)
Citation Context ...the margin (i.e., perimeter) of each MBR, (iii) the overlap between two MBRs in the same node, and (iv) the distance between the centroid of an MBR and that of the node containing it. As discussed in =-=[9]-=-, minimization of these metrics decreases the probability that an MBR intersects a query region. 3 Problem Definition Formally, an “uncertain object” o is associated with (i) a probability density fun... |