MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Random Sampling from Databases (1993) [82 citations — 1 self]

Abstract:

Random Sampling from Databases by Frank Olken Doctor of Philosophy in Computer Science University of California at Berkeley Professor Michael Stonebraker, Chair In this thesis I describe efficient methods of answering random sampling queries of relational databases, i.e., retrieving random samples of the results of relational queries. I begin with a discussion of the motivation for including sampling operators in the database management system (DBMS). Uses include auditing, estimation (e.g., approximate answers to aggregate queries), and query optimization. The second chapter contains a review of the basic file sampling methods used in the thesis: acceptance/rejection sampling, reservoir sampling, and partial sum (ranked) tree sampling. I describe their usage for sampling from variably blocked files, and sampling from results as they are generated. Related literature on sampling from databases is reviewed. In Chapter Three I show how acceptance/rejection sampling of B + trees can be...

Citations

2573 Classification and Regression Trees – Breiman, Friedman, et al. - 1984
1014 The Design and Analysis of Spatial Data Structures – Samet - 1989
778 Image Analysis and Mathematical Morphology – Serra - 1988
413 An Introduction to Database Systems – Date - 2000
324 The quadtree and related hierarchical data structures – SAMET - 1984
302 The Jackknife, the Bootstrap and Other Resampling Plans, (Philadelphia, Society for Industrial and Applied Mathematics – Efron - 1982
290 The Art of Computer Programming, Vol.3: Sorting and Searching – Knuth - 1973
257 Simulation and the Monte Carlo method – Rubinstein - 1981
256 Application of Spatial Data Structures – Samet - 1989
231 Sampling Techniques – Cochran - 1977
218 The Art of Computer Programming, Vol. 2 (Seminumerical Algorithms – Knuth - 1969
213 Probabilistic counting algorithms for data base applications – Flajolet, Martin - 1985
211 Sequential Analysis – Wald - 1947
187 Deriving production rules for incremental view maintenance – Ceri, Widom - 1991
166 Efficiently Updating Materialized Views – Blakeley, Larson, et al. - 1986
162 D.J.: Equi-depth histograms for estimating selectivity factors for multi-dimensional queries – Muralikrishna, DeWitt - 1988
157 Random sampling with a reservoir – Vitter - 1985
148 Introduction to Statistical Quality Control – Montgomery - 1991
143 Practical selectivity estimation through adaptive sampling – Lipton, Naughton - 1990
143 Classi cation and Regression Trees – Breiman, Friedman, et al. - 1984
139 Updating derived relations: Detecting irrelevant and autonomously computable updates – Blakeley, Coburn, et al. - 1989
138 Accurate Estimation of the Number of Tuples Satisfying a Condition – Piatetsky-Shapiro, Connell - 1984
126 Implementation of integrity constraints and views by query modification – Stonebraker - 1975
116 Introduction to the Theory of Coverage Processes – Hall - 1998
115 Object and File Management in the EXODUS Extensible Database System – Carey, DeWitt, et al. - 1986
110 Probability Approximations via the PoissonClumping Heuristic – Aldous - 1989
97 Extendible hashing { a fast access method for dynamic les – Fagin - 1979
80 A Performance Analysis of View Materialization Strategy – Hanson - 1987
79 Sequential sampling procedures for query size estimation – Haas, Swami - 1992
79 Urn Models and Their Application – JOHNSON, KOTZ - 1977
70 Parallel Sorting on a Shared-Nothing Architecture Using Probabilistic Splitting – DeWitt, Naughton, et al. - 1991
66 Practical Skew Handling in Parallel Joins – DeWitt, Naughton, et al. - 1992
59 A linear-time probabilistic counting algorithm for database applications – Whang, Vander-Zanden, et al. - 1990
58 Differential files: Their application to the maintenance of large databases – Severance, Lehman - 1976
56 Probabilistic counting – Flajolet, Martin - 1983
54 Implications of certain assumptions in database performance evaluation – Christodoulakis - 1984
54 Secure statistical databases with random sample queries – Denning - 1980
52 Statistical estimators for relational algebra expressions – Hou, Ozsoyoglu, et al. - 1988
52 Processing aggregate relational queries with hard time constraints – Hou, Ozsoyoglu, et al. - 1989
52 Updating distributed materialized views – Segev, Park - 1989
52 E ciently Monitoring Relational Databases – Buneman, Clemons - 1979
48 Database Snapshots – Adiba, Lindsay - 1980
42 A Snapshot Differential Refresh Algorithm – Lindsay, Hass, et al. - 1986
39 hashing: a New Tool for File and Table Addressing – Linear - 1980
39 Simple random sampling from relational databases – Olken, Rotem - 1986
38 The tracker: A threat to statistical database security – Denning, Denning, et al. - 1979
36 E ciently Updating Materialized Views – Blakeley, Larson, et al. - 1986
33 Dynamic query optimization in Rdb/VMS – Antoshenkov - 1993
32 Estimating the number of species: A review – Bunge, Fitzpatrick - 1993
31 Approximate counting: a detailed analysis – Flajolet - 1985