#### DMCA

## What you can do with coordinated samples (2013)

Venue: | In The 17th. International Workshop on Randomization and Computation (RANDOM |

Citations: | 4 - 3 self |

### Citations

1024 | Approximate nearest neighbors: towards removing the curse of dimensionality
- Indyk, Motwani
- 1998
(Show Context)
Citation Context ...imates obtained over coordinated samples are much more accurate than possible with independent samples. Used this way, coordinated sampling can be casted as a form of Locality Sensitive Hashing (LSH) =-=[29, 24, 28]-=-. Lastly, coordinated samples can sometimes be obtained much more efficiently than independent samples. One example is computing samples of the d-neighborhoods of all nodes in a graph [7, 10, 31, 11, ... |

641 | Similarity search in high dimensions via hashing
- Gionis, Indyk, et al.
(Show Context)
Citation Context ...imates obtained over coordinated samples are much more accurate than possible with independent samples. Used this way, coordinated sampling can be casted as a form of Locality Sensitive Hashing (LSH) =-=[29, 24, 28]-=-. Lastly, coordinated samples can sometimes be obtained much more efficiently than independent samples. One example is computing samples of the d-neighborhoods of all nodes in a graph [7, 10, 31, 11, ... |

499 | the resemblance and containment of documents
- Broder
- 1998
(Show Context)
Citation Context ...rdinated samples of instances are used as synopses which facilitate efficient estimation of multi-instance functions such as distinct counts (cardinality of set unions), sum of maxima, and similarity =-=[5, 4, 7, 18, 31, 22, 23, 6, 19, 12, 2, 25, 13, 17]-=-. Estimates obtained over coordinated samples are much more accurate than possible with independent samples. Used this way, coordinated sampling can be casted as a form of Locality Sensitive Hashing (... |

476 |
A generalization of sampling without replacement from a finite universe
- Horvitz, Thompson
- 1952
(Show Context)
Citation Context ... the sum aggregate since a variance component that is due to bias “adds up” with aggregation whereas otherwise the relative error “cancels out” with aggregation. 3 The Horvitz-Thompson (HT) estimator =-=[27]-=- is a classic sum estimator which is unbiased and nonnegative. To estimate f(v), the HT estimator outputs 0 when the value is not sampled and the inverse-probability estimate f(v)/p when the value is ... |

322 | Stable distributions, pseudorandom generators, embeddings and data stream computation.
- Indyk
- 2006
(Show Context)
Citation Context ...imates obtained over coordinated samples are much more accurate than possible with independent samples. Used this way, coordinated sampling can be casted as a form of Locality Sensitive Hashing (LSH) =-=[29, 24, 28]-=-. Lastly, coordinated samples can sometimes be obtained much more efficiently than independent samples. One example is computing samples of the d-neighborhoods of all nodes in a graph [7, 10, 31, 11, ... |

277 | Google news personalization: Scalable online collaborative filtering
- Das, Datar, et al.
- 2007
(Show Context)
Citation Context ...rdinated samples of instances are used as synopses which facilitate efficient estimation of multi-instance functions such as distinct counts (cardinality of set unions), sum of maxima, and similarity =-=[5, 4, 7, 18, 31, 22, 23, 6, 19, 12, 2, 25, 13, 17]-=-. Estimates obtained over coordinated samples are much more accurate than possible with independent samples. Used this way, coordinated sampling can be casted as a form of Locality Sensitive Hashing (... |

241 | Informed content delivery across adaptive overlay networks
- Byers, Considine, et al.
- 2002
(Show Context)
Citation Context ...rdinated samples of instances are used as synopses which facilitate efficient estimation of multi-instance functions such as distinct counts (cardinality of set unions), sum of maxima, and similarity =-=[5, 4, 7, 18, 31, 22, 23, 6, 19, 12, 2, 25, 13, 17]-=-. Estimates obtained over coordinated samples are much more accurate than possible with independent samples. Used this way, coordinated sampling can be casted as a form of Locality Sensitive Hashing (... |

158 | Size-estimation framework with applications to transitive closure and reachability
- Cohen
- 1997
(Show Context)
Citation Context |

156 | Identifying and Filtering NearDuplicate Documents, COM’00
- Broder
- 2000
(Show Context)
Citation Context |

119 | Distinct sampling for highly-accurate answers to distinct values queries and event reports
- Gibbons
- 2001
(Show Context)
Citation Context |

110 | Estimating simple functions on the union of data streams.
- Gibbons, Tirthapura
- 2001
(Show Context)
Citation Context |

75 | Computing separable functions via gossip
- Mosk-Aoyama, Shah
- 2006
(Show Context)
Citation Context |

56 | On synopses for distinctvalue estimation under multiset operations - BEYER, HAAS, et al. |

37 | Spatially-decaying aggregation over a network: model and algorithms
- Cohen, Kaplan
(Show Context)
Citation Context ...SH) [29, 24, 28]. Lastly, coordinated samples can sometimes be obtained much more efficiently than independent samples. One example is computing samples of the d-neighborhoods of all nodes in a graph =-=[7, 10, 31, 11, 12, 8]-=-. Similarity queries between neighborhoods are useful in the analysis of massive graph datasets such as social networks or Web graphs. Our aim here is to study the potential and limitations of estimat... |

36 | Priority sampling for estimation of arbitrary subset sums.
- Duffield, Lund, et al.
- 2007
(Show Context)
Citation Context ...ze [26], where each item is included with probability proportion to its value) are obtained using the rank function r(u, v) = v/u and a fixed T (h) across items. Priority (sequential Poisson) samples =-=[33, 20, 38]-=- are bottom-k samples utilizing the PPS ranks ∗Microsoft Research, SVC edith@cohenwang.com †The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel. haimk@cs.tau.ac.il 1This is... |

32 |
Sampling from a finite population.
- Hajek
- 1981
(Show Context)
Citation Context ...item h is sampled ⇐⇒ r(u(h), v(h)) ≥ T (h), where T (h) are fixed thresholds, whereas a bottom-k sample includes the k items with highest ranks.2 Poisson PPS samples (Probability Proportional to Size =-=[26]-=-, where each item is included with probability proportion to its value) are obtained using the rank function r(u, v) = v/u and a fixed T (h) across items. Priority (sequential Poisson) samples [33, 20... |

30 | Summarizing data using bottom-k sketches.
- Cohen, Kaplan
- 2007
(Show Context)
Citation Context ... Tel Aviv University, Tel Aviv, Israel. haimk@cs.tau.ac.il 1This is the full version of a RANDOM 2013 paper 2The term bottom-k is due to historic usage of the inverse rank function and lowest k ranks =-=[35, 36, 11, 12, 13]-=- 1 items: 1 2 3 4 5 6 7 8 Instance1: 1 0 4 1 0 2 3 1 Instance2: 3 2 1 0 2 3 1 0 PPS sampling probabilities for T=4 (sample of expected size 3): Instance1: 0.25 0.00 1.00 0.25 0.00 0.50 0.75 0.25 Insta... |

26 |
Weighted random sampling with a reservoir
- Efraimidis, Spirakis
- 2006
(Show Context)
Citation Context ...le, item 1 will always (for any drawing of seeds) be sampled in instance 2 if it is sampled in instance 1 and vice versa for item 7. r(u, v) = v/u and successive weighted sampling without replacement =-=[35, 21, 11]-=- corresponds to bottom-k samples with the rank function r(u, v) = −v/ ln(u). Samples of different instances are coordinated when the set of random seeds u(h) is shared across instances. Scalable shari... |

25 |
Asymptotic theory for successive sampling with varying probabilities without replacement
- Rosén
- 1972
(Show Context)
Citation Context ... Tel Aviv University, Tel Aviv, Israel. haimk@cs.tau.ac.il 1This is the full version of a RANDOM 2013 paper 2The term bottom-k is due to historic usage of the inverse rank function and lowest k ranks =-=[35, 36, 11, 12, 13]-=- 1 items: 1 2 3 4 5 6 7 8 Instance1: 1 0 4 1 0 2 3 1 Instance2: 3 2 1 0 2 3 1 0 PPS sampling probabilities for T=4 (sample of expected size 3): Instance1: 0.25 0.00 1.00 0.25 0.00 0.50 0.75 0.25 Insta... |

24 |
Selecting Several Samples From a Single Population
- Brewer, Early, et al.
- 1972
(Show Context)
Citation Context ...instance does not depend on values assumed in other instances, which is important for scalable deployment. Why coordinate samples? Sample coordination was proposed in 1972 by Brewer, Early, and Joice =-=[3]-=-, as a method to maximize overlap and therefore minimize overhead in repeated surveys [37, 34, 36]: The values of items change, and therefore there is a new set of PPS sampling probabilities. With coo... |

24 | Hashed samples: Selectivity estimators for set similarity selection queries
- Hadjieleftheriou, Yu, et al.
- 2008
(Show Context)
Citation Context |

21 | Estimating arbitrary subset sums with few probes,”
- Alon, Duffield, et al.
- 2005
(Show Context)
Citation Context ...ples are efficient to compute, also when instances are presented as streams or are distributed across multiple servers. It is convenient to specify these sampling schemes through a rank function, r : =-=[0, 1]-=- × V → R, which maps seed-value pairs to a number r(u, v) that is non-increasing with u and non-decreasing with v. For each item h we draw a seed u(h) ∼ U [0, 1] uniformly at random and compute the ra... |

21 |
The DLT priority sampling is essentially optimal
- Szegedy
(Show Context)
Citation Context ...ze [26], where each item is included with probability proportion to its value) are obtained using the rank function r(u, v) = v/u and a fixed T (h) across items. Priority (sequential Poisson) samples =-=[33, 20, 38]-=- are bottom-k samples utilizing the PPS ranks ∗Microsoft Research, SVC edith@cohenwang.com †The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel. haimk@cs.tau.ac.il 1This is... |

20 |
Asymptotic theory for order sampling.
- Rosen
- 1997
(Show Context)
Citation Context ... Tel Aviv University, Tel Aviv, Israel. haimk@cs.tau.ac.il 1This is the full version of a RANDOM 2013 paper 2The term bottom-k is due to historic usage of the inverse rank function and lowest k ranks =-=[35, 36, 11, 12, 13]-=- 1 items: 1 2 3 4 5 6 7 8 Instance1: 1 0 4 1 0 2 3 1 Instance2: 3 2 1 0 2 3 1 0 PPS sampling probabilities for T=4 (sample of expected size 3): Instance1: 0.25 0.00 1.00 0.25 0.00 0.50 0.75 0.25 Insta... |

16 |
Sequential Poisson sampling
- Ohlsson
- 1998
(Show Context)
Citation Context ...ze [26], where each item is included with probability proportion to its value) are obtained using the rank function r(u, v) = v/u and a fixed T (h) across items. Priority (sequential Poisson) samples =-=[33, 20, 38]-=- are bottom-k samples utilizing the PPS ranks ∗Microsoft Research, SVC edith@cohenwang.com †The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel. haimk@cs.tau.ac.il 1This is... |

13 | Tighter estimation using bottom-k sketches
- Cohen, Kaplan
- 2008
(Show Context)
Citation Context |

12 |
When piecewise determinism is almost true
- Cohen, Wang, et al.
- 1995
(Show Context)
Citation Context |

9 | Leveraging discarded samples for tighter estimation of multiple-set aggregates
- Cohen, Kaplan
- 2009
(Show Context)
Citation Context |

7 | Coordinated weighted sampling for estimating aggregates over multiple weight assignments
- Cohen, Kaplan, et al.
(Show Context)
Citation Context |

7 | Coordination of pps samples over time
- Ohlsson
- 2000
(Show Context)
Citation Context ...lable deployment. Why coordinate samples? Sample coordination was proposed in 1972 by Brewer, Early, and Joice [3], as a method to maximize overlap and therefore minimize overhead in repeated surveys =-=[37, 34, 36]-=-: The values of items change, and therefore there is a new set of PPS sampling probabilities. With coordination, the sample of the new instance is as similar as possible to the previous sample, and th... |

7 |
Fixed sample size pps approximations with a permanent random number
- Saavedra
- 1995
(Show Context)
Citation Context ...lable deployment. Why coordinate samples? Sample coordination was proposed in 1972 by Brewer, Early, and Joice [3], as a method to maximize overlap and therefore minimize overhead in repeated surveys =-=[37, 34, 36]-=-: The values of items change, and therefore there is a new set of PPS sampling probabilities. With coordination, the sample of the new instance is as similar as possible to the previous sample, and th... |

6 | Scalable similarity estimation in social networks: Closeness, node labels, and random edge length
- Cohen, Delling, et al.
- 2013
(Show Context)
Citation Context ...ioritizes data on which the estimated function is smaller whereas the U∗ estimator prioritizes large values. We demonstrate the potential for applications in [16], for Lp difference estimation and in =-=[9]-=-, for sketch-based similarity estimation in massive graphs. A natural question is to bound the best possible competitive ratio. That is, the supremum over instances (data domain, shared-seed sampling ... |

4 |
All-distances sketches, revisited: Scalable estimation of the distance distribution and centralities in massive graphs
- Cohen
(Show Context)
Citation Context ...threshold value 4, so item with value v is sampled with probability min{1, v/4}. To obtain two coordinated PPS samples of the instances, we associate an independent u(i) ∼ U [0, 1] with each item i ∈ =-=[8]-=-. We then sample i ∈ [8] in instance h ∈ [2] if and only if u(i) ≤ vh(i)/4, where vh(i) is the value of i in instance h. When coordinating the samples this way, we make them as similar as possible. In... |

2 | Get the most out of your sample: Optimal unbiased estimators using partial information - Cohen, Kaplan - 2011 |

2 |
A case for customizing estimators: Coordinated samples
- Cohen, Kaplan
- 2012
(Show Context)
Citation Context ...ork uses a fresh, CS-inspired, and unified approach to the study of estimators that is particularly suitable for data analysis from samples and sets the ground for continued work. In a follow up work =-=[15]-=-, aiming for good performance in practice, we seek estimators that are variance optimal (admissible [32]), that is, can not be strictly improved. We show that there is a range of variance optimal and ... |

2 |
On UMV-estimators in survey sampling
- Lanke
- 1973
(Show Context)
Citation Context ...wn in the statistics literature as UMVUE (uniform minimum variance unbiased) estimators [32]. Generally however, an (unbiased, nonnegative, linear) estimator with minimum variance on all data vectors =-=[30]-=- may not exist. Simple examples show that this is also the case for our coordinated sampling model even when restricting our attention to particular natural functions. We are therefore introducing and... |

2 |
Theory and Methods of Survey Sampling
- Mukhopandhyay
- 2008
(Show Context)
Citation Context ...ich subject to desired properties of the estimator, minimizes the variance for all data. Such estimators are known in the statistics literature as UMVUE (uniform minimum variance unbiased) estimators =-=[32]-=-. Generally however, an (unbiased, nonnegative, linear) estimator with minimum variance on all data vectors [30] may not exist. Simple examples show that this is also the case for our coordinated samp... |

1 |
How to estimate change from samples
- Cohen, Kaplan
- 2012
(Show Context)
Citation Context ...g at data tuples where the maximum has sampling probability 1.) Interestingly, for RG2 it is possible to get a bounded ratio in terms of variance. More details are in the companion experimental paper =-=[16]-=-. 4 The lower bound function and its lower hull For a function f , we define the respective lower bound function f and the lower hull function Hf . We then characterize, in terms of properties of f an... |