• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Very sparse stable random projections, estimators and tail bounds for stable random projections (2006)

by P Li
Add To MetaCart

Tools

Sorted by:
Results 1 - 5 of 5

A sketch algorithm for estimating two-way and multi-way associations

by Ping Li, Kenneth W. Church - Computational Linguistics , 2007
"... We should not have to look at the entire corpus (e.g., the Web) to know if two (or more) words are strongly associated or not. One can often obtain estimates of associations from a small sample. We develop a sketch-based algorithm that constructs a contingency table for a sample. One can estimate th ..."
Abstract - Cited by 27 (13 self) - Add to MetaCart
We should not have to look at the entire corpus (e.g., the Web) to know if two (or more) words are strongly associated or not. One can often obtain estimates of associations from a small sample. We develop a sketch-based algorithm that constructs a contingency table for a sample. One can estimate the contingency table for the entire population using straightforward scaling. However, one can do better by taking advantage of the margins (also known as document frequencies). The proposed method cuts the errors roughly in half over Broder’s sketches. 1.
(Show Context)

Citation Context

...ing techniques (Chaudhuri, Motwani, and Narasayya 1998; Indyk and Motwani 1998; Manku, Rajagopalan, and Lindsay 1999; Charikar 2002; Achlioptas 2003; Gilbert et al. 2003; Li, Hastie, and Church 2007; =-=Li 2006-=-), which are useful for numerous applications such as association rules (Brin et al. 1997; Brin, Motwani, and Silverstein 1997), clustering (Guha, Rastogi, and Shim 1998; Broder 1998; Aggarwal et al. ...

Conditional random sampling: A sketch-based sampling technique for sparse data

by Ping Li, Kenneth W. Church, Trevor J. Hastie - In NIPS , 2006
"... We1 develop Conditional Random Sampling (CRS), a technique particularly suit-able for sparse data. In large-scale applications, the data are often highly sparse. CRS combines sketching and sampling in that it converts sketches of the data into conditional random samples online in the estimation stag ..."
Abstract - Cited by 23 (14 self) - Add to MetaCart
We1 develop Conditional Random Sampling (CRS), a technique particularly suit-able for sparse data. In large-scale applications, the data are often highly sparse. CRS combines sketching and sampling in that it converts sketches of the data into conditional random samples online in the estimation stage, with the sample size determined retrospectively. This paper focuses on approximating pairwise l2 and l1 distances and comparing CRS with random projections. For boolean (0/1) data, CRS is provably better than random projections. We show using real-world data that CRS often outperforms random projections. This technique can be applied in learning, data mining, information retrieval, and database query optimizations. 1
(Show Context)

Citation Context

... 2 + � � Var ˆd (1) CRP,MLE,c = 2[d(1) ] 2 k CRP,MLE “ ” = 0. (15) 2 ˆd (1) CRP,MLE + 3[d(1) ] 2 � 1 + O k2 k3 � . (16) 4.3 General Stable Random Projections for Dimension Reduction in lp (0 < p ≤ 2) =-=[10]-=- generalized the bias-corrected geometric mean estimator to general stable random projections for dimension reduction in lp (0 < p ≤ 2), and provided the theoretical variances and exponential tail bou...

On estimating frequency moments of data streams

by Sumit Ganguly, Graham Cormode - In International Workshop on Randomization and Approximation Techniques in Computer Science , 2007
"... Abstract. Space-economical estimation of the pth frequency moments, defined as Fp = P n i=1 |fi|p, for p> 0, are of interest in estimating all-pairs distances in a large data matrix [14], machine learning, and in data stream computation. Random sketches formed by the inner product of the frequenc ..."
Abstract - Cited by 19 (0 self) - Add to MetaCart
Abstract. Space-economical estimation of the pth frequency moments, defined as Fp = P n i=1 |fi|p, for p&gt; 0, are of interest in estimating all-pairs distances in a large data matrix [14], machine learning, and in data stream computation. Random sketches formed by the inner product of the frequency vector f1,..., fn with a suitably chosen random vector were pioneered by Alon, Ma-tias and Szegedy [1], and have since played a central role in estimating Fp and for data stream computations in general. The concept of p-stable sketches formed by the inner product of the frequency vector with a random vector whose components are drawn from a p-stable distribution, was proposed by Indyk [11] for estimating Fp, for 0 &lt; p &lt; 2, and has been further studied in Li [13]. In this paper, we consider the problem of estimating Fp, for 0 &lt; p &lt; 2. A disadvantage of the sta-ble sketches technique and its variants is that they require O ( 1 ɛ 2) inner-products of the frequency vector with dense vectors of stable (or nearly stable [14, 13]) random variables to be maintained. This means that each stream update can be quite time-consuming. We present algorithms for esti-mating Fp, for 0 &lt; p &lt; 2, that does not require the use of stable sketches or its approximations. Our technique is elementary in nature, in that, it uses simple randomization in conjunction with well-known summary structures for data streams, such as the COUNT-MIN sketch [7] and the COUNTSKETCH structure [5]. Our algorithms require space 1 ± ɛ factors and requires expected time O(log F1 log 1 δ Õ ( 1 ɛ 2+p) 3 to estimate Fp to within) to process each update. Thus, our tech-nique trades an O ( 1 ɛ p) factor in space for much more efficient processing of stream updates. We also present a stand-alone iterative estimator for F1. 1

Kumar A

by G Taj, P Agarwal, M Grant , 2010
"... ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...ually and removing; this approach has been extended to quantities such as entropy [BG06]. Can it also apply to L1? (2) Recent work [LHC06] has studied sparse random projections for L2. Follow up work =-=[Li06]-=- has extended this to sparse projections using stable distributions. What time bounds does this imply for (ɛ, δ)-approximation of L1 distance? A more general open question arises. So far, there has be...

Stable Distributions in Streaming Computations

by Graham Cormode, Piotr Indyk
"... ..."
Abstract - Add to MetaCart
Abstract not found
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University