### Citations

940 | Nonlinear approximation - DeVore - 1998 |

901 | Greed is good: Algorithmic results for sparse approximation
- Tropp
- 2004
(Show Context)
Citation Context ...ere B<1/(2µ). Then, in iterations polynomial in B, we can find a B-term representation R such that √ ‖A − R‖≤ 1+ 2µB2 ‖A − Ropt‖. (1 − 2µB) 2 The concept of coherence has been generalized recently in =-=[208]-=-, and made more widely applicable. Further in [108, 110], authors used approximate nearest neighbor algorithms to implement the iterations in Theorem 25 efficiently, and proved that approximate implem... |

833 | The space complexity of approximating the frequency moments
- Alon, Matias, et al.
- 1996
(Show Context)
Citation Context ... present in the ocean. Let m = Pt62S pt. The problem of estimating m is calledthe missing mass problem. In a classical work by Good (attributed to Turing too) [35], it is shown that mis estimated by s=-=[1]-=-/f , provably with small bias; recall that our rarity fl is closely related to s[1]/f . Hence,our result here on estimating rarity in data streams is of independent interest in the context of estimati... |

796 |
Protocols for Secure Computations
- Yao
- 1982
(Show Context)
Citation Context ...articular, there are solutions that involve both Paul and Carole permuting the domain, and those that involve small space pseudorandom generators.)218 New Directions Yao’s “two millionaires” problem =-=[217]-=- is related in which Paul and Carole each have a secret number and the problem is to determine whose secret is larger without revealing their secrets. These problems show the challenge in the emerging... |

763 | Models and issues in data stream systems - Babcock, Babu, et al. |

581 | The population frequencies of species and the estimation of population parameters - Good - 1953 |

577 | NiagaraCQ: A scalable continuous query system for internet databases - Chen, DeWitt, et al. - 2000 |

554 |
zur Gathen and
- von
- 1997
(Show Context)
Citation Context ...tor this polynomial in Fq to determine the missing numbers. No deterministic algorithms are known for the factoring problem, but there are randomized algorithms take roughly O(k 2 logn) bits and time =-=[214]-=-. The elementary symmetric polynomial approach above comes from [169] where the authors solve the set reconciliation problem in the communication complexity model. The subset reconciliation problem is... |

544 |
Sparse approximate solutions to linear systems
- Natarajan
- 1995
(Show Context)
Citation Context ...approximate the best representation of a given function using these dictionaries. Studying general dictionaries. A paper that seems to have escaped the attention of approximation theoryresearchers is =-=[63]-=- which proves the general problem to be NP-Hard. This was reproved in [60]. In addition, [63] contained the following very nice result. Say to obtain a representation with error ffl one needs B(ffl) t... |

508 | Network applications of Bloom filters: A survey - Broder, Mitzenmacher - 2002 |

403 | Approximate Frequency Counts Over Data Streams - Manku, Motwani - 2002 |

382 | Bursty and hierarchical structure in streams - Kleinberg - 2002 |

348 | External memory algorithms and data structures: Dealing with massive data
- Vitter
- 2001
(Show Context)
Citation Context ... uniquely challenge the TCS needs. We in the computer science community have traditionally focused on scaling in size: how to efficiently manipulate large disk-bound data via suitable data structures =-=[213]-=-, how to scale to databases of petabytes [114], synthesize massive data sets [115], etc. However, far less attention has been given to benchmarking, studying performance of systems under rapid updates... |

332 | Finding frequent items in data streams - Charikar, Chen, et al. - 2002 |

317 | Stable distributions, pseudorandom generators, embeddings, and data stream computation - Indyk - 2006 |

299 | Chromium: A stream-processing framework for interactive rendering on clusters - Humphreys, Houston, et al. - 2002 |

292 | Deriving Traffic Demands for Operational IP Networks: Methodology and Experience - Feldmann, Greenberg, et al. - 2001 |

291 | Clustering data streams
- Guha, Mishra, et al.
- 2000
(Show Context)
Citation Context ...velet version of the problem studied in [99]. There are other applications, where the tree hierarchy is imposed as an artifact of the problem solvingapproach. The k-means algorithm on the data stream =-=[100]-=- can be seen as a tree method: building clusterson points, building higher level clusters on their representatives, and so on up the tree. Finally, I will speculate that Yair Bartal's fundamental resu... |

273 | Min-wise independent permutations - Broder, Charikar, et al. - 1998 |

266 | Maintaining stream statistics over sliding windows
- Datar, Gionis, et al.
- 2002
(Show Context)
Citation Context ...en one is restricted to usea polylog space data structure. This technique has been used in one dimensional nearest neighbor problems and facility location [46], maintaining statistics within a window =-=[36]-=-, and from a certain perspective, forestimating the number of distinct items [34]. It is a simple and natural strategy which is likely to get used seamlessly in data stream algorithms. 5.3 Lower Bound... |

247 | Trajectory sampling for direct traffic observation - Duffield, Grossglauser - 2000 |

203 | Space-efficient online computation of quantile summaries - Greenwald, Khanna - 2001 |

195 | What’s Hot and What’s Not: Tracking Most Frequent Items Dynamically, in - Cormode, Muthukrishnan - 2003 |

174 | The Art of Computer Programming, Volume III: Sorting and Searching - Knuth - 1973 |

165 | High-dimensional data analysis: The curses and blessings of dimensionality. aide-memoire of a lecture at - Donoho - 2000 |

155 | On computing correlated aggregates over continual data streams - Gehrke, Korn, et al. - 2001 |

147 | Reductions in streaming algorithms, with an application to counting triangles in graphs
- Bar-Yossef, Kumar, et al.
- 2002
(Show Context)
Citation Context ...role in log space complexity. See [41] for some de-tails. However, hardly any graph problem has been studied in the data stream model where (poly)log space requirement comes with other constraints.In =-=[42]-=-, authors studied the problem of counting the number of triangles in the cash register model. Graph G = (V, E) is presented as a series of edges (u, v) 2 E in no particular order. The problem is toest... |

138 | Selection and sorting with limited storage - Munro, Paterson - 1980 |

121 | Random sampling for histogram construction: How much is enough - Chaudhuri, Motwani, et al. - 1998 |

118 | Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports - Gibbons - 2001 |

113 | The power of a pebble: Exploring and mapping directed graphs - Bender, Fernandez, et al. - 1998 |

113 | STREAM: The Stanford Stream Data Manager - Arasu, Babcock, et al. - 2003 |

111 | Quickly Generating Billion-Record Synthetic Databases - Gray, Sundaresan, et al. - 1994 |

110 | How to summarize the universe: Dynamic maintenance of quantiles - Gilbert, Kotidis, et al. - 2002 |

107 | An Approximate L1-Difference Algorithm for Massive Data Streams - Feigenbaum, Kannan, et al. - 2002 |

106 | Secure multiparty computation of approximations - Feigenbaum, Ishai, et al. - 2001 |

99 | Dynamic multidimensional histograms
- Thaper, Guha, et al.
- 2002
(Show Context)
Citation Context ...or the multidimensionalversion. In [75], authors proposed efficient approximation algorithms for a variety of two dimensional histogramsfor a static signal. Some preliminary results were presented in =-=[76]-=- for the streaming case: specifically, the authors proposed a polylog space, 1+ffl approximation algorithm using O(B log N ) partitions, taking \Omega (N 2)time. Using the ideas in [75] and robustness... |

99 | Approximation of functions over redundant dictionaries using coherence - Gilbert, Muthukrishnan, et al. - 2003 |

81 | MUTHUKRISHNAN: Comparing data streams using Hamming norms (how to zero in - CORMODE, DATAR, et al. - 2003 |

80 | A Small Approximately Min-Wise Independent Family of Hash Functions - Indyk - 2001 |

77 | Set reconciliation with nearly optimal communication complexity
- MINSKY, TRACHTENBERG, et al.
- 2001
(Show Context)
Citation Context ...m, but randomized algorithms take roughly O(k2 log n) bits and time [23]. Thepower sum method is what colleagues typically propose over dinner. The elementary symmetric polynomial approach comes from =-=[24]-=- where the authors solve the set reconciliation problem in the communicationcomplexity model. The subset reconciliation problem is related to our puzzle. Readers may have guessed that they may be a di... |

76 | Greedy adaptive approximation - Davis, Mallat, et al. - 1997 |

71 | more, sample less: control of volume and variance in network measurement - Duffield, Lund, et al. - 2005 |

70 | The string edit distance matching problem with moves
- Cormode, Muthukrishnan
- 2002
(Show Context)
Citation Context ...stogram on the Time Series data stream [98]. This also has applications to finding certainoutliers called the deviants [105]. * Building a parse tree atop the Time Series data stream seen as a string =-=[81]-=-. This has applicationsto estimating string edit distances as well as estimating size of the smallest grammar to encode the string. Here is a problem of similar ilk, but it needs new ideas. Problem 6 ... |

66 | Mining Database Structure; or, How to Build a Data Quality Browser - Dasu, Johnson, et al. - 2002 |

65 | A language for extracting signatures from data streams - CORTES, FISHER, et al. |

64 | Gigascope: High performance network monitoring with an sql interface - Cranor, Gao, et al. - 2002 |

56 | On the dynamic finger conjecture for splay trees. Part I: Splay sorting logn-block sequences - Cole, Mishra, et al. - 2000 |

56 | Quicksand: Quick summary and analysis of network data - Gilbert, Kotidis, et al. - 2001 |

56 | On rectangular partitionings in two dimensions: Algorithms, complexity and applications - Muthukrishnan, Poosala, et al. - 1999 |

55 | Communities of Interest - Cortes, Pregibon, et al. - 2001 |

51 | Pass efficient algorithms for approximating large matrices - Drineas, Kannan - 2003 |

47 |
The bestm-term approximation and greedy algorithms
- Temlyakov
- 1998
(Show Context)
Citation Context ...e results at the AMS-MAA joint meetings [69].Functional approximation theory has in general focused on characterizing the class of functions for which error has a certain decay as N ! 1. See [62] and =-=[61]-=- for many such problems. But from analgorithmicists point of view, the nature of problems I discussed above are more clearly more appealing. This is a wonderful area for new algorithmic research; a st... |

45 | D.: Reverse nearest neighbor aggregates over data streams - Korn, Muthukrishnan, et al. - 2002 |

45 | Sublinear time approximate clustering - Mishra, Oblinger, et al. - 1995 |

35 | Approximate counting of inversions in a data stream
- Ajtai, Jayram, et al.
- 2002
(Show Context)
Citation Context ...have already been studied in this model. In [37], authors studied how to estimate various permutation edit distances. The problem of estimating the number of inversions in a permutationwas studied in =-=[33]-=-. Here is an outline of a simple algorithm to estimate the number of inversions [31]. Let At is the indicator array of the seen items before seeing the tth item, and It be the number of inversions sof... |

29 | Estimating dominance norms of multiple data streams, in - Cormode, Muthukrishnan |

28 | Counting inversions in lists - Gupta, Zane - 2003 |

27 | Maintaining statistics counters in router line cards
- shah, Iyer, et al.
(Show Context)
Citation Context ...e needed where updates, as they are generated, are fed to hardwareunits for per-item processing. This has been explored in the networking context for a variety of per-packet processing tasks (see eg. =-=[5]-=-) previously, but more needs to be done. There is commercial potential in suchhardware machines. Consider: Problem 17 Develop hardware implementation of the inner product based algorithms described in... |

24 |
Fast Computation of Low Rank Approximations
- Achlioptas, McSherry
- 2001
(Show Context)
Citation Context ...rank representation to At at any time t. More precisely, find D* such that ||At - D*|| <= f ( minD, rank(D)<=k ||At - D||) using suitable norm ||.|| and function f . Similar result has been proved in =-=[51]-=- using appropriate sampling for a fixed A, and recent progress is in [50]for similar problem using a few passes, but there are no results in the Turnstile Model. A lot of interesting technical issues ... |

21 | Better Algorithms for high-dimensional proximity problems via asymmetric embeddings - Indyk - 2003 |

20 | Three thresholds for a liar - Spencer, Winkler - 1992 |

19 | Clifford algebras and approximating the permanent - Chien, Rasmussen, et al. |

17 | et al.. Aurora: a data stream management system - Abadi |

16 | On optimal strategies for searching in presence of errors - Muthukrishnan - 1994 |

16 | Signature-based methods for data streams. Data Mining and Knowledge Discovery 2001 - Cortes, Pregibon - 2001 |

15 | Inferring Mixtures of Markov Chains - Batu, Guha, et al. - 2004 |

15 | Rangesum histograms - Muthukrishnan, Strauss - 2003 |

14 | Surfing wavelets on streams: One pass summaries for approximate aggregate queries - Gilbert, Kotidis, et al. |

12 | Permutation editing and matching via embeddings - Cormode, Muthukrishnan, et al. - 2001 |

11 | Petabyte Scale Data Mining: Dream or Reality - Szalay, Gray, et al. |

11 | Application of the two-sided depth test to CSG rendering - Guha, Krishnan, et al. - 2003 |

10 | Space-efficient finger search on degree-balanced search trees - Blelloch, Maggs, et al. - 2003 |

9 | Imagining numbers (particularly the square root of minus fifteen - Mazur - 2003 |

6 |
Detecting Packet Patterns at High Speeds
- Varghese
- 2002
(Show Context)
Citation Context ... or so, we get the error to drop by more than 50%. For capturing major trends in the IP traffic, few hundred coefficients prove adequate. In IP traffic, few flows send a large fraction of the traffic =-=[209]-=-. That is, of the 2 64 possible (src,dest) IP flows, if one is interested in heavy hitters, one is usually focused on a small number (few hundreds?) of flows. This means that one is typically interest... |

6 | An o(log 4/3 (n)) space algorithm for (s,t) connectivity in undirected graphs - Armoni, Ta-Shma, et al. |

4 | et al. TelegraphCQ: Continuous dataflow processing for an uncertain world - Chandrasekharan |

4 | approximation with walsh atoms - Best - 1997 |

3 | Computing diameter in the streaming and sliding window models - Feigenbaum, Kannan, et al. - 2002 |

3 | small space algorithm for approximate histogram maintenance - Fast - 2002 |

2 |
Details of transactions testing at http://www.tpc.org/tpcc/detail.asp
- org
(Show Context)
Citation Context ...atasets [7], etc. However,far less attention has been given to benchmarking, studying performance of systems under rapid updates with near-real time analyses. Even benchmarks of database transactions =-=[115]-=- are inadequate.There are ways to build workable systems around these TCS challenges. TCS systems are sophisticated and have developed high-level principles that still apply. Make things parallel. A l... |

2 | Graph structure of the web: A survey - Raghavan |

2 | Synopsis data structures - Gibbons, Matias - 1999 |

2 | Exploratory data mining and data quality. ISBN: 0-471-26851-8 - Dasu, Johnson - 2003 |

1 |
The coming age of calm technology. Chapter 6. Beyond calculation: The next fifty years of computing, by
- Weiser, Brown
- 1996
(Show Context)
Citation Context ...essage boards into light and sound,described by NY Times as a "computer-generated opera". Besides being Art, ambient information displays like the ones above are typically seen as Calming Technol-ogy =-=[2]-=-; they are also an attempt to transcode streaming data into a processible multi-sensory flow. 8.2 Short Data Stream History Data stream algorithms as an active research agenda has emerged only over th... |

1 |
Estimating rarity and similarity in window streams
- Datar, Muthukrishnan
- 1995
(Show Context)
Citation Context ...r be more generally |{ j | ct[j] <= ff }| for some ff, letting Carole go fishing too, or letting Paul and Carole throw fish back into the sea as needed--there aresome real data streaming applications =-=[19]-=-. Honestly, the fishing motif is silly: the total number of fish species in the sea is estimated to be roughly 22000 and anyone can afford an array of as many bits. In the reality of data streams whic... |

1 |
Computing diameter in the streaming and sliding
- Feigenbaum, Kannan, et al.
(Show Context)
Citation Context ...from the center. Then diameter can be estimated from the arcs given by these points. One gets an ffl-approximationto the diameter with O(1/ffl) space and O(log(1/ffl)) compute time per inserted point =-=[45]-=-.I know of other results in progress, so more computational geometry problems will get solved in the data stream model in the near future.Let me add a couple of notes. First, in small dimensional appl... |

1 |
Comparing information without leaking it: Simple solutions
- Fagin, Naor, et al.
- 1996
(Show Context)
Citation Context ...cterize the complexity class given by a deterministic logspace verifier with one-wayaccess to the proof. 7.10.4 Privacy Preserving Data Mining Peter Winkler gives an interesting talk on the result in =-=[53]-=- which is a delightful read. Paul and Carole eachhave a secret name in mind, and the problem is for them to determine if their secrets are the same. If not, neither should learn the other's secret. Th... |

1 |
Secure multiparty computation. Book at http://philby.ucsd.edu/cryptolib/BOOKS/odedsc.html
- Goldreich
- 1998
(Show Context)
Citation Context ...ocess. This may bedue to regulatory or proprietary reasons. They need privacy preserving methods for data mining. This is by now a well researched topic with positive results in very general settings =-=[56]-=-. However,these protocols have high complexity. But there is a demand for efficient solutions, perhaps with provable approximations, in practice. In [55] authors formalized the notion of approximate p... |

1 |
In search of petabyte databases. http://www.research.microsoft.com/ Gray/talks/ [107] Querying and mining data streams: you only get one look
- Gray, Hey
(Show Context)
Citation Context ...ter Science community have tradi-tionally focused on scaling wrt to size: how to efficiently manipulate large disk-bound data via suitable data structures [15], how to scale to databases of petabytes =-=[106]-=-, synthesize massive datasets [7], etc. However,far less attention has been given to benchmarking, studying performance of systems under rapid updates with near-real time analyses. Even benchmarks of ... |

1 |
streams athttp://www.archive.arm.gov/docs/catalog/ [109] http://www.sprintlabs.com/Department/IP-Interworking/Monitor/ http://ipmon.sprintlabs.com/ [110] http://stat.rutgers.edu/ madigan/mms [111] http://cat.nyu.edu/natalie/. [112] http://www7.nationalaca
- Data
(Show Context)
Citation Context ...earth geodetics [118, 113], radar derived meteoro-logical data [119]1, continuous large scale astronomical surveys in optical, infrared and radio wavelengths [117], atmospheric radiation measurements =-=[108]-=- etc. The Internet is a general purpose net-work system that has distributed both the data sources as well as the data consumers over millions of users. It has scaled up the rate of transactions treme... |

1 |
http://www.ngs.noaa.gov/ [119] http://www.unidata.ucar.edu/data/data.detail.html [120] http://cmsdoc.cern.ch/cms/outreach/html/CMSdocuments/CMSchallenges/CMSchallenges.pdf [121] http://www.ambientdevices.com/cat/index.html. [122] http://www-db.stanford.ed
- gov
(Show Context)
Citation Context ...inning with networks that spanned bank-ing and credit transactions. Other dedicated network systems now provide massive data streams: 6ssatellite based, high resolution measurement of earth geodetics =-=[118, 113]-=-, radar derived meteoro-logical data [119]1, continuous large scale astronomical surveys in optical, infrared and radio wavelengths [117], atmospheric radiation measurements [108] etc. The Internet is... |

1 | Network performance monitoring and measurement: techniques and experience - Bhattacharya, Moon - 2002 |

1 | Computing on data stream. Technical Note 1998-011. Digital systems research - Henzinger, Raghavan, et al. - 1998 |

1 | Open problems in streaming. Ppt slides on request from the source - Kannan |

1 | Extended range search queries on geometric SIMD machine - Krishnan, Mustafa, et al. - 2002 |

1 | Fast computation of low rank approximation. STOC - Achlioptas, McSherry - 2001 |

1 | IPSOFACTO: IP stream-oriented fast correlation tool - Korn, Muthukrishnan, et al. - 2003 |

1 | Sublinear algorithms for sparse approximations with excellent odds - Daubechies |

1 | Muthukrishnan and Y.Zhu. Checks and balances: Monitoring data quality in network traffic databases - Korn, S - 2003 |