|
913
|
MapReduce: simplified data processing on large clusters
– Jeffrey Dean, Sanjay Ghemawat
- 2004
|
|
265
|
Dryad: Distributed data-parallel programs from sequential building blocks
– M Isard, M Budiu, Y Yu, A Birrell, D Fetterly
- 2007
|
|
196
|
Pig Latin: A Not-So-Foreign Language for Data Processing
– Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins
|
|
89
|
DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language
– Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep Kumar, Gunda Jon Currey
|
|
76
|
Improving MapReduce Performance in Heterogeneous Environments
– Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Y Katz, Ion Stoica
|
|
285
|
Bigtable: A distributed storage system for structured data
– Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber
- 2006
|
|
9
|
DryadInc: Reusing work in large-scale computations
– Lucian Popa, Mihai Budiu, Yuan Yu, Michael Isard
|
|
50
|
Pregel: a system for large-scale graph processing
– G Malewicz, M H Austern, A J C Bik, J C Dehnert, I Horn, N Leiser, G Czajkowski
- 2010
|
|
284
|
Online Aggregation
– Joseph M. Hellerstein, Peter J. Haas, Helen J. Wang
- 1997
|
|
128
|
Interpreting the Data: Parallel Analysis with Sawzall
– Rob Pike, Sean Dorward, Robert Griesemer, Sean Quinlan, Google Inc
|
|
27
|
Building a High-Level Dataflow System on top of Map-Reduce: The Pig Experience
– Alan F. Gates, Olga Natkovich, Shubham Chopra, Pradeep Kamath, Shravan M. Narayanamurthy, Christopher Olston, Benjamin Reed, Santhosh Srinivasan, Utkarsh Srivastava
- 2009
|
|
41
|
Hive- A Warehousing Solution Over a Map-Reduce Framework
– Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, Raghotham Murthy
- 2009
|
|
255
|
Maintenance of Materialized Views: Problems, Techniques, and Applications
– Ashish Gupta, Inderpal Singh Mumick
- 1995
|
|
66
|
A comparison of approaches to large-scale data analysis
– Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. Dewitt, Samuel Madden, Michael Stonebraker
- 2009
|
|
65
|
SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets
– Ronnie Chaiken, Bob Jenkins, Per-åke Larson, Bill Ramsey, Darren Shakib, Simon Weaver, Jingren Zhou
|
|
8
|
Comet: batched stream processing for data intensive distributed computing
– B HE, M YANG, Z GUO, R CHEN, B SU, W LIN, L ZHOU
|
|
29
|
Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling
– Matei Zaharia, Khaled Elmeleegy, Dhruba Borthakur, Scott Shenker, Joydeep Sen Sarma, Ion Stoica
- 2010
|
|
637
|
The Google File System
– Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
- 2003
|
|
194
|
Dynamo: amazon’s highly available key-value store
– Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, Werner Vogels
- 2007
|