• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

DMCA

Improving MapReduce Performance in Heterogeneous Environments (2008)

Cached

  • Download as a PDF

Download Links

  • [www.usenix.org]
  • [www.cs.berkeley.edu]
  • [www.ssrc.ucsc.edu]
  • [www.cs.princeton.edu]
  • [www.cs.berkeley.edu]
  • [www.usenix.org]
  • [www.usenix.org]
  • [www.usenix.org]
  • [www.cse.usf.edu]
  • [bnrg.cs.berkeley.edu]
  • [people.csail.mit.edu]
  • [www.cs.berkeley.edu]
  • [static.usenix.org]
  • [www.cse.usf.edu]
  • [www.ece.rutgers.edu]
  • [www.ece.rutgers.edu]
  • [www.cs.berkeley.edu]
  • [www.eecs.berkeley.edu]
  • [bnrg.cs.berkeley.edu]
  • [www.cs.purdue.edu]
  • [www.eecs.berkeley.edu]
  • [bnrg.eecs.berkeley.edu]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Matei Zaharia , Andy Konwinski , Anthony D. Joseph , Randy Katz , Ion Stoica
Citations:319 - 18 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Zaharia08improvingmapreduce,
    author = {Matei Zaharia and Andy Konwinski and Anthony D. Joseph and Randy Katz and Ion Stoica},
    title = {Improving MapReduce Performance in Heterogeneous Environments},
    year = {2008}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

MapReduce is emerging as an important programming model for large-scale data-parallel applications such as web indexing, data mining, and scientific simulation. Hadoop is an open-source implementation of MapReduce enjoying wide adoption and is often used for short jobs where low response time is critical. Hadoop’s performance is closely tied to its task scheduler, which implicitly assumes that cluster nodes are homogeneous and tasks make progress linearly, and uses these assumptions to decide when to speculatively re-execute tasks that appear to be stragglers. In practice, the homogeneity assumptions do not always hold. An especially compelling setting where this occurs is a virtualized data center, such as Amazon’s Elastic Compute Cloud (EC2). We show that Hadoop’s scheduler can cause severe performance degradation in heterogeneous environments. We design a new scheduling algorithm, Longest Approximate Time to End (LATE), that is highly robust to heterogeneity. LATE can improve Hadoop response times by a factor of 2 in clusters of 200 virtual machines on EC2.

Keyphrases

heterogeneous environment    mapreduce performance    virtual machine    data mining    new scheduling algorithm    elastic compute cloud    longest approximate time    hadoop response time    task scheduler    homogeneity assumption    open-source implementation    cluster node    re-execute task    web indexing    large-scale data-parallel application    hadoop performance    low response time    wide adoption    scientific simulation    short job    virtualized data center    severe performance degradation    important programming model   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University