Results 1 -
2 of
2
Performance, Measurement
"... With the ubiquity of multi-core processors, software must make effective use of multiple cores to obtain good performance on modern hardware. One of the biggest roadblocks to this is load imbalance, or the uneven distribution of work across cores. We propose LIME, a framework for analyzing parallel ..."
Abstract
- Add to MetaCart
With the ubiquity of multi-core processors, software must make effective use of multiple cores to obtain good performance on modern hardware. One of the biggest roadblocks to this is load imbalance, or the uneven distribution of work across cores. We propose LIME, a framework for analyzing parallel programs and reporting the cause of load imbalance in application source code. This framework uses statistical techniques to pinpoint load imbalance problems stemming from both control flow issues (e.g., unequal iteration counts) and interactions between the application and hardware (e.g., unequal cache miss counts). We evaluate LIME on applications from widely used parallel benchmark suites, and show that LIME accurately reports the causes of load imbalance, their nature and origin in the code, and their relative importance.
Chronos: Predictable Low Latency for Data Center Applications
"... In data center applications, predictability in service time and controlled latency, especially tail latency, are essential for building performant applications. This is especially true for applications or services built by accessing data across thousands of servers to generate a user response. Curre ..."
Abstract
- Add to MetaCart
In data center applications, predictability in service time and controlled latency, especially tail latency, are essential for building performant applications. This is especially true for applications or services built by accessing data across thousands of servers to generate a user response. Current practice has been to run such services at low utilization to rein in latency outliers, which decreases efficiency and limits the number of service invocations developers can issue while still meeting tight latency budgets. In this paper, we analyze three data center applications, Memcached, OpenFlow, and Web search, to measure the effect of 1) kernel socket handling, NIC interaction, and the network stack, 2) application locks contested in the kernel, and 3) application-layer queueing due to requests being stalled behind straggler threads on tail latency. We propose Chronos, a framework to deliver predictable, low latency in data center applications. Chronos uses a combination of existing and new techniques to achieve this end, for example by supporting Memcached at 200,000 requests per second per server at mean latency of 10 µs with a 99 th percentile latency of only 30 µs, a factor of 20 lower than baseline Memcached.

