Results 1 - 10
of
50
Understanding and detecting real-world performance bugs
- in PLDI
, 2012
"... Developers frequently use inefficient code sequences that could be fixed by simple patches. These inefficient code sequences can cause significant performance degradation and resource waste, referred to as performance bugs. Meager increases in single threaded performance in the multi-core era and in ..."
Abstract
-
Cited by 45 (7 self)
- Add to MetaCart
Developers frequently use inefficient code sequences that could be fixed by simple patches. These inefficient code sequences can cause significant performance degradation and resource waste, referred to as performance bugs. Meager increases in single threaded performance in the multi-core era and increasing emphasis on energy efficiency call for more effort in tackling performance bugs. This paper conducts a comprehensive study of 109 real-world performance bugs that are randomly sampled from five representative software suites (Apache, Chrome, GCC, Mozilla, and MySQL). The findings of this study provide guidance for future work to avoid, expose, detect, and fix performance bugs. Guided by our characteristics study, efficiency rules are extracted from 25 patches and are used to detect performance bugs. 332 previously unknown performance problems are found in the latest versions of MySQL, Apache, and Mozilla applications, including 219 performance problems found by applying rules across applications.
Verifying Quantitative Reliability for Programs That Execute on Unreliable Hardware
"... Emerging high-performance architectures are anticipated to contain unreliable components that may exhibit soft errors, which silently corrupt the results of computations. Full detection and masking of soft errors is challenging, expensive, and, for some applications, unnecessary. For example, approx ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
(Show Context)
Emerging high-performance architectures are anticipated to contain unreliable components that may exhibit soft errors, which silently corrupt the results of computations. Full detection and masking of soft errors is challenging, expensive, and, for some applications, unnecessary. For example, approximate computing applications (such as multimedia processing, machine learning, and big data analytics) can often naturally tolerate soft errors. We present Rely, a programming language that enables developers to reason about the quantitative reliability of an application – namely, the probability that it produces the correct result when executed on unreliable hardware. Rely allows developers to specify the reliability requirements for each value that a function produces. We present a static quantitative reliability analysis that verifies quantitative requirements on the reliability of an application, enabling a developer to perform sound and verified reliability engineering. The analysis takes a Rely program with a reliability specification and a hardware specification that characterizes the reliability of the underlying hardware components and verifies that the program satisfies its reliability specification when executed on the underlying unreliable hardware platform. We demonstrate the application of quantitative reliability analysis on six computations implemented
Proving acceptability properties of relaxed nondeterministic approximate programs.
- In PLDI,
, 2012
"... Abstract Approximate program transformations such as task skipping We call such transformed programs relaxed programs-they have been extended with additional nondeterminism to relax their semantics and enable greater flexibility in their execution. In this paper, we present programming language co ..."
Abstract
-
Cited by 20 (5 self)
- Add to MetaCart
(Show Context)
Abstract Approximate program transformations such as task skipping We call such transformed programs relaxed programs-they have been extended with additional nondeterminism to relax their semantics and enable greater flexibility in their execution. In this paper, we present programming language constructs for developing and specifying relaxed programs. We also present proof rules for reasoning about acceptability properties The rules are designed to support a reasoning approach in which the majority of the reasoning effort uses the original semantics. This effort is then transferred to establish the desired properties of the program under the relaxed semantics by using relational reasoning to bridge the gap between the two semantics. We have formalized the dynamic semantics of our target programming language and the proof rules in Coq, and verified that the proof rules are sound with respect to the dynamic semantics. Our Coq implementation enables developers to obtain fully machine checked verifications of their relaxed programs.
Approximate storage in solid-state memories
- In Proc. Int. Symp. Microarchitecture
, 2013
"... Memories today expose an all-or-nothing correctness model that incurs significant costs in performance, energy, area, and design complexity. But not all applications need high-precision storage for all of their data structures all of the time. This paper proposes mechanisms that enable applications ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
(Show Context)
Memories today expose an all-or-nothing correctness model that incurs significant costs in performance, energy, area, and design complexity. But not all applications need high-precision storage for all of their data structures all of the time. This paper proposes mechanisms that enable applications to store data approximately and shows that doing so can improve the performance, lifetime, or density of solid-state memories. We propose two mechanisms. The first allows errors in multi-level cells by reducing the number of programming pulses used to write them. The second mechanism mit-igates wear-out failures and extends memory endurance by mapping approximate data onto blocks that have exhausted their hardware error correction resources. Simulations show that reduced-precision writes in multi-level phase-change memory cells can be 1.7 × faster on average and using failed blocks can improve array lifetime by 23 % on average with quality loss under 10%.
Energy types
- In OOPSLA’12
, 2012
"... This paper presents a novel type system to promote and facilitate energy-aware programming. Energy Types is built upon a key insight into today’s energy-efficient systems and applications: despite the popular perception that energy and power can only be described in joules and watts, real-world ener ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
(Show Context)
This paper presents a novel type system to promote and facilitate energy-aware programming. Energy Types is built upon a key insight into today’s energy-efficient systems and applications: despite the popular perception that energy and power can only be described in joules and watts, real-world energy management is often based on discrete phases and modes, which in turn can be reasoned about by type systems very effectively. A phase characterizes a distinct pattern of program workload, and a mode represents an energy state the program is expected to execute in. This paper describes a programming model where phases and modes can be intuitively specified by programmers or inferred by the compiler as type information. It demonstrates how a type-based approach to reasoning about phases and modes can help promote energy efficiency. The soundness of our type system and the invariants related to inter-phase and inter-mode interactions are rigorously proved. Energy Types is implemented as the core of a prototyped objectoriented language ET for smartphone programming. Preliminary studies show ET can lead to significant energy savings for Android Apps.
PARDIS: A Programmable Memory Controller for the DDRx Interfacing Standards
- in Proceedings of ISCA
, 2012
"... Modern memory controllers employ sophisticated address mapping, command scheduling, and power management optimizations to alleviate the adverse effects of DRAM timing and resource constraints on system performance. A promising way of improving the versatility and efficiency of these controllers is t ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
Modern memory controllers employ sophisticated address mapping, command scheduling, and power management optimizations to alleviate the adverse effects of DRAM timing and resource constraints on system performance. A promising way of improving the versatility and efficiency of these controllers is to make them programmable—a proven technique that has seen wide use in other control tasks ranging from DMA scheduling to NAND Flash and directory control. Unfortunately, the stringent latency and throughput requirements of modern DDRx devices have rendered such programmability largely impractical, confining DDRx controllers to fixed-function hardware. This paper presents the instruction set architecture (ISA) and hardware implementation of PARDIS, a programmable memory controller that can meet the performance requirements of a high-speed DDRx interface. The proposed controller is evaluated by mapping previously proposed DRAM scheduling, address mapping, refresh scheduling, and power management algorithms onto PARDIS. Simulation results show that the performance of PARDIS comes within 8 % of an ASIC implementation of these techniques in every case; moreover, by enabling application-specific optimizations, PARDIS improves system performance by 6-17 % and reduces DRAM energy by 9-22 % over four existing memory controllers. 1
RAMZzz: Rank-aware DRAM power management with dynamic migrations and demotions
- In SC ’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer
, 2012
"... Abstract—Main memory is a significant energy consumer which may contribute to over 40 % of the total system power, and will become more significant for server machines with more main memory. In this paper, we propose a novel memory system design named RAMZzz with rank-aware energy saving optimizatio ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Main memory is a significant energy consumer which may contribute to over 40 % of the total system power, and will become more significant for server machines with more main memory. In this paper, we propose a novel memory system design named RAMZzz with rank-aware energy saving optimizations. Specifically, we rely on a memory controller to monitor the memory access locality, and group the pages with similar access locality into the same rank. We further develop dynamic page migrations to adapt to data access patterns, and a prediction model to estimate the demotion time for accurate control on pow-er state transitions. We experimentally compare our algorithm with other energy saving policies with cycle-accurate simulation. Experiments with benchmark workloads show that RAMZzz achieves significant improvement on energy-delay2 and energy consumption over other power saving techniques. I.
Reasoning about relaxed programs
, 2011
"... Approximate program transformations such as task skipping [27, 28], loop perforation [20, 21, 32], multiple selectable implementa-tions [3, 4, 15], approximate function memoization [10], and ap-proximate data types [31] produce programs that can execute at a variety of points in an underlying perfor ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
Approximate program transformations such as task skipping [27, 28], loop perforation [20, 21, 32], multiple selectable implementa-tions [3, 4, 15], approximate function memoization [10], and ap-proximate data types [31] produce programs that can execute at a variety of points in an underlying performance versus accuracy tradeoff space. Namely, these transformed programs trade accuracy of their results for increased performance by dynamically and non-deterministically modifying variables that control their execution. We call such transformed programs relaxed programs — they have been extended with additional nondeterminism to relax their semantics and enable greater flexibility in their execution. We present programming language constructs for developing and specifying relaxed programs. We also present proof rules for reasoning about properties of relaxed programs. Our proof rules enable programmers to directly specify and verify acceptability properties that characterize the desired correctness relationships between the values of variables in a program’s original semantics (before the transformation) and its relaxed semantics. Our proof rules also support the verification of safety properties (which char-acterize desirable properties involving values in only the current execution). The rules are designed to support a reasoning approach in which the majority of the reasoning effort uses the original se-mantics. This effort is then reused to establish the desired properties of the program under the relaxed semantics. We have formalized the dynamic semantics of our target pro-gramming language and the proof rules in Coq, and verified that the proof rules are sound with respect to the dynamic semantics. Our Coq implementation enables developers to obtain fully ma-chine checked verifications of their relaxed programs. 1.
A case for refresh pausing in DRAM memory systems
- In Proc. HPCA 2013
"... DRAM cells rely on periodic refresh operations to main-tain data integrity. As the capacity of DRAM memories has in-creased, so has the amount of time consumed in doing refresh. Refresh operations contend with read operations, which in-creases read latency and reduces system performance. We show tha ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
DRAM cells rely on periodic refresh operations to main-tain data integrity. As the capacity of DRAM memories has in-creased, so has the amount of time consumed in doing refresh. Refresh operations contend with read operations, which in-creases read latency and reduces system performance. We show that eliminating latency penalty due to refresh can im-prove average performance by 7.2%. However, simply doing intelligent scheduling of refresh operations is ineffective at obtaining significant performance improvement. This paper provides an alternative and scalable option to reduce the latency penalty due to refresh. It exploits the prop-erty that each refresh operation in a typical DRAM device in-ternally refreshes multiple DRAM rows in JEDEC-based dis-tributed refresh mode. Therefore, a refresh operation has well defined points at which it can potentially be Paused to service a pending read request. Leveraging this property, we propose Refresh Pausing, a solution that is highly effective at allevi-ating the contention from refresh operations. It provides an average performance improvement of 5.1 % for 8Gb devices, and becomes even more effective for future high-density tech-nologies. We also show that Refresh Pausing significantly out-performs the recently proposed Elastic Refresh scheme. 1.
Half-Wits: Software Techniques for Low-Voltage Probabilistic Storage on Microcontrollers with NOR Flash Memory
"... This work analyzes the stochastic behavior of writing to embedded flash memory at voltages lower than recommended by a microcontroller’s specifications in order to reduce energy consumption. Flash memory integrated within a microcontroller typically requires the entire chip to operate on a common su ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
This work analyzes the stochastic behavior of writing to embedded flash memory at voltages lower than recommended by a microcontroller’s specifications in order to reduce energy consumption. Flash memory integrated within a microcontroller typically requires the entire chip to operate on a common supply voltage almost twice as much as what the CPU portion requires. Our software approach allows the flash memory to tolerate a lower supply voltage so that the CPU may operate in a more energy-efficient manner. Energyefficient coding algorithms then cope with flash memory writes that behave unpredictably. Our software-only coding algorithms (in-place writes, multiple-place writes, RS-Berger codes, andslow writes) enablereliablestorageatlowvoltagesonunmodifiedhardwarebyexploitingtheelectricallycumulative nature of half-written data in write-once bits. For a sensor monitoring application using the MSP430, coding with in-place writes reduces the overall energy consumption by 34%. In-place writes are competitive when the time spent on low-voltage operations such as computation are at least four times greater than the time spent on writes to flash memory. Our evaluation shows that tightly maintaining the digital abstraction for storage in embedded flash memory comes at a significant cost to energy consumption with minimal gain in reliability. We find our techniques most effective for embedded workloads that have significant duty cycling, rare writes, or energy harvesting.