Results 1 - 10
of
2,781
SMARTS: Accelerating Microarchitecture Simulation via Rigorous Statistical Sampling
- in Proceedings of the 30th annual international symposium on Computer architecture
, 2003
"... Current software-based microarchitecture simulators are many orders of magnitude slower than the hardware they simulate. Hence, most microarchitecture design studies draw their conclusions from drastically truncated benchmark simulations that are often inaccurate and misleading. This paper presents ..."
Abstract
-
Cited by 258 (25 self)
- Add to MetaCart
benchmarks, running with average speedups of 35 and 60 over detailed simulation of 8-way and 16-way out-of-order processors, respectively. 1.
Statistical sampling of microarchitecture simulation
- In 20th International Parallel and Distributed Processing Symposium (IPDPS
, 2006
"... Current software-based microarchitecture simulators are many orders of magnitude slower than the hardware they simulate. Hence, most microarchitecture design studies draw their conclusions from drastically truncated benchmark simulations that are often inaccurate and misleading. This article present ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
benchmarks, running with average speedups of 35 and 60 over detailed simulation of 8-way and 16-way out-of-order processors, respectively. Categories and Subject Descriptors: C.4 [Performance of Systems]—Measurement techniques,
Out-of-order commit processors
- In Proceedings of the 10th International Symposium on High Performance Computer Architecture
, 2004
"... Modern out-of-order processors tolerate long latency memory operations by supporting a large number of inflight instructions. This is particularly useful in numerical applications where branch speculation is normally not a problem and where the cache hierarchy is not capable of delivering the data s ..."
Abstract
-
Cited by 59 (12 self)
- Add to MetaCart
Modern out-of-order processors tolerate long latency memory operations by supporting a large number of inflight instructions. This is particularly useful in numerical applications where branch speculation is normally not a problem and where the cache hierarchy is not capable of delivering the data
Caches in Out-of-order Processors
"... Non-blocking caches are an effective technique for tolerating cache-miss latency. They can reduce miss-induced processor stalls by buffering the misses and continuing to serve other independent access requests. Previous research on the complexity and performance of non-blocking caches supporting non ..."
Abstract
- Add to MetaCart
-ahead capability, a perfect branch predictor, fixed 16-cycle memory latency, single-cycle latency for floating point operations, and write-through and write-no-allocate caches. These assumptions are very different from today's high performance out-of-order processors such as the Intel Nehalem. Thus
Out-of-Order Superscalar Processor
"... apport de recherche ISSN 0249-6399A structural model for WCET estimation of Simple ..."
Abstract
- Add to MetaCart
apport de recherche ISSN 0249-6399A structural model for WCET estimation of Simple
Out-of-Order Vector Architectures
, 1997
"... Register renaming and out-of-order instruction issue are now commonly used in superscalar processors. These techniques can also be used to significant advantage in vector processors, as this paper shows. Performance is improved and available memory bandwidth is used more effectively. Using a trace d ..."
Abstract
-
Cited by 59 (21 self)
- Add to MetaCart
Register renaming and out-of-order instruction issue are now commonly used in superscalar processors. These techniques can also be used to significant advantage in vector processors, as this paper shows. Performance is improved and available memory bandwidth is used more effectively. Using a trace
Runahead execution: An alternative to very large instruction windows for out-of-order processors
- In HPCA-9
, 2003
"... Today’s high performance processors tolerate long latency operations by means of out-of-order execution. However, as latencies increase, the size of the instruction window must increase even faster if we are to continue to tolerate these latencies. We have already reached the point where the size of ..."
Abstract
-
Cited by 175 (22 self)
- Add to MetaCart
of an instruction window that can handle these latencies is prohibitively large, in terms of both design complexity and power consumption. And, the problem is getting worse. This paper proposes runahead execution as an effective way to increase memory latency tolerance in an out-of-order processor, without
Out-of-Order Execution of . . .
, 1993
"... : The superscalar execution model extracts independent instructions from a restricted window. When pipeline latencies go beyond some limit, out-oforder execution becomes necessary to fully exploit the independency of instructions. On the other hand, multithreading merges the execution of independent ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
of independent instruction flows. The simultaneous use of these two techniques is expected to produce a high degree of concurrent activities in future processors. But, the codes must be interruptible and restartable. Classical program state construction is incompatible with out-of-order of execution
Performance of Database Workloads on Shared-Memory Systems with Out-of-Order Processors
, 1998
"... Database applications such as online transaction processing (OLTP) and decision support systems (DSS) constitute the largest and fastest-growing segment of the market for multiprocessor servers. However, most current system designs have been optimized to perform well on scientific and engineering wo ..."
Abstract
-
Cited by 97 (3 self)
- Add to MetaCart
with aggressive out-of-order processors, and considers simple optimizations that can provide further performance improvements. Our study is based on detailed simulations of the Oracle commercial database engine. The results show that the combination of out-of-order execution and multiple instruction issue
Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors
, 2001
"... The performance of out-of-order processors increases with the instruction window size. In conventional processors, the effective instruction window cannot be larger than the issue buffer. Determining which instructions from the issue buffer can be launched to the execution units is a timecritical op ..."
Abstract
-
Cited by 82 (0 self)
- Add to MetaCart
The performance of out-of-order processors increases with the instruction window size. In conventional processors, the effective instruction window cannot be larger than the issue buffer. Determining which instructions from the issue buffer can be launched to the execution units is a timecritical
Results 1 - 10
of
2,781