Results 1 - 10
of
356,212
Evaluation of the SUN UltraSparc T2+ Processor for Computational Science
"... Abstract. The Sun UltraSparc T2+ processor was designed for throughput com-puting and thread level parallelism. In this paper we evaluate its suitability for computational science. A set of benchmarks representing typical building blocks of scientific applications and a real-world hybrid MPI/OpenMP ..."
Abstract
- Add to MetaCart
Abstract. The Sun UltraSparc T2+ processor was designed for throughput com-puting and thread level parallelism. In this paper we evaluate its suitability for computational science. A set of benchmarks representing typical building blocks of scientific applications and a real-world hybrid MPI
Multiscalar Processors
- In Proceedings of the 22nd Annual International Symposium on Computer Architecture
, 1995
"... Multiscalar processors use a new, aggressive implementation paradigm for extracting large quantities of instruction level parallelism from ordinary high level language programs. A single program is divided into a collection of tasks by a combination of software and hardware. The tasks are distribute ..."
Abstract
-
Cited by 585 (30 self)
- Add to MetaCart
Multiscalar processors use a new, aggressive implementation paradigm for extracting large quantities of instruction level parallelism from ordinary high level language programs. A single program is divided into a collection of tasks by a combination of software and hardware. The tasks
The Design of the Microarchitecture of UltraSPARC-1
- Massachusetts Institute of Technology
, 1995
"... involves hundreds of person-years of conception, logic design, circuit design, layout drawing, etc. In order to leverage effectively the 5-10 millions of transistors available, careful microarchitecture tradeoff analysis must be pe@ormed. This paper describes not only the microarchitecture of UltraS ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
involves hundreds of person-years of conception, logic design, circuit design, layout drawing, etc. In order to leverage effectively the 5-10 millions of transistors available, careful microarchitecture tradeoff analysis must be pe@ormed. This paper describes not only the microarchitecture of UltraSPARC
Efficient Cycle-Accurate Simulation of the UltraSPARC
- III CPU,” in CRPITS ’07: Proceedings of the Thirtieth Australasian Conference on Computer Science
, 2007
"... This paper presents a novel technique for cycleaccurate simulation of the Central Processing Unit (CPU) of a modern superscalar processor, the Ultra-SPARC III Cu processor. The technique is based on adding a module to an existing fetch-decode-execute style of CPU simulator, rather than the tradition ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper presents a novel technique for cycleaccurate simulation of the Central Processing Unit (CPU) of a modern superscalar processor, the Ultra-SPARC III Cu processor. The technique is based on adding a module to an existing fetch-decode-execute style of CPU simulator, rather than
Performance Nonmonotonicities: A Case Study of the UltraSPARC Processor
, 1998
"... Modern microprocessor architectures are very complex designs. Consequently, they exhibit many idiosyncrasies. In fact, situations exist in which the addition or removal of a single instruction changes the performance of a program by a factor of 3 to 4. I call such situations performance anomalies. A ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
on the market. Through a case study of the SUN UltraSPARC, I showhow these anomalies can be concealed, although only limited information is provided by the vendor. I explain the cause of four performance anomalies observed on the UltraSPARC, and present an algorithm to conceal each of them. I implemented
System design methodology of UltraSPARC
- 32nd Design Automation Conference Proceedings (in press
"... Abstract- Increasing complexity of microprocessor-based sys-tems puts pressure on a product’s time-to-market. We describe a methodology used in designing the system interface of UltraSPARC-I. This methodology allowed us to define the system interface architecture, verify the functionality, perform t ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Abstract- Increasing complexity of microprocessor-based sys-tems puts pressure on a product’s time-to-market. We describe a methodology used in designing the system interface of UltraSPARC-I. This methodology allowed us to define the system interface architecture, verify the functionality, perform
Garp: A MIPS Processor with a Reconfigurable Coprocessor
, 1997
"... Typical reconfigurable machines exhibit shortcomings that make them less than ideal for general-purpose computing. The Garp Architecture combines reconfigurable hardware with a standard MIPS processor on the same die to retain the better features of both. Novel aspects of the architecture are presen ..."
Abstract
-
Cited by 402 (6 self)
- Add to MetaCart
are presented, as well as a prototype software environment and preliminary performance results. Compared to an UltraSPARC, a Garp of similar technology could achieve speedups ranging from a factor of 2 to as high as a factor of 24 for some useful applications.
Performance Nonmonotonicities: A Case Study of the UltraSPARC Processor
, 1998
"... Modern microprocessor architectures are very complex designs. Consequently, they exhibit many idiosyncrasies. In fact, situations exist in which the addition or removal of a single instruction changes the performance of a program by a factor of 3 to 4. I call such situations performance anomalies. A ..."
Abstract
- Add to MetaCart
on the market. Through a case study of the SUN UltraSPARC, I show how these anomalies can be concealed, although only limited information is provided by the vendor. I explain the cause of four performance anomalies observed on the UltraSPARC, and present an algorithm to conceal each of them. I implemented
Complexityeffective superscalar processors
- In Proceedings of the 24th annual international symposium on Computer architecture
, 1997
"... The performance tradeoff between hardware complexity and clock speed is studied. First, a generic superscalar pipeline is de-fined. Then the specific areas of register renaming, instruction win-dow wakeup and selection logic, and operand bypassing are ana-lyzed. Each is modeled and Spice simulated f ..."
Abstract
-
Cited by 459 (5 self)
- Add to MetaCart
The performance tradeoff between hardware complexity and clock speed is studied. First, a generic superscalar pipeline is de-fined. Then the specific areas of register renaming, instruction win-dow wakeup and selection logic, and operand bypassing are ana-lyzed. Each is modeled and Spice simulated for feature sizes of 0:8m, 0:35m, and 0:18m. Performance results and trends are expressed in terms of issue width and window size. Our analysis in-dicates that window wakeup and selection logic as well as operand bypass logic are likely to be the most critical in the future. A microarchitecture that simplifies wakeup and selection logic is proposed and discussed. This implementation puts chains of de-pendent instructions into queues, and issues instructions from mul-tiple queues in parallel. Simulation shows little slowdown as com-pared with a completely flexible issue window when performance is measured in clock cycles. Furthermore, because only instructions at queue heads need to be awakened and selected, issue logic is simpli-fied and the clock cycle is faster – consequently overall performance is improved. By grouping dependent instructions together, the pro-posed microarchitecture will help minimize performance degrada-tion due to slow bypasses in future wide-issue machines. 1
A survey of general-purpose computation on graphics hardware
, 2007
"... The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability, have made graphics hardware acompelling platform for computationally demanding tasks in awide variety of application domains. In this report, we describe, summarize, and analyze the l ..."
Abstract
-
Cited by 545 (18 self)
- Add to MetaCart
the latest research in mapping general-purpose computation to graphics hardware. We begin with the technical motivations that underlie general-purpose computation on graphics processors (GPGPU) and describe the hardware and software developments that have led to the recent interest in this field. We then aim
Results 1 - 10
of
356,212