• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 356,212
Next 10 →

Evaluation of the SUN UltraSparc T2+ Processor for Computational Science

by Martin S, Sabri Pllana, Siegfried Benkner
"... Abstract. The Sun UltraSparc T2+ processor was designed for throughput com-puting and thread level parallelism. In this paper we evaluate its suitability for computational science. A set of benchmarks representing typical building blocks of scientific applications and a real-world hybrid MPI/OpenMP ..."
Abstract - Add to MetaCart
Abstract. The Sun UltraSparc T2+ processor was designed for throughput com-puting and thread level parallelism. In this paper we evaluate its suitability for computational science. A set of benchmarks representing typical building blocks of scientific applications and a real-world hybrid MPI

Multiscalar Processors

by Gurindar S. Sohi, Scott E. Breach, T. N. Vijaykumar - In Proceedings of the 22nd Annual International Symposium on Computer Architecture , 1995
"... Multiscalar processors use a new, aggressive implementation paradigm for extracting large quantities of instruction level parallelism from ordinary high level language programs. A single program is divided into a collection of tasks by a combination of software and hardware. The tasks are distribute ..."
Abstract - Cited by 585 (30 self) - Add to MetaCart
Multiscalar processors use a new, aggressive implementation paradigm for extracting large quantities of instruction level parallelism from ordinary high level language programs. A single program is divided into a collection of tasks by a combination of software and hardware. The tasks

The Design of the Microarchitecture of UltraSPARC-1

by Marc Tremblay, Dale Greenley, Kevin Normoyle - Massachusetts Institute of Technology , 1995
"... involves hundreds of person-years of conception, logic design, circuit design, layout drawing, etc. In order to leverage effectively the 5-10 millions of transistors available, careful microarchitecture tradeoff analysis must be pe@ormed. This paper describes not only the microarchitecture of UltraS ..."
Abstract - Cited by 20 (0 self) - Add to MetaCart
involves hundreds of person-years of conception, logic design, circuit design, layout drawing, etc. In order to leverage effectively the 5-10 millions of transistors available, careful microarchitecture tradeoff analysis must be pe@ormed. This paper describes not only the microarchitecture of UltraSPARC

Efficient Cycle-Accurate Simulation of the UltraSPARC

by Peter Strazdins, Bill Clarke, Andrew Over - III CPU,” in CRPITS ’07: Proceedings of the Thirtieth Australasian Conference on Computer Science , 2007
"... This paper presents a novel technique for cycleaccurate simulation of the Central Processing Unit (CPU) of a modern superscalar processor, the Ultra-SPARC III Cu processor. The technique is based on adding a module to an existing fetch-decode-execute style of CPU simulator, rather than the tradition ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
This paper presents a novel technique for cycleaccurate simulation of the Central Processing Unit (CPU) of a modern superscalar processor, the Ultra-SPARC III Cu processor. The technique is based on adding a module to an existing fetch-decode-execute style of CPU simulator, rather than

Performance Nonmonotonicities: A Case Study of the UltraSPARC Processor

by Nathaniel A. Kushman, Volker Strumpen, Charles E. Leiserson, Arthur C. Smith , 1998
"... Modern microprocessor architectures are very complex designs. Consequently, they exhibit many idiosyncrasies. In fact, situations exist in which the addition or removal of a single instruction changes the performance of a program by a factor of 3 to 4. I call such situations performance anomalies. A ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
on the market. Through a case study of the SUN UltraSPARC, I showhow these anomalies can be concealed, although only limited information is provided by the vendor. I explain the cause of four performance anomalies observed on the UltraSPARC, and present an algorithm to conceal each of them. I implemented

System design methodology of UltraSPARC

by Lawrence Yang, David Gao, Jamshid Mostoufi, Raju Joshi, Paul Loewenstein - 32nd Design Automation Conference Proceedings (in press
"... Abstract- Increasing complexity of microprocessor-based sys-tems puts pressure on a product’s time-to-market. We describe a methodology used in designing the system interface of UltraSPARC-I. This methodology allowed us to define the system interface architecture, verify the functionality, perform t ..."
Abstract - Cited by 11 (0 self) - Add to MetaCart
Abstract- Increasing complexity of microprocessor-based sys-tems puts pressure on a product’s time-to-market. We describe a methodology used in designing the system interface of UltraSPARC-I. This methodology allowed us to define the system interface architecture, verify the functionality, perform

Garp: A MIPS Processor with a Reconfigurable Coprocessor

by John R. Hauser , John Wawrzynek , 1997
"... Typical reconfigurable machines exhibit shortcomings that make them less than ideal for general-purpose computing. The Garp Architecture combines reconfigurable hardware with a standard MIPS processor on the same die to retain the better features of both. Novel aspects of the architecture are presen ..."
Abstract - Cited by 402 (6 self) - Add to MetaCart
are presented, as well as a prototype software environment and preliminary performance results. Compared to an UltraSPARC, a Garp of similar technology could achieve speedups ranging from a factor of 2 to as high as a factor of 24 for some useful applications.

Performance Nonmonotonicities: A Case Study of the UltraSPARC Processor

by Volker Strumpen, Nathaniel A. Kushman, Nathaniel A. Kushman , 1998
"... Modern microprocessor architectures are very complex designs. Consequently, they exhibit many idiosyncrasies. In fact, situations exist in which the addition or removal of a single instruction changes the performance of a program by a factor of 3 to 4. I call such situations performance anomalies. A ..."
Abstract - Add to MetaCart
on the market. Through a case study of the SUN UltraSPARC, I show how these anomalies can be concealed, although only limited information is provided by the vendor. I explain the cause of four performance anomalies observed on the UltraSPARC, and present an algorithm to conceal each of them. I implemented

Complexityeffective superscalar processors

by Subbarao Palacharla, J. E. Smith - In Proceedings of the 24th annual international symposium on Computer architecture , 1997
"... The performance tradeoff between hardware complexity and clock speed is studied. First, a generic superscalar pipeline is de-fined. Then the specific areas of register renaming, instruction win-dow wakeup and selection logic, and operand bypassing are ana-lyzed. Each is modeled and Spice simulated f ..."
Abstract - Cited by 459 (5 self) - Add to MetaCart
The performance tradeoff between hardware complexity and clock speed is studied. First, a generic superscalar pipeline is de-fined. Then the specific areas of register renaming, instruction win-dow wakeup and selection logic, and operand bypassing are ana-lyzed. Each is modeled and Spice simulated for feature sizes of 0:8m, 0:35m, and 0:18m. Performance results and trends are expressed in terms of issue width and window size. Our analysis in-dicates that window wakeup and selection logic as well as operand bypass logic are likely to be the most critical in the future. A microarchitecture that simplifies wakeup and selection logic is proposed and discussed. This implementation puts chains of de-pendent instructions into queues, and issues instructions from mul-tiple queues in parallel. Simulation shows little slowdown as com-pared with a completely flexible issue window when performance is measured in clock cycles. Furthermore, because only instructions at queue heads need to be awakened and selected, issue logic is simpli-fied and the clock cycle is faster – consequently overall performance is improved. By grouping dependent instructions together, the pro-posed microarchitecture will help minimize performance degrada-tion due to slow bypasses in future wide-issue machines. 1

A survey of general-purpose computation on graphics hardware

by John D. Owens, David Luebke, Naga Govindaraju, Mark Harris, Jens Krüger, Aaron E. Lefohn, Tim Purcell , 2007
"... The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability, have made graphics hardware acompelling platform for computationally demanding tasks in awide variety of application domains. In this report, we describe, summarize, and analyze the l ..."
Abstract - Cited by 545 (18 self) - Add to MetaCart
the latest research in mapping general-purpose computation to graphics hardware. We begin with the technical motivations that underlie general-purpose computation on graphics processors (GPGPU) and describe the hardware and software developments that have led to the recent interest in this field. We then aim
Next 10 →
Results 1 - 10 of 356,212
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University