Results 1  10
of
1,970
A RISCV Vector Processor with TightlyIntegrated SwitchedCapacitor DCDC Converters in 28nm FDSOI
"... This work demonstrates a RISCV vector microprocessor implemented in 28nm FDSOI with fullyintegrated noninterleaved switchedcapacitor DCDC (SCDCDC) converters and adaptive clocking that generates four onchip voltages between 0.5V and 1V using only 1.0V core and 1.8V IO voltage inputs. The desi ..."
Abstract
 Add to MetaCart
This work demonstrates a RISCV vector microprocessor implemented in 28nm FDSOI with fullyintegrated noninterleaved switchedcapacitor DCDC (SCDCDC) converters and adaptive clocking that generates four onchip voltages between 0.5V and 1V using only 1.0V core and 1.8V IO voltage inputs
Fast Parallel Algorithms for ShortRange Molecular Dynamics
 JOURNAL OF COMPUTATIONAL PHYSICS
, 1995
"... Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of interatomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular dyn ..."
Abstract

Cited by 653 (7 self)
 Add to MetaCart
Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of interatomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular
A 45nm 1.3GHz 16.7 DoublePrecision GFLOPS/W RISCV Processor with Vector Accelerators
"... accelerators has been fabricated in a 45 nm SOI process. This is the first dualcore processor to implement the opensource RISCV ISA designed at the University of California, Berkeley. In a standard 40 nm process, the RISCV scalar core scores 10% higher in DMIPS/MHz than the CortexA5, ARM’s comp ..."
Abstract
 Add to MetaCart
accelerators has been fabricated in a 45 nm SOI process. This is the first dualcore processor to implement the opensource RISCV ISA designed at the University of California, Berkeley. In a standard 40 nm process, the RISCV scalar core scores 10% higher in DMIPS/MHz than the CortexA5, ARM’s
A 45nm 1.3GHz 16.7 DoublePrecision GFLOPS/W RISCV Processor with Vector Accelerators
"... accelerators has been fabricated in a 45 nm SOI process. This is the first dualcore processor to implement the opensource RISCV ISA designed at the University of California, Berkeley. In a standard 40 nm process, the RISCV scalar core scores 10% higher in DMIPS/MHz than the CortexA5, ARM’s comp ..."
Abstract
 Add to MetaCart
accelerators has been fabricated in a 45 nm SOI process. This is the first dualcore processor to implement the opensource RISCV ISA designed at the University of California, Berkeley. In a standard 40 nm process, the RISCV scalar core scores 10% higher in DMIPS/MHz than the CortexA5, ARM’s
Improving Energy Efficiency and Reducing Code Size with RISCV Compressed
"... Delivering the instruction stream can be the largest source of energy consumption in a processor, yet looselyencoded RISC instruction sets are wasteful of instruction bandwidth. Aiming to improve the performance and energy efficiency of the RISCV ISA, this thesis proposes RISCV Compressed (RVC), ..."
Abstract

Cited by 6 (6 self)
 Add to MetaCart
Delivering the instruction stream can be the largest source of energy consumption in a processor, yet looselyencoded RISC instruction sets are wasteful of instruction bandwidth. Aiming to improve the performance and energy efficiency of the RISCV ISA, this thesis proposes RISCV Compressed (RVC
Linear Algebra Operators for GPU Implementation of Numerical Algorithms
 ACM Transactions on Graphics
, 2003
"... In this work, the emphasis is on the development of strategies to realize techniques of numerical computing on the graphics chip. In particular, the focus is on the acceleration of techniques for solving sets of algebraic equations as they occur in numerical simulation. We introduce a framework for ..."
Abstract

Cited by 324 (9 self)
 Add to MetaCart
for the implementation of linear algebra operators on programmable graphics processors (GPUs), thus providing the building blocks for the design of more complex numerical algorithms. In particular, we propose a stream model for arithmetic operations on vectors and matrices that exploits the intrinsic parallelism
Larrabee: a manycore x86 architecture for visual computing
 In SIGGRAPH ’08: ACM SIGGRAPH 2008 papers
, 2008
"... Abstract 123 This paper presents a manycore visual computing architecture code named Larrabee, a new software rendering pipeline, a manycore programming model, and performance analysis for several applications. Larrabee uses multiple inorder x86 CPU cores that are augmented by a wide vector proces ..."
Abstract

Cited by 279 (12 self)
 Add to MetaCart
Abstract 123 This paper presents a manycore visual computing architecture code named Larrabee, a new software rendering pipeline, a manycore programming model, and performance analysis for several applications. Larrabee uses multiple inorder x86 CPU cores that are augmented by a wide vector
Implementing sparse matrixvector multiplication on throughputoriented processors
 In SC ’09: Proceedings of the 2009 ACM/IEEE conference on Supercomputing
, 2009
"... Sparse matrixvector multiplication (SpMV) is of singular importance in sparse linear algebra. In contrast to the uniform regularity of dense linear algebra, sparse operations encounter a broad spectrum of matrices ranging from the regular to the highly irregular. Harnessing the tremendous potential ..."
Abstract

Cited by 142 (7 self)
 Add to MetaCart
Sparse matrixvector multiplication (SpMV) is of singular importance in sparse linear algebra. In contrast to the uniform regularity of dense linear algebra, sparse operations encounter a broad spectrum of matrices ranging from the regular to the highly irregular. Harnessing the tremendous
More iteration space tiling
 In Proceedings of the Supercomputing 89
, 1989
"... Subdividing the iteration space of a loop into blocks or tiles with a fixed maximum size has several advantages. Tiles become a natural candidate as the unit of work for parallel task scheduling. Synchronization between processors can be done between tiles, reducing synchronization frequency (at so ..."
Abstract

Cited by 207 (0 self)
 Add to MetaCart
Subdividing the iteration space of a loop into blocks or tiles with a fixed maximum size has several advantages. Tiles become a natural candidate as the unit of work for parallel task scheduling. Synchronization between processors can be done between tiles, reducing synchronization frequency (at
3dstacked memory architectures for multicore processors
 In International Symposium on Computer Architecture
"... Threedimensional integration enables stacking memory directly on top of a microprocessor, thereby significantly reducing wire delay between the two. Previous studies have examined the performance benefits of such an approach, but all of these works only consider commodity 2D DRAM organizations. In ..."
Abstract

Cited by 132 (7 self)
 Add to MetaCart
over previously proposed 3DDRAM approaches on our memoryintensive multiprogrammed workloads on a quadcore processor. The significant increase in memory system performance makes the L2 miss handling architecture (MHA) a new bottleneck, which we address by combining a novel data structure called
Results 1  10
of
1,970