Results 1 - 10
of
57
A Scalable Front-End Architecture for Fast Instruction Delivery
, 1999
"... In the pursuit of instruction-level parallelism, significant demands are placed on a processor's instruction delivery mechanism. Delivering the performance necessary to meet future processor execution targets requires that the performance of the instruction delivery mechanism scale with the exe ..."
Abstract
-
Cited by 74 (12 self)
- Add to MetaCart
In the pursuit of instruction-level parallelism, significant demands are placed on a processor's instruction delivery mechanism. Delivering the performance necessary to meet future processor execution targets requires that the performance of the instruction delivery mechanism scale
20.3 A Double-Precision Multiplier with Fine-Grained Clock-Gating Support for a First-Generation CELL Processor
"... A double-precision multiplier for floating-point and mediastreaming instructions in the first-generation CELL processor [1] on 90nm PD/SOI is reported. Multiplication by recoding and successive partial-product (PP) compression is completed in three 11FO4 cycles including merging with the aligner. Fi ..."
Abstract
- Add to MetaCart
A double-precision multiplier for floating-point and mediastreaming instructions in the first-generation CELL processor [1] on 90nm PD/SOI is reported. Multiplication by recoding and successive partial-product (PP) compression is completed in three 11FO4 cycles including merging with the aligner
Performance/Watt: The New Server Focus
- In Workshop on Design, Architecture and Simulation of Chip Multi-Processors
, 2005
"... Transaction processing has emerged as the killer application for commercial servers. Most servers are engaged in transactional workloads such as processing search requests, serving middleware, evaluating decisions, managing databases, and powering online commerce. Currently, commercial servers are b ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
, these ILP-focused processors have been primarily optimized to deliver maximum performance by employing high clock rates and large amounts of speculation. As a result, we are now at the point where the performance/Watt of subsequent generations of traditional ILP-focused processors on server workloads has
Scalable networking for next-generation computing platforms
- In Third Annual Workshop on System Area Networks (SAN-3
, 2004
"... Abstract — We propose a technology strategy for enabling applications to scale to next-generation levels of I/O scalability and communication performance on industry standard platforms. The strategy combines efficient packet processing and scalable I/O concurrency, potentially enabling Ethernet and ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Abstract — We propose a technology strategy for enabling applications to scale to next-generation levels of I/O scalability and communication performance on industry standard platforms. The strategy combines efficient packet processing and scalable I/O concurrency, potentially enabling Ethernet
Performance of Multi-Process and Multi-Thread Processing on Multi-core SMT Processors
"... Abstract—Many modern high-performance processors support multiple hardware threads in the form of multiple cores and SMT (Simultaneous Multi-Threading). Hence achieving good performance scalability of programs with respect to the numbers of cores (core scalability) and SMT threads in one core (SMT s ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
scalability) is critical. To identify a way to achieve higher performance on the multi-core SMT processors, this paper compares the performance scalability with two parallelization models (using multiple processes and using multiple threads in one process) on two types of hardware parallelism (core
A Micropower Biomedical Signal Processor
"... Abstract—This work presents a biomedical signal processor (BSP) with hybrid functional cores to optimize the power dissipation and system flexibility for mobile healthcare applications. Embedded with the biomedical core and a 32-bit RISC core, multi-features are extracted for classification and the ..."
Abstract
- Add to MetaCart
and the abnormal data are compressed. In addition, the crypto core secures both the data and wireless link protocols to protect the user privacy. This BSP chip is fabricated in a 90nm standard CMOS technology with core area of 1.17mm 2. To overcome the leakage in advanced technology, a duty-cycled clock generator
SIP server performance on multicore systems
"... This paper evaluates the performance of a popular open-source Session Initiation Protocol (SIP) server on three different multicore architectures. We examine the baseline performance and introduce three analysis-driven optimizations that involve increasing the number of slots in hash tables, an in- ..."
Abstract
- Add to MetaCart
result is an improvement in absolute performance on eight cores by a factor of 16 and a doubling of multicore scalability. Results somewhat vary across architectures but follow similar trends, indicating the generality of these optimizations. Introduction Multicore processors have emerged as the norm
Exception-Less System Calls for Event-Driven Servers
- PROCEEDINGS OF THE USENIX ANNUAL TECHNICAL CONFERENCE
, 2011
"... Event-driven architectures are currently a popular design choice for scalable, high-performance server applications. For this reason, operating systems have invested in efficiently supporting non-blocking and asynchronous I/O, as well as scalable event-based notification systems. We propose the use ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
) enabling multi-core execution for event-driven programs is easier, given that a single user-mode execution context can generate enough requests to keep multiple processors/cores busy with kernel execution. We present libflexsc, an asynchronous system call and notification library suitable for building
REDAC: Distributed, Asynchronous Redundancy in Shared Memory Servers
"... The emergence of multi-core architectures—driven by continued technology scaling—has led to concerns about increasing soft- and hard-error rates in commodity designs. Because modern chip designs consist of multiple high-speed clock domains, conventional lockstepped redundant execution is no longer p ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
scalable buffering for unchecked state updates, permitting the distribution of redundant execution across multiple nodes of a scalable shared-memory server. The REDAC mechanisms achieve high performance by enabling speculation across common serializing instructions and mitigating the effects of input
A Scalable Front-End Architecture for Fast Instruction Delivery
"... In the pursuit of instruction-level parallelism, significant demands are placed on a processor’s instruction delivery mechanism. Delivering the performance necessary to meet future processor execution targets requires that the performance of the instruction delivery mechanism scale with the executio ..."
Abstract
- Add to MetaCart
In the pursuit of instruction-level parallelism, significant demands are placed on a processor’s instruction delivery mechanism. Delivering the performance necessary to meet future processor execution targets requires that the performance of the instruction delivery mechanism scale
Results 1 - 10
of
57