Results 1 - 10
of
115
Practical Off-chip Meta-data for Temporal Memory Streaming
"... Prior research demonstrates that temporal memory streaming and related address-correlating prefetchers improve performance of commercial server workloads though increased memory level parallelism. Unfortunately, these prefetchers require large on-chip meta-data storage, making previously-proposed de ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
-data off chip: minimal off-chip lookup latency, bandwidthefficient meta-data updates, and off-chip lookup amortized over many prefetches. In this work, we show: (1) minimal off-chip meta-data lookup latency can be achieved through a hardware-managed main memory hash table, (2) bandwidth-efficient updates
Layout-conscious Random Topologies for HPC Off-chip Interconnects
"... Abstract—As the scales of parallel applications and platforms increase the negative impact of communication latencies on performance becomes large. Random network topologies can be used to achieve low hop counts between nodes and thus low latency. However, random topologies lead to increased aggrega ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
Abstract—As the scales of parallel applications and platforms increase the negative impact of communication latencies on performance becomes large. Random network topologies can be used to achieve low hop counts between nodes and thus low latency. However, random topologies lead to increased
Memory bandwidth limitations of future microprocessors
- IN PROCEEDINGS OF THE 23RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE
, 1996
"... This paper makes the case that pin bandwidth will be a critical consideration for future microprocessors. We show that many of the techniques used to tolerate growing memory latencies do so at the expense of increased bandwidth requirements. Using a decomposition of execution time, we show that for ..."
Abstract
-
Cited by 226 (12 self)
- Add to MetaCart
limitations will make more complex on-chip caches cost-effective. For example, flexible caches may allow individual applications to choose from a range of caching policies. In the long term, we predict that off-chip accesses will be so expensive that all system memory will reside on one or more processor
Dynamic multiway segment tree for IP lookups and the fast pipelined search engine
- IEEE Transactions on Computers
, 2010
"... Abstract—A dynamic multiway segment tree (DMST) is proposed for IP lookups in this paper. DMST is designed for dynamic routing tables that can dynamically insert and delete prefixes. DMST is implemented as a B-tree that has all distinct end points of ranges as its keys. The complexities of search, i ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
.7 million packets per second (Mpps) with 144-bit and 288-bit wide SRAM blocks, respectively. Furthermore, a straightforward extension of the pipelined search engine with multiple independent off-chip SRAMs can achieve the throughput of 200 Mpps which is equivalent to 102 Gbps for minimal Ethernet packets
miNI: Minimizing Network Interface Memory Requirements with Dynamic Handle Lookup
, 2002
"... Recent work in low-latency, high-bandwidth communication systems has resulted in building Network Interface Controllers (NIC) and communication abstractions that support direct access from the NIC to application virtual memory to avoid both data copies and operating system intervention. Such mechani ..."
Abstract
- Add to MetaCart
Recent work in low-latency, high-bandwidth communication systems has resulted in building Network Interface Controllers (NIC) and communication abstractions that support direct access from the NIC to application virtual memory to avoid both data copies and operating system intervention
A System-C based Microarchitectural Exploration Framework for Latency, Power and Performance Trade-offs of On-Chip Interconnection Networks
"... Abstract — We describe a System-C based framework we are developing, to explore the impact of various architectural and microarchitectural level parameters of the on-chip interconnection network elements on its power and performance. The framework enables one to choose from a variety of architectura ..."
Abstract
- Add to MetaCart
-Response messages (mimicing cache accesses) and One-Way messages. We find that the average latency can be reduced by increasing the pipeline depth, as it enables higher link frequencies. We also find that there exists an optimum degree of pipelining which minimizes energy-delay product. I.
Optimal Implementation of Combinational Logic on Look-up Tables
"... Abstract — We present a methodology for optimally implementing combinational logic equations on networks of look-up tables. Our work effectively extends optimality to span logic minimization and technology mapping. We restrict ourselves to 4-input look-up tables (LUTs) and enumerate all possible cir ..."
Abstract
- Add to MetaCart
Abstract — We present a methodology for optimally implementing combinational logic equations on networks of look-up tables. Our work effectively extends optimality to span logic minimization and technology mapping. We restrict ourselves to 4-input look-up tables (LUTs) and enumerate all possible
1 Dynamic Multiway Segment Tree for IP Lookups and the Fast Pipelined Search Engine
"... Abstract- A dynamic multiway segment tree (DMST) is proposed for IP lookups in this paper. DMST is designed for dynamic routing tables that can dynamically insert and delete prefixes. DMST is implemented as a B-tree that has all distinct endpoints of ranges as its keys. The complexities of search, i ..."
Abstract
- Add to MetaCart
.7 million packets per second (Mpps) with 144-bit and 288-bit wide SRAM blocks, respectively. Furthermore, a straightforward extension of the pipelined search engine with multiple independent off-chip SRAMs can achieve the throughput of 200 Mpps which is equivalent to 102 Gbps for minimal Ethernet packets
Architecture and Performance Models for Scalable IP Lookup Engines on FPGA*
"... Abstract—We propose a unified methodology for optimizing IPv4 and IPv6 lookup engines based on the balanced range tree (BRTree) architecture on FPGA. A general BRTree-based IP lookup solution features one or more linear pipelines with a large and complex design space. To allow fast exploration of th ..."
Abstract
- Add to MetaCart
of the design space, we develop a concise set of performance models to characterize the tradeoffs among throughput, table size, lookup latency, and resource requirement of the IP lookup engine. In particular, a simple but realistic model of DDR3 memory is used to accurately estimate the off-chip memory
Photonic Networks-OnChip for Future Generations of Chip Multiprocessors
- IEEE Trans. Computing
, 2008
"... Abstract—The design and performance of next-generation chip multiprocessors (CMPs) will be bound by the limited amount of power that can be dissipated on a single die. We present photonic networks-on-chip (NoC) as a solution to reduce the impact of intrachip and off-chip communication on the overall ..."
Abstract
-
Cited by 91 (21 self)
- Add to MetaCart
Abstract—The design and performance of next-generation chip multiprocessors (CMPs) will be bound by the limited amount of power that can be dissipated on a single die. We present photonic networks-on-chip (NoC) as a solution to reduce the impact of intrachip and off-chip communication
Results 1 - 10
of
115