Results 1 - 10
of
379
An adaptive, nonuniform cache structure for wire-delay dominated on-chip caches
- In International Conference on Architectural Support for Programming Languages and Operating Systems
, 2002
"... Growing wire delays will force substantive changes in the designs of large caches. Traditional cache architectures assume that each level in the cache hierarchy has a single, uniform access time. Increases in on-chip communication delays will make the hit time of large on-chip caches a function of a ..."
Abstract
-
Cited by 314 (39 self)
- Add to MetaCart
silicon area-by 13%, and comes within 13 % of an ideal minimal hit latency solution. 1.
Approaching Ideal NoC Latency with Pre-Configured Routes
- In NOCS
, 2007
"... In multi-core ASICs, processors and other compute engines need to communicate with memory blocks and other cores with latency as close as possible to the ideal of a direct buffered wire. However, current state of the art networkson-chip (NoCs) suffer, at best, latency of one clock cycle per hop. We ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
In multi-core ASICs, processors and other compute engines need to communicate with memory blocks and other cores with latency as close as possible to the ideal of a direct buffered wire. However, current state of the art networkson-chip (NoCs) suffer, at best, latency of one clock cycle per hop. We
Load Latency Tolerance In Dynamically Scheduled Processors
- JOURNAL OF INSTRUCTION LEVEL PARALLELISM
, 1998
"... This paper provides a quantitative evaluation of load latency tolerance in a dynamically scheduled processor. To determine the latency tolerance of each memory load operation, our simulations use flexible load completion policies instead of a fixed memory hierarchy that dictates the latency. Alth ..."
Abstract
-
Cited by 76 (2 self)
- Add to MetaCart
with an ideal memory system, between 1% and 71% of loads need to be satisfied within a single cycle and that up to 74% can be satisfied in as many as 32 cycles, depending on the benchmark and processor configuration. Load latency
Reducing Web Latency: the Virtue of Gentle Aggression
"... To serve users quickly, Web service providers build infrastructure closer to clients and use multi-stage transport connections. Although these changes reduce client-perceived round-trip times, TCP’s current mechanisms fundamentally limit latency improvements. We performed a measurement study of a la ..."
Abstract
-
Cited by 32 (7 self)
- Add to MetaCart
large Webservice provider and found that, while connections with no loss complete close to the ideal latency of one round-trip time, TCP’s timeoutdriven recovery causes transfers with loss to take five times longer on average. Inthispaper, wepresentthedesignofnovellossrecoverymechanisms for TCP
Express Virtual Channels: Towards the Ideal Interconnection Fabric
- in In Proceedings of ISCA-34
, 2007
"... Due to wire delay scalability and bandwidth limitations inherent in shared buses and dedicated links, packet-switched on-chip interconnection networks are fast emerging as the pervasive communication fabric to connect different processing elements in many-core chips. However, current state-ofthe-art ..."
Abstract
-
Cited by 80 (13 self)
- Add to MetaCart
by upto 38 % over an existing state-of-the-art packet-switched design. When compared to the ideal interconnect, EVCs add just two cycles to the no-load latency, and are within 14 % of the ideal throughput. Moreover, we show that the proposed design incurs a minimal hardware overhead while exhibiting
Load Value Approximation: Approaching the Ideal Memory Access Latency
"... Approximate computing recognizes that many applications can tolerate inexactness. These applications, which range from multimedia processing to machine learning, operate on inherently noisy and imprecise data. As a result, we can tradeoff some loss in output value integrity for improved processor pe ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
achieve high coverage while maintaining very low error in the application’s output. By exploiting the approximate nature of applications, we can draw closer to the ideal memory access latency. 1.
Sparrow: Distributed, Low Latency Scheduling
"... Large-scale data analytics frameworks are shifting towards shorter task durations and larger degrees of parallelism to provide low latency. Scheduling highly parallel jobs that complete in hundreds of milliseconds poses a major challenge for task schedulers, which will need to schedule millions of t ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
Large-scale data analytics frameworks are shifting towards shorter task durations and larger degrees of parallelism to provide low latency. Scheduling highly parallel jobs that complete in hundreds of milliseconds poses a major challenge for task schedulers, which will need to schedule millions
A critical role for the right fronto-insular cortex in switching between central-executive and default-mode networks.
- Proc Natl Acad Sci USA
, 2008
"... Cognitively demanding tasks that evoke activation in the brain's central-executive network (CEN) have been consistently shown to evoke decreased activation (deactivation) in the default-mode network (DMN). The neural mechanisms underlying this switch between activation and deactivation of larg ..."
Abstract
-
Cited by 178 (1 self)
- Add to MetaCart
-specific responses in the CEN, DMN, and SN. Activations in the CEN and SN were found to be accompanied by robust deactivation in the DMN at the movement transition [ Latency Analysis Reveals Early Activation of the rFIC Relative to the CEN and DMN. First, we identified differences in the latency of the event
Reducing Web Latency: the Virtue of Gentle Aggression
"... To serve users quickly, Web service providers build infrastruc-ture closer to clients and use multi-stage transport connections. Although these changes reduce client-perceived round-trip times, TCP’s current mechanisms fundamentally limit latency improve-ments. We performed a measurement study of a ..."
Abstract
- Add to MetaCart
large Web service provider and found that, while connections with no loss complete close to the ideal latency of one round-trip time, TCP’s timeout-driven recovery causes transfers with loss to take five times longer on average. In this paper, we present the design of novel loss recovery mech
Latency criticality aware onchip communication
- In Design, Automation Test in Europe Conference
"... Abstract-Packet-switched interconnect fabric is a promising on-chip communication solution for many-core architectures. It offers high throughput and excellent scalability for on-chip data and protocol transactions. The main problem posed by this communication fabric is the potentially-high and non ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
and hardware characterization demonstrate that, for latencycritical traffic, the proposed solution closely approximates the ideal interconnect even under heavy load while preserving throughput for both latency-critical and noncritical traffic.
Results 1 - 10
of
379