Results 1 - 10
of
14
Performance optimization of latency insensitive systems through buffer queue sizing of communication channels
- in Proc. Int. Conf. Computer Aided Design
, 2003
"... This paper proposes for latency insensitive systems a performance optimization technique called channel buffer queue sizing, which is performed after relay station insertion in the physical design stage. It can be shown that proper queue sizing can reduce or even completely avoid the performance los ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
This paper proposes for latency insensitive systems a performance optimization technique called channel buffer queue sizing, which is performed after relay station insertion in the physical design stage. It can be shown that proper queue sizing can reduce or even completely avoid the performance loss due to imbalanced relay stations insertion in reconvergent paths. Moreover, the problem of queue sizing and placement of the additional buffers for maximum performance is formulated and studied to properly allocate available chip areas in the layout to communication channels. An algorithm based on mixed integer linear programming is proposed. Experimental results show that queue sizing is effective in improving the performance of latency insensitive systems even under tight area constraints. Moreover, the proposed algorithm is sufficiently efficient in obtaining the optimal solution for systems of practical sizes. 1.
The Role of Back-Pressure in Implementing Latency-Insensitive Systems
- Electronic Notes in Theoretical Computer Science
, 2006
"... Back-pressure is a logical mechanism to control the flow of information on a communication channel of a latency-insensitive system (LIS) while guaranteeing that no packet is lost. Back-pressure is necessary for building open LISs and it represents an interesting design alternative also for closed LI ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Back-pressure is a logical mechanism to control the flow of information on a communication channel of a latency-insensitive system (LIS) while guaranteeing that no packet is lost. Back-pressure is necessary for building open LISs and it represents an interesting design alternative also for closed LISs because it makes possible to realize highly modular implementations with more predictable features in terms of design overhead (area, power). In discussing the role of back-pressure, we revisit the logic of the necessary building blocks, and explain the impact of the system topology on the system performance.
Issues in implementing latency insensitive protocols
- In Proc. of the Conf. on Design, Automation and Test in Europe
, 2004
"... The performance of future Systems-on-Chip will be limited by the latency of long interconnects requiring more than one clock cycle for the signals to propagate. To deal with the problem L. Carloni et alii proposed the Latency Insensitive Protocols (LIP). A design that works under the assumption of z ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
The performance of future Systems-on-Chip will be limited by the latency of long interconnects requiring more than one clock cycle for the signals to propagate. To deal with the problem L. Carloni et alii proposed the Latency Insensitive Protocols (LIP). A design that works under the assumption of zero-delay connections between functional modules is modified in a Latency Insensitive Design (LID) by encapsulating them within wrappers (“shells”) and connecting them through internally pipelined blocks (“relay stations”) complying with a protocol that guarantees identity of behavior [1]. The wrappers perform:- Data Validation: each output channel signals whether the datum therein present has still to be consumed.- Back Pressure: when the pearl is stopped the shell generates a stop signal sent in the opposite direction of inputs;- Clock Gating: a module waiting for new data and/or stopped keeps its present state. Such a protocol was implemented [2] through the introduction
Performance analysis of latency-insensitive systems
- IEEE Trans. Comput.-Aided Design Integr. Circuits Syst
, 2006
"... Abstract—This paper formally models and studies latencyinsensitive systems (LISs) through max-plus algebra. We introduce state traces to model behaviors of LISs and obtain a formally proved performance upper bound achievable by latencyinsensitive design. An implementation of the latency-insensitive ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Abstract—This paper formally models and studies latencyinsensitive systems (LISs) through max-plus algebra. We introduce state traces to model behaviors of LISs and obtain a formally proved performance upper bound achievable by latencyinsensitive design. An implementation of the latency-insensitive protocol that can provide robust communication through backpressure is also proposed. The intrinsic performance of the proposed implementation is acquired based on state traces. It is also proved that the proposed implementation can always reach the best performance achievable by latency-insensitive design. Index Terms—Back-pressure, latency-insensitive system, maxplus algebra, performance analysis, state trace.
Generalized Latency-Insensitive Systems for GALS Architectures. To appear in
- Proceedings of FMGALS’03, 2003
, 2003
"... Abstract. Latency-insensitive systems were recently proposed by Carloni et al. for the design of single-clock systems-on-a-chip (SoC’s) using predesigned IP blocks. The goal of this paper is to extend and generalize latency-insensitive systems in such a way that they can be applied to GALS architect ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. Latency-insensitive systems were recently proposed by Carloni et al. for the design of single-clock systems-on-a-chip (SoC’s) using predesigned IP blocks. The goal of this paper is to extend and generalize latency-insensitive systems in such a way that they can be applied to GALS architectures with multiple clocks. In particular, we propose two extensions. The first extension allows each synchronous module to treat its input and output channels in a much more flexible manner (i.e., greater decoupling). As a result, significant improvement in throughput as well as power consumption may be obtained. The second extension generalizes inter-module communication from point-to-point channels to more complex networks of arbitrary topologies. 1
Design, implementation, and validation of a new class of interface circuits for latency-insensitive design
- In International Conference on Formal Methods and Models for Codesign (MEMOCODE
, 2007
"... Abstract—With the arrival of nanometer technologies wire delays are no longer negligible with respect to gate delays, and timing-closure becomes a major challenge to System-on-Chip designers. Latency-insensitive design (LID) has been proposed as a “correct-by-construction ” design methodology to cop ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract—With the arrival of nanometer technologies wire delays are no longer negligible with respect to gate delays, and timing-closure becomes a major challenge to System-on-Chip designers. Latency-insensitive design (LID) has been proposed as a “correct-by-construction ” design methodology to cope with this problem. In this paper we present the design and implementation of a new class of interface circuits to support LID that offers substantial performance improvements with limited area overhead with respect to previous designs proposed in the literature. This claim is supported by the experimental results that we obtained completing semi-custom implementations of the three designs with a 90nm industrial standard-cell library. We also report on the formal verification of our design: using the NuSMV model checker we verified that the RTL synthesizable implementations of our LID interface circuits (relay stations specifications according to the theory of LID. I.
Performance Analysis with Confidence Intervals for Embedded Software Processes
- Proceedings of the International Symposium on System Synthesis (ISSS
, 2001
"... The choice of algorithms has a large impact on the performance of embedded real-time systems. Therefore, performance estimation of embedded software is vital in an early design phase. Consequently, high-level estimation techniques have been devised, but the accuracy of the estimations vary a lot dep ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The choice of algorithms has a large impact on the performance of embedded real-time systems. Therefore, performance estimation of embedded software is vital in an early design phase. Consequently, high-level estimation techniques have been devised, but the accuracy of the estimations vary a lot depending on the algorithm and its context. We address this problem by proposing an estimation technique that both estimates the performance and computes the expected accuracy. The accuracy is used to provide a confidence interval to the estimated performance. The estimation framework presented in this paper has been crafted to fit with the MASCOT environment, but the underlying techniques can also be applied to other high-level design exploration frameworks.
Combining Retiming and Recycling to Optimize the Performance of Synchronous Circuits
- IN 16TH SYMP. ON INTEGRATED CIRCUITS AND SYSTEM DESIGN (SBCCI
, 2003
"... Recycling was recently proposed as a system-level design technique to facilitate the building of complex System-on-Chips (SOC) by assembling pre-designed components. Recycling allows us to model the communication patterns among the components, analyze the impact of interconnect latency on the overal ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Recycling was recently proposed as a system-level design technique to facilitate the building of complex System-on-Chips (SOC) by assembling pre-designed components. Recycling allows us to model the communication patterns among the components, analyze the impact of interconnect latency on the overall data processing throughput, and manage computation/communication tradeoffs to optimize the performance of the system. In this paper, we present recycling as a circuit-level design technique for optimizing the performance of sequential circuits beyond what can be achieved by retiming. We also provide a theoretical framework to guide the simultaneous application of the two techniques. Our model identifies the conditions under which an optimally-retimed synchronous circuit can be further sped-up and determines the amount of the resulting performance gain.
Topology-based optimization of maximal sustainable throughput in a latency-insensitive system
- In Proceedings of the Design Automation Conference
, 2007
"... We consider the problem of optimizing the performance of a latency-insensitive system (LIS) where the addition of backpressure has caused throughput degradation. Previous works have addressed the problem of LIS performance in different ways. In particular, the insertion of relay stations and the siz ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We consider the problem of optimizing the performance of a latency-insensitive system (LIS) where the addition of backpressure has caused throughput degradation. Previous works have addressed the problem of LIS performance in different ways. In particular, the insertion of relay stations and the sizing of the input queues in the shells are the two main optimization techniques that have been proposed. We provide a unifying framework for this problem by outlining which approaches work for different system topologies, and highlighting counterexamples where some solutions do not work. We also observe that in the most difficult class of topologies, instances with the greatest throughput degradation are typically very amenable to simplifications. The contributions of this paper include a characterization of topologies that maintain optimal throughput with fixedsize queues and a heuristic for sizing queues that produces solutions close to optimal in a fraction of the time.
A Formal Modeling Framework for Deploying Synchronous Designs on Distributed Architectures
- In FMGALS 2003: Formal Methods for Globally Asynchronous Locally Asynchronous Architecture
, 2003
"... Synchronous specifications are appealing in the design of large scale hardware and software systems because of their properties that facilitate verification and synthesis. When the target architecture is a distributed system, implementing a synchronous specification as a synchronous design may b ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Synchronous specifications are appealing in the design of large scale hardware and software systems because of their properties that facilitate verification and synthesis. When the target architecture is a distributed system, implementing a synchronous specification as a synchronous design may be inefficient in terms of both size (memory for software implementations or area for hardware implementations) and performance. A more elaborate implementation style where the basic synchronous paradigm is adapted to distributed architectures by introducing elements of asynchrony is, hence, highly desirable. This approach has to conjugate the desire of maintaining the theoretical properties of synchronous designs with the efficiency of implementations where the constraints imposed by synchrony are relaxed. Two interesting avenues have been recently pursued to achieve this goal: -- Latency-insensitive protocols [9,10] motivated by hardware implementations, where long paths between the design components may introduce delays that force the overall clock of the system to run too slow in order to maintain synchronous behavior. This approach introduces additional elements in the design to allow the implementation to maintain the throughput that could have been achieved with communication delays of the same order of the clock of the subsystems at the price of additional latency.

