Results 1 - 10
of
30
Fast and Extensive System-Level Memory Exploration for ATM
- 10th International Symposium on System Synthesis (Isss97
, 1997
"... In this paper, our memory architecture exploration methodology and CAD techniques for network protocol applications are presented. Prototype tools have been implemented, and applied on part of an industrial ATM application to show how our novel approach can be used to easily and thoroughly explore t ..."
Abstract
-
Cited by 27 (7 self)
- Add to MetaCart
In this paper, our memory architecture exploration methodology and CAD techniques for network protocol applications are presented. Prototype tools have been implemented, and applied on part of an industrial ATM application to show how our novel approach can be used to easily and thoroughly explore the memory organization search space at the system-level. An extended, novel method for signal to memory assignment is proposed which takes into account memory access conflict constraints. The number of conflicts is first optimized by our flow-graph balancing technique. Significant power and area savings were obtained by performing the exploration thoroughly at each of the degrees of freedom in the global search space.
Flow Graph Balancing for Minimizing the Required Memory Bandwidth
- In ISSS, La Jolla, CA
, 1996
"... In this paper we present the problem of flow graph balancing for minimizing the required memory bandwidth. Our goal is to minimize the required memory bandwidth within the given cycle budget by adding ordering constraints to the flow graph. This allows the subsequent memory allocation and assignment ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
In this paper we present the problem of flow graph balancing for minimizing the required memory bandwidth. Our goal is to minimize the required memory bandwidth within the given cycle budget by adding ordering constraints to the flow graph. This allows the subsequent memory allocation and assignment tasks to come up with a cheaper memory architecture with less memories and memory ports. The effect of flow graph balancing is shown on an example. We show that it is important to take into account which data is being accessed in parallel, instead of only considering the number of simultaneous memory accesses. This leads to the optimization of a conflict graph. 1.
Data dependency size estimation for use in memory optimization
- IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
, 2003
"... Abstract — A novel storage requirement estimation methodology is presented for use in the early system design phases when the data transfer ordering is only partly fixed. At that stage, none of the existing estimation tools are adequate, as they either assume a fully specified execution order or ign ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
Abstract — A novel storage requirement estimation methodology is presented for use in the early system design phases when the data transfer ordering is only partly fixed. At that stage, none of the existing estimation tools are adequate, as they either assume a fully specified execution order or ignore it completely. This paper presents an algorithm for automated estimation of strict upper and lower bounds on the individual data dependency sizes in high level application code given a partially fixed execution ordering. In the overall estimation technique, this is followed by a detection of the maximally combined size of simultaneously alive dependencies, resulting in the overall storage requirement of the application. Using representative application demonstrators, we show how our techniques can effectively guide the designer to achieve a transformed specification with low storage requirement.
Reducing Memory Requirements of Nested Loops for Embedded Systems
- DAC 2001
, 2001
"... Most embedded systems have limited amount of memory. In contrast, the memory requirements of code (in particular loops) running on embedded systems is significant. This paper addresses the problem of estimating the amount of memory needed for transfers of data in embedded systems. The problem of est ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Most embedded systems have limited amount of memory. In contrast, the memory requirements of code (in particular loops) running on embedded systems is significant. This paper addresses the problem of estimating the amount of memory needed for transfers of data in embedded systems. The problem of estimating the region associated with a statement or the set of elements referenced by a statement during the execution of the entire set of nested loops is analyzed. A quantitative analysis of the number of elements referenced is presented; exact expressions for uniformly generated references and a close upper and lower bound for non-uniformly generated references are derived. In addition to presenting an algorithm that computes the total memory required, we discuss the effect of transformations on the lifetimes of array variables, i.e., the time between the first and last accesses to a given array location. A detailed analysis on the effect of unimodular transformations on data locality including the calculation of the maximum window size is discussed. The term maximum window size is introduced and quantitative expressions are derived to compute the window size. The smaller the value of the maximum window size, the higher the amount of data locality in the loop.
Synthesis of Application-Specific Memory Structures
, 1995
"... The benefits of a behavioral synthesis design methodology, including higher designer productivity and shorter time-to-market, are the results of allowing the designer to use more abstract and familiar specifications. One of the most common and familiar abstractions used by hardware and software desi ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
The benefits of a behavioral synthesis design methodology, including higher designer productivity and shorter time-to-market, are the results of allowing the designer to use more abstract and familiar specifications. One of the most common and familiar abstractions used by hardware and software designers is the array, which allows the specification of a set of values that have a unique index associated with each element. A behavioral synthesis tool that accepts specifications containing arrays must map the storage implied by the arrays to memory in an implementation of that specification. In many data-intensive applications, these memory design decisions have a larger impact on the cost, performance, and power of the implementation than any other design decision made by a behavioral synthesis tool. Most behavioral synthesis tools, however, fail to separate the concepts of array specification and memory implementation, which severely restricts the span of designs that can be explored gi...
Experiences with Enumeration of Integer Projections of Parametric Polytopes
- in Compiler Construction: 14th Int. Conf
, 2005
"... Many compiler optimization techniques depend on the ability to calculate the number of integer values that satisfy a given set of linear constraints. This count (the enumerator of a parametric polytope) is a function of the symbolic parameters that may appear in the constraints. In an extended probl ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
Many compiler optimization techniques depend on the ability to calculate the number of integer values that satisfy a given set of linear constraints. This count (the enumerator of a parametric polytope) is a function of the symbolic parameters that may appear in the constraints. In an extended problem (the "integer projection" of a parametric polytope), some of the variables that appear in the constraints may be existentially quantified and then the enumerated set corresponds to the projection of the integer points in a parametric polytope. This paper shows how to...
Counting integer points in parametric polytopes using Barvinok’s rational functions
- Algorithmica
, 2007
"... Abstract Many compiler optimization techniques depend on the ability to calculate the number of elements that satisfy certain conditions. If these conditions can be represented by linear constraints, then such problems are equivalent to counting the number of integer points in (possibly) parametric ..."
Abstract
-
Cited by 15 (6 self)
- Add to MetaCart
Abstract Many compiler optimization techniques depend on the ability to calculate the number of elements that satisfy certain conditions. If these conditions can be represented by linear constraints, then such problems are equivalent to counting the number of integer points in (possibly) parametric polytopes. It is well known that the enumerator of such a set can be represented by an explicit function consisting of a set of quasi-polynomials each associated with a chamber in the parameter space. Previously, interpolation was used to obtain these quasi-polynomials, but this technique has several disadvantages. Its worstcase computation time for a single quasi-polynomial is exponential in the input size, even for fixed dimensions. The worst-case size of such a quasi-polynomial (measured in bits needed to represent the quasi-polynomial) is also exponential in the input size. Under certain conditions this technique even fails to produce a solution. Our main contribution is a novel method for calculating the required quasipolynomials analytically. It extends an existing method, based on Barvinok’s decomposition,
Global Multimedia System Design Exploration using Accurate Memory Organization Feedback
, 1999
"... Successful exploration of system-level design decisions is impossible without fast and accurate estimation of the impact on the system cost. In most multimedia applications, the dominant cost factor is related to the organization of the memory architecture. This paper presents a systematic approach ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Successful exploration of system-level design decisions is impossible without fast and accurate estimation of the impact on the system cost. In most multimedia applications, the dominant cost factor is related to the organization of the memory architecture. This paper presents a systematic approach which allows effective system-level exploration of memory organization design alternatives, based on accurate feedback by using our earlier developed tools. The effectiveness of this approach is illustrated on an industrial application. Applying our approach, a substantial part of the design search space has been explored in a very short time, resulting in a cost-efficient solution which meets all design constraints.
Exploiting Off-Chip Memory Access Modes in High-Level Synthesis
, 1997
"... Memory-intensive behaviors often contain large arrays that are synthesized into off-chip memories. With the increasing gap between on-chip and off-chip memory access delays, it is imperative to exploit the efficient access mode features of modern-day memories (e.g., page-mode DRAMs) in order to alle ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Memory-intensive behaviors often contain large arrays that are synthesized into off-chip memories. With the increasing gap between on-chip and off-chip memory access delays, it is imperative to exploit the efficient access mode features of modern-day memories (e.g., page-mode DRAMs) in order to alleviate the memory bandwidth bottleneck. Our work addresses this issue by: (a) modeling realistic off-chip memory access modes for High-Level Synthesis (HLS), (b) presenting algorithms to infer applicability of HLS with these memory access modes, and (c) transforming input behavior to provide further memory access optimizations during HLS. We demonstrate the utility of our approach using a suite of memory-intensive benchmarks with a realistic DRAM library module. Experimental results show a significant performance improvement (more than 40%) as a result of our optimization techniques. 1 Introduction As library modules used in ASIC designs increase in complexity, existing high-level synthesis ...

