Results 1 - 10
of
19
Synthesis of application specific instructions for embedded DSP software
- IEEE Transactions on Computers
, 1999
"... Application specific instructions play an important role in reducing the required code size and increasing performance. This paper describes a new approach to generate application specific instructions for DSP applications. The proposed approach is based on a modified subset-sum problem, and can sup ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
Application specific instructions play an important role in reducing the required code size and increasing performance. This paper describes a new approach to generate application specific instructions for DSP applications. The proposed approach is based on a modified subset-sum problem, and can support multi-cycle complex instructions as well as single cycle instructions, while the previous state-of-the-art approaches can generate only the singlecycle instructions or can just select instructions from the fixed super-set of possible instructions. In addition, the proposed approach can also be applicable to the case that instructions are predefined. The experimental results on real applications show that the proposed approach is effective in making the instructions meet the given constraints without attaching special hardware accelerators. 1
Hardware/Software co-design of the digital telecommunication systems
- Proceedings of the IEEE
, 1997
"... In this paper we reflect on the nature of digital telecommunication systems. We argue that these systems require, by nature, a heterogeneous specification and an implementation with heterogeneous architectural styles. CoWare is a hardware/software co-design environment based on a data model that all ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
In this paper we reflect on the nature of digital telecommunication systems. We argue that these systems require, by nature, a heterogeneous specification and an implementation with heterogeneous architectural styles. CoWare is a hardware/software co-design environment based on a data model that allows to specify, simulate, and synthesize heterogeneous hardware/software architectures from a heterogeneous specification. CoWare is based on the principle of encapsulation of existing hardware and software compilers and special attention is paid to the interactive synthesis of hardware/software and hardware/hardware interfaces. The principles of CoWare will be illustrated by the design process of a spread-spectrum receiver for a pager system. I.
Hardware/software partitioning for multi-function systems
- In Proceedings of the 1997 IEEE/ACM international Conference on Computer-Aided Design
, 1997
"... Abstract—We are interested in optimizing the design of multifunction embedded systems such as multistandard audio/video codecs and multisystem phones. Such systems run a prespecified set of applications, and any “one ” of the applications is selected at a run time, depending on system parameters. Ou ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
Abstract—We are interested in optimizing the design of multifunction embedded systems such as multistandard audio/video codecs and multisystem phones. Such systems run a prespecified set of applications, and any “one ” of the applications is selected at a run time, depending on system parameters. Our goal is to develop a methodology for the efficient design of such systems. A key observation underlying our method is that it may not be efficient to design for each application separately. This is attributed to two factors. First, considering each application in isolation can lead to application-specific decisions that do not necessarily lead to the best overall system solution. Second, these applications typically tend to have several commonalities among them, and considering applications independently may lead to inconsistent mappings of common tasks in different applications. Our approach is to optimize jointly across the set
Procedure Exlining: A Transformation for Improved System and Behavioral Synthesis
- In International Symposium on System Synthesis
, 1995
"... We present techniques for solving the inverse problem of procedure inlining, namely the problem of replacing sequences of statements with procedure calls. Two techniques are provided, one for finding redundant sequences of statements that can be replaced by calls to one procedure, and another for di ..."
Abstract
-
Cited by 20 (12 self)
- Add to MetaCart
We present techniques for solving the inverse problem of procedure inlining, namely the problem of replacing sequences of statements with procedure calls. Two techniques are provided, one for finding redundant sequences of statements that can be replaced by calls to one procedure, and another for dividing a large set of statements into several procedures, where each procedure performs a distinct computation. Such procedure exlining can transform a behavioral specification, originally written for readability, into a specification that can be implemented efficiently, because procedures can greatly improve the results of synthesis tools. We demonstrate the usefulness of the techniques on several examples. 1 Introduction A functional specification serves the purpose of precisely defining a system's intended behavior. Such a specification usually will be read by humans as well as input to synthesis tools. Unfortunately, a specification written for readability may not directly lead to the ...
Hierarchical design space exploration for a class of digital systems
- IEEE Transactions on VLSI, v
, 1993
"... i Hierarchical Design Space Exploration for a Class of Digital Systems Abstract This paper presents an architectural synthesis approach for a widely used class of digital systems characterized by inherent regularity in their description. This approach relies on a novel modeling or abstraction of the ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
i Hierarchical Design Space Exploration for a Class of Digital Systems Abstract This paper presents an architectural synthesis approach for a widely used class of digital systems characterized by inherent regularity in their description. This approach relies on a novel modeling or abstraction of the problem domain to facilitate a hierarchical solution method. The modeling is based on exploiting the inherent regularity in the system description to cluster its behavioral operations. The method emphasizes prudent postponement of design decisions until enough physical design information is available to estimate layout effects like wiring; we use well-known area-delay estimators for this purpose. The approach has the advantage that it keeps track of a set of potentially good candidate solutions, rather than narrowing down to a single solution very early in the design process. Through an extensive set of experiments on well known DSP design examples, we demonstrate the advantages that such distinctive features have to offer; the impact of hierarchy on several important issues like interconnection area, extent of design space explored, etc. is presented.
Design of heterogeneous IC’s for mobile and personal communication systems
- IN PROC. IEEE INT. CONF. ON COMPUTER-AIDED DESIGN, ICCAD’94
, 1994
"... Mobile and personal communication systems form key market areas for the electronics industry of the nineties. Stringent requirements in terms of flexibility, performance and power dissipation, aredriving the development of integrated circuits into the direction of heterogeneous single-chip solutions ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
Mobile and personal communication systems form key market areas for the electronics industry of the nineties. Stringent requirements in terms of flexibility, performance and power dissipation, aredriving the development of integrated circuits into the direction of heterogeneous single-chip solutions. New IC architectures are emerging which contain the core of a powerful programmable processor, complemented with dedicated hardware, memory and interface structures. In this tutorial we will discuss the real-life design of a heterogeneous IC for an industrial telecom application: a reconfigurable mobile terminal for satellite communication. Based on this practical design experience, we will subsequently discuss a methodology for the design of heterogeneous ICs. Design steps that will be addressed include: system specification and refinement, data path and communication synthesis, and code generation for embedded processor cores.
Local watermarks: methodology and application to behavioral synthesis
- IEEE International Conference on Computer-Aided Design
, 1999
"... Recently, a number of techniques for IP protection have been introduced that rely on a selection of a global solution to an optimization problem according to a unique user-specific digital signature. Although such techniques may provide convincing proof of authorship with low hardware overhead, they ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
Recently, a number of techniques for IP protection have been introduced that rely on a selection of a global solution to an optimization problem according to a unique user-specific digital signature. Although such techniques may provide convincing proof of authorship with low hardware overhead, they fail to protect parts of design, do not provide an easy procedure for watermark detection, and are not capable of detecting the watermark when the design or its part is aug-mented in another larger design. Since these demands are of the highest interest for the IP business, we introduce lo-calized watermarking as an IP protection technique that enables these features while satisfying the demand for low-cost and transparency. We propose a set of protocols that implement the new watermarking methodology at the oper-ation scheduling design level. We have demonstrated that the difficulty of erasing or finding another signature in the synthesized design can be made arbitrarily computationally difficult. The watermarking method has been tested on a set of real-life benchmarks where high likelihood of author-ship has been achieved with negligible overhead in solution quality. 1
Flexible Synthesis of Hierarchy with PMOSS
, 1995
"... This paper introduces the Paderborn Modular Synthesis System, PMOSS [GeHo94]. The proposed flexible methodology for handling function hierarchy can be used for High-Level Synthesis and HW/SW-Codesign. PMOSS is build on top of the Paderborn Synthesis Format, PSF [HoCa92]. Its graph based representati ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper introduces the Paderborn Modular Synthesis System, PMOSS [GeHo94]. The proposed flexible methodology for handling function hierarchy can be used for High-Level Synthesis and HW/SW-Codesign. PMOSS is build on top of the Paderborn Synthesis Format, PSF [HoCa92]. Its graph based representation consists out of a behavioral and a structural layer. These layers are a data base for different synthesis applications. Synthesis and Codesign in PMOSS are performed in a modular fashion (i.e. all stages of the synthesis process are highly parameterized) to provide higher flexibility during synthesis. To support the synthesis of specifications from the software domain (e.g., obtained from a HW/ SW- Partitioning process [Hardt95]), PMOSS supports a new methodology to handle large applications with medium throughput requirements. Besides the `classical' handling of function-hierarchy, inlining (flattening the design) and synthesis as separate entities, PMOSS also allows to synthesize functions as coroutines: Based on the assumption, that there is only one thread (i.e. at any time there is only one active function), a global set of resources can be used for allocation while preserving the hierarchy of the specification. The most important feature for hierarchy handling in PMOSS is the automatic restructuring mechanism: Based on a new metric for dataflow- and controlflow-similarity, parts of the specification can be merged into a new level of hierarchy. Thus specifications can be compressed and handled as a set of individual entities or coroutines. 2 Related Work
Application-specific clustered vliw datapaths: Early exploration 32 on a parameterized design space
- IEEE Transactions on Computer
, 2002
"... Abstract—Specialized clustered very large instruction word (VLIW) processors combined with effective compilation techniques enable aggressive exploitation of the high instruction-level parallelism inherent in many embedded media applications, while unlocking a variety of possible performance/cost tr ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract—Specialized clustered very large instruction word (VLIW) processors combined with effective compilation techniques enable aggressive exploitation of the high instruction-level parallelism inherent in many embedded media applications, while unlocking a variety of possible performance/cost tradeoffs. In this work, the authors propose a methodology to support early design space exploration of clustered VLIW datapaths, in the context of a specific target application. They argue that, due to the large size and complexity of the design space, the early design space exploration phase should consider only design space parameters that have a first-order impact on two key physical figures of merit: clock rate and power dissipation. These parameters were found to be: maximum cluster capacity, number of clusters, and bus (interconnect) capacity. Experimental validation of their design space exploration algorithm shows that a thorough exploration of the complex design space can be performed very efficiently in this abstract parameterized design space. Moreover, an empirical study carried out on a representative set of computation-intensive benchmarks suggests that “penalties ” of clustered versus centralized datapaths are often minimal and that clustering indeed unlocks a variety of valuable design tradeoffs. Index Terms—Application-specific processors, compilers, custom VLIW datapaths, design space exploration, embedded systems, power optimization. I.
Algorithms for Compiler-Assisted Design Space Exploration of Clustered VLIW ASIP Datapaths
, 2001
"... Clustered Very Large Instruction Word Application-Specific Instruction Set Processors (VLIW ASIPs) combined with effective compilation techniques enable aggressive exploitation of the instruction level parallelism inherent in many embedded media applications, while unlocking a variety of possible pe ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Clustered Very Large Instruction Word Application-Specific Instruction Set Processors (VLIW ASIPs) combined with effective compilation techniques enable aggressive exploitation of the instruction level parallelism inherent in many embedded media applications, while unlocking a variety of possible performance/cost tradeoffs. In this dissertation we propose and validate an algorithm to support early design space exploration (DSE) over classes of datapaths, in the context of a specific target application, and carry out an empirical study for a set of representative benchmarks. We argue that at an early DSE phase one should use design space parameters that have a first-order impact on two key physical figures of merit: clock rate f and power dissipation P. We found these parameters to be: maximum cluster capacity (number of functional units in a cluster) NF, number of clusters NC, and the interconnect capacity NB. The experimental validation of our DSE algorithm shows that a thorough exploration of the complex design space can be performed very efficiently in this parameterized design space. Moreover, our case studies suggest that penalties of clustered versus nonclustered datapaths are often minimal and that clustering indeed unlocks a variety of valuable design alternatives. Our exploration methodology is enabled by an efficient algorithm for binding op-erations in a dataflow graph to the clusters of a datapath, so as to minimize latency and the number of data transfers. The algorithm utilizes effective cost and ranking functions that enable the exploration of complex tradeoffs between: (1) operation serialization, due to cluster overload; and (2) penalties incurred by data transfers, due to scattering opera-tions with data dependencies over different clusters. The core binding algorithm has shown robustness over a large set of datapaths and application kernels, and demonstrated up to 29% improvement in schedule latency, as compared to a state of the art advanced binding algorithm.

