Results 1 -
8 of
8
Data remapping for design space optimization of embedded memory systems
- ACM Transactions in Embedded Computing Systems
, 2003
"... In this article, we present a novel linear time algorithm for data remapping, that is, (i) lightweight; (ii) fully automated; and (iii) applicable in the context of pointer-centric programming languages with dynamic memory allocation support. All previous work in this area lacks one or more of these ..."
Abstract
-
Cited by 25 (8 self)
- Add to MetaCart
In this article, we present a novel linear time algorithm for data remapping, that is, (i) lightweight; (ii) fully automated; and (iii) applicable in the context of pointer-centric programming languages with dynamic memory allocation support. All previous work in this area lacks one or more of these features. We proceed to demonstrate a novel application of this algorithm as a key step in optimizing the design of an embedded memory system. Specifically, we show that by virtue of locality enhancements via data remapping, we may reduce the memory subsystem needs of an application by 50%, and hence concomitantly reduce the associated costs in terms of size, power, and dollar-investment (61%). Such a reduction overcomes key hurdles in designing highperformance embedded computing solutions. Namely, memory subsystems are very desirable from a performance standpoint, but their costs have often limited their use in embedded systems. Thus, our innovative approach offers the intriguing possibility of compilers playing a significant role in exploring and optimizing the design space of a memory subsystem for an embedded design. To this end and in order to properly leverage the improvements afforded by a compiler optimization, we identify a range of measures for quantifying the cost-impact of popular notions of locality, prefetching, regularity of memory access and others. The proposed methodology will
Automated Design of Finite State Machine Predictors for Customized Processors
- In Annual International Symposium on Computer Architecture
, 2001
"... Customized processors use compiler analysis and design automation techniques to take a generalized architectural model and create a specific instance of it which is optimized to a given application or set of applications. These processors offer the promise of satisfying the high performance needs of ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Customized processors use compiler analysis and design automation techniques to take a generalized architectural model and create a specific instance of it which is optimized to a given application or set of applications. These processors offer the promise of satisfying the high performance needs of the embedded community while simultaneously shrinking design times. Finite State Machines (FSM) are a fundamental building block in computer architecture, and are used to control and optimize all types of prediction and speculation, now even in the embedded space. They are used for branch prediction, cache replacement policies, and confidence estimation and accuracy counters for a variety of optimizations. In this paper, we present a framework for automated design of small FSM predictors for customized processors. Our approach can be used to automatically generate small FSM predictors to perform well over a suite of applications, tailored to a specific application, or even a specific instruction. We evaluate the use of these customized FSM predictors for branch prediction over a set of benchmarks.
Application-specific clustered vliw datapaths: Early exploration 32 on a parameterized design space
- IEEE Transactions on Computer
, 2002
"... Abstract—Specialized clustered very large instruction word (VLIW) processors combined with effective compilation techniques enable aggressive exploitation of the high instruction-level parallelism inherent in many embedded media applications, while unlocking a variety of possible performance/cost tr ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract—Specialized clustered very large instruction word (VLIW) processors combined with effective compilation techniques enable aggressive exploitation of the high instruction-level parallelism inherent in many embedded media applications, while unlocking a variety of possible performance/cost tradeoffs. In this work, the authors propose a methodology to support early design space exploration of clustered VLIW datapaths, in the context of a specific target application. They argue that, due to the large size and complexity of the design space, the early design space exploration phase should consider only design space parameters that have a first-order impact on two key physical figures of merit: clock rate and power dissipation. These parameters were found to be: maximum cluster capacity, number of clusters, and bus (interconnect) capacity. Experimental validation of their design space exploration algorithm shows that a thorough exploration of the complex design space can be performed very efficiently in this abstract parameterized design space. Moreover, an empirical study carried out on a representative set of computation-intensive benchmarks suggests that “penalties ” of clustered versus centralized datapaths are often minimal and that clustering indeed unlocks a variety of valuable design tradeoffs. Index Terms—Application-specific processors, compilers, custom VLIW datapaths, design space exploration, embedded systems, power optimization. I.
Algorithms for Compiler-Assisted Design Space Exploration of Clustered VLIW ASIP Datapaths
, 2001
"... Clustered Very Large Instruction Word Application-Specific Instruction Set Processors (VLIW ASIPs) combined with effective compilation techniques enable aggressive exploitation of the instruction level parallelism inherent in many embedded media applications, while unlocking a variety of possible pe ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Clustered Very Large Instruction Word Application-Specific Instruction Set Processors (VLIW ASIPs) combined with effective compilation techniques enable aggressive exploitation of the instruction level parallelism inherent in many embedded media applications, while unlocking a variety of possible performance/cost tradeoffs. In this dissertation we propose and validate an algorithm to support early design space exploration (DSE) over classes of datapaths, in the context of a specific target application, and carry out an empirical study for a set of representative benchmarks. We argue that at an early DSE phase one should use design space parameters that have a first-order impact on two key physical figures of merit: clock rate f and power dissipation P. We found these parameters to be: maximum cluster capacity (number of functional units in a cluster) NF, number of clusters NC, and the interconnect capacity NB. The experimental validation of our DSE algorithm shows that a thorough exploration of the complex design space can be performed very efficiently in this parameterized design space. Moreover, our case studies suggest that penalties of clustered versus nonclustered datapaths are often minimal and that clustering indeed unlocks a variety of valuable design alternatives. Our exploration methodology is enabled by an efficient algorithm for binding op-erations in a dataflow graph to the clusters of a datapath, so as to minimize latency and the number of data transfers. The algorithm utilizes effective cost and ranking functions that enable the exploration of complex tradeoffs between: (1) operation serialization, due to cluster overload; and (2) penalties incurred by data transfers, due to scattering opera-tions with data dependencies over different clusters. The core binding algorithm has shown robustness over a large set of datapaths and application kernels, and demonstrated up to 29% improvement in schedule latency, as compared to a state of the art advanced binding algorithm.
A framework for compiler driven design space exploration for embedded system customization
- In Proceedings of the 9th Asian Computing Science Conference
, 2004
"... Abstract. Designing custom solutions has been central to meeting a range of stringent and specialized needs of embedded computing, along such dimensions as physical size, power consumption, and performance that includes real-time behavior. For this trend to continue, we must find ways to overcome th ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. Designing custom solutions has been central to meeting a range of stringent and specialized needs of embedded computing, along such dimensions as physical size, power consumption, and performance that includes real-time behavior. For this trend to continue, we must find ways to overcome the twin hurdles of rising non-recurring engineering (NRE) costs and decreasing time-to-market windows by providing major improvements in designer productivity. This paper presents compiler directed design space exploration as a framework for articulating, formulating, and implementing global optimizations for embedded systems customization, where the design space is spanned by parametric representations of both candidate compiler optimizations and architecture parameters, and the navigation of the design space is driven by quantifiable, machine independent metrics. This paper describes the elements of such a framework and an example of its application. 1
Using Mobile Robotics to Teach Reconfigurable Computing
"... Abstract—Reconfigurable computing is becoming part of a large number of embedded systems since it efficiently adds both spatial and temporal computing. Its use intends to satisfy requirements (e.g., energy savings, performance, etc.) that are not fulfilled by traditional computing systems. As new FP ..."
Abstract
- Add to MetaCart
Abstract—Reconfigurable computing is becoming part of a large number of embedded systems since it efficiently adds both spatial and temporal computing. Its use intends to satisfy requirements (e.g., energy savings, performance, etc.) that are not fulfilled by traditional computing systems. As new FPGAs are becoming real computing platforms, integrating one or more microprocessors and complex components, the future developers of those systems may need to decide about the system organization, integrate reconfigurable hardware components, and develop the software for those microprocessors. Although designing reconfigurable hardware is softening, there is still the need to master digital systems design in order to accomplish most system requirements. This paper presents a proposal and some experiences already conducted on teaching reconfigurable computing to undergraduate and graduate computer science students. Applications of mobile robotics are used to motivate students, and also because they include most embedded system requirements.
In Proceedings of the 28th International Symposium on Computer Architecture (ISCA), June 2001.
, 2001
"... Customized processors use compiler analysis and design automation techniques to take a generalized architectural model and create a specific instance of it which is optimized to a given application or set of applications. These processors offer the promise of satisfying the high performance needs of ..."
Abstract
- Add to MetaCart
Customized processors use compiler analysis and design automation techniques to take a generalized architectural model and create a specific instance of it which is optimized to a given application or set of applications. These processors offer the promise of satisfying the high performance needs of the embedded community while simultaneously shrinking design times.

