• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 62
Next 10 →

Processor Affinity and MPI Performance on SMP-CMP Clusters

by Chi Zhang, Xin Yuan, Ashok Srinivasan
"... with multi-core Chip-Multiprocessors (CMP), also known as SMP-CMP clusters, are becoming ubiquitous today. For Message Passing interface (MPI) programs, such clusters have a multilayer hierarchical communication structure: the performance of intra-node communication is usually higher than that of in ..."
Abstract - Add to MetaCart
with multi-core Chip-Multiprocessors (CMP), also known as SMP-CMP clusters, are becoming ubiquitous today. For Message Passing interface (MPI) programs, such clusters have a multilayer hierarchical communication structure: the performance of intra-node communication is usually higher than

1Performance Analysis and Modeling of a Computational Biology Code on CMP Clusters (revised July 2008)

by Timothy D. Campbell
"... Abstract — The current trend in parallel computing systems is shifting towards cluster systems with CMPs (chip mul-tiprocessors). Further, the CMPs are usually configured hierarchically (e.g., multiple CMPs compose a multi-chip module and multiple multi-chip modules compose a node) to compose a node ..."
Abstract - Add to MetaCart
Abstract — The current trend in parallel computing systems is shifting towards cluster systems with CMPs (chip mul-tiprocessors). Further, the CMPs are usually configured hierarchically (e.g., multiple CMPs compose a multi-chip module and multiple multi-chip modules compose a node) to compose a

Computer Science- Research and Development manuscript No. (will be inserted by the editor) Predictive Analysis of a Hydrodynamics Application on Large-Scale CMP Clusters

by J. A. Davis, G. R. Mudalige, S. D. Hammond, J. A. Herdman, I. Miller, S. A. Jarvis, G. R. Mudalige, J. A. Herdman, I. Miller
"... Abstract We present the development of a predictive performance model for the high-performance computing code Hydra, a hydrodynamics benchmark developed and maintained by the United Kingdom Atomic Weapons Establishment (AWE). The developed model elucidates the parallel computation of Hydra, with whi ..."
Abstract - Add to MetaCart
, with which it is possible to predict its run-time and scaling performance on varying large-scale chip multiprocessor (CMP) clusters. A key feature of the model is its granularity; with the model we are able to separate the contributing costs, including computation, point-topoint communications, collectives

Performance analysis and optimization of parallel scientific applications on CMP cluster systems. Scalable Computing: Practice and Experience, 10(1):188–195, 2009. submitted manuscript has been created by UChicago Argonne, LLC, Operator of Argonne Nationa

by Xingfu Wu, Valerie Taylor, Charles Lively
"... Abstract. Chip multiprocessors (CMP) are widely used for high performance computing. Further, these CMPs are being configured in a hierarchical manner to compose a node in a cluster system. A major challenge to be addressed is efficient use of such cluster systems for large-scale scientific applicat ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Abstract. Chip multiprocessors (CMP) are widely used for high performance computing. Further, these CMPs are being configured in a hierarchical manner to compose a node in a cluster system. A major challenge to be addressed is efficient use of such cluster systems for large-scale scientific

Thread clustering: Sharing-aware scheduling on SMP-CMP-SMT multiprocessors

by David Tam, Reza Azimi, Michael Stumm - in EuroSys , 2007
"... The major chip manufacturers have all introduced chip multiprocessing (CMP) and simultaneous multithreading (SMT) technology into their processing units. As a result, even low-end computing systems and game consoles have become shared memory multiprocessors with L1 and L2 cache sharing within a chip ..."
Abstract - Cited by 86 (4 self) - Add to MetaCart
The major chip manufacturers have all introduced chip multiprocessing (CMP) and simultaneous multithreading (SMT) technology into their processing units. As a result, even low-end computing systems and game consoles have become shared memory multiprocessors with L1 and L2 cache sharing within a

Adaptive Loop Tiling for a Multi-Cluster CMP

by Jisheng Zhao, Matthew Horsnell, Ian Rogers, Chris Kirkham, Ian Watson
"... Abstract. Loop tiling is a fundamental optimization for improving data locality. Selecting the right tile size combined with the parallelization of loops can provide additional performance increases in the modern of Chip MultiProcessor (CMP) architectures. This paper presents a runtime optimization ..."
Abstract - Add to MetaCart
system which automatically parallelizes loops and searches empirically for the best tile sizes on a scalable multi-cluster CMP. The system is built on top of a virtual machine and targets the runtime parallelization and optimization of Java programs. Experimental results show that runtime parallelization

Understanding the Energy Efficiency of SMT and CMP with Multiclustering

by Jason Cong, Ashok Jagannathan, Glenn Reinman, Yuval Tamir - in Proceedings of the 2005 International Symposium on Low Power Electronics and Design , 2005
"... In this paper we study the energy efficiency of SMT and CMP with multiclustering. Through a detailed design space exploration, we show that clustering closes the energy effi-ciency gap between SMT and CMP at equal performance points. Specifically, we show that the energy efficiency of CMP compared t ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
In this paper we study the energy efficiency of SMT and CMP with multiclustering. Through a detailed design space exploration, we show that clustering closes the energy effi-ciency gap between SMT and CMP at equal performance points. Specifically, we show that the energy efficiency of CMP compared

Dynamic Power Aware Packet Processing with CMP

by Zhen Ma, Weifeng Zhang
"... Network processors implemented as systems-on-chip with multiple processors and peripherals offer a reliable means of scaling network with high link capacities. As more and more co-processors and peripherals are integrated, the power requirement also dramatically increases. Therefore it is essential ..."
Abstract - Add to MetaCart
to efficiently parallelize the subsystems to maximize the packet processing capacities while maintaining low power consumption. In this project, we propose a power aware packet processing architecture with chip-multiprocessor (CMP), which consists of a number of processor clusters (or arrays). Each array

PERFORMANCE ANALYSIS AND COMPARISON OF MPI, OPENMP AND HYBRID NPB-MZ 1 Performance Analysis and Comparison of MPI, OpenMP and Hybrid NPB-MZ

by Héctor J. Machín Machín
"... Abstract—Chip multiprocessors (CMP) are w idely used for high performance computing and are being configured in a hierarchical manner to compose a node in a parallel system. CMP clusters provide a natural programming paradigm for hybrid programs. Can current hybrid parallel programming paradigms suc ..."
Abstract - Add to MetaCart
Abstract—Chip multiprocessors (CMP) are w idely used for high performance computing and are being configured in a hierarchical manner to compose a node in a parallel system. CMP clusters provide a natural programming paradigm for hybrid programs. Can current hybrid parallel programming paradigms

Exploring instruction caching strategies for tightly-coupled shared-memory clusters

by Daniele Bortolotti, Francesco Paterna, Christian Pinto, Andrea Marongiu, Martino Ruggiero, Luca Benini - in System on Chip (SoC), 2011 International Symposium on
"... Abstract—Several Chip-Multiprocessor designs today leverage tightly-coupled computing clusters as a building block. These clusters consist of a fairly large number N of simple cores, featuring fast communication through a shared multibanked L1 data memory and ≈ 1 Instruction-Per-Cycle (IPC) per core ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
core. Thus, aggregated I-fetch bandwidth approaches f ∗ N, where f is the cluster clock frequency. An effective instruction cache architecture is key to support this I-fetch bandwidth. In this paper we compare two main architectures for instruction caching targeting tightly coupled CMP clusters: (i
Next 10 →
Results 1 - 10 of 62
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University