• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

I/O-optimal distribution sweeping on private-cache chip multiprocessors (2011)

by D Ajwani, N Sitchinava, N Zeh
Venue:In IPDPS
Add To MetaCart

Tools

Sorted by:
Results 1 - 3 of 3

OntheSublinearProcessorGap for Parallel Architectures

by Ro López-ortiz, Ro Salinger
"... ..."
Abstract - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

... n/(B log B) is an upper bound for optimal processor utilization for any sorting algorithm in the PEM model [3]. This algorithm is used in further results in the model for graph and geometry problems =-=[4,1,2]-=-. Thus the assumption that p ≤ n/B 2 is carried on to these results as well, some of which actually require p ≤ n/(B log n) and even p ≤ n/(B 2 log B log (t) n), where log (t) n denotes the compositio...

Empirical Evaluation of the Parallel Distribution Sweeping Framework on Multicore Architectures

by Deepak Ajwani, Nodari Sitchinava
"... ar ..."
Abstract - Add to MetaCart
Abstract not found

On the Sublinear Processor Gap for Multi-Core Architectures

by Ro López-ortiz, Ro Salinger
"... Abstract. In the past, parallel algorithms were developed, for the most part, under the assumption that the number of processors is Θ(n) and that if in practice the actual number was smaller, this could be resolved using Brent’s Lemma to simulate the highly parallel solution on a lower-degree parall ..."
Abstract - Add to MetaCart
Abstract. In the past, parallel algorithms were developed, for the most part, under the assumption that the number of processors is Θ(n) and that if in practice the actual number was smaller, this could be resolved using Brent’s Lemma to simulate the highly parallel solution on a lower-degree parallel architecture. In this paper, however, we argue that design and implementation issues of algorithms and architectures are significantly different—both in theory and in practice—between computational models with high and low degrees of parallelism. We report an observed gap in the behavior of a CMP/parallel architecture depending on the number of processors. This gap appears repeatedly in both empirical cases, when studying practical aspects of architecture design and program implementation as well as in theoretical instances when studying the behaviour of various parallel algorithms. It separates the performance, design and analysis of systems with a sublinear number of processors and systems with linearly many processors. More specifically we observe that systems with either logarithmically many cores or with O(n α) cores (with α < 1) exhibit a qualitatively different behavior than a system with a linear number of cores on the size of the input, i.e. Θ(n). The evidence we present suggests the existence of a sharp theoretical gap between the classes of problems that can be efficiently parallelized with o(n) processors and with Θ(n) processors unless NC = P. 1
(Show Context)

Citation Context

...2 processors, and it is actually proven that p ≥ n/(B log B) is a lower bound for optimal processor utilization. This algorithm is used in further results in the model for graph and geometry problems =-=[4, 1, 2]-=-. Thus the assumption that p ≥ n/B 2 is carried on to these results as well, some of which actually require p ≤ n/(B log n). Shared cache performance is studied in [7], which compares the number of ca...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University