## Scalable Computing (1996)

Venue: | Computer Science Today: Recent Trends and Developments |

Citations: | 84 - 3 self |

### BibTeX

@INPROCEEDINGS{McColl96scalablecomputing,

author = {W F McColl},

title = {Scalable Computing},

booktitle = {Computer Science Today: Recent Trends and Developments},

year = {1996},

pages = {46--61},

publisher = {Springer-Verlag}

}

### Years of Citing Articles

### OpenURL

### Abstract

. Scalable computing will, over the next few years, become the normal form of computing. In this paper we present a unified framework, based on the BSP model, which aims to serve as a foundation for this evolutionary development. A number of important techniques, tools and methodologies for the design of sequential algorithms and programs have been developed over the past few decades. In the transition from sequential to scalable computing we will find that new requirements such as universality and predictable performance will necessitate significant changes of emphasis in these areas. Programs for scalable computing, in addition to being fully portable, will have to be efficiently universal, offering high performance, in a predictable way, on any general purpose parallel architecture. The BSP model provides a discipline for the design of scalable programs of this kind. We outline the approach and discuss some of the issues involved. 1 Introduction For fifty years, sequential computin...

### Citations

1950 | Random graphs
- Bollobás
- 1985
(Show Context)
Citation Context ...in a way which minimises the number of arcs between different subsets. Lower bounds on the efficiency of such partitions can be derived from known isoperimetric inequalities in graph theory, see e.g. =-=[4]-=-. For the EXPANDER matrix, the best upper bound which we have is the trivial one, n=p + (n=p) \Delta g + l, which can be obtained by randomly distributing the matrix elements. Techniques similar to th... |

1316 |
On computable numbers, with an application to the \Entscheidungsproblem
- Turing
- 1936
(Show Context)
Citation Context ...sequential computing has been the widespread, almost universal, adoption of the basic model proposed by von Neumann. To see why this happened it is necessary to go back further, to the work of Turing =-=[20]-=-. In his theoretical studies, Turing demonstrated that a single general purpose sequential machine could be designed which would be capable of efficiently performing any computation which could be per... |

1192 |
A bridging model for parallel computation
- Valiant
- 1990
(Show Context)
Citation Context ... and get requests, and of the replies to get requests, through the network of processors. A number of efficient routing and memory management techniques have been developed for this problem, see e.g. =-=[13, 21, 22]-=-. Consider the problem of packet routing on a p-processor network. Let an h-relation denote a routing problem where each processor has at most h packets to send to various processors in the network, a... |

670 |
PVM: Parallel Virtual Machine: A Users’ Guide and Tutorial for Networked Parallel Computing
- Geist, Beguelin, et al.
- 1994
(Show Context)
Citation Context ... describe a number of the programming languages and libraries which are currently available to support BSP programming, and some which are currently under development. The PVM message passing library =-=[8]-=- is widely implemented and widely used. The MPI message passing interface [16] is more elaborate. It supports blocking and non-blocking point-to-point communication and a number of collective communic... |

208 |
General Purpose Parallel Architectures
- Valiant
- 1990
(Show Context)
Citation Context ... and get requests, and of the replies to get requests, through the network of processors. A number of efficient routing and memory management techniques have been developed for this problem, see e.g. =-=[13, 21, 22]-=-. Consider the problem of packet routing on a p-processor network. Let an h-relation denote a routing problem where each processor has at most h packets to send to various processors in the network, a... |

178 |
I/O complexity: The red-blue pebble game
- Hong, Kung
- 1981
(Show Context)
Citation Context ...e of each of its vector elements u i by adding the p 1=2 values u k i which it receives. The time required for this final step is n=p 1=2 + l. An input-output complexity argument, similar to those in =-=[1, 11]-=-, can be used to show that for any BSP algorithm which computes u = M \Deltav , if W (n; p) = n 2 =p then H(n; p)sn=p 1=2 . Noting that the n 2 sequential computation cost is itself optimal we see tha... |

98 |
Communication complexity of PRAMs
- Aggarwal, Chandra, et al.
- 1990
(Show Context)
Citation Context ...e of each of its vector elements u i by adding the p 1=2 values u k i which it receives. The time required for this final step is n=p 1=2 + l. An input-output complexity argument, similar to those in =-=[1, 11]-=-, can be used to show that for any BSP algorithm which computes u = M \Deltav , if W (n; p) = n 2 =p then H(n; p)sn=p 1=2 . Noting that the n 2 sequential computation cost is itself optimal we see tha... |

78 | W.F.: General purpose parallel computing
- McColl
- 1993
(Show Context)
Citation Context ... and get requests, and of the replies to get requests, through the network of processors. A number of efficient routing and memory management techniques have been developed for this problem, see e.g. =-=[13, 21, 22]-=-. Consider the problem of packet routing on a p-processor network. Let an h-relation denote a routing problem where each processor has at most h packets to send to various processors in the network, a... |

72 | Scientic computing on bulk synchronous parallel architectures
- Bisseling, McColl
- 1994
(Show Context)
Citation Context ... matrices. 4.1 Dense Matrix-Vector Multiplication In this section we consider the problem of multiplying an n \Theta n matrix M by an n-element vector v on p processors, where M; v are both dense. In =-=[3]-=-, the BSP cost of this problem, for both sparse and dense matrices, was theoretically and experimentally analysed. The BSP algorithm which we now describe was shown there. Its complexity is n 2 =p + (... |

58 |
R.: A library for bulk synchronous parallel programming
- Miller
- 1993
(Show Context)
Citation Context ...unications (broadcast, scatter, gather, reduction etc.). Although neither of these libraries is directly aimed at supporting BSP programming, they can be used for that purpose. The Oxford BSP Library =-=[17]-=- consists of a set of subroutines which can be called from standard sequential languages such as Fortran and C. The core of the Library consists of just six routines: bsp start, bsp finish, bsp sstep,... |

53 | Empirical Evaluation of the CRAYT3D: A Compiler Perspective
- Arpaci, Culler, et al.
- 1995
(Show Context)
Citation Context ...ffering extremely high bandwidth global communications, it has several specialised hardware mechanisms which enable it to efficiently support parallel programs which execute in a global address space =-=[2]-=-. The mechanisms include hardware barrier synchronisation and direct remote memory access. The latter permits each processor to get a value directly from any remote memory location in the machine, and... |

47 |
Ecient optical communication in parallel computers
- Gereb-Graus, Tsantilas
- 1992
(Show Context)
Citation Context ...n, such as e.g. a real number or an integer. Using two-phase randomised routing one can, for example, show that every (log p)-relation can be realised on a p processor hypercube in O(log p) steps. In =-=[9]-=- a simple and practical randomised method of routing hrelations on an optical communications system is described. The optical system is physically realistic and the method requires only O(h + log p lo... |

22 | A combining mechanism for parallel computers
- Valiant
- 1992
(Show Context)
Citation Context ...anism for the efficient barrier synchronisation of the processors. There are no specialised broadcasting or combining facilities, although these can be efficiently realised in software where required =-=[23]-=-. The model also does not deal directly with issues such as input-output or the use of vector units, although it can be easily extended to do so. If we define a time step to be the time required for a... |

10 | E cient communication using total-exchange
- Rao, Suel, et al.
- 1995
(Show Context)
Citation Context ...ical randomised method of routing hrelations on an optical communications system is described. The optical system is physically realistic and the method requires only O(h + log p log log p) steps. In =-=[18]-=- a simple and very efficient protocol for routing h-relations using only the total-exchange primitive is described. The process of architectural convergence which was described above brings with it th... |

3 |
A Fahmy, D C Stefanescu, and L G Valiant. Bulk synchronous parallel computing - a paradigm for transportable software
- Cheatham
- 1995
(Show Context)
Citation Context ...ge extension is based on a small set of global access primitives and simple memory management declarations which support both bulk synchronous and message driven styles of parallel programming. BSP-L =-=[6]-=- is an experimental BSP programming language under development at Harvard. The language is being used to explore the effectiveness of various constructs which might be added to conventional languages ... |

3 |
Sparse matrix vector multiplication on distributed architectures: Lower bound and average complexity results
- Manzini
- 1994
(Show Context)
Citation Context ...he EXPANDER matrix, the best upper bound which we have is the trivial one, n=p + (n=p) \Delta g + l, which can be obtained by randomly distributing the matrix elements. Techniques similar to those in =-=[12]-=- can be used to show that for the EXPANDER matrix there is no partition of the nodes into p equal sized subsets which gives a value for H(n; p) which is less than the trivial n=p. Therefore, for the E... |

3 |
Foundations of Parallel Programming, volume 6
- Skillicorn
- 1994
(Show Context)
Citation Context ...ant niche within the field of scalable computing. A number of interesting programming languages and elegant theories have been developed in support of the data parallel style of programming, see e.g. =-=[19]-=-. The BSP approach, as outlined in this paper, aims to offer a more flexible and general style of programming than is provided by data parallelism. The two approaches are not, however, incompatible in... |

1 |
A Dusseau
- Culler
- 1993
(Show Context)
Citation Context ...e to any implementation of the language. With a reliable cost model of this kind, the programmer will be able to make appropriate design decisions to achieve the highest possible performance. Split-C =-=[7]-=- is a parallel extension of C which supports efficient access to a global address space on current distributed memory architectures. Like GPL, it aims to support careful engineering and optimisation o... |