Results 1  10
of
155
The Landscape of Parallel Computing Research: A View from Berkeley
 TECHNICAL REPORT, UC BERKELEY
, 2006
"... All rights reserved. ..."
Tempest and Typhoon: Userlevel Shared Memory
 In Proceedings of the 21st Annual International Symposium on Computer Architecture
, 1994
"... Future parallel computers must efficiently execute not only handcoded applications but also programs written in highlevel, parallel programming languages. Today’s machines limit these programs to a single communication paradigm, either messagepassing or sharedmemory, which results in uneven perf ..."
Abstract

Cited by 300 (26 self)
 Add to MetaCart
Future parallel computers must efficiently execute not only handcoded applications but also programs written in highlevel, parallel programming languages. Today’s machines limit these programs to a single communication paradigm, either messagepassing or sharedmemory, which results in uneven performance. This paper addresses this problem by defining an interface, Tempest, that exposes lowlevel communication and memorysystem mechanisms so programmers and compilers can customize policies for a given application. Typhoon is a proposed hardware platform that implements these mechanisms with a fullyprogrammable, userlevel processor in the network interface. We demonstrate the utility of Tempest with two examples. First, the Stache protocol uses Tempest’s finegrain access control mechanisms to manage part of a processor’s local memory as a large, fullyassociative cache for remote data. We simulated Typhoon on the Wisconsin Wind Tunnel and found that Stache running on Typhoon performs comparably (±30%) to an allhardware Dir N NB cachecoherence protocol for five sharedmemory programs. Second, we illustrate how programmers or compilers can use Tempest’s flexibility to exploit an application’s sharing patterns with a custom protocol. For the EM3D application, the custom protocol improves performance up to 35 % over the allhardware protocol.
Prefuse: A toolkit for interactive information visualization
 In ACM Human Factors in Computing Systems (CHI
, 2005
"... In this demonstration we present prefuse, an extensible user interface toolkit for building interactive information visualization applications, including nodelink diagrams, containment diagrams, and visualizations of unstructured (edgefree) data such as scatter plots and timelines. prefuse data in ..."
Abstract

Cited by 264 (4 self)
 Add to MetaCart
In this demonstration we present prefuse, an extensible user interface toolkit for building interactive information visualization applications, including nodelink diagrams, containment diagrams, and visualizations of unstructured (edgefree) data such as scatter plots and timelines. prefuse data into visual forms and then manipulating visual data in aggregate, including layout, animation, and distortion routines. The result is a platform for creating scalable, highlyinteractive visualizations of large data sets in a modular and principled fashion. We have used prefuse to implement both novel and existing visualizations, validating the toolkit’s power and expressiveness.
Point Set Surfaces
, 2001
"... We advocate the use of point sets to represent shapes. We provide a definition of a smooth manifold surface from a set of points close to the original surface. The definition is based on local maps from differential geometry, which are approximated by the method of moving least squares (MLS). We pre ..."
Abstract

Cited by 254 (36 self)
 Add to MetaCart
We advocate the use of point sets to represent shapes. We provide a definition of a smooth manifold surface from a set of points close to the original surface. The definition is based on local maps from differential geometry, which are approximated by the method of moving least squares (MLS). We present tools to increase or decrease the density of the points, thus, allowing an adjustment of the spacing among the points to control the fidelity of the representation. To display the point set surface, we introduce a novel point rendering technique. The idea is to evaluate the local maps according to the image resolution. This results in high quality shading effects and smooth silhouettes at interactive frame rates.
C.T.: Computing and rendering point set surfaces
 IEEE Transactions on Visualization and Computer Graphics9
"... ..."
Spectral Partitioning Works: Planar graphs and finite element meshes
 In IEEE Symposium on Foundations of Computer Science
, 1996
"... Spectral partitioning methods use the Fiedler vectorthe eigenvector of the secondsmallest eigenvalue of the Laplacian matrixto find a small separator of a graph. These methods are important components of many scientific numerical algorithms and have been demonstrated by experiment to work extr ..."
Abstract

Cited by 153 (8 self)
 Add to MetaCart
Spectral partitioning methods use the Fiedler vectorthe eigenvector of the secondsmallest eigenvalue of the Laplacian matrixto find a small separator of a graph. These methods are important components of many scientific numerical algorithms and have been demonstrated by experiment to work extremely well. In this paper, we show that spectral partitioning methods work well on boundeddegree planar graphs and finite element meshes the classes of graphs to which they are usually applied. While naive spectral bisection does not necessarily work, we prove that spectral partitioning techniques can be used to produce separators whose ratio of vertices removed to edges cut is O( p n) for boundeddegree planar graphs and twodimensional meshes and O i n 1=d j for wellshaped ddimensional meshes. The heart of our analysis is an upper bound on the secondsmallest eigenvalues of the Laplacian matrices of these graphs. 1. Introduction Spectral partitioning has become one of the mos...
Efficient Support for Irregular Applications on DistributedMemory Machines
, 1995
"... Irregular computation problems underlie many important scientific applications. Although these problems are computationally expensive, and so would seem appropriate for parallel machines, their irregular and unpredictable runtime behavior makes this type of parallel program difficult to write and a ..."
Abstract

Cited by 91 (13 self)
 Add to MetaCart
Irregular computation problems underlie many important scientific applications. Although these problems are computationally expensive, and so would seem appropriate for parallel machines, their irregular and unpredictable runtime behavior makes this type of parallel program difficult to write and adversely affects runtime performance. This paper explores three issues  partitioning, mutual exclusion, and data transfer  crucial to the efficient execution of irregular problems on distributedmemory machines. Unlike previous work, we studied the same programs running in three alternative systems on the same hardware base (a Thinking Machines CM5): the CHAOS irregular application library, Transparent Shared Memory (TSM), and eXtensible Shared Memory (XSM). CHAOS and XSM performed equivalently for all three applications. Both systems were somewhat (13%) to significantly faster (991%) than TSM.
Special Purpose Parallel Computing
 Lectures on Parallel Computation
, 1993
"... A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing ..."
Abstract

Cited by 77 (5 self)
 Add to MetaCart
A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing [365] demonstrated that, in principle, a single general purpose sequential machine could be designed which would be capable of efficiently performing any computation which could be performed by a special purpose sequential machine. The importance of this universality result for subsequent practical developments in computing cannot be overstated. It showed that, for a given computational problem, the additional efficiency advantages which could be gained by designing a special purpose sequential machine for that problem would not be great. Around 1944, von Neumann produced a proposal [66, 389] for a general purpose storedprogram sequential computer which captured the fundamental principles of...
Load Balancing and Data Locality in Adaptive Hierarchical Nbody Methods: BarnesHut, Fast Multipole, and Radiosity
 Journal Of Parallel and Distributed Computing
, 1995
"... processes, are increasingly being used to solve largescale problems in a variety of scientific/engineering domains. Applications that use these methods are challenging to parallelize effectively, however, owing to their nonuniform, dynamically changing characteristics and their need for longrang ..."
Abstract

Cited by 67 (2 self)
 Add to MetaCart
processes, are increasingly being used to solve largescale problems in a variety of scientific/engineering domains. Applications that use these methods are challenging to parallelize effectively, however, owing to their nonuniform, dynamically changing characteristics and their need for longrange communication.
Scalable Parallel Formulations of the BarnesHut Method for nBody Simulations
 IN PROCEEDINGS OF SUPERCOMPUTING '94
, 1994
"... In this paper, we present two new parallel formulations of the BarnesHut method. These parallel formulations are especially suited for simulations with irregular particle densities. We first present a parallel formulation that uses a static partitioning of the domain and assignment of subdomains to ..."
Abstract

Cited by 44 (7 self)
 Add to MetaCart
In this paper, we present two new parallel formulations of the BarnesHut method. These parallel formulations are especially suited for simulations with irregular particle densities. We first present a parallel formulation that uses a static partitioning of the domain and assignment of subdomains to processors. We demonstrate that this scheme delivers acceptable load balance, and coupled with two collective communication operations, it yields good performance. We present a second parallel formulation which combines static decomposition of the domain with an assignment of subdomains to processors based on Morton ordering. This alleviates the load imbalance inherent in the first scheme. The second parallel formulation is inspired by two currently best known parallel algorithms for the BarnesHut method. We present an experimental evaluation of these schemes on a 256 processor nCUBE2 parallel computer for an astrophysical simulation.