## Implementation and Evaluation of an Efficient Parallel Delaunay Triangulation Algorithm (1997)

### Cached

### Download Links

- [www.cs.cmu.edu]
- [www.cs.cmu.edu]
- [www-2.cs.cmu.edu]
- [www.cs.cmu.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | in Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures |

Citations: | 12 - 2 self |

### BibTeX

@INPROCEEDINGS{Hardwick97implementationand,

author = {Jonathan C. Hardwick},

title = {Implementation and Evaluation of an Efficient Parallel Delaunay Triangulation Algorithm},

booktitle = {in Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures},

year = {1997},

pages = {23--25}

}

### Years of Citing Articles

### OpenURL

### Abstract

This paper describes the derivation of an empirically efficient parallel two-dimensional Delaunay triangulation program from a theoretically efficient CREW PRAM algorithm. Compared to previous work, the resulting implementation is not limited to datasets with a uniform distribution of points, achieves significantly better speedups over good serial code, and is widely portable due to its use of MPI as a communication mechanism. Results are presented for a loosely-coupled cluster of workstations, a distributed-memory multicomputer, and a shared-memory multiprocessor. The Machiavelli toolkit used to transform the nested data parallelism inherent in the divide-and-conquer algorithm into achievable task and data parallelism is also described and compared to previous techniques.

### Citations

496 | The quickhull algorithm for convex hulls
- Barber, Dobkin, et al.
- 1996
(Show Context)
Citation Context ...he circumcircle of any triangle. There are many well-known serial algorithms for Delaunay triangulation. The best have been extensively analyzed [17, 36], and implemented as general-purpose libraries =-=[4, 33]-=-. Since these algorithms are time and memory intensive, parallel implementations are important both for improved performance and to allow the solution of problems that are too large for serial machine... |

438 | Triangle: Engineering a 2D quality mesh generator and Delaunay triangulator
- Shewchuk
- 1996
(Show Context)
Citation Context ...he circumcircle of any triangle. There are many well-known serial algorithms for Delaunay triangulation. The best have been extensively analyzed [17, 36], and implemented as general-purpose libraries =-=[4, 33]-=-. Since these algorithms are time and memory intensive, parallel implementations are important both for improved performance and to allow the solution of problems that are too large for serial machine... |

389 | Gaussian Elimination is Not Optimal - Strassen - 1969 |

267 |
Vector Models for Data-Parallel Computing
- Blelloch
- 1990
(Show Context)
Citation Context ...nested data-parallel languages such as Nesl [7] and Proteus [31] are well-suited for expressing irregular divide-and-conquer algorithms, their current implementation layer assumes a vector PRAM model =-=[6]-=-. This can be efficiently implemented on vector processors with high memory bandwidth, but it is harder to do so on current RISC-based NUMA multiprocessor architectures, due to the higher relative cos... |

210 | Voronoi diagrams and Delaunay triangulations
- Fortune
- 1992
(Show Context)
Citation Context ...of points such that there are no elements of S within the circumcircle of any triangle. There are many well-known serial algorithms for Delaunay triangulation. The best have been extensively analyzed =-=[17, 36]-=-, and implemented as general-purpose libraries [4, 33]. Since these algorithms are time and memory intensive, parallel implementations are important both for improved performance and to allow the solu... |

182 | Implementation of a Portable Nested Data-Parallel Language
- Blelloch, Chatterjee, et al.
- 1994
(Show Context)
Citation Context ...ses a "marriage before conquest" approach to eliminate the expensive merge step that has hindered previous parallel algorithms. Additionally, when prototyped in the nested data-parallel lang=-=uage Nesl [7]-=-, the algorithm was found to perform only twice as many floatingpoint operations as a good serial algorithm. This paper describes a practical parallel Delaunay triangulation program which uses the alg... |

181 |
Voronoi diagrams-a survey of a fundamental geometric data structure
- Aurenhammer
- 1991
(Show Context)
Citation Context ...riangulations and their duals, Voronoi diagrams, are among the most widely-studied structures in computational geometry. Voronoi diagrams have also appeared in many other fields under different names =-=[2]-=-; domains of action in crystallography,sWigner-Seitz zones in metallurgy, Thiessen polygons in geography, and Blum's transforms in biology. This paper assumes that the reader is familiar with the basi... |

161 |
Maintenance of configurations in the plane
- Overmars, Leeuwen
- 1981
(Show Context)
Citation Context ...rallelization purposes. Note that these complexities assume that the lower convex hull substep is solved using a linear-work algorithm, which is possible since we can store the points in sorted order =-=[29]-=-. However, Blelloch et al found experimentally that a simple quickhull [30] was faster than a more complicated convex hull algorithm that was guaranteed to take linear time. Furthermore, using a point... |

121 |
Preparata and Michael Ian Shamos. Computational Geometry
- Franco
- 1985
(Show Context)
Citation Context ...convex hull substep is solved using a linear-work algorithm, which is possible since we can store the points in sorted order [29]. However, Blelloch et al found experimentally that a simple quickhull =-=[30]-=- was faster than a more complicated convex hull algorithm that was guaranteed to take linear time. Furthermore, using a point-pruning version of quickhull that limits possible imbalances between recur... |

63 |
Parallel computational geometry
- Aggarwal, Chazelle, et al.
- 1988
(Show Context)
Citation Context ...oth for improved performance and to allow the solution of problems that are too large for serial machines. However, although several parallel algorithms for Delaunay triangulation have been described =-=[1, 32, 13, 27, 20]-=-, practical implementations have been slower to appear. One reason is that the dynamic nature of the problem can result in significant inter-processor communication. Performing key phases of To appear... |

58 |
A Faster Divide-and-Conquer Algorithm for Constructing Delaunay
- Dwyer
- 1987
(Show Context)
Citation Context ...ation. Pseudocode for the algorithm is shown in Figure 3. It has three important substeps: Serial Delaunay: Although any serial Delaunay triangulation algorithm can be used for the base case, Dwyer's =-=[16]-=- is recommended since it has been shown experimentally to be the fastest [36, 33]. Lower convex hull: The lower half of the convex hull of the projected points is used to find a new path H that divide... |

58 | A Comparision of Sequential Delaunay Triangulation Algorithms
- Su, Drysdale
- 1995
(Show Context)
Citation Context ...of points such that there are no elements of S within the circumcircle of any triangle. There are many well-known serial algorithms for Delaunay triangulation. The best have been extensively analyzed =-=[17, 36]-=-, and implemented as general-purpose libraries [4, 33]. Since these algorithms are time and memory intensive, parallel implementations are important both for improved performance and to allow the solu... |

53 | ªTransforming High-Level Data-Parallel Programs into Vector Operations,º
- Prins, Palmer
- 1993
(Show Context)
Citation Context ...e irregular algorithms runs the risk of hiding divide-andconquer parallelism that would otherwise be easy to exploit. For example, although nested data-parallel languages such as Nesl [7] and Proteus =-=[31]-=- are well-suited for expressing irregular divide-and-conquer algorithms, their current implementation layer assumes a vector PRAM model [6]. This can be efficiently implemented on vector processors wi... |

36 | Pmrsb: Parallel multilevel recursive spectral bisection
- Barnard
- 1995
(Show Context)
Citation Context ...g SPMD code to directly implement the behavior of a divide-andconquer algorithm (this can be seen a generalized version of the technique used by Barnard's spectral bisection algorithm on the Cray T3D =-=[5]-=-). To achieve the second goal, Machiavelli is implemented using C and MPI (the standard Message Passing Interface [18]). To achieve the third goal, Machiavelli obtains parallelism from both data-paral... |

33 |
Output-sensitive construction of polytopes in four dimensions and clipped Voronoi diagrams in three
- Chan, Snoeyink, et al.
- 1995
(Show Context)
Citation Context ...ormance of the convex hull variants described in Section 4. The final p n-pair pruning quickhull was benchmarked against both a basic quickhull and the original n-pair pruning quickhull by Chan et al =-=[9]-=-. Results for an extreme case are shown in Figure 8. As can be seen, the n-pair algorithm is more than twice as fast as the basic quickhull on the non-uniform Kuzmin dataset (over 0.19 0.56 0.94 1.31 ... |

31 | Parallel Constrained Delaunay Meshing
- Chew, Chrisochoides, et al.
- 1997
(Show Context)
Citation Context ... [28] by Merriam achieved speedup factors of 6--20 on a 128-processor Intel Gamma, for a parallel efficiency of 5--16%. Both of these results were for uniform datasets. The 2D algorithm by Chew et al =-=[10]-=- (which solves the more general problem of constrained Delaunay triangulation in a meshing algorithm) achieves speedup factors of 3 on an 8-processor SP2, but currently requires that the boundaries be... |

24 |
Algorithm 63 (PARTITION) and algorithm 65 (FIND
- Hoare
- 1961
(Show Context)
Citation Context ...serial code, and the indexed format with replication used by the parallel code. No changes are necessary to the source code of Triangle. Finding the median Initially a parallel version of quickmedian =-=[25]-=- was used to find the median internal point along the x or y axis. Quickmedian redistributes data amongst the processors on each recursive step, resulting in high communication overhead. It was theref... |

17 |
A data-parallel algorithm for three-dimensional delaunay triangulation and its implementation
- Teng, Sullivan, et al.
- 1993
(Show Context)
Citation Context ...communication, but introduces a serial bottleneck that severely limits scalability in terms of both parallel speedup and achievable problem size. The use of decomposition techniques such as bucketing =-=[28, 11, 37, 35]-=-, or striping [14] can also reduce communication, but relies on the input dataset having a uniform spatial distribution of points in order to avoid load imbalances between processors. Unfortunately, w... |

15 | Evaluation of Parallelization Strategies for an Incremental Delaunay Triangulator
- Cignoni, Laforenza, et al.
- 1995
(Show Context)
Citation Context ...communication, but introduces a serial bottleneck that severely limits scalability in terms of both parallel speedup and achievable problem size. The use of decomposition techniques such as bucketing =-=[28, 11, 37, 35]-=-, or striping [14] can also reduce communication, but relies on the input dataset having a uniform spatial distribution of points in order to avoid load imbalances between processors. Unfortunately, w... |

15 |
Parallel 3D Delaunay triangulation
- Cignoni, Montani, et al.
- 1993
(Show Context)
Citation Context ... convex hull of points on a sphere or paraboloid. The resulting algorithm is divide-and-conquer in nature but uses a "marriage before conquest" approach, similar to the DeWall triangulation =-=algorithm [12]-=-, which enables it to avoid an expensive merge step. See [8] for more details, and http://web.scandal.cs.cmu.edu/cgi-bin/demo for an interactive demonstration. Pseudocode for the algorithm is shown in... |

14 | Polling: A New Randomized Sampling Technique for Computational Geometry
- Reif, Sen
(Show Context)
Citation Context ...oth for improved performance and to allow the solution of problems that are too large for serial machines. However, although several parallel algorithms for Delaunay triangulation have been described =-=[1, 32, 13, 27, 20]-=-, practical implementations have been slower to appear. One reason is that the dynamic nature of the problem can result in significant inter-processor communication. Performing key phases of To appear... |

13 | Leewen. Maintenance of con gurations in the plane - Overmars, van - 1981 |

10 | Efficient Parallel Algorithms for Closest Point Problems
- Su
- 1994
(Show Context)
Citation Context ...communication, but introduces a serial bottleneck that severely limits scalability in terms of both parallel speedup and achievable problem size. The use of decomposition techniques such as bucketing =-=[28, 11, 37, 35]-=-, or striping [14] can also reduce communication, but relies on the input dataset having a uniform spatial distribution of points in order to avoid load imbalances between processors. Unfortunately, w... |

9 | The divide-and-conquer paradigm as a basis for parallel language design
- Axford
- 1992
(Show Context)
Citation Context ... a potential source of parallelism. This has resulted in many architectures and parallel programming languages being designed specifically for the implementation of divide-and-conquer algorithms (see =-=[3]-=- for a survey) . However, previous parallel divide-and-conquer models have typically been limited to regular algorithms, in which the subproblems are of equal size. This excludes a very useful class o... |

9 | An efficient implementation of nested data parallelism for irregular divide-and-conquer algorithms
- Hardwick
- 1996
(Show Context)
Citation Context ...ystem adds the ability to perform dynamic load-balancing for irregular algorithms. Specifically, it can ship a recursive serial function call to an idle processor in order to redistribute computation =-=[22]-=-. A Machiavellian divide-and-conquer program consists of both serial and SPMD parallel code. The parallel code op2 vecpoint parallelDT (team T, vecpoint P, vecborder B); -- team Tnew; vecpoint PL, PR,... |

9 |
E cient parallel implementation of an algorithm for Delaunay triangulation
- Merriam
- 1993
(Show Context)
Citation Context |

8 |
Porting a Vector Library: a Comparison of
- Hardwick
- 1994
(Show Context)
Citation Context ...ctor processors with high memory bandwidth, but it is harder to do so on current RISC-based NUMA multiprocessor architectures, due to the higher relative costs of communication and poor data locality =-=[21]-=-. Machiavelli [24] is a new parallel toolkit for divide-andconquer algorithms that is intended to alleviate some of these problems. It is designed to be usable both as an implementation layer for lang... |

7 |
A note on improving the performance of Delaunay triangulation
- Davy, Dew
- 1989
(Show Context)
Citation Context ...a serial bottleneck that severely limits scalability in terms of both parallel speedup and achievable problem size. The use of decomposition techniques such as bucketing [28, 11, 37, 35], or striping =-=[14]-=- can also reduce communication, but relies on the input dataset having a uniform spatial distribution of points in order to avoid load imbalances between processors. Unfortunately, while most real-wor... |

6 | Practical Parallel Divide-and-Conquer Algorithms
- Hardwick
- 1997
(Show Context)
Citation Context ...tching to an efficient implementation of Dwyer's serial algorithm provided by the Triangle package [33] at the leaves of the recursion tree. The program was parallelized using the Machiavelli toolkit =-=[24]-=-, Inner Convex Hull Outer Delaunay Triangulation Figure 1: Nested recursion in Delaunay triangulation algorithm by Blelloch et al [8]. Each recursive level of the outer divideand -conquer triangulatio... |

5 |
Dafna Talmor. Developing a practical projection-based parallel Delaunay algorithm
- Blelloch, Miller
- 1996
(Show Context)
Citation Context ...aries between processors be created by hand. Blelloch, Miller and Talmor recently developed a CREW PRAM algorithm that does not rely on bucketing and hence can efficiently handle non-uniform datasets =-=[8]. It is di-=-videand -conquer in style but uses a "marriage before conquest" approach to eliminate the expensive merge step that has hindered previous parallel algorithms. Additionally, when prototyped i... |

5 |
An Optimal Mesh Computer Algorithm for Constrained Delaunay Triangulation
- Guha
- 1994
(Show Context)
Citation Context ...oth for improved performance and to allow the solution of problems that are too large for serial machines. However, although several parallel algorithms for Delaunay triangulation have been described =-=[1, 32, 13, 27, 20]-=-, practical implementations have been slower to appear. One reason is that the dynamic nature of the problem can result in significant inter-processor communication. Performing key phases of To appear... |

5 |
Dynamic load balancing in a 2D parallel Delaunay mesh generator
- Verhoeven, Weatherill, et al.
- 1995
(Show Context)
Citation Context ...llel Algorithms and Architectures (SPAA), 22-25 June 1997, Newport, Rhode Island. the algorithm on a single processor (for example, serializing the merge step of a divide-and-conquer algorithm, as in =-=[38]-=-) reduces this communication, but introduces a serial bottleneck that severely limits scalability in terms of both parallel speedup and achievable problem size. The use of decomposition techniques suc... |

4 |
Dynamic and recursive parallel algorithms for constructing Delaunay triangulations
- Ding, Densham
- 1994
(Show Context)
Citation Context ...2-processor CM-5), while the 3D algorithm by Cignoni et al [11] was up to 10 times slower on non-uniform datasets than on uniform ones (on a 128-processor nCUBE). The 2D algorithm by Ding and Densham =-=[15]-=- is designed to be able to handle non-uniform datasets, but has only been demonstrated to scale to 2 processors. A second problem is that the parallel algorithms are typically much more complex than t... |

4 | Algorithm 63 (partition) and algorithm 65 ( nd - Hoare - 1961 |

3 |
Dunlaing C. Merging free trees in parallel for efficient Voronoi diagram construction
- Cole, Goodrich
- 1989
(Show Context)
Citation Context |

3 |
Srinivas Aluru, and Sanjay Ranka. Concatenated parallelism: Atechnique for e cient parallel divide and conquer
- Goil
- 1996
(Show Context)
Citation Context ...d-conquer algorithms, but can only outperform a task-parallel approach when the communication cost of redistributing the data is significant compared to the computational cost of subdividing the task =-=[19]-=-. The alternative approach of using a more general language model to handle irregular algorithms runs the risk of hiding divide-andconquer parallelism that would otherwise be easy to exploit. For exam... |

2 |
Triangulation, Voronoi diagram, and convex hull in k-space on meshconnected arrays and hypercubes
- Holey, Ibarra
- 1991
(Show Context)
Citation Context |

1 | Implementation and evaluation of an efficient 2D parallel Delaunay triangulation algorithm
- Hardwick
- 1997
(Show Context)
Citation Context ...phaCluster) with 8 processors, a shared-memory SGI Power Challenge with 16 processors, and a distributed-memory IBM SP2 with 16 processors (additional results for larger machine sizes can be found in =-=[23]-=-). To test parallel efficiency, we compared timings to those on one processor, when the program immediately switches to the serial Triangle package [33]. To test the ability to handle non-uniform data... |

1 | Colm O Dunlaing. Merging free trees in parallel for e - cient Voronoi diagram construction - Cole, Goodrich - 1990 |

1 | Implementation and evaluation of an e cient 2D parallel Delaunay triangulation algorithm - Hardwick - 1997 |

1 | Polling: Anewrandomized sampling technique for computational geometry - Reif, Sen - 1989 |