Results 1 -
4 of
4
I/O-efficient batched union-find and its applications to terrain analysis
- In Proc. 22nd Annual Symposium on Computational Geometry
, 2006
"... Despite extensive study over the last four decades and numerous applications, no I/O-efficient algorithm is known for the union-find problem. In this paper we present an I/O-efficient algorithm for the batched (off-line) version of the union-find problem. Given any sequence of N union and find opera ..."
Abstract
-
Cited by 14 (8 self)
- Add to MetaCart
Despite extensive study over the last four decades and numerous applications, no I/O-efficient algorithm is known for the union-find problem. In this paper we present an I/O-efficient algorithm for the batched (off-line) version of the union-find problem. Given any sequence of N union and find operations, where each union operation joins two distinct sets, our algorithm uses O(SORT(N)) = O ( N B log M/B N I/Os, where M is the memory size and B is the disk block size. This bound is asymptotically optimal in the worst case. If there are union operations that join a set with itself, our algorithm uses O(SORT(N) + MST(N)) I/Os, where MST(N) is the number of I/Os needed to compute the minimum spanning tree of a graph with N edges. We also describe a simple and practical O(SORT(N) log ( N M))-I/O algorithm for this problem, which we have implemented. We are interested in the union-find problem because of its applications in terrain analysis. A terrain can be abstracted as a height function defined over R2, and many problems that deal with such functions require a union-find data structure. With the emergence of modern mapping technologies, huge amount of elevation data is being generated that is too large to fit in memory, thus I/O-efficient algorithms are needed to process this data efficiently. In this paper, we study two terrain-analysis problems that benefit from a union-find data structure: (i) computing topological persistence and (ii) constructing the contour tree. We give the first O(SORT(N))-I/O algorithms for these two problems, assuming that the input terrain is represented as a triangular mesh with N vertices. Finally, we report some preliminary experimental results, showing that our algorithms give order-ofmagnitude improvement over previous methods on large data sets that do not fit in memory. 1
TerraStream: From elevation data to watershed hierarchies
- Proc. ACM Sympos. on Advances in Geographic Information Systems
"... We consider the problem of extracting a river network and a watershed hierarchy from a terrain given as a set of irregularly spaced points. We describe TerraStream, a “pipelined ” solution that consists of four main stages: construction of a digital elevation model (DEM), hydrological conditioning, ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
We consider the problem of extracting a river network and a watershed hierarchy from a terrain given as a set of irregularly spaced points. We describe TerraStream, a “pipelined ” solution that consists of four main stages: construction of a digital elevation model (DEM), hydrological conditioning, extraction of river networks, and construction of a watershed hierarchy. Our approach has several advantages over existing methods. First, we design and implement the pipeline so each stage is scalable to massive data sets; a single non-scalable stage would create a bottleneck and limit overall scalability. Second, we develop the algorithms in a general framework so that they work for both TIN and grid DEMs. Terra-Stream is flexible and allows users to choose from various models and parameters, yet our pipeline is designed to reduce (or eliminate) the need for manual intervention between stages. We have implemented TerraStream and present experimental results on real elevation point sets that show that our approach handles massive multi-gigabyte terrain data sets. For example, we can process a data set containing over 300 million points—over 20GB of raw data—in under 26 hours, where most of the time (76%) is spent in the initial CPU-intensive DEM construction stage. 1
Surface Compression using Over-determined Laplacian Approximation
"... We describe a surface compression technique to lossily compress elevation datasets. Our approach first approximates the uncompressed terrain using an over-determined system of linear equations based on the Laplacian partial differential equation. Then the approximation is refined with respect to the ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
We describe a surface compression technique to lossily compress elevation datasets. Our approach first approximates the uncompressed terrain using an over-determined system of linear equations based on the Laplacian partial differential equation. Then the approximation is refined with respect to the uncompressed terrain using an error metric. These two steps work alternately until we find an approximation that is good enough. We then further compress the result to achieve a better overall compression ratio. We present experiments and measurements using different metrics and our method gives convincing results.
A Survey of Distributed Workflow Characteristics and Resource Requirements
"... Workflows have been used to model repeatable tasks or operations in a number of different industries including manufacturing and software. In recent years, workflows are increasingly used in distributed resources and web services environments through resource models such as grid and cloud computing. ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Workflows have been used to model repeatable tasks or operations in a number of different industries including manufacturing and software. In recent years, workflows are increasingly used in distributed resources and web services environments through resource models such as grid and cloud computing. These workflows often have disparate requirements and constraints that need to be accounted for during workflow orchestration. In this paper, we present workflow examples from different domains including bioinformatics and biomedical, weather and ocean modeling, astronomy detailing their data and computational requirements. 1

