• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

A transparent parallel i/o environment (1994)

by D E Vengroff
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 29
Next 10 →

External-Memory Graph Algorithms

by Yi-Jen Chiang , Michael T. Goodrich, Edward F. Grove, Roberto Tamassia, Darren Erik Vengroff, Jeffrey Scott Vitter , 1995
"... We present a collection of new techniques for designing and analyzing efficient external-memory algorithms for graph problems and illustrate how these techniques can be applied to a wide variety of specific problems. Our results include: ffl Proximate-neighboring. We present a simple method for der ..."
Abstract - Cited by 159 (22 self) - Add to MetaCart
We present a collection of new techniques for designing and analyzing efficient external-memory algorithms for graph problems and illustrate how these techniques can be applied to a wide variety of specific problems. Our results include: ffl Proximate-neighboring. We present a simple method for deriving external-memory lower bounds via reductions from a problem we call the "proximate neighbors" problem. We use this technique to derive non-trivial lower bounds for such problems as list ranking, expression tree evaluation, and connected components. ffl PRAM simulation. We give methods for efficiently simulating PRAM computations in external memory, even for some cases in which the PRAM algorithm is not work-optimal. We apply this to derive a number of optimal (and simple) external-memory graph algorithms. ffl Time-forward processing. We present a general technique for evaluating circuits (or "circuit-like" computations) in external memory. We also use this in a deterministic list rank...

Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets

by Jeffrey Scott Vitter, Min Wang
"... Computing multidimensional aggregates in high dimensions is a performance bottleneck for many OLAP applications. Obtaining the exact answer to an aggregation query can be prohibitively expensive in terms of time and/or storage space in a data warehouse environment. It is advantageous to have fast, a ..."
Abstract - Cited by 154 (2 self) - Add to MetaCart
Computing multidimensional aggregates in high dimensions is a performance bottleneck for many OLAP applications. Obtaining the exact answer to an aggregation query can be prohibitively expensive in terms of time and/or storage space in a data warehouse environment. It is advantageous to have fast, approximate answers to OLAP aggregation queries. In this paper, we present anovel method that provides approximate answers to high-dimensional OLAP aggregation queries in massive sparse data sets in a time-efficient and space-efficient manner. We construct a compact data cube, which is an approximate and space-efficient representation of the underlying multidimensional array, based upon a multiresolution wavelet decomposition. In the on-line phase, each aggregation query can generally be answered using the compact data cube in one I/O or a small number of I/Os, depending upon the desired accuracy. We present two I/O-efficient algorithms to construct the compact data cube for the important case of sparse high-dimensional arrays, which often arise in practice. The traditional histogram methods are infeasible for the massive high-dimensional data sets in OLAP applications. Previously developed wavelet techniques are efficient only for dense data. Our on-line query processing algorithm is very fast and capable of refining answers as the user demands more accuracy. Experiments on real data show that our method provides significantly more accurate results for typical OLAP aggregation queries than other efficient approximation techniques such as random sampling.

The buffer tree: A new technique for optimal I/O-algorithms

by Lars Arge - University of Aarhus , 1995
"... ..."
Abstract - Cited by 144 (27 self) - Add to MetaCart
Abstract not found

External Memory Data Structures

by Lars Arge , 2001
"... In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worst-case efficient external memory dynami ..."
Abstract - Cited by 78 (34 self) - Add to MetaCart
In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worst-case efficient external memory dynamic data structures. We also briefly discuss some of the most popular external data structures used in practice.

Scalable sweeping-based spatial join

by Lars Arge, Octavian Procopiuc, Sridhar Ramaswamy, Torsten Suel, Jeffrey Scott Vitter - IN PROC. 24TH INT. CONF. VERY LARGE DATA BASES, VLDB , 1998
"... In this paper, we consider the filter step of the spatial join problem, for the case where neither of the inputs are indexed. We present a new algorithm, Scalable Sweeping-Based Spatial Join (SSSJ), that achieves both efficiency on real-life data and robustness against highly skewed and worst-case d ..."
Abstract - Cited by 56 (7 self) - Add to MetaCart
In this paper, we consider the filter step of the spatial join problem, for the case where neither of the inputs are indexed. We present a new algorithm, Scalable Sweeping-Based Spatial Join (SSSJ), that achieves both efficiency on real-life data and robustness against highly skewed and worst-case data sets. The algorithm combines a method with theoretically optimal bounds on I/O transfers based on the recently proposed distribution-sweeping technique with a highly optimized implementation of internal-memory plane-sweeping. We present experimental results based on an efficient implementation of the SSSJ algorithm, and compare it to the state-ofthe-art Partition-Based Spatial-Merge (PBSM) algorithm of Pate1 and DeWitt.

Integrating Theory and Practice in Parallel File Systems

by Thomas H. Cormen, David Kotz - PROCEEDINGS OF THE 1993 DAGS/PC SYMPOSIUM (THE DARTMOUTH INSTITUTE FOR ADVANCED GRADUATE STUDIES , 1993
"... Several algorithms for parallel disk systems have appeared in the literature recently, and they are asymptotically optimal in terms of the number of disk accesses. Scalable systems with parallel disks must be able to run these algorithms. We present for the first time a list of capabilities that mus ..."
Abstract - Cited by 48 (11 self) - Add to MetaCart
Several algorithms for parallel disk systems have appeared in the literature recently, and they are asymptotically optimal in terms of the number of disk accesses. Scalable systems with parallel disks must be able to run these algorithms. We present for the first time a list of capabilities that must be provided by the system to support these optimal algorithms: control over declustering, querying about the configuration, independent I/O, and turning off parity, file caching, and prefetching. We summarize recent theoretical and empirical work that justifies the need for these capabilities. In addition, we sketch an organization for a parallel file interface with low-level primitives and higher-level operations.

Efficient External-Memory Data Structures and Applications

by Lars Arge , 1996
"... In this thesis we study the Input/Output (I/O) complexity of large-scale problems arising e.g. in the areas of database systems, geographic information systems, VLSI design systems and computer graphics, and design I/O-efficient algorithms for them. A general theme in our work is to design I/O-effic ..."
Abstract - Cited by 38 (12 self) - Add to MetaCart
In this thesis we study the Input/Output (I/O) complexity of large-scale problems arising e.g. in the areas of database systems, geographic information systems, VLSI design systems and computer graphics, and design I/O-efficient algorithms for them. A general theme in our work is to design I/O-efficient algorithms through the design of I/O-efficient data structures. One of our philosophies is to try to isolate all the I/O specific parts of an algorithm in the data structures, that is, to try to design I/O algorithms from internal memory algorithms by exchanging the data structures used in internal memory with their external memory counterparts. The results in the thesis include a technique for transforming an internal memory tree data structure into an external data structure which can be used in a batched dynamic setting, that is, a setting where we for example do not require that the result of a search operation is returned immediately. Using this technique we develop batched dynamic external versions of the (one-dimensional) range-tree and the segment-tree and we develop an external priority queue. Following our general philosophy we show how these structures can be used in standard internal memory sorting algorithms

Efficient Bulk Operations on Dynamic R-Trees

by L. Arge, K. H. Hinrichs, J. Vahrenhold, J. S. Vitter - ALGORITHMICA , 2002
"... In recent years there has been an upsurge of interest in spatial databases. A major issue is how to manipulate efficiently massive amounts of spatial data stored on disk in multidimensional spatial indexes (data structures). Construction of spatial indexes (bulk loading) has been studied intensivel ..."
Abstract - Cited by 37 (10 self) - Add to MetaCart
In recent years there has been an upsurge of interest in spatial databases. A major issue is how to manipulate efficiently massive amounts of spatial data stored on disk in multidimensional spatial indexes (data structures). Construction of spatial indexes (bulk loading) has been studied intensively in the database community. The continuous arrival of massive amounts of new data makes it important to update existing indexes (bulk updating) efficiently. In this paper we present a simple, yet efficient, technique for performing bulk update and query operations on multidimensional indexes. We present our technique in terms of the so-called R-tree and its variants, as they have emerged as practically efficient indexing methods for spatial data. Our method uses ideas from the buffer tree lazy buffering technique and fully utilizes the available internal memory and the page size of the operating system. We give a theoretical analysis of our technique, showing that it is efficient both in terms of I/O communication, disk storage, and internal computation time. We also present the results of an extensive set of experiments showing that in practice our approach performs better than the previously best known bulk update methods with respect to update time, and that it produces a better quality index in terms of query performance. One important novel feature of our technique is that in most cases it allows us to perform a batch of updates and queries simultaneously. To be able to do so is essential in environments where queries have to be answered even while the index is being updated and reorganized.

I/O-Efficient Scientific Computation Using TPIE

by Darren Erik Vengroff, Jeffrey Scott Vitter - In Proceedings of the Goddard Conference on Mass Storage Systems and Technologies, NASA Conference Publication 3340, Volume II , 1995
"... In recent years, I/O-efficient algorithms for a wide variety of problems have appeared in the literature. Thus far, however, systems specifically designed to assist programmers in implementing such algorithms have remained scarce. TPIE is a system designed to fill this void. It supports I/O-eff ..."
Abstract - Cited by 33 (10 self) - Add to MetaCart
In recent years, I/O-efficient algorithms for a wide variety of problems have appeared in the literature. Thus far, however, systems specifically designed to assist programmers in implementing such algorithms have remained scarce. TPIE is a system designed to fill this void. It supports I/O-efficient paradigms for problems from a variety of domains, including computational geometry, graph algorithms, and scientific computation. The TPIE interface frees programmers from having to deal not only of explicit read and write calls, but also the complex memory management that must be performed for I/O-efficient computation.

STXXL: Standard template library for XXL data sets

by R. Dementiev, L. Kettner - In: Proc. of ESA 2005. Volume 3669 of LNCS , 2005
"... for processing huge data sets that can fit only on hard disks. It supports parallel disks, overlapping between disk I/O and computation and it is the first I/O-efficient algorithm library that supports the pipelining technique that can save more than half of the I/Os. STXXL has been applied both in ..."
Abstract - Cited by 30 (4 self) - Add to MetaCart
for processing huge data sets that can fit only on hard disks. It supports parallel disks, overlapping between disk I/O and computation and it is the first I/O-efficient algorithm library that supports the pipelining technique that can save more than half of the I/Os. STXXL has been applied both in academic and industrial environments for a range of problems including text processing, graph algorithms, computational geometry, gaussian elimination, visualization, and analysis of microscopic images, differential cryptographic analysis, etc. The performance of STXXL and its applications is evaluated on synthetic and real-world inputs. We present the design of the library, how its performance features are supported, and demonstrate how the library integrates with STL. KEY WORDS: very large data sets; software library; C++ standard template library; algorithm engineering 1.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University