Results 1 - 10
of
11
Infrastructure for Building Parallel Database Systems for Multi-dimensional Data
, 1999
"... As computational power and storage capacity increase, processing and analyzing large volumes of multi-dimensional datasets play an increasingly important part in many domains of scientific research. Our study of a large set of scientific applications over the past three years indicates that the proc ..."
Abstract
-
Cited by 39 (26 self)
- Add to MetaCart
As computational power and storage capacity increase, processing and analyzing large volumes of multi-dimensional datasets play an increasingly important part in many domains of scientific research. Our study of a large set of scientific applications over the past three years indicates that the processing for such datasets is often highly stylized and shares several important characteristics. Usually, both the input dataset as well as the result being computed have underlying multi-dimensional grids. The basic processing step usually consists of transforming individual input items, mapping the transformed items to the output grid and computing output items by aggregating, in some way, all the transformed input items mapped to the corresponding grid point. In this paper, we present the design of T2, a customizable parallel database that integrates storage, retrieval and processing of multi-dimensional datasets. T2 provides support for common operations including index generation, data r...
T2: A Customizable Parallel Database for Multi-dimensional Data
"... As computational power and storage capacity increase, processing and analyzing large volumes of multi-dimensional datasets play an increasingly important part in many domains of scientific research. Several database research groups and vendors have developed object-relational database systems to pro ..."
Abstract
-
Cited by 25 (14 self)
- Add to MetaCart
As computational power and storage capacity increase, processing and analyzing large volumes of multi-dimensional datasets play an increasingly important part in many domains of scientific research. Several database research groups and vendors have developed object-relational database systems to provide some support for managing and/or visualizing multi-dimensional datasets. These systems, however, provide little or no support for analyzing or processing these datasets -- the assumption is that this is too application-specific to warrant common support. As a result, applications that process these datasets are usually decoupled from data storage and management, resulting in inefficiency due to copying and loss of locality. Furthermore, every application developer has to implement complex support for managing and scheduling the processing. Our study of a large set of scientific applications over the past three years indicates that the processing for such datasets is often highly stylize...
Study of Scalable Declustering Algorithms for Parallel Grid Files
- In Proceedings of the Tenth International Parallel Processing Symposium
, 1996
"... Efficient storage and retrieval of large multidimensional datasets is an important concern for large-scale scientific computations such as long-running time-dependent simulations which periodically generate snapshots of the state. The main challenge for efficiently handling such datasets is to minim ..."
Abstract
-
Cited by 24 (9 self)
- Add to MetaCart
Efficient storage and retrieval of large multidimensional datasets is an important concern for large-scale scientific computations such as long-running time-dependent simulations which periodically generate snapshots of the state. The main challenge for efficiently handling such datasets is to minimize response time for multidimensional range queries. The grid file is one of the well known access methods for multidimensional and spatial data. We investigate effective and scalable declustering techniques for grid files with the primary goal of minimizing response time and the secondary goal of maximizing the fairness of data distribution. The main contributions of this paper are (1) analytic and experimental evaluation of existing index-based declustering techniques and their extensions for grid files, and (2) development of a proximity-based declustering algorithm called minimax which is experimentally shown to scale and to consistently achieve better response time compared to availabl...
Object-relational Queries into Multidimensional Databases with the Active Data Repository
, 1999
"... As computational power and storage capacity increase, processing and analyzing large volumes of multi-dimensional datasets play an increasingly important role in many domains of scientific research. Scientific applications that make use of very large scientific datasets have several important charac ..."
Abstract
-
Cited by 22 (7 self)
- Add to MetaCart
As computational power and storage capacity increase, processing and analyzing large volumes of multi-dimensional datasets play an increasingly important role in many domains of scientific research. Scientific applications that make use of very large scientific datasets have several important characteristics: datasets consist of complex data and are usually multi-dimensional; applications usually retrieve a subset of all the data available in the dataset; various applicationspecific operations are performed on the data items retrieved. Such applications can be supported by object-relational database management systems (OR-DBMSs). In addition to providing functionality to define new complex datatypes and user-defined functions, an OR-DBMS for scientific datasets should contain runtime support that will provide optimized storage for very large datasets and an execution environment for user-defined functions involving expensive operations. In this paper we describe an infrastructure, the ...
A General Interprocedural Framework for Placement of Split-phase Large Latency Operations
"... Overlapping split-phase large latency operations with computations is a standard technique for improving performance on modern architectures, In this paper, we present a general interprocedural technique for overlapping such accesses with computation. We have developed an Interprocedural Balanced ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
Overlapping split-phase large latency operations with computations is a standard technique for improving performance on modern architectures, In this paper, we present a general interprocedural technique for overlapping such accesses with computation. We have developed an Interprocedural Balanced Code Placement (IBCP) framework, which performs analysis on arbitrary recursive procedures and arbitrary control flow and replaces synchronous operations with a balanced pair of asynchronous operations. We have evaluated this scheme in the context of overlapping I/O operations with computation. We demonstrate how this analysis is useful for applications which perform frequent and large accesses to disks, including applications which snapshot or checkpoint their computations or out-of-core applications.
Compiler support for exploiting coarse-grained pipelined parallelism
- In Supercomputing
, 2003
"... The emergence of grid and a new class of data-driven applications is making a new form of parallelism desirable, which we refer to as coarse-grained pipelined parallelism. This paper reports on a compilation system developed to exploit this form of parallelism. We use a dialect of Java that exposes ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
The emergence of grid and a new class of data-driven applications is making a new form of parallelism desirable, which we refer to as coarse-grained pipelined parallelism. This paper reports on a compilation system developed to exploit this form of parallelism. We use a dialect of Java that exposes both pipelined and data parallelism to the compiler. Our compiler is responsible for selecting a set of candidate filter boundaries, determining the volume of communication required if a particular boundary is chosen, performing the decomposition, and generating code. We have developed a one-pass algorithm for determining the required communication between consecutive filters. We have developed a cost model for estimating the execution time for a given decomposition, and a dynamic programming algorithm for performing the decomposition. Detailed evaluation of our current compiler using four data-driven applications demonstrate the feasibility of our approach. 1.
Visualizing time-varying phenomena in numerical simulations of unsteady flows
- In Proceedings of 34th Aerospace Science Meeting and Exhibit
, 1996
"... Also appeared as paper AIAA-96-0048 in the 34th Aerospace Sciences Meeting & Exhibit. Streamlines, contour lines, vector plots, and volume slices (cutting planes) are commonly used for flow visualization. These techniques are sometimes referred to as instantaneous flow visualization techniques becau ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Also appeared as paper AIAA-96-0048 in the 34th Aerospace Sciences Meeting & Exhibit. Streamlines, contour lines, vector plots, and volume slices (cutting planes) are commonly used for flow visualization. These techniques are sometimes referred to as instantaneous flow visualization techniques because calculations are based on an instant of the flow field in time. Although instantaneous flow visualization techniques are effective for depicting phenomena in steady flows, they sometimes do not adequately depict time-varying phenomena in unsteady flows. Streaklines and timelines are effective visualization techniques for depicting vortex shedding, vortex breakdown, and shock waves in unsteady flows. These techniques are examples of time-dependent flow visualization techniques, which are based on many instants of the flow fields in time. This paper describes the algorithms for computing streaklines and timelines. Using numerically simulated unsteady flows, streaklines and timelines are compared with streamlines, contour lines, and vector plots. It is shown that streaklines and timelines reveal vortex shedding and vortex breakdown more clearly than instantaneous flow visualization techniques.
On Cache Replacement Policies for Servicing Mixed Data Intensive Query Workloads
, 2002
"... When data analysis applicationsare employed in a multiclient environment, a data server must service multiple simultaneous queries, each of which may employ complex user-defined data structures and operations on the data. It is then necessary to harness inter- and intra-query commonalitiesand system ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
When data analysis applicationsare employed in a multiclient environment, a data server must service multiple simultaneous queries, each of which may employ complex user-defined data structures and operations on the data. It is then necessary to harness inter- and intra-query commonalitiesand system resources to improve the performance of the data server. We have developed a framework and customizable middleware to enable reuse of intermediate and final results among queries, through an in-memory active semantic cache and user-defined transformation functions. Since resources such as processing power and memory space are limited on the machine hosting the server, effective scheduling of incoming queries and efficient cache replacement policies are challenging issues that must be addressed. We have worked on the scheduling problem in earlier work, and in this paper we describe and evaluate several cache replacement policies. We present experimental evaluation of the policies on a shared-memory parallel system using two applications from different application domains. 1
Servicing Mixed Data Intensive Query Workloads
, 2002
"... When data analysis applications are employed in a multiclient environment, a data server must service multiple simultaneous queries, each of which may employ complex user-defined data structures and operations on the data. It is then necessary to harness inter- and intra-query commonalities and syst ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
When data analysis applications are employed in a multiclient environment, a data server must service multiple simultaneous queries, each of which may employ complex user-defined data structures and operations on the data. It is then necessary to harness inter- and intra-query commonalities and system resources to improve the performance of the data server. We have developed a framework and customizable middleware to enable reuse of intermediate and final results among queries, through an in-memory semantic cache and user-defined transformation functions. Since resources such as processing power and memory space are limited on the machine hosting the server, effective scheduling of incoming queries and efficient cache replacement policies are challenging issues that must be addressed. We have addressed the scheduling problem in earlier work, and in this paper we describe and evaluate several cache replacement policies. We present experimental evaluation of the policies on a shared-memory parallel system using two applications from different domains.
Advanced Compiler and Runtime Support for Data Intensive Applications
"... Processing and analyzing large volumes of data play an increasingly important role in many domains of scientific research. However, high-level language and compiler support for developing such applications have been so far lacking. ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Processing and analyzing large volumes of data play an increasingly important role in many domains of scientific research. However, high-level language and compiler support for developing such applications have been so far lacking.

