Results 1 - 10
of
23
Executing Multiple Pipelined Data Analysis Operations in the Grid
, 2002
"... Processing of data in many data analysis applications can be represented as an acyclic, coarse grain data flow, from data sources to the client. This paper is concerned with scheduling of multiple data analysis operations, each of which is represented as a pipelined chain of processing on data. We ..."
Abstract
-
Cited by 32 (7 self)
- Add to MetaCart
Processing of data in many data analysis applications can be represented as an acyclic, coarse grain data flow, from data sources to the client. This paper is concerned with scheduling of multiple data analysis operations, each of which is represented as a pipelined chain of processing on data. We define the scheduling problem for effectively placing components onto Grid resources, and propose two scheduling algorithms. Experimental results are presented using a visualization application.
Impact of High Performance Sockets on Data Intensive Applications
- In the Proceedings of the IEEE International Conference on High Performance Distributed Computing (HPDC 2003
, 2003
"... ¤ balaji,wuj,panda ¥ ..."
The Virtual Microscope
- IEEE Transactions on Information Technology in Biomedicine
, 2002
"... We present the design and implementation of the Virtual Microscope, a software system employing a client/server architecture to provide a realistic emulation of a high power light microscope. The system provides a form of completely digital telepathology, allowing simultaneous access to archived dig ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
We present the design and implementation of the Virtual Microscope, a software system employing a client/server architecture to provide a realistic emulation of a high power light microscope. The system provides a form of completely digital telepathology, allowing simultaneous access to archived digital slide images by multiple clients. The main problem the system targets is storing and processing the extremely large quantities of data required to represent a collection of slides. The Virtual Microscope client software runs on the end user's PC or workstation, while database software for storing, retrieving and processing the microscope image data runs on a parallel computer or on a set of workstations at one or more potentially remote sites. We have designed and implemented two versions of the data server software. One implementation is a customization of a database system framework that is optimized for a tightly coupled parallel machine with attached local disks. The second implementation is component-based, and has been designed to accommodate access to and processing of data in a distributed, heterogeneous environment. We also have developed caching client software, implemented in Java, to achieve good response time and portability across different computer platforms. The performance results presented show that the Virtual Microscope systems scales well, so that many clients can be adequately serviced by an appropriately configured data server.
Processing Large-Scale Multidimensional Data in Parallel and Distributed Environments
, 2002
"... Analysis of data is an important step in understanding and solving a scientific problem. Analysis involves extracting the data of interest from all the available raw data in a dataset and processing it into a data product. However, in many areas of science and engineering, a scientist's ability to a ..."
Abstract
-
Cited by 13 (9 self)
- Add to MetaCart
Analysis of data is an important step in understanding and solving a scientific problem. Analysis involves extracting the data of interest from all the available raw data in a dataset and processing it into a data product. However, in many areas of science and engineering, a scientist's ability to analyze information is increasingly becoming hindered by dataset sizes. The vast amount of data in scientific datasets makes it a difficult task to efficiently access the data of interest, and manage potentially heterogeneous system resources to process the data. Subsetting and aggregation are common operations executed in a wide range of data-intensive applications. We argue that common runtime and programming support can be developed for applications that query and manipulate large datasets. This paper presents a compendium of frameworks and methods we have developed to support efficient execution of subsetting and aggregation operations in applications that query and manipulate large, multi-dimensional datasets in parallel and distributed computing environments.
Compiler support for exploiting coarse-grained pipelined parallelism
- In Supercomputing
, 2003
"... The emergence of grid and a new class of data-driven applications is making a new form of parallelism desirable, which we refer to as coarse-grained pipelined parallelism. This paper reports on a compilation system developed to exploit this form of parallelism. We use a dialect of Java that exposes ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
The emergence of grid and a new class of data-driven applications is making a new form of parallelism desirable, which we refer to as coarse-grained pipelined parallelism. This paper reports on a compilation system developed to exploit this form of parallelism. We use a dialect of Java that exposes both pipelined and data parallelism to the compiler. Our compiler is responsible for selecting a set of candidate filter boundaries, determining the volume of communication required if a particular boundary is chosen, performing the decomposition, and generating code. We have developed a one-pass algorithm for determining the required communication between consecutive filters. We have developed a cost model for estimating the execution time for a given decomposition, and a dynamic programming algorithm for performing the decomposition. Detailed evaluation of our current compiler using four data-driven applications demonstrate the feasibility of our approach. 1.
Exploiting Functional Decomposition for Efficient Parallel Processing of Multiple Data Analysis Queries
"... Reuse is a powerful method for improving system performance. In this paper, we examine functional decomposition for improving data reuse and, therefore, overall query execution performance in the context of data analysis applications. Additionally, we look at the performance effects of using various ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
Reuse is a powerful method for improving system performance. In this paper, we examine functional decomposition for improving data reuse and, therefore, overall query execution performance in the context of data analysis applications. Additionally, we look at the performance effects of using various projection primitives that make it possible to transform intermediate results generated during the execution of a previous query so that they can be reused by a new query. A satellite data analysis application is used to experimentally show the performance benefits achieved using these strategies.
Leveraging Run Time Knowledge about Event Rates to Improve Memory Utilization in Wide Area Data Stream Filtering
- HPDC-11
, 2002
"... The dQUOB system conceptualization of datastreams as database and its SQL interface to data streams is an intuitive way for users to think about their data needs in a large scale application containing hundreds if not thousands of data streams. Experience with dQUOB has shown the need for more aggre ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
The dQUOB system conceptualization of datastreams as database and its SQL interface to data streams is an intuitive way for users to think about their data needs in a large scale application containing hundreds if not thousands of data streams. Experience with dQUOB has shown the need for more aggressive memory management to achieve the scalability we desire. This paper addresses the problem with a two-fold solution. The first is replacement of the existing First Come First Served (FCFS) scheduling algorithm with an Earliest Job First (EJF) algorithm which we demonstrate to yield better average service time. The second is an introspection algorithm that sets and adapts the sizes of join windows in response to knowledge acquired at runtime about event rates. In addition to the potential for significant improvements in memory utilization, the algorithm presented here also provides a means by which the user can reason about join window sizes. Wide area measurements demonstrate the adaptive capability required by the introspection technique.
Adaptive Query Processing: A Survey
- In 19th BNCOD
, 2002
"... In wide-area database systems, which may be running on unpredictable and volatile environments (such as computational grids), it is di#cult to produce e#cient database query plans based on information available solely at compile time. A solution to this problem is to exploit information that bec ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
In wide-area database systems, which may be running on unpredictable and volatile environments (such as computational grids), it is di#cult to produce e#cient database query plans based on information available solely at compile time. A solution to this problem is to exploit information that becomes available at query runtime and adapt the query plan to changing conditions during execution. This paper presents a survey on adaptive query processing techniques, examining the opportunities they o#er to modify a plan dynamically and classifying them into categories according to the problem they focus on, their objectives, the nature of feedback they collect from the environment, the frequency at which they can adapt, their implementation environment and which component is responsible for taking the adaptation decisions.
Active Proxy-G: Optimizing the Query Execution Process in the Grid
, 2002
"... The Grid environment facilitates collaborative work and allows many users to query and process data over geographically dispersed data repositories. Over the past several years, there has been a growing interest in developing applications that interactively analyze datasets, potentially in a collabo ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
The Grid environment facilitates collaborative work and allows many users to query and process data over geographically dispersed data repositories. Over the past several years, there has been a growing interest in developing applications that interactively analyze datasets, potentially in a collaborative setting. We describe an Active Proxy-G service that is able to cache query results, use those results for answering new incoming queries, generate subqueries for the parts of a query that cannot be produced from the cache, and submit the subqueries for final processing at application servers that store the raw datasets. We present an experimental evaluation to illustrate the effects of various design tradeoJj5 . We also show the benefits that two real applications gain from using the middleware.

