• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

T2: a customizable parallel database for multi-dimensional data (1998)

by Chang
Venue:SIGMOD Record
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 22
Next 10 →

Digital Dynamic Telepathology -- the Virtual Microscope

by Asmara Afework, Michael D. Beynon, Fabian Bustamante, Angelo Demarzo, Renato Ferreira, Robert Miller, Mark Silberman , 1998
"... this paper, we concentrate on how the system manipulates and displays high power, high resolution histopathology datasets. The Virtual Microscope employs a client/server architecture. The client software runs on an end user's PC or workstation, while the database software for storing, retrieving and ..."
Abstract - Cited by 58 (31 self) - Add to MetaCart
this paper, we concentrate on how the system manipulates and displays high power, high resolution histopathology datasets. The Virtual Microscope employs a client/server architecture. The client software runs on an end user's PC or workstation, while the database software for storing, retrieving and processing the microscope image data runs on a variety of platforms. The database software can run on a PC at the end user's site, or on a potentially remote high performance parallel or distributed computer. The database software is further decomposed into two parts -- a frontend process that accepts queries from clients and one or more backend processes that store and retrieve the

Querying Very Large Multi-dimensional Datasets in ADR

by Tahsin Kurc , Chialin Chang , Renato Ferreira , Alan Sussman , Joel Saltz , 1999
"... Applications that make use of very large scientific datasets have become an increasingly important subset of scientific applications. In these applications, datasets are often multi-dimensional, i.e., data items are associated with points in a multi-dimensional attribute space, and access to data ..."
Abstract - Cited by 25 (9 self) - Add to MetaCart
Applications that make use of very large scientific datasets have become an increasingly important subset of scientific applications. In these applications, datasets are often multi-dimensional, i.e., data items are associated with points in a multi-dimensional attribute space, and access to data items is described by range queries. The basic processing involves mapping input data items to output data items, and some form of aggregation of all the input data items that project to the each output data item. We have developed an infrastructure, called the Active Data Repository (ADR), that integrates storage, retrieval and processing of multi-dimensional datasets on distributed-memory parallel architectures with multiple disks attached to each node. In this paper we address efficient execution of range queries on distributed memory parallel machines within ADR framework. We present three potential strategies, and evaluate them under different application scenarios and machine co...

Object-relational Queries into Multidimensional Databases with the Active Data Repository

by Renato Ferreira, Tahsin Kurc, Michael Beynon, Chialin Chang, Alan Sussman, Joel Saltz , 1999
"... As computational power and storage capacity increase, processing and analyzing large volumes of multi-dimensional datasets play an increasingly important role in many domains of scientific research. Scientific applications that make use of very large scientific datasets have several important charac ..."
Abstract - Cited by 22 (7 self) - Add to MetaCart
As computational power and storage capacity increase, processing and analyzing large volumes of multi-dimensional datasets play an increasingly important role in many domains of scientific research. Scientific applications that make use of very large scientific datasets have several important characteristics: datasets consist of complex data and are usually multi-dimensional; applications usually retrieve a subset of all the data available in the dataset; various applicationspecific operations are performed on the data items retrieved. Such applications can be supported by object-relational database management systems (OR-DBMSs). In addition to providing functionality to define new complex datatypes and user-defined functions, an OR-DBMS for scientific datasets should contain runtime support that will provide optimized storage for very large datasets and an execution environment for user-defined functions involving expensive operations. In this paper we describe an infrastructure, the ...

A middleware for developing parallel data mining implementations

by Ruoming Jin, Gagan Agrawal - In Proceedings of the first SIAM conference on Data Mining , 2001
"... Data mining is an interdisciplinary field, having applications in diverse areas like bioinformatics, medical informatics, scientific data analysis, financial analysis, consumer profiling, etc. In each of these application domains, the amount of data available for analysis has exploded in recent year ..."
Abstract - Cited by 17 (10 self) - Add to MetaCart
Data mining is an interdisciplinary field, having applications in diverse areas like bioinformatics, medical informatics, scientific data analysis, financial analysis, consumer profiling, etc. In each of these application domains, the amount of data available for analysis has exploded in recent years, making the scalability of data

Query Planning for Range Queries with User-defined Aggregation on Multi-dimensional Scientific Datasets

by Chialin Chang, Tahsin Kurc, Alan Sussman, Joel Saltz , 1999
"... Applications that make use of very large scientific datasets have become an increasingly important subset of scientific applications. In these applications, the datasets are often multi-dimensional, i.e., data items are associated with points in a multi-dimensional attribute space. The processing is ..."
Abstract - Cited by 8 (6 self) - Add to MetaCart
Applications that make use of very large scientific datasets have become an increasingly important subset of scientific applications. In these applications, the datasets are often multi-dimensional, i.e., data items are associated with points in a multi-dimensional attribute space. The processing is usually highly stylized, with the basic processing steps consisting of (1) retrieval of a subset of all available data in the input dataset via a range query, (2) projection of each input data item to one or more output data items, and (3) some form of aggregation of all the input data items that project to the each output data item. We have developed an infrastructure, called the Active Data Repository (ADR), that integrates storage, retrieval and processing of multi-dimensional datasets on shared-nothing architectures. In this paper we address query planning and execution strategies for range queries with user-defined processing. We evaluate three potential query planning strategies withi...

Compiler Supported High-level Abstractions for Sparse Disk-Resident Datasets

by Renato Ferreira, Gagan Agrawal, Joel Saltz , 2001
"... Processing and analyzing large volumes of data plays an increasingly important role in many domains of scientific research. The complexity and irregularity of datasets in many domains make the task of developing such processing applications tedious and error-prone. ..."
Abstract - Cited by 5 (3 self) - Add to MetaCart
Processing and analyzing large volumes of data plays an increasingly important role in many domains of scientific research. The complexity and irregularity of datasets in many domains make the task of developing such processing applications tedious and error-prone.

Compiling Data Intensive Applications with Spatial Coordinates

by Renato Ferreira, Gagan Agrawal, Ruoming Jin - In Proceedings of Languages and Compiler for Parallel Computing , 2000
"... Processing and analyzing large volumes of data plays an increasingly important role in many domains of scientific research. We are developing a compiler which processes data intensive applications written in a dialect of Java and compiles them for efficient execution on cluster of workstations or ..."
Abstract - Cited by 4 (2 self) - Add to MetaCart
Processing and analyzing large volumes of data plays an increasingly important role in many domains of scientific research. We are developing a compiler which processes data intensive applications written in a dialect of Java and compiles them for efficient execution on cluster of workstations or distributed memory machines.

Compiler and Runtime Analysis for Efficient Communication in Data Intensive Applications

by Renato Ferreira, Gagan Agrawal, Joel Saltz
"... Processing and analyzing large volumes of data plays an increasingly important role in many domains of scientific research. We are developing a compiler that processes data intensive applications written in a dialect of Java and compiles them for efficient execution on distributed memory parallel ma ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
Processing and analyzing large volumes of data plays an increasingly important role in many domains of scientific research. We are developing a compiler that processes data intensive applications written in a dialect of Java and compiles them for efficient execution on distributed memory parallel machines.

Arraystore: A storage manager for complex parallel array processing

by Emad Soroush, Magdalena Balazinska , 2011
"... We present the design, implementation, and evaluation of ArrayStore, a new storage manager for complex, parallel array processing. ArrayStore builds on prior work in the area of multidimensional data storage, but considers the new problem of supporting a parallel and more varied workload comprising ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
We present the design, implementation, and evaluation of ArrayStore, a new storage manager for complex, parallel array processing. ArrayStore builds on prior work in the area of multidimensional data storage, but considers the new problem of supporting a parallel and more varied workload comprising not only range-queries, but also binary operations such as joins and complex user-defined functions. This paper makes two key contributions. First, it examines several existing single-site storage management strategies and array partitioning strategies to identify which combination is best suited for the array-processing workload above. Second, it develops a new and efficient storagemanagement mechanism that enables parallel processing of operations that must access data from adjacent partitions. We evaluate ArrayStore on over 80GB of real data from two scientific domains and real operators used in these domains. We show that ArrayStore outperforms previously proposed storage management strategies in the context of its diverse target workload.

Language Extensions and Compilation Techniques for Data Intensive Computations

by Gagan Agrawal, Renato Ferreira, Joel Saltz - In Proceedings of Workshop on Compilers for Parallel Computing , 2000
"... Processing and analyzing large volumes of data plays an increasingly important role in many ..."
Abstract - Cited by 3 (2 self) - Add to MetaCart
Processing and analyzing large volumes of data plays an increasingly important role in many
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University