Results 11 -
14 of
14
Clustered Workflow Execution of Retargeted Data Analysis Scripts
- EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID
"... Supercomputing advances have enabled computational science data volumes to grow at ever increasing rates, commonly resulting in more data produced than can be practically analyzed. Whole-dataset download costs have grown to impractical heights, even with multi-Gbps networks, forcing scientists to re ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Supercomputing advances have enabled computational science data volumes to grow at ever increasing rates, commonly resulting in more data produced than can be practically analyzed. Whole-dataset download costs have grown to impractical heights, even with multi-Gbps networks, forcing scientists to rely on server-side subsetting and limiting the scope of data they can analyze on a workstation. Our system supplements existing scientific data services with lightweight computational capability, providing a means of safely relocating analysis from the desktop to the server where clustered execution can be coordinated, exploiting data locality, reducing unnecessary data transfer, and providing end-users with results several times faster. We show how dataflow and other compiler-inspired analyses of shell scripts of scientists’ most common analysis tools enables parallelization and optimizations in disk and network I/O bandwidth. We benchmark using an actual geoscience analysis script, illustrating the crucial performance gains of extracting workflows defined in scripts and optimizing their execution. Current results quantify significant improvements in performance, showing the promise of bringing transparent high-performance analysis to the scientist’s desktop.
Large Scale Data Mining: The Challenges and The Solutions
- In KDD97 International Conference on Knowledge Discovery and Data Mining
, 1997
"... Data mining over large data sets is considered to be a very important research subject due to its obvious commercial potential. However, it is also a major challenge due to its complexity and computational intensity. Exploiting the inherent parallelism of data mining algorithms provides a direct ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Data mining over large data sets is considered to be a very important research subject due to its obvious commercial potential. However, it is also a major challenge due to its complexity and computational intensity. Exploiting the inherent parallelism of data mining algorithms provides a direct solution by utilising the large data retrieval and processing power of parallel architectures. In this paper, we classify various data mining algorithms with respect to their most effective parallel structure. We study induction based classification algorithms, neural networks, clustering algorithms and genetic algorithms. This classification is based on our intensive research on the parallelisation of data mining algorithms. We also present a methodology for determining the proper parallelisation strategy based on the idea of algorithmic skeletons and performance modelling. This research aims to provide a systematic way to develop parallel data mining algorithms and applications. ...
Functional Programming of Massively Parallel Systems
, 1993
"... Parallel programming is intrinsically more difficult than sequential programming. To this day there is no universal programming methodology which provides a simple programming model applicable to a wide range of architectures. In this report a methodology is presented which aims at solving this dile ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Parallel programming is intrinsically more difficult than sequential programming. To this day there is no universal programming methodology which provides a simple programming model applicable to a wide range of architectures. In this report a methodology is presented which aims at solving this dilemma using a transformational approach based on skeletons. First steps towards a general theory of skeletons are set up using results from sheaf theory. These results provide the basis for the formal definition of data distribution algebras underlying our definition of skeletons. The papers presents several examples in some detail which illustrate the use of the presented formalism in application domains important to parallel programming. Keywords: functional programming, parallel programming, skeleton, data distribution algebra.

