Results 1 -
3 of
3
Performance Models for Co-ordinating Parallel Data Classification
- In Proceedings of the Seventh International Parallel Computing Workshop (PCW-97
, 1997
"... In this paper we investigate the use of performance models for structuring parallel programs through a case study in data mining. Performance models have been shown to be an integral part of providing a more structured approach to the problems of performance portability and resource allocation in pa ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
In this paper we investigate the use of performance models for structuring parallel programs through a case study in data mining. Performance models have been shown to be an integral part of providing a more structured approach to the problems of performance portability and resource allocation in parallel programming. This is particularly true in the context of skeletons, where parallel programs are expressed as combinations of predefined, often higher-order, functions. The use of performance models has, to some extent, been limited by the difficulty in applying the approach to irregular and dynamic parallel algorithms. We explore this problem in the context of a well known data mining algorithm, C4.5, which exhibits both irregular and dynamic characteristics. C4.5 is rich in inherent parallelism making the choice of a suitable parallel implementation for a given architecture non-trivial. We demonstrate how a structured approach to developing the performance models enables a c...
Large Scale Data Mining: The Challenges and The Solutions
- In KDD97 International Conference on Knowledge Discovery and Data Mining
, 1997
"... Data mining over large data sets is considered to be a very important research subject due to its obvious commercial potential. However, it is also a major challenge due to its complexity and computational intensity. Exploiting the inherent parallelism of data mining algorithms provides a direct ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Data mining over large data sets is considered to be a very important research subject due to its obvious commercial potential. However, it is also a major challenge due to its complexity and computational intensity. Exploiting the inherent parallelism of data mining algorithms provides a direct solution by utilising the large data retrieval and processing power of parallel architectures. In this paper, we classify various data mining algorithms with respect to their most effective parallel structure. We study induction based classification algorithms, neural networks, clustering algorithms and genetic algorithms. This classification is based on our intensive research on the parallelisation of data mining algorithms. We also present a methodology for determining the proper parallelisation strategy based on the idea of algorithmic skeletons and performance modelling. This research aims to provide a systematic way to develop parallel data mining algorithms and applications. ...
Exploiting Vector and Heterogeneous Systems
"... Programming parallel systems is difficult especially when such systems incorporate heterogeneous components. This paper describes some approaches we are developing to cope effectively with various forms of heterogeneous parallel systems. The first part of the paper describes Fortran90V, an extensi ..."
Abstract
- Add to MetaCart
Programming parallel systems is difficult especially when such systems incorporate heterogeneous components. This paper describes some approaches we are developing to cope effectively with various forms of heterogeneous parallel systems. The first part of the paper describes Fortran90V, an extension of Fortran 90, which can support nested parallelism for expressing irregular problems more efficiently and naturally. This is especially suited to high-performance vector architectures, such as the Fujitsu VPP300. By compiling Fortran-90V to Fortran 90, the language is available for all parallel machines, provided the machines have Fortran 90 compilers. The second part of the paper describes the latest development in SPP(X) which was introduced in PCW'95. SPP(X) is a language which can co-ordinating the activities of a heterogeneous system. Further by associating performance models with the operators of the language, it is possible to aid the resource allocation in a program writt...

