Results 1 -
3 of
3
Parallel computation of high dimensional robust correlation and covariance matrices
- In KDD ’04: Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining
, 2004
"... The computation of covariance and correlation matrices are critical to many data mining applications and processes. Unfortunately the classical covariance and correlation matrices are very sensitive to outliers. Robust methods, such as QC and the Maronna method, have been proposed. However, existing ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
The computation of covariance and correlation matrices are critical to many data mining applications and processes. Unfortunately the classical covariance and correlation matrices are very sensitive to outliers. Robust methods, such as QC and the Maronna method, have been proposed. However, existing algorithms for QC only give acceptable performance when the dimensionality of the matrix is in the hundreds; and the Maronna method is rarely used in practise because of its high computational cost. In this paper, we develop parallel algorithms for both QC and the Maronna method. We evaluate these parallel algorithms using a real data set of the gene expression of over 6,000 genes, giving rise to a matrix of over 18 million entries. In our experimental evaluation, we explore scalability in dimensionality and in the number of processors, and the trade-offs between accuracy and computational efficiency. We also compare the parallel behaviours of the two methods. From a statistical standpoint, the Maronna method is more robust than QC. From a computational standpoint, while QC requires less computation, interestingly the Maronna method is much more parallelizable than QC. After a thorough experimentation, we conclude that for many data mining applications, both QC and Maronna are viable options. Less robust, but faster, QC is the recommended choice for small parallel platforms. On the other hand, the Maronna method is the recommended choice when a high degree of robustness is required, or when the parallel platform features a large number of processors (e.g., 32). 1
Parallelism in Combinatorial Optimisation
, 1995
"... This report addresses the issues arising from the use of parallel machines and considers the various techniques used by members of the consortium in this context. Before considering the algorithms in detail, we describe, in section 2, the main types of parallel architecture and survey various attemp ..."
Abstract
- Add to MetaCart
This report addresses the issues arising from the use of parallel machines and considers the various techniques used by members of the consortium in this context. Before considering the algorithms in detail, we describe, in section 2, the main types of parallel architecture and survey various attempts at providing a taxonomy. Then, in section 3, we address the difficult issue of the measurement of processor performance in order to quantify any enhancement obtained by implementing an algorithm in parallel. Section 4 presents the main features of in general, and PVM ( ) in particular. The latter is an application that is used to generate distributed versions of sequential algorithms for use on networks of workstations. The parallel implementations of the GA toolkit, , and the associated simulated annealing toolkit, , both developed at UEA, have been produced using PVM. Exact algorithms will always find the optimal solution to a problem given enough time and space. Subject to these constraints, they must always be the preferred method of solution. In practice, the time and space constraints can prevent the use of an exact algorithm and thus the potential of parallelism to reduce these factors becomes an important factor. Total enumeration is embarassingly parallel. With processors it is reasonable to expect an-fold reduction in time to undertake such a thorough search. Such a saving is seldom sufficient to make the method viable so we will concentrate on other exact methods here. In section 5, we review parallel branchand -bound, reprinting a survey paper written by the UEA partners in the consortium and previously published in [1]. Because of the interest in interior point methods for the CALMA project and its widely cited potential for parallelisation, this provides the ...
Alternative Analysis for Computational Holon Architectures
, 1994
"... Simulator : : : : : : : : : : : : : : : : : : : : : : : : : 87 Appendix E. Examples of Human Performance Process Hierarchical Decomposition 92 Appendix F. Scalable Coherent Interfaces 96 Contents (continued) Chapter Page Appendix G. Synopses of Selected High Performance Parallel Machines 98 Append ..."
Abstract
- Add to MetaCart
Simulator : : : : : : : : : : : : : : : : : : : : : : : : : 87 Appendix E. Examples of Human Performance Process Hierarchical Decomposition 92 Appendix F. Scalable Coherent Interfaces 96 Contents (continued) Chapter Page Appendix G. Synopses of Selected High Performance Parallel Machines 98 Appendix H. Glossary of Acronyms 102 References 105 List of Figures Figure Page 1.1 A Holarchy : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 17 2.1 Possible Paths for Human Performance Process Model Creation : : : : : : : 21 6.1 Numerical Aerodynamics Simulation Results for Embarassingly Parallel Benchmarks : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 40 6.2 CM2: Numerical Aerodynamics Simulation Benchmark Results : : : : : : : 41 6.3 Human Performance Process and Architectures : : : : : : : : : : : : : : : : 42 8.1 Heterogeneous Computing Environment : : : : : : : : : : : : : : : : : : : : 50 9.1 High Performance Systems Metrics : : :...

