Results 1 - 10
of
19
Performance Measurement, Visualization and Modeling of Parallel and Distributed Programs using the AIMS Toolkit
, 1995
"... this paper, we first address fundamental issues in building useful performance-tuning tools and then describe our experience with the AIMS toolkit for tuning parallel and distributed programs on a variety of platforms. AIMS supports source-code instrumentation, run-time monitoring, graphical executi ..."
Abstract
-
Cited by 60 (2 self)
- Add to MetaCart
this paper, we first address fundamental issues in building useful performance-tuning tools and then describe our experience with the AIMS toolkit for tuning parallel and distributed programs on a variety of platforms. AIMS supports source-code instrumentation, run-time monitoring, graphical execution profiles, performance indices and automated modeling techniques as ways to expose performance problems of programs. Using several examples representing a broad range of scientific applications, we illustrate AIMS' effectiveness in exposing performance problems in parallel and distributed programs
A Comparative Evaluation of Techniques for Studying Parallel System Performance
, 1994
"... This paper presents a comparative and qualitative survey of techniques for evaluating parallel systems. We also survey metrics that have been proposed for capturing and quantifying the details of complex parallel system interactions. Experimentation, theoretical/analytical modeling and simulation ar ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
This paper presents a comparative and qualitative survey of techniques for evaluating parallel systems. We also survey metrics that have been proposed for capturing and quantifying the details of complex parallel system interactions. Experimentation, theoretical/analytical modeling and simulation are three frequently used techniques in performance evaluation. Experimentation uses real or synthetic workloads, usually called benchmarks, to measure and analyze their performance on actual hardware. Theoretical and analytical models are used to abstract details of a parallel system, providing the view of a simplified system parameterized by a limited number of degrees of freedom that are kept tractable. Simulation and related performance monitoring/visualization tools have become extremely popular becauseof their ability to capture the dynamic nature of the interaction between applications and architectures. We first present the figures of merit that are important for any performance evaluation technique. With respect to these figures of merit, we survey the three techniques and make a qualitative comparison of their pros and cons. In particular, for each of the above techniques we discuss: representative case studies; the underlying models that are used for the workload and the architecture; the feasibility and ease of quantifying standard performance metrics from the available statistics; the accuracy/validity of the output statistics; and the cost/effort that is expended in each evaluation strategy.
Towards a Scalable Parallel Object Database - The Bulk Synchronous Parallel Approach
, 1996
"... Parallel computers have been successfully deployed in many scientific and numerical application areas, although their use in non-numerical and database applications has been scarce. In this report, we first survey the architectural advancements beginning to make general-purpose parallel computing co ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Parallel computers have been successfully deployed in many scientific and numerical application areas, although their use in non-numerical and database applications has been scarce. In this report, we first survey the architectural advancements beginning to make general-purpose parallel computing cost-effective, the requirements for non-numerical (or symbolic) applications, and the previous attempts to develop parallel databases. The central theme of the Bulk Synchronous Parallel model is to provide a high level abstraction of parallel computing hardware whilst providing a realisation of a parallel programming model that enables architecture independent programs to deliver scalable performance on diverse hardware platforms. Therefore, the primary objective of this report is to investigate the feasibility of developing a portable, scalable, parallel object database, based on the Bulk Synchronous Parallel model of computation. In particular, we devise a way of providing high-level abstra...
Partial Translation
, 1993
"... Traditional simulation of a target architecture by interpreting object code can be improved by translating the object code to an intermediate format. This approach is called interpretive translation. Despite a substantial performance improvement over traditional interpretation, a large part of the o ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
Traditional simulation of a target architecture by interpreting object code can be improved by translating the object code to an intermediate format. This approach is called interpretive translation. Despite a substantial performance improvement over traditional interpretation, a large part of the overhead is unnecessary. An alternative approach is block translation, where one or more simulated instructions are translated to directly executable code. This approach has several drawbacks. We discuss the problems with block translation, analyse the overhead of interpretive translation, and describe a hybrid approach---partial translation---that combines the benefits of both approaches. Partial translation implements an intermediate format that supports the addition of run-time generated code whenever appropriate. The performance limit (slowdown) of interpetive translation is around 15, and real implementations have achieved 20-30. Partial translation will perform considerably better. Fi...
Performance Evaluation for Parallel Systems: A Survey
, 1997
"... Performance is often a key factor in determining the success of a parallel software system. Performance evaluation... ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Performance is often a key factor in determining the success of a parallel software system. Performance evaluation...
Evaluating Scalability of the 2-D FFT on Parallel Computers
- In Computer Architectures for Machine Perception '93
, 1993
"... Parallel computers have demonstrated a remarkable potential for achieving high performance at a reasonable cost for many computer vision and image processing (CVIP) applications. A major obstacle to the use of parallel computers is the lack of a universally accepted metric to study the scalability o ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
Parallel computers have demonstrated a remarkable potential for achieving high performance at a reasonable cost for many computer vision and image processing (CVIP) applications. A major obstacle to the use of parallel computers is the lack of a universally accepted metric to study the scalability of parallel algorithms and architectures. In this paper, we apply different scalability measures to various 2-D FFT algorithms and target architectures and compare the expected performance to the measured results. A number of algorithms in computer vision and image processing exhibit regular communication patterns similar to the 2-D FFT. We can therefore extrapolate our observations to determine which aspects of these measures are relevant to the scalability analysis of other similar image processing algorithms. 1 Introduction Scalability of algorithms and architectures is a moot question in parallel computing. The factors that influence scalability include - but are not limited to - machine...
Scalable, Parallel Computers: Alternatives, Issues, and Challenges
- International Journal of Parallel Programming
, 1994
"... The 1990s will be the era of scalable computers. By giving up uniform memory access, computers can be built that scale over a range of several thousand. These provide high peak announcedperformance (PAP), by using powerful, distributed CMOS microprocessor-primary memory pairs interconnected by a hig ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
The 1990s will be the era of scalable computers. By giving up uniform memory access, computers can be built that scale over a range of several thousand. These provide high peak announcedperformance (PAP), by using powerful, distributed CMOS microprocessor-primary memory pairs interconnected by a high perfor-mance switch (network). The parameters that determine these structures and their utility include: whether hardware (a multiprocessor) or software (a multi-computer) is used to maintain a distributed, or shared virtual memory (DSM) environment; the power of computing nodes (these improve at 60 % per year); the size and scalability of the switch; distributability (the ability to connect to geographically dispersed computers including workstations); and all forms of software to exploit their inherent parallelism. To a great extent, viability is determined by a computer's generality-the ability to efficiently handle a range of work that requires varying processing (from serial to fully parallel), memory, and 110 resources. A taxonomy and evolutionary time line outlines the next decade of computer evolution, included distributed workstations, based on scalability and parallelism. Workstations can be the best scalables. KEY WORDS: Scalable multiprocessors and multicomputers; massive parallelism; distributed or shared virtual memory; high performance computers; computer architecture. 1.
Toward The Design Of Large-Scale, Shared-Memory Multiprocessors
- Dept. of Comput. Sci., Univ. of Wisconsin-Madison
, 1992
"... The state-of-the-art in multiprocessing today employs thousands of high-performance microprocessors. As system sizes continue to grow, increasing care must be taken to design cost-efficient, balanced (i.e. scalable) systems. This thesis addresses the scalability of sharedmemory multiprocessors, pres ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
The state-of-the-art in multiprocessing today employs thousands of high-performance microprocessors. As system sizes continue to grow, increasing care must be taken to design cost-efficient, balanced (i.e. scalable) systems. This thesis addresses the scalability of sharedmemory multiprocessors, presenting a practical treatment of scalability, and proceeding to focus on aspects of two critical areas of large-scale system design: interconnection networks and cache coherence mechanisms. In these areas, pipelined-channel interconnection networks and pruning-cache directories are investigated, respectively. Pipelined-channel interconnection networks allow multiple bits to be simultaneously in flight on a single wire, decoupling channel throughput from channel latency. The first published performance analysis of the SCI ring, a new IEEE standard employing pipelined channels, is presented. This study serves as a proof-of-concept for pipelined-channel networks, demonstrating their very high p...
Massively Parallel Distributed Feature Extraction in Textual Data Mining Using HDDI(tm)
- In the Proceedings of The Tenth IEEE International Symposium on High Performance Distributed Computing (HPDC-10
, 2001
"... data is feature extraction. The widespread digitization of information has created a wealth of data that requires novel approaches to feature extraction in a distributed environment. We propose a massively parallel model for feature extraction that employs unused cycles on networks of PCs/workstatio ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
data is feature extraction. The widespread digitization of information has created a wealth of data that requires novel approaches to feature extraction in a distributed environment. We propose a massively parallel model for feature extraction that employs unused cycles on networks of PCs/workstations in a highly distributed environment. We have developed an analytical model of the time and communication complexity of the feature extraction process in this environment based on feature extraction algorithms developed in our textual data mining research with [1] [18] [20]. We show that speedups linear in the number of processors are achievable for applications involving reduction operations based on a novel, parallel pipelined model of execution. We are in the process of validating our analytical model with empirical observations based on the extraction of features from a large number of pages on the World Wide Web.
An Investigation of the Use of High Performance Computing for Multiscale Color Image Smoothing Using Mathematical Morphology
"... The use of mathematical morphology in low- and mid-level image processing and computer vision applications has allowed the development of a class of techniques for analyzing shape information in color images. These techniques have shown to be useful in image enhancement, segmentation, and analysis. ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
The use of mathematical morphology in low- and mid-level image processing and computer vision applications has allowed the development of a class of techniques for analyzing shape information in color images. These techniques have shown to be useful in image enhancement, segmentation, and analysis. In this paper, we develop and test scalable parallel algorithms necessary to implement a class of morphological filters on a parallel computer, specifically, the MasPar MP-1. We examine the issues relative to the parallel implementation of the algorithms and show that real-time enhancement of high resolution color images is possible. 1. INTRODUCTION The quantum jump in computer performance in the past two decades has allowed many scientific applications access to computational speed not previously known. This may be especially true in image processing and computer vision where the need for computational speed is compounded by both the number of pixels in a large image (typically greater than...

