Results 1  10
of
204
The Watershed Transform: Definitions, Algorithms and Parallelization Strategies
, 2001
"... The watershed transform is the method of choice for image segmentation in the field of mathematical morphology. We present a critical review of several definitions of the watershed transform and the associated sequential algorithms, and discuss various issues which often cause confusion in the li ..."
Abstract

Cited by 135 (3 self)
 Add to MetaCart
The watershed transform is the method of choice for image segmentation in the field of mathematical morphology. We present a critical review of several definitions of the watershed transform and the associated sequential algorithms, and discuss various issues which often cause confusion in the literature. The need to distinguish between definition, algorithm specification and algorithm implementation is pointed out. Various examples are given which illustrate differences between watershed transforms based on different definitions and/or implementations. The second part of the paper surveys approaches for parallel implementation of sequential watershed algorithms.
Input/Output Characteristics of Scalable Parallel Applications
 In Proceedings of the Supercomputing ’95
, 1995
"... Rapid increases in computing and comm unication performance are exacerbating the longstanding problem of performancelimited input/output. Indeed, for many otherwise scalable parallel applications, input/output is emerging as a major performance bottleneck. The design of scalable input/output syste ..."
Abstract

Cited by 107 (2 self)
 Add to MetaCart
Rapid increases in computing and comm unication performance are exacerbating the longstanding problem of performancelimited input/output. Indeed, for many otherwise scalable parallel applications, input/output is emerging as a major performance bottleneck. The design of scalable input/output systems depends critically on the input/output requirements and access patterns for this emerging class of largescale parallel applications. Ho wever, hard data on the behavior of such applications is only now becoming available. In this paper, we describe the input/output requirements of three scalable parallel applications (electron scattering, terrain rendering, and quantum chemistry) on the Intel Paragon XP/S. As part of an ongoing parallel input/output characterization e ort, we used instrumented versions of the application codes to capture
Mapping a manifold of perceptual observations
 Advances in Neural Information Processing Systems 10
, 1998
"... Nonlinear dimensionality reduction is formulated here as the problem of trying to find a Euclidean featurespace embedding of a set of observations that preserves as closely as possible their intrinsic metric structure – the distances between points on the observation manifold as measured along geod ..."
Abstract

Cited by 73 (2 self)
 Add to MetaCart
Nonlinear dimensionality reduction is formulated here as the problem of trying to find a Euclidean featurespace embedding of a set of observations that preserves as closely as possible their intrinsic metric structure – the distances between points on the observation manifold as measured along geodesic paths. Our isometric feature mapping procedure, or isomap, is able to reliably recover lowdimensional nonlinear structure in realistic perceptual data sets, such as a manifold of face images, where conventional global mapping methods find only local minima. The recovered map provides a canonical set of globally meaningful features, which allows perceptual transformations such as interpolation, extrapolation, and analogy – highly nonlinear transformations in the original observation space – to be computed with simple linear operations in feature space. 1
Mapping Irregular Applications to DIVA, a PIMbased DataIntensive Architecture
 In Supercomputing
, 1999
"... Processinginmemory (PIM) chips that integrate processor logic into memory devices offer a new opportunity for bridging the growing gap between processor and memory speeds, especially for applications with high memorybandwidth requirements. The DataIntensiVe Architecture (DIVA) system combines PI ..."
Abstract

Cited by 55 (12 self)
 Add to MetaCart
Processinginmemory (PIM) chips that integrate processor logic into memory devices offer a new opportunity for bridging the growing gap between processor and memory speeds, especially for applications with high memorybandwidth requirements. The DataIntensiVe Architecture (DIVA) system combines PIM memories with one or more external host processors and a PIMtoPIM interconnect. DIVA increases memory bandwidth through two mechanisms: (1) performing selected computation in memory, reducing the quantity of data transferred across the processormemory interface; and (2) providing communication mechanisms called parcels for moving both data and computation throughout memory, further bypassing the processormemory bus. DIVA uniquely supports acceleration of important irregular applications, including sparsematrix and pointerbased computations. In this paper, we focus on several aspects of DIVA designed to effectively support such computations at very high performance levels: (1) the mem...
From patterns to frameworks to parallel programs
 UNIVERSITY OF ALBERTA
, 2002
"... This dissertation shows a new approach to writing objectoriented parallel programs based on design patterns, frameworks, and multiple layers of abstraction. ..."
Abstract

Cited by 41 (9 self)
 Add to MetaCart
This dissertation shows a new approach to writing objectoriented parallel programs based on design patterns, frameworks, and multiple layers of abstraction.
A Decoupled Scheduling Approach for Grid Application Development Environments
 Journal of Parallel and Distributed Computing
, 2003
"... In this paper we propose an adaptive scheduling approach designed to improve the performance of parallel applications in Computational Grid environments. A primary contribution of our work is that our design is modular and provides a separation of the scheduler itself from the applicationspecific c ..."
Abstract

Cited by 39 (2 self)
 Add to MetaCart
In this paper we propose an adaptive scheduling approach designed to improve the performance of parallel applications in Computational Grid environments. A primary contribution of our work is that our design is modular and provides a separation of the scheduler itself from the applicationspecific components needed for the scheduling process. As part of the scheduler, we have also developed a search procedure which effectively and efficiently identifies desirable schedules. As test cases for our approach, we selected two applications from the class of iterative, meshbased applications. For each of the test applications, we developed data mappers and performance models. We used a prototype of our approach in conjunction with these applicationspecific components to perform validation experiments in production Grid environments. Our results show that our scheduler provides significantly better application performance than conventional scheduling strategies. We also show that our scheduler gracefully handles degraded levels of availability of application and Grid resource information. Finally, we demonstrate that the overheads introduced by our methodology
Using AspectJ to Separate Concerns In Parallel Scientific Java Code
 In AOSD ’04: Proceedings of the 3rd international conference on Aspectoriented software development
, 2004
"... Scientific software frequently demands high performance in order to execute complex models in acceptable time. A major means of obtaining high performance is via parallel execution on multiprocessor systems. However, traditional methods of programming for parallel execution can lead to substantial ..."
Abstract

Cited by 28 (2 self)
 Add to MetaCart
Scientific software frequently demands high performance in order to execute complex models in acceptable time. A major means of obtaining high performance is via parallel execution on multiprocessor systems. However, traditional methods of programming for parallel execution can lead to substantial codetangling where the needs of the mathematical model crosscut with the concern of parallel execution. AspectOriented Programming is an attractive technology for solving the problem of codetangling in high performance parallel scientific software. The underlying mathematical model and the parallelism can be treated as separate concerns and programmed accordingly. Their elements of code can then be woven together to produce the final application. This paper investigates the extent to which AspectJ technology can be used to achieve the desired separation of concerns in programs from the Java Grande Forum benchmark suite, a set of test applications for evaluation of the performance of Java in the context of numerical computation. The paper analyses three different benchmark programs and classifies the degrees of difficulty in separating concerns within them in a form suitable for AspectJ. This leads to an assessment of the influence of the design of a numerical application on the ability of AspectJ to solve this kind of codetangling problem. It is concluded that: (1) scientific software is rarely produced in true objectoriented style; and (2) the inherent loop structure of many scientific algorithms is incompatible with the join point philosophy of AspectJ. Since AspectJ cannot intercept the iterations of forloops (which are at the heart of highperformance computing), various objectoriented models are proposed for describing (embarrassingly parallel) rectangular doublenested forloops that make it possible to use AspectJ for encapsulating parallelisation in an aspect. Finally, a testcase using these models is presented, together with performance results obtained on various Java Virtual Machines.
Parallelization of the Vehicle Routing Problem with Time Windows
, 2001
"... Routing with time windows (VRPTW) has been an area of research that have
attracted many researchers within the last 10 { 15 years. In this period a number
of papers and technical reports have been published on the exact solution of the
VRPTW.
The VRPTW is a generalization of the wellknown capacitat ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
Routing with time windows (VRPTW) has been an area of research that have
attracted many researchers within the last 10 { 15 years. In this period a number
of papers and technical reports have been published on the exact solution of the
VRPTW.
The VRPTW is a generalization of the wellknown capacitated routing problem
(VRP or CVRP). In the VRP a
eet of vehicles must visit (service) a number
of customers. All vehicles start and end at the depot. For each pair of customers
or customer and depot there is a cost. The cost denotes how much is costs a
vehicle to drive from one customer to another. Every customer must be visited
exactly ones. Additionally each customer demands a certain quantity of goods
delivered (know as the customer demand). For the vehicles we have an upper
limit on the amount of goods that can be carried (known as the capacity). In
the most basic case all vehicles are of the same type and hence have the same
capacity. The problem is now for a given scenario to plan routes for the vehicles
in accordance with the mentioned constraints such that the cost accumulated
on the routes, the xed costs (how much does it cost to maintain a vehicle) or
a combination hereof is minimized.
In the more general VRPTW each customer has a time window, and between
all pairs of customers or a customer and the depot we have a travel time. The
vehicles now have to comply with the additional constraint that servicing of the
customers can only be started within the time windows of the customers. It
is legal to arrive before a time window \opens" but the vehicle must wait and
service will not start until the time window of the customer actually opens.
For solving the problem exactly 4 general types of solution methods have
evolved in the literature: dynamic programming, DantzigWolfe (column generation),
Lagrange decomposition and solving the classical model formulation
directly.
Presently the algorithms that uses DantzigWolfe given the best results
(Desrochers, Desrosiers and Solomon, and Kohl), but the Ph.D. thesis of Kontoravdis
shows promising results for using the classical model formulation directly.
In this Ph.D. project we have used the DantzigWolfe method. In the
DantzigWolfe method the problem is split into two problems: a \master problem"
and a \subproblem". The master problem is a relaxed set partitioning
v
vi
problem that guarantees that each customer is visited exactly ones, while the
subproblem is a shortest path problem with additional constraints (capacity and
time window). Using the master problem the reduced costs are computed for
each arc, and these costs are then used in the subproblem in order to generate
routes from the depot and back to the depot again. The best (improving) routes
are then returned to the master problem and entered into the relaxed set partitioning
problem. As the set partitioning problem is relaxed by removing the
integer constraints the solution is seldomly integral therefore the DantzigWolfe
method is embedded in a separationbased solutiontechnique.
In this Ph.D. project we have been trying to exploit structural properties in
order to speed up execution times, and we have been using parallel computers
to be able to solve problems faster or solve larger problems.
The thesis starts with a review of previous work within the eld of VRPTW
both with respect to heuristic solution methods and exact (optimal) methods.
Through a series of experimental tests we seek to dene and examine a number
of structural characteristics.
The rst series of tests examine the use of dividing time windows as the
branching principle in the separationbased solutiontechnique. Instead of using
the methods previously described in the literature for dividing a problem into
smaller problems we use a methods developed for a variant of the VRPTW. The
results are unfortunately not positive.
Instead of dividing a problem into two smaller problems and try to solve
these we can try to get an integer solution without having to branch. A cut is an
inequality that separates the (nonintegral) optimal solution from all the integer
solutions. By nding and inserting cuts we can try to avoid branching. For the
VRPTW Kohl has developed the 2path cuts. In the separationalgorithm for
detecting 2path cuts a number of test are made. By structuring the order in
which we try to generate cuts we achieved very positive results.
In the DantzigWolfe process a large number of columns may be generated,
but a signicant fraction of the columns introduced will not be interesting with
respect to the master problem. It is a priori not possible to determine which
columns are attractive and which are not, but if a column does not become part
of the basis of the relaxed set partitioning problem we consider it to be of no
benet for the solution process. These columns are subsequently removed from
the master problem. Experiments demonstrate a signicant cut of the running
time.
Positive results were also achieved by stopping the routegeneration process
prematurely in the case of timeconsuming shortest path computations. Often
this leads to stopping the shortest path subroutine in cases where the information
(from the dual variables) leads to \bad" routes. The premature exit
from the shortest path subroutine restricts the generation of \bad" routes signi
cantly. This produces very good results and has made it possible to solve
problem instances not solved to optimality before.
The parallel algorithm is based upon the sequential DantzigWolfe based
algorithm developed earlier in the project. In an initial (sequential) phase unsolved
problems are generated and when there are unsolved problems enough
vii
to start work on every processor the parallel solution phase is initiated. In the
parallel phase each processor runs the sequential algorithm. To get a good workload
a strategy based on balancing the load between neighbouring processors is
implemented. The resulting algorithm is eÆcient and capable of attaining good
speedup values. The loadbalancing strategy shows an even distribution of work
among the processors. Due to the large demand for using the IBM SP2 parallel
computer at UNIC it has unfortunately not be possible to run as many tests
as we would have liked. We have although managed to solve one problem not
solved before using our parallel algorithm.
Parallel randomized statespace search
 in International Conference on Software Engineering
"... Model checkers search the space of possible program behaviors to detect errors and to demonstrate their absence. Despite major advances in reduction and optimization techniques, statespace search can still become costprohibitive as program size and complexity increase. In this paper, we present a ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
Model checkers search the space of possible program behaviors to detect errors and to demonstrate their absence. Despite major advances in reduction and optimization techniques, statespace search can still become costprohibitive as program size and complexity increase. In this paper, we present a technique for dramatically improving the costeffectiveness of statespace search techniques for error detection using parallelism. Our approach can be composed with all of the reduction and optimization techniques we are aware of to amplify their benefits. It was developed based on insights gained from performing a large empirical study of the costeffectiveness of randomization techniques in statespace analysis. We explain those insights and our technique, and then show through a focused empirical study that our technique speeds up analysis by factors ranging from 2 to over 1000 as compared to traditional modes of statespace search, and does so with relatively small numbers of parallel processors. 1.