Results 1  10
of
11
CSDP, a C library for semidefinite programming.
, 1997
"... this paper is organized as follows. First, we discuss the formulation of the semidefinite programming problem used by CSDP. We then describe the predictor corrector algorithm used by CSDP to solve the SDP. We discuss the storage requirements of the algorithm as well as its computational complexity. ..."
Abstract

Cited by 212 (1 self)
 Add to MetaCart
this paper is organized as follows. First, we discuss the formulation of the semidefinite programming problem used by CSDP. We then describe the predictor corrector algorithm used by CSDP to solve the SDP. We discuss the storage requirements of the algorithm as well as its computational complexity. Finally, we present results from the solution of a number of test problems. 2 The SDP Problem We consider semidefinite programming problems of the form max tr (CX)
Parallel ScaLAPACKstyle Algorithms for Solving ContinuousTime Sylvester Equations
 In EuroPar 2003 Parallel Processing, H. Kosch and et al, Eds. Lecture Notes in Computer Science
, 2003
"... Abstract. An implementation of a parallel ScaLAPACKstyle solver for the general Sylvester equation, op(A)X − Xop(B) = C, where op(A) denotes A or its transpose A T, is presented. The parallel algorithm is based on explicit blocking of the BartelsStewart method. An initial transformation of the co ..."
Abstract

Cited by 8 (7 self)
 Add to MetaCart
(Show Context)
Abstract. An implementation of a parallel ScaLAPACKstyle solver for the general Sylvester equation, op(A)X − Xop(B) = C, where op(A) denotes A or its transpose A T, is presented. The parallel algorithm is based on explicit blocking of the BartelsStewart method. An initial transformation of the coefficient matrices A and B to Schur form leads to a reduced triangular matrix equation. We use different matrix traversing strategies to handle the transposes in the problem to solve, leading to different new parallel wavefront algorithms. We also present a strategy to handle the problem when 2 x 2 diagonal blocks of the matrices in Schur form, corresponding to complex conjugate pairs of eigenvalues, are split between several blocks in the block partitioned matrices. Finally, the solution of the reduced matrix equation is transformed back to the originally coordinate system. The implementation acts in a ScaLAPACK environment using 2dimensional block cyclic mapping of the matrices onto a rectangular grid of processes. Real performance results are presented which verify that our parallel algorithms are reliable and scalable. Keywords: Sylvester matrix equation, continuoustime, Bartels–Stewart
A Web Computing Environment for the SLICOT Library
, 2001
"... A prototype web computing environment for computations related to the design and analysis of control systems using the SLICOT software library is presented. The web interface can be accessed from a standard world wide web browser with no need for additional software installations on the local machin ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
A prototype web computing environment for computations related to the design and analysis of control systems using the SLICOT software library is presented. The web interface can be accessed from a standard world wide web browser with no need for additional software installations on the local machine. The environment provides userfriendly access to SLICOT routines where runtime options are specified by mouse clicks on appropriate buttons. Input data can be entered directly into the web interface by the user or uploaded from a local computer in a standard text format or in Matlab binary format. Output data is presented in the web browser window and possible to download in a number of different formats, including Matlab binary. The environment is ideal for testing the SLICOT software before performing a software installation or for performing a limited number of computations. It is also highly recommended for education as it is easy to use, and basically selfexplanatory, with the users' guide integrated in the user interface.
Combining Explicit and Recursive Blocking for Solving Triangular SylvesterType Matrix Equations on Distributed Memory Platforms
 In M. Danelutto, D. Laforenza, M. Vanneschi (EDS.): EuroPar 2004, Lecture Notes in Computer Science
, 2004
"... Abstract. Parallel ScaLAPACKstyle hybrid algorithms for solving the triangular continuoustime Sylvester (SYCT) equation AX − XB = C using recursive blocked node solvers from the novel highperformance library RECSY are presented. We compare our new hybrid algorithms with parallel implementations b ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Parallel ScaLAPACKstyle hybrid algorithms for solving the triangular continuoustime Sylvester (SYCT) equation AX − XB = C using recursive blocked node solvers from the novel highperformance library RECSY are presented. We compare our new hybrid algorithms with parallel implementations based on the SYCT solver DTRSYL from LAPACK. Experiments show that the RECSY solvers can significantly improve on the serial as well as on the parallel performance if the problem data is partitioned and distributed in an appropriate way. Examples include cutting down the execution time by 47 % and 34 % when solving largescale problems using two different communication schemes in the parallel algorithm and distributing the matrices with blocking factors four times larger than normally. The recursive blocking is automatic for solving subsystems of the global explicit blocked algorithm on the nodes. Keywords: Sylvester matrix equation, continuoustime, Bartels–Stewart
Towards an Accurate Performance Modeling of Parallel Sparse
 LU Factorization, in &quot;Applicable Algebra in Engineering, Communication, and Computing
, 2006
"... We present a simulationbased performance model to analyze a parallel sparse LU factorization algorithm on modern cachedbased, highend parallel architectures. We consider supernodal rightlooking parallel factorization on a bidimensional grid of processors, that uses static pivoting. Our model ch ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
We present a simulationbased performance model to analyze a parallel sparse LU factorization algorithm on modern cachedbased, highend parallel architectures. We consider supernodal rightlooking parallel factorization on a bidimensional grid of processors, that uses static pivoting. Our model characterizes the algorithmic behavior by taking into account the underlying processor speed, memory system performance, as well as the interconnect speed. The model is validated using the implementation in the SuperLU DIST linear system solver, the sparse matrices from real application, and an IBM POWER3 parallel machine. Our modeling methodology can be adapted to study performance of other types of sparse factorizations, such as Cholesky or QR, and on different parallel machines. 1
Design and Evaluation of a Top100 Linux Super Cluster System
 EXPER
, 2003
"... The HPC2N Super Cluster is a truly selfmade highperformance Linux cluster with 240 AMD processors in 120 dual nodes, interconnected with a highbandwidth, lowlatency SCI network. This contribution describes the hardware selected for the system, the work needed to build it, important software ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
The HPC2N Super Cluster is a truly selfmade highperformance Linux cluster with 240 AMD processors in 120 dual nodes, interconnected with a highbandwidth, lowlatency SCI network. This contribution describes the hardware selected for the system, the work needed to build it, important software issues, and an extensive performance analysis. The performance is evaluated using a number of stateoftheart benchmarks and software, including STREAM, Pallas MPI, the Atlas DGEMM, High Performance Linpack, and NAS Parallel benchmarks. Using these benchmarks we first determine the raw memory bandwidth and network characteristics; the practical peak performance of a single CPU, a single dualnode, and the complete 240processor system; and investigate the parallel performance for nonoptimized dustydeck Fortran applications. In summary, this $500K system is extremely costeffective and shows the performance one would expect of a largescale supercomputing system with distributed memory architecture. According to the TOP500 list of June 2002, this cluster was the 94th fastest computer in the world. It is now fully operational and stable as the main computing facility at HPC2N. The
Parallel triangular Sylvestertype matrix equation solvers for SMP systems using recursive blocking
 in Applied Parallel Computing. New Paradigms for HPC in Industry and Academia
, 2001
"... ..."
(Show Context)
SUMMARY
"... The High Performance Computing Center North (HPC2N) Super Cluster is a truly selfmade highperformance Linux cluster with 240 AMD processors in 120 dual nodes, interconnected with a highbandwidth, lowlatency SCI network. This contribution describes the hardware selected for the system, the work n ..."
Abstract
 Add to MetaCart
(Show Context)
The High Performance Computing Center North (HPC2N) Super Cluster is a truly selfmade highperformance Linux cluster with 240 AMD processors in 120 dual nodes, interconnected with a highbandwidth, lowlatency SCI network. This contribution describes the hardware selected for the system, the work needed to build it, important software issues and an extensive performance analysis. The performance is evaluated using a number of stateoftheart benchmarks and software, including STREAM, Pallas MPI, the Atlas DGEMM, HighPerformance Linpack and NAS Parallel benchmarks. Using these benchmarks we first determine the raw memory bandwidth and network characteristics; the practical peak performance of a single CPU, a single dualnode and the complete 240processor system; and investigate the parallel performance for nonoptimized dustydeck Fortran applications. In summary, this $500 000 system is extremely costeffective and shows the performance one would expect of a largescale supercomputing system with distributed memory architecture. According to the TOP500 list of June 2002, this cluster was the 94th fastest computer in the world. It is now fully operational and stable as the main computing facility at HPC2N. The system’s utilization figures exceed 90%, i.e. all 240 processors are on average utilized over 90 % of the time, 24 hours a day, seven days a week. Copyright c ○ 2004 John Wiley &