How to test software on the parts of Dardel that have Slingshot 11
NOTE: This information is not valid for Scania. UPDATE: Downtime for upgrades starting 27 March!
Research groups will need to use recompiled application software when running jobs on Dardel nodes with the faster Slingshot 11 interconnect and the Strawberry software stack. This is how to test and run software on those nodes.
There are plans for significant updates to Dardel in the second half of March. The interconnect for the nodes in the old part of the Dardel system will be updated from Slingshot 10 to a newer faster version of the high-speed network called Slingshot 11. New system software called Strawberry is required to use the Slingshot 11 network. Please note that the recently installed second part of the Dardel system already runs Slingshot 11 and Strawberry and hence will not be upgraded in February.
These changes mean that all application software on Dardel will need to be recompiled on a partition that is running Slingshot 11 and Strawberry. The most used general software that has been installed on Dardel by PDC will be recompiled and tested with Slingshot 11 and Strawberry by PDC staff. Many application packages have already been installed and tested at the time of writing. Any research groups that have installed their own application software on Dardel will need to recompile their software with Slingshot 11 and Strawberry to be able to use that software after these upgrades. All research groups are advised to log in to the new Slingshot 11 partition right away to test the new environment. Groups can also start compiling their software on the new Slingshot 11 and Strawberry partition now.
This short guide describes how to log in, compile and test application software on the partition of Dardel that is using Slingshot 11 and Strawberry. This will need to be done by all research groups that usually compile and execute their own software on the system. All research groups are also advised to test any general software on Dardel that they use to ensure that both the re-installation and compilation have been done and that the software works properly when the group members use it.
- Log in to the dardel-login-2.pdc.kth.se login node in the usual way.
All the usual Dardel directories and your files should be available.
Also, all the regular Dardel allocations should exist.
If that is not the case, please contact support@pdc.kth.se . - Select a job that your group has successfully run on the first phase of Dardel.
- Load the module PDCTEST/22.06 (ml PDCTEST/22.06) and any other Dardel modules that are required to execute the job.
If you need a particular module that is now missing on Dardel, please contact support@pdc.kth.se .
It should be possible to compile and run software in the same way that was done on the first phase of Dardel. - The name of the partition for these tests is s11-tst. At this point, there are nodes with 256 and 512 GB memory in this partition, which can be selected with the -mem flag. Submit the job in the s11-tst partition by adding -p s11-tst in the job script. The allocation (-A …) should not be changed. A sample job script is shown below.
Check if the software executes correctly.
Note the speed at which the software executes as compared to when it was run previously on the Slingshot 10 partition.
Please contact support@pdc.kth.se
- if any problems are encountered when compiling or running the test job,
- if the compiled software does not execute correctly, or
- if the job executes slowly compared to when it was run on the first phase of Dardel before these upgrades.