Known Issues HPC3
Below is a list issues that we're actively working on. We hope to have these resolved soon. This is intended to be a temporary page.
For differences between the new platforms and Mahuika, see the more permanent differences from Mahuika.
Access¶
OnDemand Apps¶
-
Firefox Browser will fail to render the HPC Shell Access app correctly. Please switch to a Chrome or Safari browser until the vendor provides a fix.
-
The resources dedicated to interactive work via a web browser are smaller, and so computations requiring large amounts of memory or many CPU cores are not yet supported.
-
Slurm
sbatchjobs can be submitted directly from your apps, such as the terminal in Jupyterlab, RSudio or code-server. However, interactive jobs (srunorsalloc) can only run from theClusters > NeSI HPC SHell Accessdropdown menu which opens a standard terminal window in the browser. Watch a demo here. -
Missing user Namespaces in Kubernetes pods will interfere with most Apptainer operations. One can run
apptainer pullcommand,apptainer exec,run,shellcommands can not be executed.
Software¶
-
FileSender - If you modify the
default_transfer_days_validparameter in your~/.filesender/filesender.py.inito > 20 it will cause the transfer to fail with a 500 error code. Please do not modify this parameter. -
Legacy Code - As was already the case on the Milan nodes in Mahuika (where they had a Rocky 8 OS), some of our environment modules cause system software to stop working, e.g: load
module load Perlandsvnstops working. This is usually the case if they loadLegacySystem/7as a dependency. The solutions are to ask us to re-build the problem environment module, or just don't have it loaded while doing other things.- MPI software using 2020 or earlier toolchains eg intel-2020a, may not work correctly across nodes. Trying with more recent toolchains is recommended eg intel-2022a.
Please let us know if you find any additional problems.
Slurm¶
GPUs¶
If you request a GPU without specifying which type of GPU, you will get a random one. So please always specify a GPU type.
BadConstraints¶
This uninformative message can appear as a reason for a job pending in the squeue output when the job is submitted to both milan and genoa partitions (which is the default behaviour). It does not appear to reflect a real problem though, just a side-effect of the mechanism we are using to target jobs to the right-sized node(s).