Page Deprecated

Information you find on this page may be out of date and no longer accurate.

Slurm Native Profiling

Job resource usage can be determined on job completion by checking the following sacct columns;

MaxRSS - Peak memory usage.
TotalCPU - Check Elapsed x Alloc≈TotalCPU

However if you want to examine resource usage over the run-time of your job,
the line #SBATCH --profile task can be added to your script.

That will cause profile data to be recorded every 30 seconds throughout the job. For jobs which take much less/more than a day to run we recommend increasing/decreasing that sampling frequency, so for example when profiling a job of less than 1 hour it would be OK to sample every second by adding #SBATCH --acctg-freq=1, and for a week long job the rate should be reduced to once every 5 minutes: #SBATCH --acctg-freq=300.

On completion of your job, collate the data into an HDF5 file using sh5util -j <jobid>, this will collect the results from the nodes where your job ran and write into an HDF5 file named: job_<jobid>.h5

You can plot the contents of this file with the command nn_profile_plot job_<jobid>.h5, this will generate a file named job_<jobid>_profile.png.

Alternatively you could use one of the following scripts.

Any GPU usage will also be recorded in the profile, so long as the process was executed via srun.