Useful Slurm Commands

About Slurm

The ML Cloud can be accessed through a dedicated set of login nodes used to write and compile applications as well as to perform pre- and post-processing of simulation data. Access to the compute nodes in the system is controlled by the workload manager.

On the ML Cloud the Slurm (Simple Linux Utility for Resource Management) Workload Manager, a free open-source resource manager and batch system, is employed. Slurm is a modern, extensible batch system that is widely deployed around the world on clusters of various sizes.

This page describes how you can run jobs and what to consider when choosing SLURM parameters. You submit a job with its resource request using SLURM, SLURM allocates resources and runs the job, and you receive the results back. There are interactive modes available. Slurm:

It allocates exclusive or non-exclusive access to the resources (compute nodes) to users during a limited amount of time so that they can perform they work
It provides a framework for starting, executing and monitoring work
It arbitrates contention for resources by managing a queue of pending work.
It permits to schedule jobs for users on the cluster resource

Slurm Partitions

In Slurm multiple nodes can be grouped into partitions which are sets of nodes aggregated by shared characteristics or objectives, with associated limits for wall-clock time, job size, etc. These limits are hard limits for the jobs and can not be overruled. In practice, these partitions can be used by the user to signal a need for resources that have certain hardware characteristics (cpu-only, large memory, GPU type) or that are dedicated to specific workloads (large production jobs, small debugging jobs, interactive, etc.). ML Cloud has implemented several production queues.

See Galvani's Partitions

Preemptable partitions

These are cheap partitions where jobs will only cost 25% compared to their non-preemptable counterparts. But in these partitions your job may be canceled or requeued if a job in a non-preemptable partition requires resources. Preempted jobs are requeued by default. When a job is requeued, the batch script is initiated from its beginning. If you do not desire this please set the --no-requeue sbatch option in your job submission scripts. If you set this option your job will just be cancelled.

Job submission options

There are several useful environment variables set be Slurm within an allocated job. The most important ones are detailed in the below table which summarizes the main job submission options offered with {sbatch | srun | salloc}.

You can pass options using either the command line or job script; most users find that the job script is the easier approach. Slurm directives begin with #SBATCH; most have a short form (e.g. -N) and a long form (e.g. --nodes).

Option	Argument	Comments
`--partition`	queue_name	Submits to queue (partition) designated by queue_name
`-nodes`	nodes	Define, how many nodes you need.
`--ntasks`	tasks	Number of tasks
`--gres`	gpu:N	request GPUs type and number of resouces
`--job-name`	job_name	Give your job a name, so you can recognize it in the queue overview
`--output`	output_file	Direct job standard output to output_file (without `-e` option error goes to this file. Make sure this is not on `$HOME`
`--error`	error_file	Direct job error output to error_file. Make sure this is not on `$HOME`
`--time`	D-HH:MM	Wall clock time for job.
`--mem`	#G	Memory pool for all cores (see also `--mem-per-cpu`)
`--mail-type`	`BEGIN`,`END`,`FAIL`,`ALL`	Specify when user notifications are to be sent (one option per line).
`--mail-user`	user@uni-tuebingen.de	Email to which notifications will be sent

By default, Slurm writes all console output to a file named slurm-%j.out, where %j is the numerical job ID. To specify a different filename use the -o option. To save stdout (standard out) and stderr (standard error) to separate files, specify both -o and -e.

Common SLURM commands

Command	Function
`sbatch`	Submit a batch job script. The command exits immediately when the script is transferred to the Slurm controller daemon and assigned a Slurm job ID.
`sacct`	Used to query past jobs.
`squeue`	Print table of submitted jobs and their state.
`sinfo`	Provide overview of cluster status.
`scancel`	Cancel a job prior to its completion.
`seff jobid`	Reports the computational efficiency of your calculations.
`sacctmgr`	with `show associations user=username` find out what account(s) your usesrname is associated with.
`scontrol show partitions`	detailed information about all available partitions and their definition/limits.
`sprio -w`	The weights for the prioritization can be found by running the `sprio -w` command.
`sshare`	The command shows how many shares your group has as well as your fairshare value

Monitor Your Job

Once submitted, the job will be queued for some time, depending on how many jobs are presently submitted. Eventually, more or less after previously submitted jobs have completed, the job will be started on one or more of the nodes determined by its resource requirements. The status of the job can be queried with the squeue command.

Option	Description
`-a`	Display information for all jobs.
`-j jobid`	Display information for the specified job ID.
`-j jobid -o %all`	Display all information fields (with a vertical bar separating each field) for the specified job ID.
`-l`	Display information in long format.
`-n job_name`	Display information for the specified job name.
`-t state_list`	Display jobs that have the specified state(s). Valid jobs states include `PENDING`, `RUNNING`, `SUSPENDED`, `COMPLETED`, `CANCELLED`, `FAILED`, `TIMEOUT`, `NODE_FAIL`, `PREEMPTED`, `BOOT_FAIL`, `DEADLINE`, `OUT_OF_MEMORY`, `COMPLETING`, `CONFIGURING`, `RESIZING`, `REVOKED`, and `SPECIAL_EXIT`.
`-u username`	Display jobs owned by the specified user.

For example, to see pending jobs for a particular user:

squeue -u mfa608 -t PENDING

You can use sacct to get details of a previously run job:

sacct -j 15370

or

sacct -j 15370 --format JobID,JobName,Partition,Account,AllocCPUS,State,ExitCode,NodeList

Cluster status

The command sinfo provides status information of nodes in various partitions (also the associated time limit for each partition). The default partition is marked with an "*". This information can be useful in deciding where to submit your job. Status codes (abbreviated form) are explained below:

Status Code	Description
`alloc`	The node has been allocated to one or more jobs.
`mix`	The node has some of its CPUs ALLOCATED while others are IDLE.
`idle`	The node is not allocated to any jobs and is available for use.
`down`	The node is down and unavailable for use.
`drain`	The node is unavailable for use per system administrator request. (for maintenance etc.)
`drng`	The node is being drained but is still running a user job. The node will be marked as drained right after the user job is finished. Do not worry if you have a job running on a node with this state.

Last update: June 28, 2024
Created: June 21, 2024