Skip to content

Useful Slurm Commands

About Slurm

The ML Cloud can be accessed through a dedicated set of login nodes used to write and compile applications as well as to perform pre- and post-processing of simulation data. Access to the compute nodes in the system is controlled by the workload manager.

On the ML Cloud the Slurm (Simple Linux Utility for Resource Management) Workload Manager, a free open-source resource manager and batch system, is employed. Slurm is a modern, extensible batch system that is widely deployed around the world on clusters of various sizes.

This page describes how you can run jobs and what to consider when choosing SLURM parameters. You submit a job with its resource request using SLURM, SLURM allocates resources and runs the job, and you receive the results back. There are interactive modes available. Slurm:

  • It allocates exclusive or non-exclusive access to the resources (compute nodes) to users during a limited amount of time so that they can perform they work
  • It provides a framework for starting, executing and monitoring work
  • It arbitrates contention for resources by managing a queue of pending work.
  • It permits to schedule jobs for users on the cluster resource

Slurm Partitions

In Slurm multiple nodes can be grouped into partitions which are sets of nodes aggregated by shared characteristics or objectives, with associated limits for wall-clock time, job size, etc. These limits are hard limits for the jobs and can not be overruled. In practice, these partitions can be used by the user to signal a need for resources that have certain hardware characteristics (cpu-only, large memory, GPU type) or that are dedicated to specific workloads (large production jobs, small debugging jobs, interactive, etc.). ML Cloud has implemented several production queues.

See Galvani's Partitions

Preemptable partitions

These are cheap partitions where jobs will only cost 25% compared to their non-preemptable counterparts. But in these partitions your job may be canceled or requeued if a job in a non-preemptable partition requires resources. Preempted jobs are requeued by default. When a job is requeued, the batch script is initiated from its beginning. If you do not desire this please set the --no-requeue sbatch option in your job submission scripts. If you set this option your job will just be cancelled.

Job submission options

There are several useful environment variables set be Slurm within an allocated job. The most important ones are detailed in the below table which summarizes the main job submission options offered with {sbatch | srun | salloc}.

You can pass options using either the command line or job script; most users find that the job script is the easier approach. Slurm directives begin with #SBATCH; most have a short form (e.g. -N) and a long form (e.g. --nodes).

Option Argument Comments
--partition queue_name Submits to queue (partition) designated by queue_name
-nodes nodes Define, how many nodes you need.
--ntasks tasks Number of tasks
--gres gpu:N request GPUs type and number of resouces
--job-name job_name Give your job a name, so you can recognize it in the queue overview
--output output_file Direct job standard output to output_file (without -e option error goes to this file. Make sure this is not on $HOME
--error error_file Direct job error output to error_file. Make sure this is not on $HOME
--time D-HH:MM Wall clock time for job.
--mem #G Memory pool for all cores (see also --mem-per-cpu)
--mail-type BEGIN,END,FAIL,ALL Specify when user notifications are to be sent (one option per line).
--mail-user user@uni-tuebingen.de Email to which notifications will be sent

By default, Slurm writes all console output to a file named slurm-%j.out, where %j is the numerical job ID. To specify a different filename use the -o option. To save stdout (standard out) and stderr (standard error) to separate files, specify both -o and -e.

Common SLURM commands

Command Function
sbatch Submit a batch job script. The command exits immediately when the script is transferred to the Slurm controller daemon and assigned a Slurm job ID.
sacct Used to query past jobs.
squeue Print table of submitted jobs and their state.
sinfo Provide overview of cluster status.
scancel Cancel a job prior to its completion.
seff jobid Reports the computational efficiency of your calculations.
sacctmgr with show associations user=username find out what account(s) your usesrname is associated with.
scontrol show partitions detailed information about all available partitions and their definition/limits.
sprio -w The weights for the prioritization can be found by running the sprio -w command.
sshare The command shows how many shares your group has as well as your fairshare value

Monitor Your Job

Once submitted, the job will be queued for some time, depending on how many jobs are presently submitted. Eventually, more or less after previously submitted jobs have completed, the job will be started on one or more of the nodes determined by its resource requirements. The status of the job can be queried with the squeue command.

Option Description
-a Display information for all jobs.
-j jobid Display information for the specified job ID.
-j jobid -o %all Display all information fields (with a vertical bar separating each field) for the specified job ID.
-l Display information in long format.
-n job_name Display information for the specified job name.
-t state_list Display jobs that have the specified state(s). Valid jobs states include PENDING, RUNNING, SUSPENDED, COMPLETED, CANCELLED, FAILED, TIMEOUT, NODE_FAIL, PREEMPTED, BOOT_FAIL, DEADLINE, OUT_OF_MEMORY, COMPLETING, CONFIGURING, RESIZING, REVOKED, and SPECIAL_EXIT.
-u username Display jobs owned by the specified user.


For example, to see pending jobs for a particular user:

squeue -u mfa608 -t PENDING

You can use sacct to get details of a previously run job:

sacct -j 15370

or

sacct -j 15370 --format JobID,JobName,Partition,Account,AllocCPUS,State,ExitCode,NodeList

Cluster status

The command sinfo provides status information of nodes in various partitions (also the associated time limit for each partition). The default partition is marked with an "*". This information can be useful in deciding where to submit your job. Status codes (abbreviated form) are explained below:

Status Code Description
alloc The node has been allocated to one or more jobs.
mix The node has some of its CPUs ALLOCATED while others are IDLE.
idle The node is not allocated to any jobs and is available for use.
down The node is down and unavailable for use.
drain The node is unavailable for use per system administrator request. (for maintenance etc.)
drng The node is being drained but is still running a user job. The node will be marked as drained right after the user job is finished. Do not worry if you have a job running on a node with this state.

Last update: June 28, 2024
Created: June 21, 2024