Available Storage in Galvani

This page describes how Galvani deals with data storage. Right now the storage solution for Galvani and Galvani is the same.

Note

We are decommissioning QB in 2024 with the storage being all NVMe Lustre. Beegfs will also be decommissioned.

On Storage Availability and Backups

Please note that storage space and services on Galvani is provided as is. We cannot guarantee high availability and back-up of any of our storage space, which refers to all storage on the ML Cloud.

Storage Quotas

Storage is a shared and limited resource and in a number of places we need to enforce quota to avoid that some script accidentally fills up the disk and the system becomes unusable for everybody.

Storage quota is specified in:

  • Space limit: affects the aggregated size of all your files or files of a group. When this limit is reached you or the group cannot store more data (new data or increasing file sizes) on the system.

Quota applies to specific folders

Often it is intended that storage quota applies to a specific folder on the file system. For example, the so-called HOME quota shall apply to your home folder $HOME.

Quota applies to snapshots

It is important to remember that the file system snapshots that are created impact your storage quota. This means that if you truncate and then delete a file, the reduce in quota will not be visible immediately, but it will have a delayed effect of about 14 days.

QB Storage

The main storage backend for your data in Galvani is a Quobyte storage cluster. The only things that are not stored on the backend are the root disks of VMs and bare-metal machines as wells as ephemeral disks of VMs.

The following table summarizes the different storage options:

Table 2.1 ML Cloud File Systems For Galvani

File System Quota Disc Type Filelocks Key Features
$HOME 20GB, ### files SSD No implicit filelocks Not intended for parallel or high-intensity file operations.
NOT intended to store input data, results or models.
NOT intended for usage to write jobs on HOME since such process might max out your quota and lead to inability to login.
$WORK Initial Quota: 8TB SSD No implicit filelocks Volume WORK can hold ~50 mln files.
The environment variable is set to /mnt/qb/work/YOUR_GROUP/YOUR_UID for each user.
This can be used to store input data and results with the quota limits in mind.
$WORK2 part of WORK quota SSD No implicit filelocks WORK2 is a different volume, but still part of your WORK quota
$WORK3 part of WORK quota SSD No implicit filelocks WORK3 is a different volume, but still part of your WORK quota
$SCRATCH - SSD No implicit filelocks should be used to access local, temporary storage on compute nodes.
It is set to the folder /scratch_local/$SLURM_JOB_USER-$SLURM_JOBID.
Please note that this folder and everything in it will be deleted after the job finishes.
/mnt/qb/YOUR_GROUP 25TB HDD No implicit filelocks shared HDD volume for each group
/mnt/qb/datasets - SSD No implicit filelocks Accessible to all users. Read only on GPU nodes and writable for staging on login and cpu-only nodes.


Note

The separation ofWORK, WORK2, WORK3 in different volumes (while all three volumes comprise your group quota) was to spread the volumes for rebalancing purposes for the metadata.

QB Quotas and Freeing Space

The shared network volumes which provide $HOME, $WORK and /mnt/qb/datasets have user or group quotas in Galvani. You can query them with qinfo quota if your current working directory is part of those volumes.

There are two ways to delete/remove files and free quota:

  • rm file: although this command will romove the file, the quota size decrease will be reflected with a delay of app. 2 weeks if files were present at the latest QB metadata snapshot.
  • To free quota instantaneously use:
: > file
rm file

for a single file. For deleting a directory recursively use:

find DIRECTORY -type f ! -size 0c | parallel -X --progress truncate -s0
rm -rf DIRECTORY
  • using mv file /ouside/of/volume/ might not free quota instantaneously. Instead use:
cp file /ouside/of/volume/
: > file
rm file

Note

Truncating any files within your conda environments, dirs and packages most likely corrupts your environments and might required you to rebuild them from scratch.