Available Storage in Galvani
This page describes how Galvani deals with data storage. Right now the storage solution for Galvani and Galvani is the same.
Note
We are decommissioning QB in 2024 with the storage being all NVMe Lustre. Beegfs will also be decommissioned.
On Storage Availability and Backups
Please note that storage space and services on Galvani is provided as is. We cannot guarantee high availability and back-up of any of our storage space, which refers to all storage on the ML Cloud.
Storage Quotas
Storage is a shared and limited resource and in a number of places we need to enforce quota to avoid that some script accidentally fills up the disk and the system becomes unusable for everybody.
Storage quota is specified in:
- Space limit: affects the aggregated size of all your files or files of a group. When this limit is reached you or the group cannot store more data (new data or increasing file sizes) on the system.
Quota applies to specific folders
Often it is intended that storage quota applies to a specific folder on the file system. For example, the so-called HOME
quota shall apply to your home folder $HOME
.
Quota applies to snapshots
It is important to remember that the file system snapshots that are created impact your storage quota. This means that if you truncate and then delete a file, the reduce in quota will not be visible immediately, but it will have a delayed effect of about 14 days.
QB Storage
The main storage backend for your data in Galvani is a Quobyte storage cluster. The only things that are not stored on the backend are the root disks of VMs and bare-metal machines as wells as ephemeral disks of VMs.
The following table summarizes the different storage options:
Table 2.1 ML Cloud File Systems For Galvani
File System | Quota | Disc Type | Filelocks | Key Features |
---|---|---|---|---|
$HOME |
20GB, ### files | SSD | No implicit filelocks | Not intended for parallel or high-intensity file operations. NOT intended to store input data, results or models. NOT intended for usage to write jobs on HOME since such process might max out your quota and lead to inability to login. |
$WORK |
Initial Quota: 8TB | SSD | No implicit filelocks | Volume WORK can hold ~50 mln files. The environment variable is set to /mnt/qb/work/YOUR_GROUP/YOUR_UID for each user. This can be used to store input data and results with the quota limits in mind. |
$WORK2 |
part of WORK quota |
SSD | No implicit filelocks | WORK2 is a different volume, but still part of your WORK quota |
$WORK3 |
part of WORK quota |
SSD | No implicit filelocks | WORK3 is a different volume, but still part of your WORK quota |
$SCRATCH |
- | SSD | No implicit filelocks | should be used to access local, temporary storage on compute nodes. It is set to the folder /scratch_local/$SLURM_JOB_USER-$SLURM_JOBID .Please note that this folder and everything in it will be deleted after the job finishes. |
/mnt/qb/YOUR_GROUP |
25TB | HDD | No implicit filelocks | shared HDD volume for each group |
/mnt/qb/datasets |
- | SSD | No implicit filelocks | Accessible to all users. Read only on GPU nodes and writable for staging on login and cpu-only nodes. |
Note
The separation ofWORK
, WORK2
, WORK3
in different volumes (while all three volumes comprise your group quota) was to spread the volumes for rebalancing purposes for the metadata.
QB Quotas and Freeing Space
The shared network volumes which provide $HOME
, $WORK
and /mnt/qb/datasets
have user
or group
quotas in Galvani. You can query them with qinfo quota
if your current working directory is part of those volumes.
There are two ways to delete/remove files and free quota:
rm file
: although this command will romove the file, the quota size decrease will be reflected with a delay of app. 2 weeks if files were present at the latest QB metadata snapshot.- To free quota instantaneously use:
: > file
rm file
for a single file. For deleting a directory recursively use:
find DIRECTORY -type f ! -size 0c | parallel -X --progress truncate -s0
rm -rf DIRECTORY
- using mv file
/ouside/of/volume/
might not free quota instantaneously. Instead use:
cp file /ouside/of/volume/
: > file
rm file
Note
Truncating any files within your conda environments, dirs and packages most likely corrupts your environments and might required you to rebuild them from scratch.