Skip to content

Storage

Legacy File Systems

We are completely decommissioning the Quobyte file system in 2024 and replacing it with all-NVMe Lustre.

Backups

Warning

Perform regular backups of your data to a safe location. The ML Cloud does not perform data backups.

File System Access Points

File System Quota Key Features
$HOME 21GB Quobyte (QB) file system. Low-intensity file operations only. Not backed up.
$WORK 8.8TB Quobyte (QB) file system. Standard computational operations. Not backed up.
/mnt/lustre/work 3TB Lustre File System, default stripe 11; 1MB stripe size. Not backed up.
$SCRATCH on nodes none Local to the compute node. Data in $SCRATCH is not shared across nodes. Purged regularly.

The $SCRATCH File System

The $SCRATCH file system, as its name indicates, is a temporary storage space. Files that have not been accessed in ten days are subject to purge without warning. Assume that $SCRATCH is purged when your compute job finishes, so only use it as a temporary workspace.

Best Practices on Lustre

Lustre is a high-performance distributed cluster file system. Lustre is designed to appear as a single file system to the end user. However, this means that a few command-line operations are inefficient on Lustre compared to single-drive file systems.

Avoid accessing attributes of files and directories

Try to use the lfs-style Lustre-based commands where possible to access attributes of files and directories, such as lfs find or lfs df instead of find or df.

In Lustre, accessing metadata information including file attributes (i.a. type, ownership, protection, size, dates) is resource intensive. These commands can degrade the filesystem performance, especially when performed frequently or over large directories.

  • Avoid using commands such as ls -R, find, locate, du, df and similar. These commands walk the filesystem recursively and/or perform heavy metadata operations. Because they are very intensive on accessing filesystem metadata, these commands can degrade badly the overall file system performance. If walking the filesystem recursively is absolutely required, then use the Lustre-optimized lfs findinstead of find and similar tool. To minimize the number of Lustre RPC calls, whenever possible use the lfs commands instead of the system provided commands:

lfs df instead of df

lfs find instead of find

Avoid having a large number of files in a single directory

When a file is accessed, Lustre places a lock on the parent directory. When many files in the same directory are to be opened this creates contention. Writing thousands of files to a single directory produces massive load on Lustre metadata servers, often resulting on taking filesystems offline. Accessing a single directory containing thousands of files can cause heavy resource contention degrading the filesystem performance.

One alternative is to organize the data into multiple sub-directories and split the files across them.
A common approach is to use the square root of the number of files, for instance for 90000 files the square root would be 300, therefore 300 directories should be created containing 300 files each.

Best is to reorganize your data in files such as webdata, tar files. For experts, when data is read-only, another alternative is to create a disk image and mount it read-only through loopback in each cluster node as described in Handling Data Tutorial. Container tools such as singularity can also enable the use of loopback mounted disk images.

File Size: Use Large Files

Lustre performance is dependent on the size of the files being accessed. A read operation that accesses many small files will be slower than one accessing a few large files, because the metadata for each file needs to be read individually. Therefore, for optimal performance, try using file formats that containerize/package data within a large single file that includes the ability to perform direct subfile access.

Lustre Architecture & Striping, In Brief

The Lustre file system looks and acts like a single logical hard disk, but is actually a sophisticated integrated system involving hundreds of physical drives. Lustre stripes large files over several physical disks by breaking it into chunks and storing it on multiple physical drives, making it possible to deliver the high performance needed to service input/output (I/O) requests from hundreds of users across thousands of nodes. Lustre is managed from a MGS (Management Server) that interfaces to where the data and metadata are stored. There are two types of Lustre storage nodes in our configuration: MDTs and OSTs. Object Storage Targets (OSTs) manage the file system's data space: a file with 11 stripes, for example, is distributed across 11 OSTs for storage. Metadata Targets (MDT) track the OSTs assigned to a file, as well as storing the file system's descriptive metadata space. The Lustre MGS, MDTs, and OSTs work together to provide optimal filesystem services for client nodes.

Available Datasets

Some commonly used datasets have been provided for the users:

Dataset Name Location
ImageNet-ffcv /mnt/lustre/datasets/ImageNet-ffcv
CLEVR_v1.0 /mnt/lustre/datasets/CLEVR_v1.0
coco /mnt/lustre/datasets/coco
Falcor3D_down128 /mnt/lustre/datasets/Falcor3D_down128
ffcv_imagenet_data /mnt/lustre/datasets/ffcv_imagenet_data
imagenet-styletransfer /mnt/qb/datasets/imagenet-styletransfer
kitti /mnt/lustre/datasets/kitti
laion400m /mnt/lustre/datasets/laion400m
laion_aesthetics /mnt/lustre/datasets/laion_aesthetics
ModelNet40 /mnt/lustre/datasets/ModelNet40
NMR_Dataset /mnt/lustre/datasets/NMR_Dataset
stl10_binary /mnt/lustre/datasets/stl10_binary
WeatherBench /mnt/qb/datasets/WeatherBench
PUG Dataset /mnt/lustre/datasets/PUG
C4 (en, noclean) /mnt/lustre/datasets/c4
synthclip /mnt/lustre/datasets/SynthCLIP
gobjaverse (tar version) /mnt/lustre/datasets/gobjaverse
mlcommons /mnt/lustre/datasets/mlcommons
objaverse /mnt/lustre/datasets/objaverse
ImageNet-C /mnt/lustre/datasets/ImageNet-C
Imagenet2012 /mnt/lustre/datasets/ImageNet2012
Imagenet-r /mnt/lustre/datasets/imagenet-r
Imagenet-r /mnt/lustre/datasets/imagenet-r

Do You Want A New Dataset?

If you would like an additional dataset installed for general use, please use the following form and/or contact us though the ticketing system.

Datasets on Compute Nodes

We have also deployed some commonly used datasets locally on compute nodes for faster job I/O. The current list is:

Dataset Location
Imagenet-c $SCRATCH/datasets/ImageNet-C
Imagenet2012 $SCRATCH/datasets/ImageNet2012
Imagenet-r $SCRATCH/datasets/imagenet-r
Imagenet-ffcv $SCRATCH/datasets/Imagenet-ffcv

QB Quotas and Freeing Space

The shared network volumes which provide $HOME, $WORK have user or group quotas in Galvani. You can query them with qinfo quota if your current working directory is part of those volumes.

There are two ways to delete/remove files and free quota:

  • rm file: although this command will romove the file, the quota size decrease will be reflected with a delay of app. 2 weeks if files were present at the latest QB metadata snapshot.
  • To free quota instantaneously use:
: > file
rm file

for a single file. For deleting a directory recursively use:

find DIRECTORY -type f ! -size 0c | parallel -X --progress truncate -s0
rm -rf DIRECTORY
  • using mv file /ouside/of/volume/ might not free quota instantaneously. Instead use:
cp file /ouside/of/volume/
: > file
rm file

Note

Truncating any files within your conda environments, dirs, and packages will probably corrupt your Python environments, requiring you to rebuild them from scratch.

CEPH S3

Ceph provides large amounts of slow storage for archival purposes, also known as cold storage. Ceph is accessible through the S3 protocol, which means you must have both S3 credentials and an S3 client.

How to get S3 Credentials (through web interface)

You can generate your credentials through OpenStack by signing into with your ML Cloud credentials.

OpenStack

Upon signing in, go to Project / API Access, as shown, and click the "View Credentials" button on the right hand side:

Ceph1

Ceph2

Note down the colored fields: in this screenshot, the Project Name is in blue, the EC2 Access Key is in green, and the EC2 Secret Key is in yellow. These together are your S3 credentials for your personal Ceph access. Do not share these credentials with anyone!

How to get S3 Credentials (through command line)

You may also create EC2 credentials with the openstack command-line interface as follows:

source ./openrc
openstack ec2 credentials create
+----------------------------------+----------------------------------+----------------------------------+----------------------------------+
| Access                           | Secret                           | Project ID                       | User ID                          |
+----------------------------------+----------------------------------+----------------------------------+----------------------------------+
| c3v463ddvg53jbi6zw7ltf0nt70o04ck | hk47szblvm6ods4741gx1wcryng0r02d | vllt6e3mhc15d6bqy7j99j23iumeeda6 | 363wgaw6wtzbel628i7wlt81m7qorxlt |
+----------------------------------+----------------------------------+----------------------------------+----------------------------------+
Or, if you are member of multiple projects: openstack ec2 credentials create --project Project_Name

S3 client: Inside Galvani

Inside the Galvani cluster, you can use the aws or s5cmd clients.s5cmd is potentially faster but has fewer features.

To use the aws client, you must first create a credential file. On the login nodes, run the following set of commands, substituting your EC2 access key and EC2 secret key from the OpenStack dashboard you accessed before:

mkdir $HOME/.aws/
echo "[mlcloud]
aws_access_key_id=EC2_ACCESS_KEY
aws_secret_access_key=EC2_SECRET_KEY" >> $HOME/.aws/credentials
chmod 600 $HOME/.aws/credentials

Within the cluster, the Ceph S3 address is:http://192.168.213.10:7480 (also known as endpoints).

S3 client: On your own computer

On your own computer, if you are running Linux, you can access Ceph using the command-line AWS-S3 client aws or awscli, depending on your operating system.

The credential setup is the same, but the Ceph S3 address is instead: https://mlcloud.uni-tuebingen.de:7443.

For Windows users, if you've already installed WinSCP, its S3 connection functionality is described here, and, for both Windows and Mac users, a variety of other S3 clients are available (for example, Cyberduck).

Storing data on Ceph

Data on Ceph is stored in buckets - the S3 term for remote folders. The buckets for ML Cloud users correspond to projects - remember the Project Name mentioned above?

The project name has two parts separated by a dash, e.g. mladm0-mfa555. The first part - e.g. mladm0 - is your bucket name, corresponding to the research project in question. Everyone with access to the same bucket can access the data in that bucket. The second part is your user name for that bucket.

You can read data from the bucket at s3://BUCKET_NAME and you can write data to the bucket at s3://BUCKET_NAME/USER_NAME - you can make subdirectories, as well - for example, s3://BUCKET_NAME/USER_NAME/directory_name/

Warning

Remember, all users in the same project can access files in the same bucket. Never ever ever store credentials (including SSH private keys) in an S3 bucket!

The first time ever that you access your group's bucket, you must initialize it with the following command from inside the cluster (remember to set up your credential on the login node first) - this command also shows the contents of your upload folder:

aws --endpoint-url http://192.168.213.10:7480 --profile mlcloud s3 ls s3://BUCKET_NAME/USER_NAME/

To move local data (e.g. ~/localdata.file) to your S3 Ceph bucket in Galvani, you'd run the following within the cluster:

aws --endpoint-url http://192.168.213.10:7480 --profile mlcloud s3 cp ~/localdata.file s3://BUCKET_NAME/USER_NAME/my_directory/

And this from your own computer (using aws):

aws --endpoint-url https://mlcloud.uni-tuebingen.de:7443 --profile mlcloud s3 cp ~/localdata.file s3://BUCKET_NAME/USER_NAME/my_directory/

To move data (e.g. ceph_data.file) from your Ceph S3 bucket in to a compute node you're logged into:

aws --endpoint-url http://192.168.213.10:7480 --profile mlcloud s3 cp s3://BUCKET_NAME/USER_NAME/my_directory/ceph_data.file ./

And this from your own computer (using aws):

aws --endpoint-url https://mlcloud.uni-tuebingen.de:7443 --profile mlcloud s3 cp s3://BUCKET_NAME/USER_NAME/my_directory/ceph_data.file ./

It is technically possible to use s3fs to mount an S3 bucket directly, but we do not recommend this, as it's 5x slower on typical data and has problems with large files and POSIX operations generally.

Publishing data on Ceph for the world to see

It is possible to publish an S3 bucket for the whole world to access and see. To do this, please contact ML Cloud support - we would create a new bucket for the purpose.


Last update: September 9, 2024
Created: September 9, 2024