Storage
Legacy File Systems
We are completely decommissioning the Quobyte file system in 2024 and replacing it with all-NVMe Lustre.
Backups
Warning
Perform regular backups of your data to a safe location. The ML Cloud does not perform data backups.
File System Access Points
File System | Quota | Key Features |
---|---|---|
$HOME |
21GB | Quobyte (QB) file system. Low-intensity file operations only. Not backed up. |
$WORK |
8.8TB | Quobyte (QB) file system. Standard computational operations. Not backed up. |
/mnt/lustre/work |
3TB | Lustre File System, default stripe 11; 1MB stripe size. Not backed up. |
$SCRATCH on nodes |
none | Local to the compute node. Data in $SCRATCH is not shared across nodes. Purged regularly. |
The $SCRATCH
File System
The $SCRATCH
file system, as its name indicates, is a temporary storage space.
Files that have not been accessed in ten days are subject to purge without warning.
Assume that $SCRATCH is purged when your compute job finishes, so only use it as a temporary workspace.
Best Practices on Lustre
Lustre is a high-performance distributed cluster file system. Lustre is designed to appear as a single file system to the end user. However, this means that a few command-line operations are inefficient on Lustre compared to single-drive file systems.
Avoid accessing attributes of files and directories
Try to use the lfs
-style Lustre-based commands where possible to access attributes of files and directories, such as lfs find
or lfs df
instead of find
or df
.
In Lustre, accessing metadata information including file attributes (i.a. type, ownership, protection, size, dates) is resource intensive. These commands can degrade the filesystem performance, especially when performed frequently or over large directories.
- Avoid using commands such as
ls -R
,find
,locate
,du
,df
and similar. These commands walk the filesystem recursively and/or perform heavy metadata operations. Because they are very intensive on accessing filesystem metadata, these commands can degrade badly the overall file system performance. If walking the filesystem recursively is absolutely required, then use the Lustre-optimizedlfs find
instead of find and similar tool. To minimize the number of Lustre RPC calls, whenever possible use thelfs
commands instead of the system provided commands:
lfs df
instead of df
lfs find
instead of find
Avoid having a large number of files in a single directory
When a file is accessed, Lustre places a lock on the parent directory. When many files in the same directory are to be opened this creates contention. Writing thousands of files to a single directory produces massive load on Lustre metadata servers, often resulting on taking filesystems offline. Accessing a single directory containing thousands of files can cause heavy resource contention degrading the filesystem performance.
One alternative is to organize the data into multiple sub-directories and split the files across them.
A common approach is to use the square root of the number of files, for instance for 90000 files the square root would be 300, therefore 300 directories should be created containing 300 files each.
Best is to reorganize your data in files such as webdata, tar files. For experts, when data is read-only, another alternative is to create a disk image and mount it read-only through loopback in each cluster node as described in Handling Data Tutorial. Container tools such as singularity can also enable the use of loopback mounted disk images.
File Size: Use Large Files
Lustre performance is dependent on the size of the files being accessed. A read operation that accesses many small files will be slower than one accessing a few large files, because the metadata for each file needs to be read individually. Therefore, for optimal performance, try using file formats that containerize/package data within a large single file that includes the ability to perform direct subfile access.
Lustre Architecture & Striping, In Brief
The Lustre file system looks and acts like a single logical hard disk, but is actually a sophisticated integrated system involving hundreds of physical drives. Lustre stripes large files over several physical disks by breaking it into chunks and storing it on multiple physical drives, making it possible to deliver the high performance needed to service input/output (I/O) requests from hundreds of users across thousands of nodes. Lustre is managed from a MGS (Management Server) that interfaces to where the data and metadata are stored. There are two types of Lustre storage nodes in our configuration: MDTs and OSTs. Object Storage Targets (OSTs) manage the file system's data space: a file with 11 stripes, for example, is distributed across 11 OSTs for storage. Metadata Targets (MDT) track the OSTs assigned to a file, as well as storing the file system's descriptive metadata space. The Lustre MGS, MDTs, and OSTs work together to provide optimal filesystem services for client nodes.
Available Datasets
Some commonly used datasets have been provided for the users:
Dataset Name | Location |
---|---|
ImageNet-ffcv |
/mnt/lustre/datasets/ImageNet-ffcv |
CLEVR_v1.0 |
/mnt/lustre/datasets/CLEVR_v1.0 |
coco |
/mnt/lustre/datasets/coco |
Falcor3D_down128 |
/mnt/lustre/datasets/Falcor3D_down128 |
ffcv_imagenet_data |
/mnt/lustre/datasets/ffcv_imagenet_data |
imagenet-styletransfer |
/mnt/qb/datasets/imagenet-styletransfer |
kitti |
/mnt/lustre/datasets/kitti |
laion400m |
/mnt/lustre/datasets/laion400m |
laion_aesthetics |
/mnt/lustre/datasets/laion_aesthetics |
ModelNet40 |
/mnt/lustre/datasets/ModelNet40 |
NMR_Dataset |
/mnt/lustre/datasets/NMR_Dataset |
stl10_binary |
/mnt/lustre/datasets/stl10_binary |
WeatherBench |
/mnt/qb/datasets/WeatherBench |
PUG Dataset |
/mnt/lustre/datasets/PUG |
C4 (en, noclean) |
/mnt/lustre/datasets/c4 |
synthclip |
/mnt/lustre/datasets/SynthCLIP |
gobjaverse (tar version) |
/mnt/lustre/datasets/gobjaverse |
mlcommons |
/mnt/lustre/datasets/mlcommons |
objaverse |
/mnt/lustre/datasets/objaverse |
ImageNet-C |
/mnt/lustre/datasets/ImageNet-C |
Imagenet2012 |
/mnt/lustre/datasets/ImageNet2012 |
Imagenet-r |
/mnt/lustre/datasets/imagenet-r |
Imagenet-r |
/mnt/lustre/datasets/imagenet-r |
Do You Want A New Dataset?
If you would like an additional dataset installed for general use, please use the following form and/or contact us though the ticketing system.
Datasets on Compute Nodes
We have also deployed some commonly used datasets locally on compute nodes for faster job I/O. The current list is:
Dataset | Location |
---|---|
Imagenet-c | $SCRATCH/datasets/ImageNet-C |
Imagenet2012 | $SCRATCH/datasets/ImageNet2012 |
Imagenet-r | $SCRATCH/datasets/imagenet-r |
Imagenet-ffcv | $SCRATCH/datasets/Imagenet-ffcv |
QB Quotas and Freeing Space
The shared network volumes which provide $HOME
, $WORK
have user
or group
quotas in Galvani. You can query them with qinfo quota
if your current working directory is part of those volumes.
There are two ways to delete/remove files and free quota:
rm file
: although this command will romove the file, the quota size decrease will be reflected with a delay of app. 2 weeks if files were present at the latest QB metadata snapshot.- To free quota instantaneously use:
: > file
rm file
for a single file. For deleting a directory recursively use:
find DIRECTORY -type f ! -size 0c | parallel -X --progress truncate -s0
rm -rf DIRECTORY
- using mv file
/ouside/of/volume/
might not free quota instantaneously. Instead use:
cp file /ouside/of/volume/
: > file
rm file
Note
Truncating any files within your conda environments, dirs, and packages will probably corrupt your Python environments, requiring you to rebuild them from scratch.
CEPH S3
Ceph provides large amounts of slow storage for archival purposes, also known as cold storage. Ceph is accessible through the S3 protocol, which means you must have both S3 credentials and an S3 client.
How to get S3 Credentials (through web interface)
You can generate your credentials through OpenStack by signing into with your ML Cloud credentials.
Upon signing in, go to Project / API Access, as shown, and click the "View Credentials" button on the right hand side:
Note down the colored fields: in this screenshot, the Project Name is in blue, the EC2 Access Key is in green, and the EC2 Secret Key is in yellow. These together are your S3 credentials for your personal Ceph access. Do not share these credentials with anyone!
How to get S3 Credentials (through command line)
You may also create EC2 credentials with the openstack
command-line interface as follows:
source ./openrc
openstack ec2 credentials create
+----------------------------------+----------------------------------+----------------------------------+----------------------------------+
| Access | Secret | Project ID | User ID |
+----------------------------------+----------------------------------+----------------------------------+----------------------------------+
| c3v463ddvg53jbi6zw7ltf0nt70o04ck | hk47szblvm6ods4741gx1wcryng0r02d | vllt6e3mhc15d6bqy7j99j23iumeeda6 | 363wgaw6wtzbel628i7wlt81m7qorxlt |
+----------------------------------+----------------------------------+----------------------------------+----------------------------------+
openstack ec2 credentials create --project Project_Name
S3 client: Inside Galvani
Inside the Galvani cluster, you can use the aws
or s5cmd
clients.s5cmd
is potentially faster but has fewer features.
To use the aws
client, you must first create a credential file.
On the login nodes, run the following set of commands, substituting your EC2 access key and EC2 secret key from the OpenStack dashboard you accessed before:
mkdir $HOME/.aws/
echo "[mlcloud]
aws_access_key_id=EC2_ACCESS_KEY
aws_secret_access_key=EC2_SECRET_KEY" >> $HOME/.aws/credentials
chmod 600 $HOME/.aws/credentials
Within the cluster, the Ceph S3 address is:http://192.168.213.10:7480
(also known as endpoints).
S3 client: On your own computer
On your own computer, if you are running Linux, you can access Ceph using the command-line AWS-S3 client aws
or awscli
, depending on your operating system.
The credential setup is the same, but the Ceph S3 address is instead: https://mlcloud.uni-tuebingen.de:7443
.
For Windows users, if you've already installed WinSCP, its S3 connection functionality is described here, and, for both Windows and Mac users, a variety of other S3 clients are available (for example, Cyberduck).
Storing data on Ceph
Data on Ceph is stored in buckets
- the S3 term for remote folders. The buckets for ML Cloud users correspond to projects - remember the Project Name mentioned above?
The project name has two parts separated by a dash, e.g. mladm0-mfa555. The first part - e.g. mladm0 - is your bucket name, corresponding to the research project in question. Everyone with access to the same bucket can access the data in that bucket. The second part is your user name for that bucket.
You can read data from the bucket at s3://BUCKET_NAME
and you can write data to the bucket at s3://BUCKET_NAME/USER_NAME
- you can make subdirectories, as well - for example, s3://BUCKET_NAME/USER_NAME/directory_name/
Warning
Remember, all users in the same project can access files in the same bucket. Never ever ever store credentials (including SSH private keys) in an S3 bucket!
The first time ever that you access your group's bucket, you must initialize it with the following command from inside the cluster (remember to set up your credential on the login node first) - this command also shows the contents of your upload folder:
aws --endpoint-url http://192.168.213.10:7480 --profile mlcloud s3 ls s3://BUCKET_NAME/USER_NAME/
To move local data (e.g. ~/localdata.file
) to your S3 Ceph bucket in Galvani, you'd run the following within the cluster:
aws --endpoint-url http://192.168.213.10:7480 --profile mlcloud s3 cp ~/localdata.file s3://BUCKET_NAME/USER_NAME/my_directory/
And this from your own computer (using aws
):
aws --endpoint-url https://mlcloud.uni-tuebingen.de:7443 --profile mlcloud s3 cp ~/localdata.file s3://BUCKET_NAME/USER_NAME/my_directory/
To move data (e.g. ceph_data.file
) from your Ceph S3 bucket in to a compute node you're logged into:
aws --endpoint-url http://192.168.213.10:7480 --profile mlcloud s3 cp s3://BUCKET_NAME/USER_NAME/my_directory/ceph_data.file ./
And this from your own computer (using aws
):
aws --endpoint-url https://mlcloud.uni-tuebingen.de:7443 --profile mlcloud s3 cp s3://BUCKET_NAME/USER_NAME/my_directory/ceph_data.file ./
It is technically possible to use s3fs
to mount an S3 bucket directly, but we do not recommend this, as it's 5x slower on typical data and has problems with large files and POSIX operations generally.
Publishing data on Ceph for the world to see
It is possible to publish an S3 bucket for the whole world to access and see. To do this, please contact ML Cloud support - we would create a new bucket for the purpose.
Created: June 21, 2024