Ferranti Storage
File System Access Points
File System | Quota | Key Features |
---|---|---|
/weka/group/user |
GROUP Quota 0.5TB | NVMe based. Not backed up. |
/scratch_local |
none | Local to the compute node. Data in $SCRATCH is not shared across nodes. Purged regularly. |
Available Datasets
Some commonly used datasets have been provided for the users:
Dataset Name | Location |
---|---|
ImageNet-ffcv |
/weka/datasets/ImageNet-ffcv |
CLEVR_v1.0 |
/weka/datasets/CLEVR_v1.0 |
coco |
/weka/datasets/coco |
Falcor3D_down128 |
/weka/datasets/Falcor3D_down128 |
ffcv_imagenet_data |
/weka/datasets/ffcv_imagenet_data |
imagenet-styletransfer |
/mnt/qb/datasets/imagenet-styletransfer |
kitti |
/weka/datasets/kitti |
laion400m |
/weka/datasets/laion400m |
laion_aesthetics |
/weka/datasets/laion_aesthetics |
ModelNet40 |
/weka/datasets/ModelNet40 |
NMR_Dataset |
/weka/datasets/NMR_Dataset |
stl10_binary |
/weka/datasets/stl10_binary |
WeatherBench |
/mnt/qb/datasets/WeatherBench |
PUG Dataset |
/weka/datasets/PUG |
C4 (en, noclean) |
/weka/datasets/c4 |
synthclip |
/weka/datasets/SynthCLIP |
gobjaverse (tar version) |
/weka/datasets/gobjaverse |
mlcommons |
/weka/datasets/mlcommons |
objaverse |
/weka/datasets/objaverse |
ImageNet-C |
/weka/datasets/ImageNet-C |
Imagenet2012 |
/weka/datasets/ImageNet2012 |
Imagenet-r |
/weka/datasets/imagenet-r |
Imagenet-r |
/weka/datasets/imagenet-r |
Do You Want A New Dataset?
If you would like an additional dataset installed for general use, please use the following form and/or contact us though the ticketing system.
Ferranti Ceph
For each group in MLCloud we have created a ceph s3 bucket named {{group}}0
, for example for a mladm
the bucket name will be mladm0
Each user belongs to mladm
will be allowed only to put data in a sepcific bucket location, For example: User mfa624
belongs to group mladm
, will be only able to upload data to mladm0/mfa624/
.
Each bucket will have quota = 10TB.
How to access Ferranti Ceph S3 Buckets:
1. Generating access credentials for Ferranti Ceph S3 buckets
This details generating Ceph S3 access credentials for the Ferranti cluster.
On ferranti-login001
or ferranti-login002
. You must first generate a token by running the following script , It will ask you to enter your LDAP username
and LDAP password
(Your credentials that you are using to access Nextcloud) if you have forgotten your password, you may perform a password reset on Nextcloud.
/usr/share/custom-scripts/generate_token.sh
NOTE : While entering your password no characters will appear on screen.
The output command for this is your Access Key, which you will need in the next step, and will be a string like this (this is a random example, not a valid key):
s9uk2E00L8Y6O741jDV9d61h73092LPk498Miq8AA1n2nX7142z24u1D376tk4346734d63Qvf36U23n50891w60P818ze98Du5116b38E94Z00M3rM1u5mpO0z0j64PC4aD5EOb87vgQTGb1v801181G3IeY2GM286r34s09349125Sjn3x85a=
NOTE: DO NOT SHARE THIS TOKEN WITH ANYONE, BECAUSE IT CONTAINS CLUSTER LOGIN INFORMATION.
How to Interact with Ceph buckets:
There are multiple cli tools you can use for interacting with S3 buckets , two of the most popular tools are s3cmd
and awscli
commands.
How to use s3cmd
- Generate the
~/.s3cfg
configuration file:
Execute command s3cmd --configure
and answering the following prompts:
Enter new values or accept defaults in brackets with Enter.
Refer to user manual for detailed description of all options.
Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables.
Access Key
: the Token that you generated in previous step.
Secret Key
: Enter the word secret
Default Region [US]
: Press Enter
S3 Endpoint [s3.amazonaws.com]
: ferranti-s3.mlcloud.uni-tuebingen.de
DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]
: ferranti-s3.mlcloud.uni-tuebingen.de
Encryption password is used to protect your files from reading by unauthorized persons while in transfer to S3
Encryption password
: Press Enter
Path to GPG program [/usr/bin/gpg]:
Press Enter
Use HTTPS protocol [Yes]
: Press Enter
HTTP Proxy server name
: Press Enter
Test access with supplied credentials?
: y
Save settings?
: y
- Verify if your access is working fine
First, try the following command:
s3cmd ls
If this command did not output an error, then your configuration is valid.
The following are a list of useful commands for interacting with S3 buckets.
s3cmd
Examples:
- List a bucket contents:
s3cmd ls s3://<bucket>
- Upload a file to a bucket:
s3cmd put /weka/group/user/local/file/path/filename.txt s3://<bucket>/<user>/
- Download a file from a bucket to Ferranti Weka storage:
s3cmd get s3://<bucket>/<user>/<file-name> /weka/group/user/local/file/path/
How to use awscli
- Create
aws
config and credentials files, execute command
aws --profile <profile-name> configure
You can name your profile with anything , for example name it as your ldap username.
For example for a LDAP user named mfa624
, I will create profile with name mfa624
Execute command and answering the following prompts:
aws --profile mfa624 configure
AWS Access Key ID [None]
: "the Token that you generated in previous step"
AWS Secret Access Key [None]
: "" yes, add only 2 double quotations
Default region name [None]
: us-east-1
Default output format [None]
: json`
- Verify if your access is working fine
aws --profile mfa624 --endpoint-url https://ferranti-s3.mlcloud.uni-tuebingen.de s3 ls
If this command did not output an error, then your configuration is valid. replace mfa624 with your profile name
aws
Examples:
- List a bucket contents:
aws --profile <profile-name> --endpoint-url https://ferranti-s3.mlcloud.uni-tuebingen.de s3 ls s3://<bucket>
- Upload a file to a bucket:
aws --profile <profile-name> --endpoint-url https://ferranti-s3.mlcloud.uni-tuebingen.de s3 cp /weka/group/user/local/file/path/filename.txt s3://<bucket>/<user>
- Download a file from a bucket to Ferranti Weka storage:
aws --profile <profile-name> --endpoint-url https://ferranti-s3.mlcloud.uni-tuebingen.de s3 cp s3://<bucket>/<user>/<file-name> /weka/group/user/local/file/path/
Created: June 21, 2024