Archival Storage in the ML Cloud

Ceph S3 is accessible now!

ML Cloud now has data backup and cold storage available in both Region 1 and Region 2. This data volume uses the Ceph SDS and is accessed via S3 protocols.
The following describes what is available and how to use it.

There are two separate S3 zones, Region 1 and Region 2, that do not replicate data (by default, only bucket names are replicated).
If you need replication of your data, please open a ticket.

Access to the Ceph backup drives is through the S3 protocol. S3 access is via OpenStack EC2 credentials only.
You have to create an EC2 credentials with openstack cli (but first check if you already have one - explaind after this part):

source ./openrc
openstack ec2 credentials create
| Access                           | Secret                           | Project ID                       | User ID                          |
| c3v463ddvg53jbi6zw7ltf0nt70o04ck | hk47szblvm6ods4741gx1wcryng0r02d | vllt6e3mhc15d6bqy7j99j23iumeeda6 | 363wgaw6wtzbel628i7wlt81m7qorxlt |

# or if you are member of multiple projects:
openstack ec2 credentials create --project the_project_name

To run openstack cli command, you can install OpenStackClient in your laptop or use login nodes.
You can create as many EC2 credentials as you need, but on the dashboard you see only the last one. To list all, run the following:

openstack ec2 credentials list

As mentioned before if you already have a credential then you can just get it (and even renew it) from the dashboard:
Go to the Openstack dashboard and try to sign in with your main MLCloud account and the ldap domain:

Open Stack

Upon sign in, you will find three pieces of information you will need: your Openstack S3 credentials - your access key and your secret key - as well as your project name. Note these down for later:



Access S3

Examples here use s5cmd client. You can also use aws client.

First, you need to create your credentials file.
To do this, run the following commands on the login nodes, substituting your access key and secret key from the OpenStack dashboard you accessed before:

 mkdir $HOME/.aws/
 echo "[mlcloud]
 aws_secret_access_key=hk47szblvm6ods4741gx1wcryng0r02d" >> $HOME/.aws/credentials
 chmod 600 $HOME/.aws/credentials

This will create your credentials file on your home directory.
If you are using Linux on your personal computer, you can also use the same procedure to access the S3 from outside the cluster.

Depend on where you are accessing S3, you can choose coresponding enpoint:

Inside the cluster:

Outside the cluster:

In S3, you work with buckets, analogous to a drive or other logical volume. There is only one bucket available to you:

  • Your group's shared bucket, which you access at s3://$GROUP_NAME, where $GROUP_NAME is the string before the - in project name above; for example, aiproj0 in aiproj0-aaa000.
    Every member of the group can access this bucket (list the contents or downlod an object). But each member only can upload to s3://$GROUP_NAME/$THEIR_USER_NAME/ which in our example is aaa000 - the second part ot the project name.

Now, assuming your are using a node in Region 1 (note the endpopint), after you've set up your credentials, you can run the following example commands (using our example project aiproj0-aaa000):


If it's your first time accessing S3, run an initial request against REGION ONE endpoit in order the proccess of user creation automatically starts for you.

You can move files to and from the group's shared bucket like this - if you have a file called myfile.tar and you want to copy it into the my_directory in group's shared bucket:

  • s5cmd --endpoint-url --profile mlcloud cp myfile.tar s3://aiproj0/aaa000/my_directory/

  • s5cmd --endpoint-url --profile mlcloud cp s3://aiproj0/aaa000/my_directory/myfile.tar ./

You can also see what files are in the group's shared bucket - note the trailing / when querying a directory, and, if there are no files in a directory or bucket when you ls it, it will return ERROR: [...]: no object found:

  • s5cmd --endpoint-url --profile mlcloud ls s3://aiproj0

  • s5cmd --endpoint-url --profile mlcloud ls s3://aiproj0/aaa000/my_directory/

Please remember that group's shared bucket is a shared bucket and every member is allowed to read the content.

External access to your S3 buckets works the same way as internal access, only substitute the "outside the cluster" URL above and ensure you have both the s5cmd command and the $HOME/.aws/credentials file set up properly. (we assume you can handle that on your own, should you choose this path)

Details of your S3 buckets: Quota, etc.

  • The quota of shared group bucket in each region is 5TB data and/or 500K objects. In order to not hit the number-of-objects quota, please upload objects in a tar archive or equivalent where possible.

  • Versioning is disabled by default on the shared group bucket; to enable versioning, please file a ticket.
    Note that with each change to an object, you create a new version, which counts against the 5TB data and 500K object limit. Furthermore, the admins of each group have the full control over shared group bucket's policies, and they can adjust it as they wish.

  • Publish of shared group bucket's objects publicly is restricted; to do so, please file a ticket.

  • With the default policies/permissions, all members of your group have access to all objects in the shared group bucket. If the policies/permissions needs to be customized, please file a ticket.

Other useful tips

Should you need other commands on the cluster (e.g. s3cmd, mc), you can install them in your personal python-venv or conda environment.

Sometime commands like aws needs --region tuebingen. e.g "mb - make bucket"

  • aws --endpoint-url --profile mlcloud s3 mb s3://my_personal_bucket --region tuebingen

You can also mount a bucket as a volume, replacing $BUCKET_NAME with the bucket you want to acccess:

s3fs -o url= -o profile=mlcloud -o use_path_request_style $BUCKET_NAME ./my-mount-point

# and unmount it via
fusermount -u ./my-mount-point