Archival Storage in the ML Cloud
Ceph S3 is accessible now!
ML Cloud now has data backup and cold storage available in both Region 1 and Region 2. This data volume uses the Ceph SDS and is accessed via S3 protocols.
The following describes what is available and how to use it.
There are two separate S3 zones, Region 1 and Region 2, that do not replicate data (by default, only bucket names are replicated).
If you need replication of your data, please open a ticket.
Access to the Ceph backup drives is through the S3 protocol. S3 access is via OpenStack EC2 credentials only.
You have to create an EC2 credentials with openstack cli (but first check if you already have one - explaind after this part):
source ./openrc
openstack ec2 credentials create
+----------------------------------+----------------------------------+----------------------------------+----------------------------------+
| Access | Secret | Project ID | User ID |
+----------------------------------+----------------------------------+----------------------------------+----------------------------------+
| c3v463ddvg53jbi6zw7ltf0nt70o04ck | hk47szblvm6ods4741gx1wcryng0r02d | vllt6e3mhc15d6bqy7j99j23iumeeda6 | 363wgaw6wtzbel628i7wlt81m7qorxlt |
+----------------------------------+----------------------------------+----------------------------------+----------------------------------+
# or if you are member of multiple projects:
openstack ec2 credentials create --project the_project_name
To run openstack cli command, you can install OpenStackClient in your laptop or use login nodes.
You can create as many EC2 credentials as you need, but on the dashboard you see only the last one. To list all, run the following:
openstack ec2 credentials list
As mentioned before if you already have a credential then you can just get it (and even renew it) from the dashboard:
Go to the Openstack dashboard and try to sign in with your main MLCloud account and the ldap
domain:
Upon sign in, you will find three pieces of information you will need: your Openstack S3 credentials - your access key and your secret key - as well as your project name. Note these down for later:
Access S3
Examples here use s5cmd client. You can also use aws client.
First, you need to create your credentials file.
To do this, run the following commands on the login nodes, substituting your access key and secret key from the OpenStack dashboard you accessed before:
mkdir $HOME/.aws/
echo "[mlcloud]
aws_access_key_id=c3v463ddvg53jbi6zw7ltf0nt70o04ck
aws_secret_access_key=hk47szblvm6ods4741gx1wcryng0r02d" >> $HOME/.aws/credentials
chmod 600 $HOME/.aws/credentials
This will create your credentials file on your home directory.
If you are using Linux on your personal computer, you can also use the same procedure to access the S3 from outside the cluster.
Depend on where you are accessing S3, you can choose coresponding enpoint:
Inside the cluster:
- Region 1 http://192.168.213.10:7480
- Region 2 http://192.168.211.10:7480
Outside the cluster:
- Region 1 https://mlcloud.uni-tuebingen.de:7443
- Region 2 https://mlcloud2.uni-tuebingen.de:7443
In S3, you work with buckets, analogous to a drive or other logical volume. There is only one bucket available to you:
- Your group's shared bucket, which you access at
s3://$GROUP_NAME
, where$GROUP_NAME
is the string before the-
in project name above; for example,aiproj0
inaiproj0-aaa000
.
Every member of the group can access this bucket (list the contents or downlod an object). But each member only can upload tos3://$GROUP_NAME/$THEIR_USER_NAME/
which in our example isaaa000
- the second part ot the project name.
Now, assuming your are using a node in Region 1 (note the endpopint), after you've set up your credentials, you can run the following example commands (using our example project aiproj0-aaa000
):
Note
If it's your first time accessing S3, run an initial request against REGION ONE endpoit in order the proccess of user creation automatically starts for you.
You can move files to and from the group's shared bucket like this - if you have a file called myfile.tar
and you want to copy it into the my_directory
in group's shared bucket:
-
s5cmd --endpoint-url http://192.168.213.10:7480 --profile mlcloud cp myfile.tar s3://aiproj0/aaa000/my_directory/
-
s5cmd --endpoint-url http://192.168.213.10:7480 --profile mlcloud cp s3://aiproj0/aaa000/my_directory/myfile.tar ./
You can also see what files are in the group's shared bucket - note the trailing /
when querying a directory, and, if there are no files in a directory or bucket when you ls
it, it will return ERROR: [...]: no object found:
-
s5cmd --endpoint-url http://192.168.213.10:7480 --profile mlcloud ls s3://aiproj0
-
s5cmd --endpoint-url http://192.168.213.10:7480 --profile mlcloud ls s3://aiproj0/aaa000/my_directory/
Please remember that group's shared bucket is a shared bucket and every member is allowed to read the content.
External access to your S3 buckets works the same way as internal access, only substitute the "outside the cluster" URL above and ensure you have both the s5cmd
command and the $HOME/.aws/credentials
file set up properly.
(we assume you can handle that on your own, should you choose this path)
Details of your S3 buckets: Quota, etc.
-
The quota of shared group bucket in each region is 5TB data and/or 500K objects. In order to not hit the number-of-objects quota, please upload objects in a tar archive or equivalent where possible.
-
Versioning is disabled by default on the shared group bucket; to enable versioning, please file a ticket.
Note that with each change to an object, you create a new version, which counts against the 5TB data and 500K object limit. Furthermore, the admins of each group have the full control over shared group bucket's policies, and they can adjust it as they wish. -
Publish of shared group bucket's objects publicly is restricted; to do so, please file a ticket.
- With the default policies/permissions, all members of your group have access to all objects in the shared group bucket. If the policies/permissions needs to be customized, please file a ticket.
Other useful tips
Should you need other commands on the cluster (e.g. s3cmd
, mc
), you can install them in your personal python-venv
or conda
environment.
Sometime commands like aws
needs --region tuebingen
. e.g "mb - make bucket"
aws --endpoint-url https://mlcloud.uni-tuebingen.de:7443 --profile mlcloud s3 mb s3://my_personal_bucket --region tuebingen
You can also mount a bucket as a volume, replacing $BUCKET_NAME
with the bucket you want to acccess:
s3fs -o url=http://192.168.213.10:7480 -o profile=mlcloud -o use_path_request_style $BUCKET_NAME ./my-mount-point
# and unmount it via
fusermount -u ./my-mount-point