User Guides
Access the Cluster

Get started by creating an account, logging in and using the system.

System Architecture

Lean about the ML Cloud Infrastructure

Containers and the ML Cloud
Learn how to use containers within the ML Cloud
Maintenance Schedule
Issue: Openstack Maintenance - Volumes, Snapshots, Backup Affected

Date: 25 August 2023 to 30 August 2023
Description: Since we are switching Openstack volume backend to Ceph in Region R1, there will be a down-time for the VMs with attached volumes that have not been detached, volumes, snapshots and backups. This is necessitated due to the preparation to retire QB within the next months.

Issue: R1 External Connectivity down (ZDV)

Date: 05 September 2023
Duration: 7AM to 8AM

Description: Due to ZDV networking maintenance window, access to the following services will not be possible: Slurm R1, Openstack, Gitlab, Nextcloud, Portal, Ticketing system.

Training and Events
ML Cloud Town Hall

Date: TBD
Description: Learn about infrastructure expansion, experimental hardware..

Profiling and Energy
Date: TBD
Description: Learn about profiling your code for optimization.
Date: TBD
Description: Multi-node training and optimization with H100