User Guides
Access the Cluster

Get started by creating an account, logging in and using the system.

System Architecture

Lean about the ML Cloud Infrastructure

Containers and the ML Cloud

Learn how to use containers within the ML Cloud

Maintenance Schedule
Issue: ZDV changes switches affecting external connectivity temporarily

Date: 16 May 2024 13:30
Description: ZDV needs to change swithces in TTR2, requiring a temporarily lost of external connectivity of the ML Cloud while they move the cables. Should be relatively quick.

Change: QB in R1 will be retired and substituted with >1PB NVMe storage

Date: Easter holiday 2024
Duration: Easter holiday (details to come)

Description: QB was an adventure. Now it is time to retire and bring our new NVMe based storage with over 1PB available capacity. Delivery is expected in february, we will install, test throughout march and during the easter holidays we will retire QB, move all data to the new storage and make the system available to the entire userbase.

Training and Events
ML Cloud Town Hall

Date: October 30st, 2024, 13:00-14:00PM
Description: Learn about the ML Cloud Infrastructure, Expansion Plans, Software Services. Ask questions about the ML Cloud of interest to you.

Profiling and Energy
Date: TBD
Description: Learn about profiling your code for optimization.
TBD
Date: TBD
Description: Multi-node training and optimization with H100