Overview
Welcome to the ML Cloud's User Guide.
Our User Guide covers topics such as account management, our clusters, software services, as well as a variety of tutorials for working with the ML Cloud systems.
ML Cloud Quick Info
Getting an Account
Ask your PI or Group Manager to submit a request for an account for you through our system.
Community Rules
The Community rules on the system ensures stability & fairness. Please read them carefully!
Maintenance Calendar
The ML Cloud uses maintenance windows during low utilization periods of the systems.
HPC Resource Guides
Each of our Clusters has an entry in this user guide with extensive instructions and information. Learn below how to access and use our compute resources.
Maintenance and Downtime
The ML Cloud Team will schedule maintenance in one of the following three manners:
- Rolling reboots: Whenever possible, the ML Cloud Team will apply updates and do other maintenance in a rolling fashion in such a manner as to have either no or as little impact as possible to ML Cloud services.
- Partial outages: The ML Cloud Team will do these as needed but in a manner that impacts only some ML Cloud services at a time.
- Full outages: These are outages that will affect ML Cloud Services depending on the system, such as outages of core networking services, data storage services, data centers power of cooling system maintenance or outages.
In the case of a planned downtime, a reservation will be made in the queuing system, so new jobs, that would not finish until the downtime, won’t start. A notification message will be present in the Portal System Status Page for the respective cluster, as well as mailing list notification will be sent in advance. We apologize for any inconveniences this may cause.
AI Conference Deadlines
To plan for Maintenance and Downtime, we take under consideration of the most common AI Conferences Deadlines and try to schedule maintenance time around those deadlines. You can see the current list of Conferences we track and submit new ones in this form.
Created: September 9, 2024