Skip to content

Overview

Welcome to the ML Cloud's User Guide.

Our User Guide covers topics such as account management, our clusters, software services, as well as a variety of tutorials for working with the ML Cloud systems.

ML Cloud Quick Info

Good Conduct

Getting an Account

Ask your PI or Group Manager to submit a request for an account for you through our system.

Good Conduct

Community Rules

The Community rules on the system ensures stability & fairness. Please read them carefully!

Maintenance Windows

Maintenance Calendar

The ML Cloud uses maintenance windows during low utilization periods of the systems.

HPC Resource Guides

Each of our Clusters has an entry in this user guide with extensive instructions and information. Learn below how to access and use our compute resources.

The Galvani Cluster

Galvani Cluster

Access and use the Galvani cluster with 2080ti & A100 GPU compute nodes.

The Ferranti Cluster

Ferranti Cluster

Access and use the Ferranti cluster with H100 GPU compute nodes.

Maintenance and Downtime

The ML Cloud Team will schedule maintenance in one of the following three manners:

  1. Rolling reboots: Whenever possible, the ML Cloud Team will apply updates and do other maintenance in a rolling fashion in such a manner as to have either no or as little impact as possible to ML Cloud services.
  2. Partial outages: The ML Cloud Team will do these as needed but in a manner that impacts only some ML Cloud services at a time.
  3. Full outages: These are outages that will affect ML Cloud Services depending on the system, such as outages of core networking services, data storage services, data centers power of cooling system maintenance or outages.

In the case of a planned downtime, a reservation will be made in the queuing system, so new jobs, that would not finish until the downtime, won’t start. A notification message will be present in the Portal System Status Page for the respective cluster, as well as mailing list notification will be sent in advance. We apologize for any inconveniences this may cause.

AI Conference Deadlines

To plan for Maintenance and Downtime, we take under consideration of the most common AI Conferences Deadlines and try to schedule maintenance time around those deadlines. You can see the current list of Conferences we track and submit new ones in this form.


Last update: October 19, 2024
Created: October 19, 2024