Accounting

To ensure that all research groups receive equitable access to the cluster and to account for variations in the hardware being utilized, we leverage Slurm's integrated job accounting and fairshare system. On the ML Cloud each user is associated with their primary group. This group belongs to a primary associated organization. For Slurm, the group is called an Account. Users belong to Accounts, and Accounts have Shares granted to them. These Shares determine how much of the cluster that group has been granted.

Shares granted an Account come in three types that are summed together:

Common Pool (which is 5% of the investment from the institutions);
the Institutional Investment (primary group association) made towards the ML Cloud;
Investments made by individual groups by financially contributing to the ML Cloud;

Thus the total Share an Account has is simply the addition of all of these types of Share. This Share is global to all clusters of the ML Cloud.

What is charged/accounted for?

Currently, we charge for:

the fraction of the compute node used with CPU, GPU, and Memory usage using Slurm’s Trackable RESources (TRES). These TRES charge backs vary from partition to partition.
any VMs (through Openstack) running by a user;
any TB of storage used by group;

Resource Prices

Your Group is charged for a TB used on a monthly basis. We take into account that the first 1TB of Nvme storage is free of charge for each group.

STORAGE	TB/month
CEPH	€2.00
Weka	€80.00
Lustre	€44.00
QB	€12.00

For computing Resources:

Type	Node Cost/H	GPU Cost/H
CPU	€1.57	-
2080ti Node	€2.84	€0.35
V100 (4gpus)	€3.90	€0.98
V100 (8gpus)	€7.57	€0.95
A100 (8gpus)	€8.51	€1.06
A100 (9gpus)	€11.26	€1.25
H100 (Intel)	€27.00	€3.37
H100 (Genoa)	€29.57	€3.70
CPU (Genoa)	€3.52	-

Fairshare Score

$ sshare --account=mladm -a 
Account                    User  RawShares  NormShares    RawUsage  EffectvUsage  FairShare 
-------------------- ---------- ---------- ----------- ----------- ------------- ---------- 
mladm                                70980    0.033815        3988      0.000002            
 mladm               jjmdermann          5    0.125000           0      0.000000   0.897638 
 mladm                   mfa118          5    0.125000           0      0.000000   0.889764 
 mladm                   mfa608          5    0.125000          27      0.006997   0.885827 
 mladm                   mfa624          5    0.125000           0      0.000000   0.897638 
 mladm                   mfa718          5    0.125000           0      0.000000   0.897638 
 mladm                   mfa887          5    0.125000           0      0.000000   0.887795 
 mladm                   mfa912          5    0.125000        3960      0.993003   0.883858 
 mladm                   mfa954          5    0.125000           0      0.000000   0.897638

The first line of the sshare output shows the summary for the whole group, while the subsequent lines show the information for each user. The mladm group has been granted 70980 shares. From those raw shares, we substract any storage and VM usage.

Important

VM usage is calculated according to the above prices for resources on an hourly basis, so having a VM with 2 GPUs for over a month might hit the group fairshare significantly.

Each user of that group has a RawShare of parent, this means that all the users pull from the total Share of the Account and do not have their own individual subShares of the Account Share.

RawUsage is the amount of TRES-sec the Account/User has used. This RawUsage is also attenuated by the halflife that is set for the clusters, which is currently 3 days. Thus work done in the last 3 days counts at full cost, work done 6 days ago costs half, work done 9 days ago one fourth, and so on. So RawUsage is the aggregate of the Account’s past usage with this halflife weighting factor.

The next column after RawShares is NormShares. NormShares is simply the Account’s RawShares divided by the total number of RawShares given out to all Accounts on the cluster.

The next column is EffectvUsage. EffectvUsage is the Account’s RawUsage divided by the total RawUsage for the clusters. Thus EffectvUsage is the percentage of the clusters the Account has actually used. In this case, the mladm has used 0.000002 of the cluster

Finally, we have the Fairshare score. From this one can see that there are five basic regimes for this score which are as follows:

1.0: Unused. The Account has not run any jobs recently.
1.0 > f > 0.5: Underutilization. The Account is underutilizing their granted Share. For example, when f=0.75 a group has recently underutilized their Share of the resources 1:2.
0.5: Average utilization. The Account on average is using exactly as much as their granted Share.
0.5 > f > 0: Over-utilization. The Account has overused their granted Share. For example, when f=0.25 a group has recently overutilized their Share of the resources 2:1
0: No share left. The Account has vastly overused their granted Share. If there is no contention for resources, the jobs will still start.

Since the usage of the clusters varies, the schedule does not stop Accounts from using more than their granted Share. Instead, the scheduler wants to fill idle cycles, so it will take whatever jobs it has available.

Restrictions

In order to provide an equitable usage of the ML Cloud resources, we have instituted some limits on the number of jobs per user can submit, accrue priority for and run simultaneously. Those restrictions are:

MaxJobs=40
MaxJobsAccrue=40
MaxSubmit=100

Note

We will be tweaking those restrictions throughout the year in order to improve the utilization of the cluster during high and lower demand periods.

Last update: January 17, 2025
Created: June 21, 2024