Accounting

This page describes what a fairshare is, how it is used within the ML Cloud to calculate priority and what some of the terms actually entail.

Fairsharing and Job Accounting

Fairshare allows past resource utilization information to be taken into account into job feasibility and priority decisions to ensure a fair allocation of the computational resources between the all ML Cloud users. Fairshare allows those users who have not fully used their resource grant to get higher priority for their jobs on the cluster, while making sure that those groups that have used more than their resource grant do not overuse the cluster.

There are several existing fairshare algorithms, but at the ML Cloud we use Fair Tree, which is done through a a rooted plane tree. Some observations include: 1. All users from a higher priority account receive a higher fair share factor than all users from a lower priority account. 2. New jobs are immediately assigned a priority.

Definitions

Shares: This is an integer value that symbolizes the portion of the computing resource that has been promised to the account or user.
Usage: an integer value that symbolizes how much the account or user has consumed from the computing resources.
Usage Unit: Integer value that is the result of: TRES-sec - TRES - resources that the job requested, and seconds is the time measured in seconds that the job used from start to finish, not considering time waiting for execution, we can call it real usage.
Priority: : Integer value that ranges from 0 to 4294967295. The larger the value, the sooner the job will be scheduled for execution.
Fairshare: Floating point number between 0.0 and 1.0 that reflects the shares of a computing resource that a user has been allocated and the amount of computing resources the user’s jobs have consumed. For Fairshare calculation, the terms RawShares and Shares are equivalent, RawUsage and Usage are also equivalent.

To understand and determine how much a job will cost your fairshare account, please see Job cost

Your Fairshare Score Meaning

1.0: Unused. The account has not run any jobs recently.
1.0 > f > 0.5: Underutilization. The Group/User is underutilizing their granted Share. If for instance f=0.75 the underutilization of share of resources is 1:2.
0.5: Average utilization. On average a user/group is using exactly as much as their granted Share.
0.5 > f > 0: Over-utilization. The Group has overused their granted Share. For instance, f=0.25 shows that the group has overutilized their Share of the resources 2:1.
0: No share left. The Group has vastly overused their granted Share. If there is no contention for resources, the jobs will still start.

Of course, the usage of the ML Cloud varies and the scheduler does not prevent groups/users from using more than their granted Share. The schedule will want to fill idle cycles, so it will run whatever jobs it has submitted. In this case, we can consider the group/user essentially "borrows" compute resource time in the future to be used now. This naturally drives down the user/group fairshare score, but still allows jobs to start. At some point another group with a higher fairshare score will start submitting jobs and those group jobs will have a higher priority because they have not used their granted Share.

Note

If there are two members of a given group, and if one of those users has run many jobs under that group, the job priority of a job submitted by the user who has not run any jobs will be negatively affected. This ensures that the combined usage charged to a group matches the portion of the cluster that is allocated to that group.

Shares

On the ML Cloud each user is associated by default to a group reflecting its direct supervisor within the institution. You may have other account associations, but only through the main one a user is granted shares. These Shares determine how much of the cluster that group has been granted. Users when running a job are "charged" for their runs against the group they belong to.

Priority computation

The way the priority is computed for a job depends on another parameter which is called PriorityType. In the ML Cluster the priority type is multifactor. The priority then depends on five elements: * Job age: how long the job has been waiting in the queue; * User fairshare: a measure of past usage of the cluster by the user; * Job size: the number of CPUs a job requests; * Partition: the partition to which a job is submitted, specified with the --partition submission parameter; * QOS: a quality of service associated with the job, specified with the --qos submission parameter.

Note that the job age parameter is bounded so that priority stops increasing when the bound is attained. The job size parameter can be configured to favor small or large jobs.