Skip to content

Announcements

Problem: 2080ti card suddenly disappeared / nvidia-smi errors

Sometimes, a 2080Ti GPU will have a hardware problem and the GPU will crash. This manifests as a job failing, or nvidia-smi returning errors and/or not showing your GPU. This also can manifest as an Nvidia Xid=79 error.

This occurs unfortunately frequently with this hardware. The only way to tackle the issue is for you to cancel your job and rerun the job on another node. In the meantime, the ML Cloud Team will take care of the problematic card and return the node to service.

Problem: I failed to deploy my SSH public key in time / my old SSH key is no longer accessible

Sometimes, users fail to copy their SSH public key within the initial window provided when setting up their account, or they no longer have access to their previous SSH private key. Thus, they cannot access the server.

To resolve this, please file a support ticket requesting an additional day of password access. Use this new window to upload your SSH public key.

If your SSH private key has been compromised, please notify the ML Cloud Team immediately via a support ticket.

Problem: My disk quota is full.

Disk space is a shared-and-finite resource among all cluster users. Adding quota to one user reduces the amount available to other users. Thus, decisions on quota allocation must balance fairness for all users. Should it be necessary, please direct requests for additional quota to the support ticketing system, but understand there is a high probability that your quota extension will be denied.

Problem: I am experiencing login issues

One common reason for a user experiencing login issues is maxed-out quota on $HOME. All $HOME directories have a very low quota by design.

conda by default places its pkgs storage on `$HOME/.conda. To move this to $WORK, run the following command twice to route pkg downloads to $WORK:

conda config --add pkgs_dirs $WORK/.conda/pkgs/

If you need the space, you may then mv $HOME/.conda/pkgs $WORK/.conda.

Please do NOT store input data, results or models on $HOME. Please never let your jobs write to $HOME. If you run out of quota on $HOME, please file a support ticket with the ML Cloud Team so we can temporarily resolve the issue.

Changing user shell

If you wish to change your shell from the default (bash) to zsh, please open a support ticket and request this. We do not support all shells currently; if you wish to use a different shell, please enquire for more details.

I don't see a specific package, why?

The current strategy of the ML Cloud Team is that we only install required packages. Thus, if you have a particular need, please file a support ticket requesting the package and we will contact you to discuss further.

Problem with your conda

If you see an error such as this one below:

Collecting package metadata (repodata.json): failed

 >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<

    Traceback (most recent call last):
      File "/usr/lib/python3.9/site-packages/conda/core/subdir_data.py", line 387, in _load
        raw_repodata_str = fetch_repodata_remote_request(
      File "/usr/lib/python3.9/site-packages/conda/core/subdir_data.py", line 858, in fetch_repodata_remote_request
        raise Response304ContentUnchanged()
    conda.core.subdir_data.Response304ContentUnchanged

    During handling of the above exception, another exception occurred:

.....

try removing ~/.condarc from your directory and try again.

Strange wget error

If you encounter strange wget errors simply delete your ~/.wget-hsts which will solve the issue.


Last update: February 27, 2026
Created: June 21, 2024