Hands-On: Your First ML Job on the Cluster
Goal: Run a semantic segmentation model (DeepLabV3) that labels every pixel in a photograph — using both Conda and Singularity. Same script, two environments.
0 — Directory Convention
| Path | Purpose | Backed up? |
|---|---|---|
$HOME |
Scripts, configs, small files | No |
$WORK |
Datasets, model caches, results | No |
Rule of thumb: Code lives in $HOME, data lives in $WORK.
1 — Get an Interactive GPU Session
# Adjust partition/account to your allocation
srun --partition=2080-galvani --gres=gpu:1 --mem=16G --time=02:00:00 --reservation=hands-on --pty bash
Verify you landed on a GPU node:
nvidia-smi
2 — Prepare the container and files
mkdir -p $WORK/ml-tutorial
cd $WORK/ml-tutorial
rsync -a --progress /mnt/lustre/datasets/hands-on/ $WORK/ml-tutorial
4A — Run with Conda
One-time setup (ALREADY DONE!)
# Install Miniconda (if you haven't already)
wget -q https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p $WORK/miniconda3
eval "$($WORK/miniconda3/bin/conda shell.bash hook)"
# Create environment — use pip for PyTorch (ships self-contained CUDA/MKL libs)
conda create -y -n segdemo python=3.11 numpy pillow
conda activate segdemo
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
Run
eval "$($HOME/miniconda3/bin/conda shell.bash hook)"
conda activate segdemo
# Cache model weights on $WORK to avoid filling $HOME
export TORCH_HOME=$WORK/.cache/torch
python $WORK/ml-tutorial/segment.py $WORK/ml-tutorial/input.jpg $WORK/ml-tutorial/results
Expected output (first run downloads ~40 MB, then):
Device: cuda
Loading DeepLabV3-MobileNetV3-Large …
Model ready in 2.3s
Running inference …
Inference done in 0.034s
Detected classes:
background 42.1%
bicycle 1.8%
car 11.3%
person 18.7%
...
Saved: /path/to/results/input_segmentation.png
Saved: /path/to/results/input_overlay.png
Clean up
conda deactivate
4B — Run with Singularity (NGC Container)
One-time setup
# Pull an NGC PyTorch container (includes torchvision)
# Store the .sif on $WORK — it's ~6 GB
singularity pull $WORK/ml-tutorial/pytorch_24.01.sif \
docker://nvcr.io/nvidia/pytorch:24.01-py3
This only needs to happen once. The
.sifis reusable.
Run
singularity exec --nv \
--bind $WORK:$WORK \
--bind $HOME:$HOME \
--env TORCH_HOME=$WORK/.cache/torch \
$WORK/ml-tutorial/pytorch_24.01.sif \
python $WORK/ml-tutorial/segment.py $WORK/ml-tutorial/input.jpg $WORK/ml-tutorial/results
| Flag | Purpose |
|---|---|
--nv |
Exposes host NVIDIA drivers/GPUs inside the container |
--bind |
Makes host directories visible inside the container |
--env |
Sets environment variables inside the container |
You should see identical output to the Conda run above.
5 — Inspect Results
Open the result images (e.g. copy them via scp, rsync to your local machine):
input_segmentation.png— every pixel coloured by class (person = dark green, car = dark blue, bicycle = green, …)input_overlay.png— the original photo blended with the segmentation map
6 — Quick Comparison
| Conda | Singularity | |
|---|---|---|
| Setup effort | Create env, install packages | Pull one container image |
| Reproducibility | Depends on conda solver; | Exact image hash; fully reproducible |
| Disk usage | ~3 GB (env) | ~9 GB (.sif) |
| Startup time | Fast (native) | Slightly slower (container init) |
| Best for | Rapid prototyping, custom stacks | Production runs, shared pipelines |
7 — What to Try Next
- Use your own image:
python $WORK/ml-tutorial/segment.py $WORK/my_photo.jpg - Batch job: Wrap the
python …call in ansbatchscript for non-interactive runs. - Different model: Swap
deeplabv3_mobilenet_v3_largefordeeplabv3_resnet101— larger model, better accuracy, same interface.
Cheat Sheet
# Interactive GPU session
srun --partition=<PARTITION> --gres=gpu:1 --mem=16G --time=01:00:00 --pty bash
# Conda
conda activate segdemo
export TORCH_HOME=$WORK/.cache/torch
python $WORK/ml-tutorial/segment.py $WORK/ml-tutorial/input.jpg $WORK/ml-tutorial/results
# Singularity
singularity exec --nv --bind $WORK:$WORK --bind $HOME:$HOME \
--env TORCH_HOME=$WORK/.cache/torch \
$WORK/pytorch_24.01.sif \
python $WORK/ml-tutorial/segment.py $WORK/ml-tutorial/input.jpg $WORK/ml-tutorial/results
Last update:
March 6, 2026
Created: March 5, 2026
Created: March 5, 2026