Skip to content

System Architecture

All Galvani nodes run Rocky 8.8 and resource allocation is managed using Slurm (resource manager). Global storage is presently provided by Quobyte file system ($HOME and $WORK); however, Quobyte is undergoing decommissioning, and all users will be moved to our new NVMe-based Lustre parallel file system. Inter-node communication is provided by an Ethernet network. Galvani also has Ceph archival storage available for users using the S3 protocol.

The Galvani cluster is composed of 2 login nodes, 3 CPU-only nodes, 28 2080ti GPU nodes, and 26 A100 nodes, housed in 16 air-cooled racks.

Login Nodes

Galvani's two login nodes 134.2.168.43 and 134.2.168.114 have slightly different configurations:

Feature Login Node 134.2.168.43 Login Node 134.2.168.114
CPUs: 2 x Intel Xeon Gold 16 cores, 2.9GHz 8 cores
RAM: 1536GB (2933 MT/s) DDR4
Local Storage: 960GB SSD 250GB SSD

CPU Nodes

The 3 CPU-only compute nodes have the following hardware:

Feature Specifications
CPUs: 2 x Intel Xeon Gold 16 cores, 2.9GHz
RAM: 1536GB (2933 MT/s) DDR4
Local Storage: 960GB SSD
Theoretical Peak Performance: TBD

GPU Compute Nodes

In addition to the nodes listed below, a limited quantity of 9xA100 nodes will be added in the near future.

Feature Specifications Specifications Specifications
Total Nodes: 28 21 TBD
Accelerators: 8 Nvidia 2080ti / node 8 Nvidia A100 / node 9 Nvidia A100 / node
Accelerator connect: PCIe PCIe PCIe
CUDA Parallel Processing Cores: 4352 / card 3,456 (FP64), 6,912 (FP32) / card 3,456 (FP64), 6,912 (FP32) / card
NVIDIA Tensor Cores: 544 / card 432 / card 432 / card
GPU Memory: 11GB GDDR6 (memory bus: 352 bit) / card 40GB HBM2 / card 40GB HBM2 / card
CPUs: 36 cores: 2 x Intel Xeon Gold 6240, 18 cores/die, 2.6GHz 32 cores: 2 x AMD EPYC 7302, 16 cores, 3.0GHz 128 cores: 2 x AMD EPYC 7742, 64 cores/die, 2.25GHz
RAM: 384GB (3200 MT/s) DDR4 1TB (3200 MT/s) DDR4 2TB (3200 MT/s) DDR4
Local Storage: 1.92TB 3.84TB 30.72TB
Theoretical Peak Performance: 266.79 TFLOPs/node 196.54 TFLOPs/node 221.11 TFLOPS/node

Network

The Galvani Cluster uses an Ethernet fat-tree non-blocking topology. Most compute nodes have a 40Gb/s HCA. Storage is connected using 100Gb/s HCAs.


Last update: September 9, 2024
Created: September 9, 2024