System Architecture
All Galvani nodes run Rocky 8.8 and resource allocation is managed using Slurm (resource manager).
Global storage is presently provided by Quobyte file system ($HOME
and $WORK
); however, Quobyte is undergoing decommissioning, and all users will be moved to our new NVMe-based Lustre parallel file system.
Inter-node communication is provided by an Ethernet network.
Galvani also has Ceph archival storage available for users using the S3 protocol.
The Galvani cluster is composed of 2 login nodes, 3 CPU-only nodes, 28 2080ti GPU nodes, and 26 A100 nodes, housed in 16 air-cooled racks.
Login Nodes
Galvani's two login nodes 134.2.168.43
and 134.2.168.114
have slightly different configurations:
Feature | Login Node 134.2.168.43 |
Login Node 134.2.168.114 |
---|---|---|
CPUs: | 2 x Intel Xeon Gold 16 cores, 2.9GHz | 8 cores |
RAM: | 1536GB (2933 MT/s) DDR4 | |
Local Storage: | 960GB SSD | 250GB SSD |
CPU Nodes
The 3 CPU-only compute nodes have the following hardware:
Feature | Specifications |
---|---|
CPUs: | 2 x Intel Xeon Gold 16 cores, 2.9GHz |
RAM: | 1536GB (2933 MT/s) DDR4 |
Local Storage: | 960GB SSD |
Theoretical Peak Performance: | TBD |
GPU Compute Nodes
In addition to the nodes listed below, a limited quantity of 9xA100 nodes will be added in the near future.
Feature | Specifications | Specifications | Specifications |
---|---|---|---|
Total Nodes: | 28 | 21 | TBD |
Accelerators: | 8 Nvidia 2080ti / node | 8 Nvidia A100 / node | 9 Nvidia A100 / node |
Accelerator connect: | PCIe | PCIe | PCIe |
CUDA Parallel Processing Cores: | 4352 / card | 3,456 (FP64), 6,912 (FP32) / card | 3,456 (FP64), 6,912 (FP32) / card |
NVIDIA Tensor Cores: | 544 / card | 432 / card | 432 / card |
GPU Memory: | 11GB GDDR6 (memory bus: 352 bit) / card | 40GB HBM2 / card | 40GB HBM2 / card |
CPUs: | 36 cores: 2 x Intel Xeon Gold 6240, 18 cores/die, 2.6GHz | 32 cores: 2 x AMD EPYC 7302, 16 cores, 3.0GHz | 128 cores: 2 x AMD EPYC 7742, 64 cores/die, 2.25GHz |
RAM: | 384GB (3200 MT/s) DDR4 | 1TB (3200 MT/s) DDR4 | 2TB (3200 MT/s) DDR4 |
Local Storage: | 1.92TB | 3.84TB | 30.72TB |
Theoretical Peak Performance: | 266.79 TFLOPs/node | 196.54 TFLOPs/node | 221.11 TFLOPS/node |
Network
The Galvani Cluster uses an Ethernet fat-tree non-blocking topology. Most compute nodes have a 40Gb/s HCA. Storage is connected using 100Gb/s HCAs.
Created: June 21, 2024