Computers & Networks Calendar Safety     Search PSFC

 

 
 

PSFC GPU Cluster

Overview

The cluster consists of six nodes with four nvidia gpus each.

 

Three nodes have four v100s cards and three nodes have four rtx6000 cards. The former are double precision and the latter are more appropriate for machine learning like workflows.

Requesting Access

To request access to the PSFC GPU cluster, send an email to: engaging@psfc.mit.edu with your name and PSFC username.

Software Stack

CentOS 8.2
Python 3
CUDA 10.2 (nvcc)
cuDNN
Tensorflow 2 with GPU support
PyTorch
Caffe
OpenMPI 4.0.4
PGI CUDA Fortran compiler (nvfortran)

Accessing the Cluster

Login in to gpu.psfc.mit.edu. This is the gateway node and you launch your jobs from here. Note that gpu does not have nvidia cards or compilers. You have to request an interactive job to access a node for compilation. You currently can log in directly as well but please only use that for quick compilations not running jobs.

 

The v100s nodes are: gpu-v100s-01.psfc.mit.edu, gpu-v100s-03.psfc.mit.edu and gpu-v100s-05.psfc.mit.edu.

 

The RTX6000 nodes are: gpu-rtx6000-02.psfc.mit.edu, gpu-rtx6000-04.psfc.mit.edu and gpu-rtx6000-06.psfc.mit.edu.

Job Management

SLURM is used for job management just as on engaging. An example job requesting one gpu card per node on 2 nodes:

 

   #!/bin/bash
   #
   #SBATCH --job-name=hello_world
   #SBATCH --output=hello.txt
   #SBATCH --ntasks-per-node=1
   #SBATCH --nodes=2
   #SBATCH --gres=gpu:1
   #SBATCH --time=05:00
   #SBATCH -p regular

   mpirun hostname
   mpirun nvidia-smi
   #-----------------------------

 

Note: current documentation is non-existent, so please address any questions you have on usage to: engaging@psfc.mit.edu

 

77 Massachusetts Avenue, NW16, Cambridge, MA 02139, psfc-info@mit.edu

 

massachusetts institute of technology