NCCL Test

NCCL are the optimized primitives for inter-GPU communication. NCCL (pronounced "Nickel") is a stand-alone library of standard communication routines for GPUs, implementing all-reduce, all-gather, reduce, broadcast, reduce-scatter, as well as any send/receive based communication pattern. It has been optimized to achieve high bandwidth on platforms using PCIe, NVLink, NVswitch, as well as networking using InfiniBand Verbs or TCP/IP sockets. NCCL supports an arbitrary number of GPUs installed in a single node or across multiple nodes, and can be used in either single- or multi-process (e.g., MPI) applications.

These tests check both the performance and the correctness of NCCL operations.

Usage

Example script : test-nccl.sh

#!/bin/bash

#SBATCH -J nccl-tst # job name
#SBATCH -o nccl-test.o%j # output and error file name (%j expands to jobID)
#SBATCH -p ada # queue L40S (partition)
#SBATCH -N 2 # total number of nodes

module load nccl-test/1.0

# Run 2 MPI processes in 2 GPUs in 2 Nodes
for i in $NCCLBUILD/*_perf; do
        FILENAME=$(basename $i)
        echo ""
        echo "Running test $FILENAME..."
        echo ""
        mpirun -np 2 $NCCLBUILD/./$FILENAME -b 8 -e 8G -f 2 -g 2
done

echo "done!!"

The informative report will be shown un the job output file.

Submit with :

sbatch --account=your_project_ID test-nccl.sh

More info :