NCCL Test
NCCL are the optimized primitives for inter-GPU communication. NCCL (pronounced "Nickel") is a stand-alone library of standard communication routines for GPUs, implementing all-reduce, all-gather, reduce, broadcast, reduce-scatter, as well as any send/receive based communication pattern. It has been optimized to achieve high bandwidth on platforms using PCIe, NVLink, NVswitch, as well as networking using InfiniBand Verbs or TCP/IP sockets. NCCL supports an arbitrary number of GPUs installed in a single node or across multiple nodes, and can be used in either single- or multi-process (e.g., MPI) applications.
These tests check both the performance and the correctness of NCCL operations.
Usage
Example script : test-nccl.sh
#!/bin/bash
#SBATCH -J nccl-tst # job name
#SBATCH -o nccl-test.o%j # output and error file name (%j expands to jobID)
#SBATCH -p ada # queue L40S (partition)
#SBATCH -N 2 # total number of nodes
module load nccl-test/1.0
# Run 2 MPI processes in 2 GPUs in 2 Nodes
for i in $NCCLBUILD/*_perf; do
FILENAME=$(basename $i)
echo ""
echo "Running test $FILENAME..."
echo ""
mpirun -np 2 $NCCLBUILD/./$FILENAME -b 8 -e 8G -f 2 -g 2
done
echo "done!!"
The informative report will be shown un the job output file.
Submit with :
sbatch --account=your_project_ID test-nccl.sh
More info :