llama.cpp

llama.cpp is an open source software library that performs inference on various large language models such as Llama. It is co-developed alongside the GGML project, a general-purpose tensor library. Inference of Meta's LLaMA model (and others) in pure C/C++. Command-line tools are included with the library.

Usage

llama.cpp has been compiled with GPU support, it is available for GPU queues ADA and Hopper.

Available llama.cpp versions:

llama.cpp v1 (b4234)
llama.cpp v1 (b4706)

Example script : test-llamacpp-hpc.sh

#!/bin/bash

#SBATCH -J llamacpp-gpu-test
#SBATCH -e llamacpp-test%j.err
#SBATCH -o llamacpp-test%j.msg
#SBATCH -p rasmia_ada # queue (partition)
#SBATCH --nodelist=agpul08
#SBATCH --gres=gpu:1

#module load llama.cpp/b4234
module load llama.cpp/b4706

BASE_PATH=/fs/agustina/$(whoami)/llamacpp
MODELS_PATH=$BASE_PATH/models

mkdir -p $MODELS_PATH

if [ -f "$MODELS_PATH/tiny-vicuna-1b.q5_k_m.gguf" ]; then
  echo "Model tiny-vicuna-1b.q5_k_m.gguf exists"
else
  echo "Model tiny-vicuna-1b.q5_k_m.gguf does not exist, downloading now..."

  # model download
  wget https://huggingface.co/afrideva/Tiny-Vicuna-1B-GGUF/resolve/main/tiny-vicuna-1b.q5_k_m.gguf -P $MODELS_PATH
fi

mkdir -p $BASE_PATH/answers

PROMPT='I think the meaning of life is'

# use llama.cpp client
$LLAMACPP_BIN/./llama-cli --help
$LLAMACPP_BIN/./llama-cli --version
$LLAMACPP_BIN/./llama-cli --list-devices
$LLAMACPP_BIN/./llama-cli -m $MODELS_PATH/tiny-vicuna-1b.q5_k_m.gguf \
                          -p "$PROMPT" \
                          --n-predict 128 \
                          --n-gpu-layers -1 \
                          -dev CUDA0 > $BASE_PATH/answers/llama-cli-answer.txt

cat $BASE_PATH/answers/llama-cli-answer.txt

# list llama.cpp directory to find out more tools
ls -al $LLAMACPP_BIN

echo "DONE!"

Submit with :

sbatch --account=your_project_ID test-llamacpp-hpc.sh

More info :

https://github.com/ggerganov/llama.cpp

Documentacion BIFI

llama.cpp

Usage