llama.cpp
llama.cpp is an open source software library that performs inference on various large language models such as Llama. It is co-developed alongside the GGML project, a general-purpose tensor library. Inference of Meta's LLaMA model (and others) in pure C/C++. Command-line tools are included with the library.
Usage
llama.cpp has been compiled with GPU support, it is available for GPU queues ADA and Hopper.
Available ollama versions:
- llama.cpp v1 (b4234)
- llama.cpp v1 (b4706)
Example script : test-llamacpp-hpc.sh
#!/bin/bash
#SBATCH -J llamacpp-gpu-test
#SBATCH -e llamacpp-test%j.err
#SBATCH -o llamacpp-test%j.msg
#SBATCH -p ada # queue (partition)
#SBATCH --nodelist=agpul08
#SBATCH --gres=gpu:1
#module load llama.cpp/b4234
module load llama.cpp/b4706
BASE_PATH=/fs/agustina/$(whoami)/llamacpp
MODELS_PATH=$BASE_PATH/models
mkdir -p $MODELS_PATH
if [ -f "$MODELS_PATH/tiny-vicuna-1b.q5_k_m.gguf" ]; then
echo "Model tiny-vicuna-1b.q5_k_m.gguf exists"
else
echo "Model tiny-vicuna-1b.q5_k_m.gguf does not exist, downloading now..."
# model download
wget https://huggingface.co/afrideva/Tiny-Vicuna-1B-GGUF/resolve/main/tiny-vicuna-1b.q5_k_m.gguf -P $MODELS_PATH
fi
mkdir -p $BASE_PATH/answers
PROMPT='I think the meaning of life is'
# use llama.cpp client
$LLAMACPP_BIN/./llama-cli --help
$LLAMACPP_BIN/./llama-cli --version
$LLAMACPP_BIN/./llama-cli --list-devices
$LLAMACPP_BIN/./llama-cli -m $MODELS_PATH/tiny-vicuna-1b.q5_k_m.gguf \
-p "$PROMPT" \
--n-predict 128 \
--n-gpu-layers -1 \
-dev CUDA0 > $BASE_PATH/answers/llama-cli-answer.txt
cat $BASE_PATH/answers/llama-cli-answer.txt
# list llama.cpp directory to find out more tools
ls -al $LLAMACPP_BIN
echo "DONE!"
Submit with :
sbatch --account=your_project_ID test-llamacpp-hpc.sh
More info :