Ollama

Get up and running with large language models.

For a list of available models, see AI Models.

Interactive Run¶

Warning

We don't recommend running ollama like this except for small test jobs. It is a very inefficient use of GPUs.

#!/bin/bash -e

#SBATCH --account nesi99991
#SBATCH --job-name ollama-test
#SBATCH --time 01:00:00
#SBATCH --mem 10G
#SBATCH --gpus-per-node l4:1

PORT=16000 # please choose your own port number between 1024 and 49151

module purge
module load ollama/0.21.1
export OLLAMA_HOST=${HOSTNAME}:${PORT}
ssh -NfR ${PORT}:${HOSTNAME}:${PORT} ${SLURM_SUBMIT_HOST}

unset CUDA_VISIBLE_DEVICES # Slurm sets this to 0

ollama serve

Then on the login node run,

module load ollama
export OLLAMA_HOST=<nodename>:<port>
ollama

Where <nodename> is the host name of the node running your job (you can find this with sacct or squeue --me), and <port> is your selected port.

Batch Job¶

Start ollama serve in the background, wait for it to be ready, run your prompt, then exit. The job ending kills the server automatically.

#!/bin/bash -e

#SBATCH --account        nesi99991
#SBATCH --job-name       ollama-batch
#SBATCH --time           00:30:00
#SBATCH --mem            10G
#SBATCH --gpus-per-node  l4:1

module purge
module load ollama/0.21.1
unset CUDA_VISIBLE_DEVICES  # Slurm sets this to 0; ollama manages the GPU itself

# Will assign a random free port number to `PORT`
PORT=$(python3 -c "import socket; s=socket.socket(); s.bind(('', 0)); print(s.getsockname()[1]); s.close()")

export OLLAMA_HOST=${HOSTNAME}:${PORT}

# pipe server output to `/dev/null` to avoid noise.
ollama serve &>/dev/null & 

until ollama list &>/dev/null; do sleep 1; done

echo "What is the capital of France" | ollama run llama3.1:8b

Debugging

For verbose server logs, set OLLAMA_DEBUG=1 before ollama serve.