Ollama
Get up and running with large language models.
For a list of available models, see AI Models.
Interactive Run¶
Warning
We don't recommend running ollama like this except for small test jobs. It is a very inefficient use of GPUs.
#!/bin/bash -e
#SBATCH --account nesi99991
#SBATCH --job-name ollama-test
#SBATCH --time 01:00:00
#SBATCH --mem 10G
#SBATCH --gpus-per-node l4:1
PORT=16000 # please choose your own port number between 1024 and 49151
module purge
module load ollama/0.21.1
export OLLAMA_HOST=${HOSTNAME}:${PORT}
ssh -NfR ${PORT}:${HOSTNAME}:${PORT} ${SLURM_SUBMIT_HOST}
unset CUDA_VISIBLE_DEVICES # Slurm sets this to 0
ollama serve
Then on the login node run,
module load ollama
export OLLAMA_HOST=<nodename>:<port>
ollama
Where <nodename> is the host name of the node running your job (you can find this with sacct or squeue --me),
and <port> is your selected port.
Batch Job¶
Start ollama serve in the background, wait for it to be ready, run your prompt, then exit.
The job ending kills the server automatically.
#!/bin/bash -e
#SBATCH --account nesi99991
#SBATCH --job-name ollama-batch
#SBATCH --time 00:30:00
#SBATCH --mem 10G
#SBATCH --gpus-per-node l4:1
module purge
module load ollama/0.21.1
unset CUDA_VISIBLE_DEVICES # Slurm sets this to 0; ollama manages the GPU itself
# Will assign a random free port number to `PORT`
PORT=$(python3 -c "import socket; s=socket.socket(); s.bind(('', 0)); print(s.getsockname()[1]); s.close()")
export OLLAMA_HOST=${HOSTNAME}:${PORT}
# pipe server output to `/dev/null` to avoid noise.
ollama serve &>/dev/null &
until ollama list &>/dev/null; do sleep 1; done
echo "What is the capital of France" | ollama run llama3.1:8b
Debugging
For verbose server logs, set OLLAMA_DEBUG=1 before ollama serve.