llama.cpp in Docker
build llama.cpp inside a Docker container with AMD ROCm support
Github Gist - Docker files
Lets build llama.cpp inside a Docker container with AMD ROCm + HIP toolchain installed. The result is a container image that can compile optimized HIP binaries for our AMD GPU and run GGUF models with full GPU acceleration.
Ubuntu container image with ROCm/HIP
ROCm is very sensitive to ABI mismatches between the kernel driver, HIP runtime, and system libraries, which can cause subtle runtime failures even when builds succeed. AMD recommends Ubuntu because ROCm is built and validated against specific Ubuntu releases, keeping these ABIs aligned and predictable.
I used the docker image rocm/dev-ubuntu-24.04 because ROCm HIP support and developer tooling (hipcc, hipconfig, runtime libs) are preinstalled. Using this image avoids the hunt for missing system packages and subtle ABI mismatches inside the container.
Preparing the Container for GPU Access
We want run the build using a Dockerfile, but let’s reproduce the steps manually first.
Set these shell variables:
-
LLAMACPP_ROCM_ARCH— the AMD GPU architecture to target for optimized binaries,gfx1101orgfx1102for some RDNA3 cards -
HIP_VISIBLE_DEVICES— which GPU(s) the ROCm runtime should expose to the process inside the container. It takes a comma-separated list of GPU indices.HIP_VISIBLE_DEVICES=0→ expose only GPU 0HIP_VISIBLE_DEVICES=0,1→ expose GPUs 0 and 1
docker run -it \
--name=llamacpp_build_01 \
--privileged \
--network=host \
--device=/dev/kfd \
--device=/dev/dri \
--group-add video \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
--ipc=host \
--shm-size 16G \
-v /home/bj/LLM_MODELS:/data \
rocm/dev-ubuntu-24.04:latest
lets walk through it line by line, focusing on why each flag
-
docker run -itruns a new container and attaches our terminal to it. We get an interactive prompt inside the container.-ikeeps STDIN open.-tallocates a pseudo-TTY so you get a normal shell.
-
--name=llamacpp_build_01gives the container a usable name, instead of having to use a container-id like 36569e4cb3cd.
docker start -ai llamacpp_build_01 attach and run interactive shell inside container
docker exec -it llamacpp_build_01 bash - start a shell inside an already running container
-
--privilegedgives container access to the host GPU and other devices -
--network=hostshares the host’s network stack, for port forwarding etc -
--device=/dev/kfdpass the Kernel Fusion Driver device into the container, which lets HIP/OpenCL programs communicate with the GPU. -
--device=/dev/dripass the Direct Rendering Infrastructure devices, used for GPU enumeration, memory mapping etc. -
--group-add videoadd the container user to thevideogroup, since/dev/dri/*devices owned byvideogroup. -
--cap-add=SYS_PTRACEallows the container to trace processes, used by llama.cpp and rocm runtime tools (helpful for testing). -
--security-opt seccomp=unconfineddisables Docker’s default security profile thats blocks some syscalls needed by GPU tools -
--ipc=hostshares host’s IPC namespace, improves performance for GPU runtimes that rely on shared memory segments -
--shm-size 16Gincreases/dev/shmsize inside the container, for large model inference, tensor buffers etc -
-v $HOME/LLM_MODELS:/datamounts our model directory into the container
i.e. launch a privileged, GPU enabled, interactive ROCm development container with access to our AMD GPU and local model files, optimized for building and running large LLM workloads with llama.cpp.
Update and build llama.cpp
- Install build dependencies
apt update && apt install -y nano libcurl4-openssl-dev cmake git
- get llama.cpp source
mkdir -p /workspace && cd /workspace
git clone https://github.com/ggml-org/llama.cpp.git
- Set the target GPU architecture and build using CMake with HIP enabled
export LLAMACPP_ROCM_ARCH=gfx1101,gfx1102
HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)"
cmake -S . -B build -DGGML_HIP=ON \
-DAMDGPU_TARGETS=$LLAMACPP_ROCM_ARCH \
-DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=ON
cmake --build build --config Release -j$(nproc)
Dockerfile: Building llama.cpp with ROCm
FROM rocm/dev-ubuntu-24.04:latest
ARG LLAMACPP_ROCM_ARCH=gfx1101
ARG HIP_VISIBLE_DEVICES=0
ENV LLAMACPP_ROCM_ARCH=${LLAMACPP_ROCM_ARCH}
ENV HIP_VISIBLE_DEVICES=${HIP_VISIBLE_DEVICES}
RUN apt update && apt install -y \
nano \
libcurl4-openssl-dev \
cmake \
git \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /workspace
RUN git clone https://github.com/ggml-org/llama.cpp.git
WORKDIR /workspace/llama.cpp
RUN HIPCXX="$(/opt/rocm/bin/hipconfig -l)/clang" \
HIP_PATH="$(/opt/rocm/bin/hipconfig -R)" \
cmake -S . -B build \
-DGGML_HIP=ON \
-DAMDGPU_TARGETS=${LLAMACPP_ROCM_ARCH} \
-DCMAKE_BUILD_TYPE=Release \
-DLLAMA_CURL=ON \
&& cmake --build build --config Release -j$(nproc)
COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]
CMD ["help"]
Making the Container Usable
entrypoint.sh is a wrapper that turns the container into a friendly CLI. It interprets --run, --serve and --help and dispatches to llama-cli or llama-server. We can use the container like a command-line tool instead of manually invoking binaries.
#!/usr/bin/env bash
set -euo pipefail
LLAMA_BIN="/workspace/llama.cpp/build/bin/llama-cli"
SERVER_BIN="/workspace/llama.cpp/build/bin/llama-server"
usage() {
cat <<EOF
llama.cpp ROCm container
Usage:
--run Run llama-cli (default)
--serve Run llama-serverx`
--help Show this help
Examples:
docker run IMAGE --run -m /data/model.gguf -p "Hello"
docker run IMAGE --serve -m /data/model.gguf --port 8080
docker run IMAGE --help
EOF
}
die() {
echo "error: $*" >&2
exit 1
}
check_bin() {
[ -x "$1" ] || die "binary not found or not executable: $1"
}
run_llama() {
check_bin "$LLAMA_BIN"
exec "$LLAMA_BIN" "$@"
}
run_server() {
check_bin "$SERVER_BIN"
exec "$SERVER_BIN" "$@"
}
if [ $# -eq 0 ]; then
usage
exit 0
fi
case "$1" in
--run)
shift
run_llama "$@"
;;
--serve)
shift
run_server "$@"
;;
--help|-h)
usage
exit 0
;;
*)
# Default mode: treat args as llama-cli flags
run_llama "$@"
;;
esac
Docker Compose: Running the ROCm Container
version: "3.9"
services:
llamacpp-rocm:
container_name: llamacpp_build_01
image: rocm/dev-ubuntu-24.04:latest
tty: true
stdin_open: true
privileged: true
network_mode: host
devices:
- /dev/kfd
- /dev/dri
group_add:
- video
cap_add:
- SYS_PTRACE
security_opt:
- seccomp=unconfined
ipc: host
shm_size: 16g
volumes:
- /home/bj/LLM_MODELS:/data
Run the cli
docker compose run --rm llamacpp-rocm \
--run -m /data/model.gguf -p "Hello"
Run the server
docker compose run --rm llamacpp \
--serve -m /data/model.gguf --port 8080
Here are concise, technical “Notes / Caveats” you can drop in without expanding the scope of the article:
Caveats
- ROCm is sensitive to ABI mismatches, host kernel driver, ROCm version and Ubuntu image must remain aligned.
LLAMACPP_ROCM_ARCHmust match the GPU architecture, or else HIP kernels may compile but fail or underperform.HIP_VISIBLE_DEVICESonly controls GPU visibility inside the process. it does not replace proper device mounts.- Large models place heavy pressure on shared memory. If /dev/shm is too small, inference may fail silently or run with severe and confusing performance degradation.
--privilegedsimplifies ROCm usage but broadens the security surface, avoid on untrusted hosts.