How you reach it
Teal is your machine, ink is the DNS + reverse-proxy edge, coral is the LLM stack. Both DNS names — the chat UI and the raw API — terminate at the same GPU Ollama. Every name resolves through Technitium and is fronted by Traefik with a wildcard certificate, so it is HTTPS end to end.What’s in the stack
| Container | Does | Reached at |
|---|---|---|
hermes-infer | Ollama on the RX 6800 (ROCm), serves hermes4 | https://ollama.<domain> |
hermes-chat | Open WebUI chat front-end | https://llm.<domain> |
llamaindex | CPU Ollama (nomic-embed-text) for embeddings | internal (RAG) |
qdrant | Vector store for retrieval | https://qdrant.<domain> |
ai VLAN. hermes-infer is a privileged LXC with the GPU
passed through (/dev/kfd + /dev/dri); the model lives on a 120 GB volume.
The LXCs and the Traefik ingress are provisioned by
tofu-proxmox; Ollama, ROCm, the
model pull, and Open WebUI are configured by
ansible-proxmox-apps.
This is not the same “local AI” as the
Apple Silicon stack. That one is the MLX
server on this MacBook (also port 11434), tuned to hold one resident model.
This page is the homelab GPU stack — a different machine, a bigger
model, always on, shared across the LAN.
Use it from your Mac
Everything below is reachable by DNS name over HTTPS. Replaceexample.net with
your homelab’s internal domain.
1 · Browser
Openhttps://llm.example.net, sign in, pick hermes4, and chat. This is
the full Open WebUI — conversation history, system prompts, file uploads.
2 · ollama CLI
Point the CLI at the remote GPU instead of running a model locally:
3 · OpenAI-compatible API
Ollama speaks the OpenAI API, so any OpenAI client works — just change the base URL. Drop this into editors, scripts, and SDKs:ollama. endpoint is unauthenticated — it is LAN-internal, the same
posture as the other homelab dashboards. If you want an authenticated path,
Open WebUI also exposes https://llm.example.net/v1: generate an API key under
Settings → Account and send it as a bearer token.
Related
tofu-proxmox
Provisions the LXCs, GPU passthrough, and the Traefik ingress entries.
ansible-proxmox-apps
Installs Ollama + ROCm, pulls Hermes-4, configures Open WebUI.
LXC vs Docker
Why the inference stack runs as native LXC, not Docker.
AI development pipeline
The other “local AI” — MLX on the workstation, via Bifrost.