The goal: fault-tolerant infrastructure I can rebuild from a single nix build.
The homelab is a real production environment, just for one person. Proxmox cluster on bare metal, UniFi networking, Splunk indexers, Cribl Edge collectors, Home Assistant, a docker-host VM for the necessary evil of vendor-locked containers.
Hardware footprint
| Layer | Whatβs there | Notes |
|---|---|---|
| Compute | 3-node Proxmox cluster: pve1, pve2, pve3 | Heterogeneous mix β single-engineer homelab, parts opportunistically combined; 3 nodes give natural majority quorum |
| Local LLM | Dedicated bare-metal NixOS box with discrete GPU and a large local model library | Outside the Proxmox cluster β GPU-bound workload kept off hypervisor to avoid passthrough overhead |
| Storage | ZFS on Proxmox hosts; SAS backplane on one node for cluster bulk storage; NVMe for hot tiers | Mixed-tier by accident, kept by design β bulk on SAS, working sets on NVMe |
| Networking | UniFi end-to-end: gateway, switches (with 10G SFP+ uplinks), APs | Single-pane management; 10G fiber backbone where it matters |
| Power | Rack UPS for servers; separate UPS for the Home Assistant Pi | Active NUT monitoring planned once the LLM box is built |
| Rack management | Raspberry Pi running Home Assistant; iDRAC vKVM jump VM in cluster for Java Web Start console access | Old BMC firmware needs a Java Web Start client; the jump VM keeps Java off the laptop |
Network topology
The UniFi gateway sits at the centre of the LAN; the Proxmox cluster, personal devices, and the bare-metal LLM box all hang off it. WireGuard tunnels traverse the Internet β UniFi edge. Per-service VLANs encode workload tier. Diagram and the network-as-code that defines it:tofu-unifi.
Data flow
UniFi gear and host telemetry feed HAProxy β Cribl Edge β Splunk β AWS DR. Full log and NetFlow pipelines: Data pipelines.Container philosophy
LXC is the default for production homelab services; Docker is the exception, fenced off to a dedicateddocker-host VM whenever a vendor ships Docker-only images. The four-question decision tree: LXC vs Docker.
What runs where
Most workloads run as LXC on the Proxmox cluster β HAProxy, Cribl Edge, Home Assistant, Qdrant. Splunk Enterprise gets a bare-metal-ish VM for network volume. Docker is fenced off to a singledocker-host VM. Local LLM inference runs bare-metal on NixOS to dodge passthrough overhead. Full per-workload inventory: Infrastructure overview.
Provisioning + configuration
tofu-proxmox builds VMs and LXCs. ansible-proxmox configures the host. ansible-proxmox-apps layers the apps on top. The macOS counterpart that runs the monitoring stack as Kubernetes is orbstack-kubernetes.
DR plan
tofu-aws defines a cold AWS footprint sized to take a Splunk failover. Cribl Edge routes can be flipped to the AWS HEC endpoint via config change. Details: tf-splunk-aws.