Skip to main content
The goal: fault-tolerant infrastructure I can rebuild from a single nix build.
The homelab is a real production environment, just for one person. Proxmox cluster on bare metal, UniFi networking, Splunk indexers, Cribl Edge collectors, Home Assistant, a docker-host VM for the necessary evil of vendor-locked containers.

Hardware footprint

LayerWhat’s thereNotes
Compute3-node Proxmox cluster: pve1, pve2, pve3Heterogeneous mix β€” single-engineer homelab, parts opportunistically combined; 3 nodes give natural majority quorum
Local LLMDedicated bare-metal NixOS box with discrete GPU and a large local model libraryOutside the Proxmox cluster β€” GPU-bound workload kept off hypervisor to avoid passthrough overhead
StorageZFS on Proxmox hosts; SAS backplane on one node for cluster bulk storage; NVMe for hot tiersMixed-tier by accident, kept by design β€” bulk on SAS, working sets on NVMe
NetworkingUniFi end-to-end: gateway, switches (with 10G SFP+ uplinks), APsSingle-pane management; 10G fiber backbone where it matters
PowerRack UPS for servers; separate UPS for the Home Assistant PiActive NUT monitoring planned once the LLM box is built
Rack managementRaspberry Pi running Home Assistant; iDRAC vKVM jump VM in cluster for Java Web Start console accessOld BMC firmware needs a Java Web Start client; the jump VM keeps Java off the laptop

Network topology

The UniFi gateway sits at the centre of the LAN; the Proxmox cluster, personal devices, and the bare-metal LLM box all hang off it. WireGuard tunnels traverse the Internet β†’ UniFi edge. Per-service VLANs encode workload tier. Diagram and the network-as-code that defines it: tofu-unifi.

Data flow

UniFi gear and host telemetry feed HAProxy β†’ Cribl Edge β†’ Splunk β†’ AWS DR. Full log and NetFlow pipelines: Data pipelines.

Container philosophy

LXC is the default for production homelab services; Docker is the exception, fenced off to a dedicated docker-host VM whenever a vendor ships Docker-only images. The four-question decision tree: LXC vs Docker.

What runs where

Most workloads run as LXC on the Proxmox cluster β€” HAProxy, Cribl Edge, Home Assistant, Qdrant. Splunk Enterprise gets a bare-metal-ish VM for network volume. Docker is fenced off to a single docker-host VM. Local LLM inference runs bare-metal on NixOS to dodge passthrough overhead. Full per-workload inventory: Infrastructure overview.

Provisioning + configuration

tofu-proxmox builds VMs and LXCs. ansible-proxmox configures the host. ansible-proxmox-apps layers the apps on top. The macOS counterpart that runs the monitoring stack as Kubernetes is orbstack-kubernetes.

DR plan

tofu-aws defines a cold AWS footprint sized to take a Splunk failover. Cribl Edge routes can be flipped to the AWS HEC endpoint via config change. Details: tf-splunk-aws.