Skip to main content
deployment.json is the single desired-state input for tofu-proxmox. It lives as one versioned object in a private S3 store, is fetched fail-loud at every plan/apply, and is mutated by exactly one session at a time. This page is the one place that contract is defined — everything else links here.
deployment.json declares every Proxmox guest, pool, node, and storage target the homelab should have. It is the input to tofu-proxmox, not its output. Getting its handling wrong is uniquely dangerous: an earlier model kept it as a gitignored, per-worktree local file, so copies drifted between machines and a missing file let try(jsondecode(file()), {}) silently decode to {} and plan a full destroy. That failure mode is why this contract exists and why it is written down exactly once. The rules below are ACID by design. Read them as the binding contract for any tofu or ansible session that touches deployment state — do not restate them elsewhere; link to this page.

Two object stores, never confused

Two S3-compatible stores are in play. They share no credentials and serve different jobs. Conflating them is the most common mistake.
StoreHoldsReached with
Private on-prem S3 (iac-inventory bucket, deployment.json key)the desired-state input object, versionedDoppler S3_* creds + S3_ENDPOINT, via aws s3 cp in Terragrunt
AWS S3 (tfstate-proxmox-<account>, us-east-2)the Terraform state outputaws-vault AWS_* STS creds, the backend remote_state block
The input fetch uses S3_* creds and the on-prem endpoint; the state backend uses AWS_* STS creds and AWS. They never overlap.

The ACID guarantees

PropertyWhat it means for deployment.jsonHow it is enforced
AtomicityA plan/apply sees the whole input or none of it — never a partial or empty desired-state.Terragrunt fetches the single object with aws s3 cp … -; a missing or blank object makes the fetch exit non-zero and run_cmd raises. There is no try() fallback, so {} can never reach a plan. Writes replace the whole object (S3 PutObject), never edit in place.
ConsistencyOnly a structurally valid desired-state is ever applied or published.deployment.schema.json requires containers (≥1), nodes, pools, and proxmox_node; an empty map fails before any plan. The rendered Ansible inventory is re-validated against the schema before it is distributed, so a partial -target apply cannot publish a truncated inventory. Container keys must equal Terraform state keys.
IsolationExactly one session mutates infrastructure at a time; concurrent applies serialize instead of colliding.The OpenTofu state lock (use_lockfile, S3 conditional write) admits one writer. -lock-timeout=10m makes a second apply wait for the holder rather than fail, so agents, hooks, and parallel sessions queue cleanly.
DurabilityA committed desired-state survives process, worktree, and machine loss.The input is a versioned object in iac-inventory (history retained); the state is durable in AWS S3. Nothing depends on a local copy — there is no authoritative file on any laptop.

Reading the input

The canonical source is the iac-inventory object, fetched fresh on every plan/apply. There is no blessed local copy.
  • Never trust or hand-edit a local deployment.json. Any file on disk is a transient fetch artifact, not the source of truth. Delete stale local copies.
  • DEPLOYMENT_JSON_PATH exists only for offline or bootstrap work; it points Terragrunt at a local file instead of S3. Do not use it as a normal workflow.
  • Ansible consumers never read deployment.json directly. They read the published inventory that tofu-proxmox renders, validates, and distributes after each apply.

Writing the input

All infrastructure changes — containers, VMs, pools, sizing — are edits to the S3 object, applied through one writer:
  1. Fetch the current object from iac-inventory.
  2. Edit the desired-state.
  3. Validate against deployment.schema.json before upload.
  4. Upload the whole object back (replaces the previous version; the bucket keeps history).
Two hard prohibitions:
  • Never git add deployment.json. The repo keeps only deployment.json.example as a shape reference; the live file is gitignored by design.
  • Never create terraform.tfvars. It silently overrides deployment.json through Terraform variable precedence and is gitignored, so it does not travel between worktrees — the exact drift this contract removes. If one appears in a worktree, delete it.

The schema

deployment.schema.json is the consistency gate. Its job is to reject an empty or structurally broken input before any plan runs.
KeyRequirement
containersobject, ≥1 entry (an empty map is the destroy footgun the schema exists to catch)
nodesobject, ≥1 entry — cluster node identity
poolsobject, ≥1 entry
proxmox_nodenon-empty string
additionalProperties stays true so per-environment extras and _-prefixed inline-comment keys never false-fail; only the load-bearing shape is enforced. Each container requires vm_id (≥100), hostname, and vlan.

Authoring containers

Keep container entries compact — the module supplies the defaults:
  • Omit root_disk.datastore_id; it defaults to local-zfs.
  • Omit network_interfaces when you want the Proxmox firewall on (the default).
  • Include network_interfaces only to set firewall: false (e.g. DNS servers, management tools).
"my-container": {
  "vm_id": 123,
  "hostname": "my-container",
  "description": "What it runs",
  "cpu_cores": 2,
  "memory_dedicated": 2048,
  "vlan": "compute",
  "tags": ["terraform", "container"],
  "pool_id": "infrastructure",
  "root_disk": { "size": 16 }
}
Key-name alignment is load-bearing. A container’s key must match its Terraform state key exactly — a mismatch triggers destroy + recreate. Verify against terragrunt state list before adding an entry for a guest that already exists.

Single-writer locking — the direction

The lock that gives isolation is the OpenTofu state lock, and the canonical mechanism is S3-native use_lockfile (conditional writes via If-None-Match, OpenTofu ≥ 1.10). It needs no separate lock service. This is the modern default: Terraform has deprecated the dynamodb_table argument, and OpenTofu recommends native S3 locking for new backends. tofu-proxmox currently holds both use_lockfile and a legacy DynamoDB lock table — the “run both, then drop DynamoDB” baking period OpenTofu documents for migrating off DynamoDB. Dropping the DynamoDB leg forces a backend re-init, so it is sequenced as its own change (tracked in tofu-proxmox#353). The target end-state is use_lockfile only. tofu-unifi has already completed this migration and runs lockfile-only.
Lock waiting is separate from the lock mechanism. -lock-timeout=10m (set on every locking command) is what turns “Error acquiring the state lock” into a clean queue when two sessions apply at once — keep it regardless of which lock backend is active.

Variant: tofu-unifi

tofu-unifi solves the same single-writer problem with a different input model, and that divergence is intentional. Instead of one S3-fetched object, it keeps committed per-domain files under deployment/*.json (one top-level domain key per file), shallow-merged at plan time, with use_lockfile-only state locking. Its surface is small and public-safe, so committed config is the right call there. Converging it onto the S3-input model is a roadmap item, not a current requirement.

See also

IaC tooling

Why Terragrunt stays in tofu-proxmox/tofu-unifi — the three-layer input merge this object feeds.

tofu-proxmox

The repo that fetches, applies, and publishes from this object.

Terraform on AWS

The state backend, S3 conditional-write locking, and IAM isolation model.

SOPS for IaC

The encrypted-at-rest layer merged alongside this object.