deployment.jsonis the single desired-state input fortofu-proxmox. It lives as one versioned object in a private S3 store, is fetched fail-loud at every plan/apply, and is mutated by exactly one session at a time. This page is the one place that contract is defined — everything else links here.
deployment.json declares every Proxmox guest, pool, node, and storage target the homelab should have. It is the input to tofu-proxmox, not its output. Getting its handling wrong is uniquely dangerous: an earlier model kept it as a gitignored, per-worktree local file, so copies drifted between machines and a missing file let try(jsondecode(file()), {}) silently decode to {} and plan a full destroy. That failure mode is why this contract exists and why it is written down exactly once.
The rules below are ACID by design. Read them as the binding contract for any tofu or ansible session that touches deployment state — do not restate them elsewhere; link to this page.
Two object stores, never confused
Two S3-compatible stores are in play. They share no credentials and serve different jobs. Conflating them is the most common mistake.| Store | Holds | Reached with |
|---|---|---|
Private on-prem S3 (iac-inventory bucket, deployment.json key) | the desired-state input object, versioned | Doppler S3_* creds + S3_ENDPOINT, via aws s3 cp in Terragrunt |
AWS S3 (tfstate-proxmox-<account>, us-east-2) | the Terraform state output | aws-vault AWS_* STS creds, the backend remote_state block |
S3_* creds and the on-prem endpoint; the state backend uses AWS_* STS creds and AWS. They never overlap.
The ACID guarantees
| Property | What it means for deployment.json | How it is enforced |
|---|---|---|
| Atomicity | A plan/apply sees the whole input or none of it — never a partial or empty desired-state. | Terragrunt fetches the single object with aws s3 cp … -; a missing or blank object makes the fetch exit non-zero and run_cmd raises. There is no try() fallback, so {} can never reach a plan. Writes replace the whole object (S3 PutObject), never edit in place. |
| Consistency | Only a structurally valid desired-state is ever applied or published. | deployment.schema.json requires containers (≥1), nodes, pools, and proxmox_node; an empty map fails before any plan. The rendered Ansible inventory is re-validated against the schema before it is distributed, so a partial -target apply cannot publish a truncated inventory. Container keys must equal Terraform state keys. |
| Isolation | Exactly one session mutates infrastructure at a time; concurrent applies serialize instead of colliding. | The OpenTofu state lock (use_lockfile, S3 conditional write) admits one writer. -lock-timeout=10m makes a second apply wait for the holder rather than fail, so agents, hooks, and parallel sessions queue cleanly. |
| Durability | A committed desired-state survives process, worktree, and machine loss. | The input is a versioned object in iac-inventory (history retained); the state is durable in AWS S3. Nothing depends on a local copy — there is no authoritative file on any laptop. |
Reading the input
The canonical source is theiac-inventory object, fetched fresh on every plan/apply. There is no blessed local copy.
- Never trust or hand-edit a local
deployment.json. Any file on disk is a transient fetch artifact, not the source of truth. Delete stale local copies. DEPLOYMENT_JSON_PATHexists only for offline or bootstrap work; it points Terragrunt at a local file instead of S3. Do not use it as a normal workflow.- Ansible consumers never read
deployment.jsondirectly. They read the published inventory thattofu-proxmoxrenders, validates, and distributes after each apply.
Writing the input
All infrastructure changes — containers, VMs, pools, sizing — are edits to the S3 object, applied through one writer:- Fetch the current object from
iac-inventory. - Edit the desired-state.
- Validate against
deployment.schema.jsonbefore upload. - Upload the whole object back (replaces the previous version; the bucket keeps history).
- Never
git add deployment.json. The repo keeps onlydeployment.json.exampleas a shape reference; the live file is gitignored by design. - Never create
terraform.tfvars. It silently overridesdeployment.jsonthrough Terraform variable precedence and is gitignored, so it does not travel between worktrees — the exact drift this contract removes. If one appears in a worktree, delete it.
The schema
deployment.schema.json is the consistency gate. Its job is to reject an empty or structurally broken input before any plan runs.
| Key | Requirement |
|---|---|
containers | object, ≥1 entry (an empty map is the destroy footgun the schema exists to catch) |
nodes | object, ≥1 entry — cluster node identity |
pools | object, ≥1 entry |
proxmox_node | non-empty string |
additionalProperties stays true so per-environment extras and _-prefixed inline-comment keys never false-fail; only the load-bearing shape is enforced. Each container requires vm_id (≥100), hostname, and vlan.
Authoring containers
Keep container entries compact — the module supplies the defaults:- Omit
root_disk.datastore_id; it defaults tolocal-zfs. - Omit
network_interfaceswhen you want the Proxmox firewall on (the default). - Include
network_interfacesonly to setfirewall: false(e.g. DNS servers, management tools).
terragrunt state list before adding an entry for a guest that already exists.
Single-writer locking — the direction
The lock that gives isolation is the OpenTofu state lock, and the canonical mechanism is S3-nativeuse_lockfile (conditional writes via If-None-Match, OpenTofu ≥ 1.10). It needs no separate lock service. This is the modern default: Terraform has deprecated the dynamodb_table argument, and OpenTofu recommends native S3 locking for new backends.
tofu-proxmox currently holds both use_lockfile and a legacy DynamoDB lock table — the “run both, then drop DynamoDB” baking period OpenTofu documents for migrating off DynamoDB. Dropping the DynamoDB leg forces a backend re-init, so it is sequenced as its own change (tracked in tofu-proxmox#353). The target end-state is use_lockfile only. tofu-unifi has already completed this migration and runs lockfile-only.
Lock waiting is separate from the lock mechanism.
-lock-timeout=10m (set on every locking command) is what turns “Error acquiring the state lock” into a clean queue when two sessions apply at once — keep it regardless of which lock backend is active.Variant: tofu-unifi
tofu-unifi solves the same single-writer problem with a different input model, and that divergence is intentional. Instead of one S3-fetched object, it keeps committed per-domain files under deployment/*.json (one top-level domain key per file), shallow-merged at plan time, with use_lockfile-only state locking. Its surface is small and public-safe, so committed config is the right call there. Converging it onto the S3-input model is a roadmap item, not a current requirement.
See also
IaC tooling
Why Terragrunt stays in
tofu-proxmox/tofu-unifi — the three-layer input merge this object feeds.tofu-proxmox
The repo that fetches, applies, and publishes from this object.
Terraform on AWS
The state backend, S3 conditional-write locking, and IAM isolation model.
SOPS for IaC
The encrypted-at-rest layer merged alongside this object.