Why Docker Swarm (Not Kubernetes)
Kubernetes is the standard, but for 5-10 containers, it’s overkill. Docker Swarm gives you:
- Service discovery with zero configuration
- Built-in load balancing
- Rolling updates without custom tooling
- Single-node control plane if needed
The trade-off: advanced scheduling or custom CNIs need Kubernetes. For Home Assistant, Jellyfin, and WireGuard? Swarm is simpler.
Hardware constraint: My GPU passthrough works on LXC (device nodes), which is simpler than PCI passthrough on VMs. This determines the choice.
Proxmox LXC containers with device nodes bypass the need for PCI passthrough. This simplifies GPU access significantly.
Module Capabilities
The tf-module-proxmox-docker module provisions Docker containers or VMs with Docker Engine installed, optionally forming a Docker Swarm cluster:
- Multi-node provisioning — creates LXC or VM nodes across host pool
- Docker installation — installs Docker Engine via cloud-init
- Keepalived integration — optional VIP for high availability
- GPU device passthrough — passes through /dev/apex_0, /dev/dri/* for hardware acceleration
- Host pool scheduling — round-robin distribution across Proxmox nodes
Quick Start
module "docker_cluster" {
source = "registry.example.com/namespace/tf-module-proxmox-docker/docker"
version = "1.2.3"
configuration = {
cluster = {
name = "prod-docker"
type = "lxc" # or "vm"
datastore = { id = "nas", node = "alpha" }
}
host_pool = [
{ name = "alpha", datastore_id = "local-lvm" },
{ name = "charlie", datastore_id = "local-lvm" },
{ name = "foxtrot", datastore_id = "local-lvm" }
]
worker_nodes = [
{
size = "medium"
networks = { dmz = { address = "192.168.61.21/24", gateway = "192.168.61.1" } }
vip = { state = "MASTER", priority = 100, interface = "dmz" }
},
{
size = "medium"
networks = { dmz = { address = "192.168.61.22/24", gateway = "192.168.61.1" } }
vip = { state = "BACKUP", priority = 90, interface = "dmz" }
}
]
node_size_configuration = {
medium = { cpu = 8, memory = 32768, os_disk = 256 }
}
vip = { enabled = true, address = "192.168.61.20" }
}
}LXC vs VM
The module supports both container and VM backends:
| Aspect | LXC | VM |
|---|---|---|
| Resource overhead | Minimal | Full hypervisor |
| GPU passthrough | Device nodes | Full PCI |
| Nesting support | No | Yes |
| Use case | Simple containers | Full VMs |
# LXC-based (type = "lxc")
configuration = {
cluster = {
type = "lxc"
}
}
# VM-based (type = "vm")
configuration = {
cluster = {
type = "vm"
}
}The VM provisioner downloads a cloud image and imports it:
resource "proxmox_download_file" "vm_image" {
content_type = "iso"
datastore_id = var.configuration.cluster.datastore.id
file_name = "docker-base.iso"
url = var.configuration.node_os_configuration[var.configuration.cluster.type].template_image_url
}Host Pool Scheduling
VMs are distributed across Proxmox nodes via modulo arithmetic:
# In nodes.tf
node_name = var.configuration.host_pool[
each.key % length(var.configuration.host_pool)
].nameWith 3 nodes and 3 node indices:
- Node 0 → alpha (0 % 3)
- Node 1 → charlie (1 % 3)
- Node 2 → foxtrot (2 % 3)
This ensures even distribution across the cluster for resilience.
Keepalived HA
For high availability, Keepalived provides a floating VIP:
configuration = {
vip = {
enabled = true
address = "192.168.61.20"
router_id = 20
}
}Each node is configured with its role:
worker_nodes = [
{
size = "medium"
networks = { dmz = { address = "192.168.61.21/24", gateway = "192.168.61.1" } }
vip = { state = "MASTER", priority = 100, interface = "dmz" }
},
{
size = "medium"
networks = { dmz = { address = "192.168.61.22/24", gateway = "192.168.61.1" } }
vip = { state = "BACKUP", priority = 90, interface = "dmz" }
},
{
size = "medium"
networks = { dmz = { address = "192.168.61.23/24", gateway = "192.168.61.1" } }
vip = { state = "BACKUP", priority = 80, interface = "dmz" }
}
]The module generates Keepalived configuration:
resource "proxmox_virtual_environment_file" "keepalived_config" {
content = <<-EOF
vrrp_instance VI_1 {
state ${node.vip.state}
interface ${node.vip.interface}
virtual_router_id ${var.configuration.vip.router_id}
priority ${node.vip.priority}
virtual_ipaddress {
${var.configuration.vip.address}
}
}
EOF
}GPU Passthrough
For hardware acceleration (e.g., transcodeing, ML workloads), device passthrough is configured in the host pool:
host_pool = [
{
name = "alpha"
device_map = [
{ device = "/dev/apex_0", mode = "0666" } # GPU
{ device = "/dev/dri/renderD128", mode = "0666" } # iGPU
{ device = "/dev/dri/card1", mode = "0666" }
]
datastore_id = "local-lvm"
}
]The devices are passed through to containers:
resource "proxmox_virtual_environment_container" "this" {
# ...
devices_passthrough = [
for device in var.configuration.host_pool[each.key % length(var.configuration.host_pool)].device_map : {
path = device.device
mode = device.mode
}
]
}Docker Installation
Docker is installed via cloud-init:
resource "proxmox_virtual_environment_file" "cloud_config" {
content = <<-EOF
#cloud-config
package_update: true
packages:
- docker.io
- docker-compose
runcmd:
- systemctl enable docker
- systemctl start docker
- usermod -aG docker root
EOF
}Or for more complex setups, custom post-install commands:
node_os_configuration = {
debian = {
family = "debian"
template_image_url = "https://..."
packages = ["docker.io", "docker-compose"]
package_manager = {
install_command = "apt-get install -y"
}
post_install_commands = [
"systemctl enable docker",
"usermod -aG docker root"
]
}
}Multi-Network Support
The module supports multiple network interfaces per node:
networks = {
dmz = {
address = "192.168.61.21/24"
gateway = "192.168.61.1"
}
vmbr1 = {
address = "192.168.192.121/25"
}
}This maps to:
- dmz — frontend network with gateway (for public access)
- vmbr1 — backend network (for inter-node communication)
Node Sizing
The node_size_configuration block keeps definitions DRY:
node_size_configuration = {
small = {
cpu = 2
memory = 512
os_disk = 20
}
medium = {
cpu = 8
memory = 32768
os_disk = 256
}
large = {
cpu = 16
memory = 65536
os_disk = 512
}
}My production cluster uses medium nodes (8 vCPU, 32GB RAM, 256GB disk).
Optional Tools
The module can provision additional tools:
configuration = {
cluster = {
options = {
# Hawser - container management
hawser = {
enabled = true
image = "harbor.example.com/gh/finsys/hawser:latest"
}
# Newt - container log viewer
newt = {
enabled = true
image = "harbor.example.com/dh/fosrl/newt:latest"
endpoint = "https://newt.example.com"
}
# APT cache for faster downloads
apt_cache = {
enabled = true
url = "https://apt.example.com/"
}
}
}
}My Production Configuration
Here’s the actual production YAML configuration:
# configurations/docker/prod-docker-lxc.yaml
name: prod-docker-lxc
enabled: true
cluster:
name: prod-docker-lxc
type: lxc
datastore:
id: nas
node: alpha
host_pool:
- name: alpha
device_map:
- device: /dev/apex_0
mode: "0666"
- device: /dev/dri/renderD128
mode: "0666"
- device: /dev/dri/card1
mode: "0666"
datastore_id: local-lvm
- name: charlie
device_map:
- device: /dev/dri/renderD128
mode: "0666"
- device: /dev/dri/card0
mode: "0666"
datastore_id: local-lvm
- name: foxtrot
device_map:
- device: /dev/apex_0
mode: "0666"
- device: /dev/dri/renderD128
mode: "0666"
- device: /dev/dri/card0
mode: "0666"
datastore_id: local-lvm
worker_nodes:
- size: medium
networks:
dmz:
address: 192.168.61.21/24
gateway: 192.168.61.1
vmbr1:
address: 192.168.192.121/24
vip:
state: MASTER
priority: 100
interface: dmz
- size: medium
networks:
dmz:
address: 192.168.61.22/24
gateway: 192.168.61.1
vmbr1:
address: 192.168.192.122/24
vip:
state: BACKUP
priority: 90
interface: dmz
- size: medium
networks:
dmz:
address: 192.168.61.23/24
gateway: 192.168.61.1
vmbr1:
address: 192.168.192.123/24
vip:
state: BACKUP
priority: 80
interface: dmz
node_size_configuration:
medium:
cpu: 8
memory: 32768
os_disk: 256
vip:
enabled: true
address: 192.168.61.20
router_id: 20Outputs
The module returns node credentials for access:
output "nodes_credentials" {
value = {
password = random_password.node_root_password.result
ssh_key = tls_private_key.node_root_ssh_key.private_key_pem
hawser_token = random_uuid.hawser_token.id
}
}
output "nodes_configurations" {
value = {
for idx, node in proxmox_virtual_environment_container.this : idx => {
id = node.id
name = node.name
node = node.node
ip = node.ip_addresses[0]
}
}
}Credentials are automatically stored in Bitwarden:
resource "bitwarden-secrets_secret" "docker_nodes_password" {
key = "${local.cluster_name}-nodes_password"
value = module.docker[0].nodes_credentials.password.result
}
resource "bitwarden-secrets_secret" "docker_nodes_ssh_key" {
key = "${local.cluster_name}-nodes_ssh_key"
value = module.docker[0].nodes_credentials.ssh_key.private_key_pem
}Use Cases
This cluster handles workloads like:
- Home Assistant — Docker Compose-based home automation
- Media services — Plex, Jellyfin with GPU transcoding
- VPN services — WireGuard, OpenVPN
- CI runners — GitHub Actions self-hosted runners
The hardware acceleration via GPU passthrough is critical for media workloads.
What Most People Get Wrong
-
“Docker Swarm is dead” — It’s not Kubernetes, but for 10-container workloads, it’s simpler. No RBAC complexity, no CNI headaches.
-
GPU passthrough works on LXC — Most guides assume PCI passthrough (VMs). With device nodes (
/dev/apex_0,/dev/dri/*), LXC containers access GPUs directly. -
Keepalived needs 3 nodes for quorum — Two nodes work fine with
nopreempton the master. The backup only takes over if master fails.
When to Use / When NOT to Use
| Use Docker Swarm | Use Kubernetes |
|---|---|
| 3-15 containers | 50+ containers |
| Simple networking | Custom CNI required |
| Single admin | Team with RBAC needs |
| GPU passthrough via LXC | GPU operators |
What’s Next
Current areas of exploration:
- GPU scheduling — Kubernetes-style GPU scheduling for Docker
- Portainer integration — management UI for Docker
- Observability — centralized logging with Loki