Docker on zharif.my

High-Availability Docker Swarm on Proxmox

Fri, 10 Apr 2026 00:00:00 +0000

Why Docker Swarm (Not Kubernetes)

Kubernetes is the standard, but for 5-10 containers, it’s overkill. Docker Swarm gives you:

Service discovery with zero configuration
Built-in load balancing
Rolling updates without custom tooling
Single-node control plane if needed

The trade-off: advanced scheduling or custom CNIs need Kubernetes. For Home Assistant, Jellyfin, and WireGuard? Swarm is simpler.

Hardware constraint: My GPU passthrough works on LXC (device nodes), which is simpler than PCI passthrough on VMs. This determines the choice.

Proxmox LXC containers with device nodes bypass the need for PCI passthrough. This simplifies GPU access significantly.

Module Capabilities

The tf-module-proxmox-docker module provisions Docker containers or VMs with Docker Engine installed, optionally forming a Docker Swarm cluster:

Multi-node provisioning — creates LXC or VM nodes across host pool
Docker installation — installs Docker Engine via cloud-init
Keepalived integration — optional VIP for high availability
GPU device passthrough — passes through /dev/apex_0, /dev/dri/* for hardware acceleration
Host pool scheduling — round-robin distribution across Proxmox nodes

Quick Start

module "docker_cluster" {
  source  = "registry.example.com/namespace/tf-module-proxmox-docker/docker"
  version = "1.2.3"

  configuration = {
    cluster = {
      name = "prod-docker"
      type = "lxc"  # or "vm"
      datastore = { id = "nas", node = "alpha" }
    }

    host_pool = [
      { name = "alpha", datastore_id = "local-lvm" },
      { name = "charlie", datastore_id = "local-lvm" },
      { name = "foxtrot", datastore_id = "local-lvm" }
    ]

    worker_nodes = [
      {
        size = "medium"
        networks = { dmz = { address = "192.168.61.21/24", gateway = "192.168.61.1" } }
        vip = { state = "MASTER", priority = 100, interface = "dmz" }
      },
      {
        size = "medium"
        networks = { dmz = { address = "192.168.61.22/24", gateway = "192.168.61.1" } }
        vip = { state = "BACKUP", priority = 90, interface = "dmz" }
      }
    ]

    node_size_configuration = {
      medium = { cpu = 8, memory = 32768, os_disk = 256 }
    }

    vip = { enabled = true, address = "192.168.61.20" }
  }
}

LXC vs VM

The module supports both container and VM backends:

Aspect	LXC	VM
Resource overhead	Minimal	Full hypervisor
GPU passthrough	Device nodes	Full PCI
Nesting support	No	Yes
Use case	Simple containers	Full VMs

# LXC-based (type = "lxc")
configuration = {
  cluster = {
    type = "lxc"
  }
}

# VM-based (type = "vm")
configuration = {
  cluster = {
    type = "vm"
  }
}

The VM provisioner downloads a cloud image and imports it:

resource "proxmox_download_file" "vm_image" {
  content_type    = "iso"
  datastore_id  = var.configuration.cluster.datastore.id
  file_name    = "docker-base.iso"
  url          = var.configuration.node_os_configuration[var.configuration.cluster.type].template_image_url
}

Host Pool Scheduling

VMs are distributed across Proxmox nodes via modulo arithmetic:

# In nodes.tf
node_name = var.configuration.host_pool[
  each.key % length(var.configuration.host_pool)
].name

With 3 nodes and 3 node indices:

Node 0 → alpha (0 % 3)
Node 1 → charlie (1 % 3)
Node 2 → foxtrot (2 % 3)

This ensures even distribution across the cluster for resilience.

Keepalived HA

For high availability, Keepalived provides a floating VIP:

configuration = {
  vip = {
    enabled  = true
    address  = "192.168.61.20"
    router_id = 20
  }
}

Each node is configured with its role:

worker_nodes = [
  {
    size = "medium"
    networks = { dmz = { address = "192.168.61.21/24", gateway = "192.168.61.1" } }
    vip = { state = "MASTER", priority = 100, interface = "dmz" }
  },
  {
    size = "medium"
    networks = { dmz = { address = "192.168.61.22/24", gateway = "192.168.61.1" } }
    vip = { state = "BACKUP", priority = 90, interface = "dmz" }
  },
  {
    size = "medium"
    networks = { dmz = { address = "192.168.61.23/24", gateway = "192.168.61.1" } }
    vip = { state = "BACKUP", priority = 80, interface = "dmz" }
  }
]

The module generates Keepalived configuration:

resource "proxmox_virtual_environment_file" "keepalived_config" {
  content = <<-EOF
    vrrp_instance VI_1 {
        state ${node.vip.state}
        interface ${node.vip.interface}
        virtual_router_id ${var.configuration.vip.router_id}
        priority ${node.vip.priority}
        virtual_ipaddress {
            ${var.configuration.vip.address}
        }
    }
  EOF
}

GPU Passthrough

For hardware acceleration (e.g., transcodeing, ML workloads), device passthrough is configured in the host pool:

host_pool = [
  {
    name = "alpha"
    device_map = [
      { device = "/dev/apex_0", mode = "0666" }      # GPU
      { device = "/dev/dri/renderD128", mode = "0666" }  # iGPU
      { device = "/dev/dri/card1", mode = "0666" }
    ]
    datastore_id = "local-lvm"
  }
]

The devices are passed through to containers:

resource "proxmox_virtual_environment_container" "this" {
  # ...
  
 devices_passthrough = [
    for device in var.configuration.host_pool[each.key % length(var.configuration.host_pool)].device_map : {
      path = device.device
      mode = device.mode
    }
  ]
}

Docker Installation

Docker is installed via cloud-init:

resource "proxmox_virtual_environment_file" "cloud_config" {
  content = <<-EOF
#cloud-config
package_update: true
packages:
  - docker.io
  - docker-compose
runcmd:
  - systemctl enable docker
  - systemctl start docker
  - usermod -aG docker root
EOF
}

Or for more complex setups, custom post-install commands:

node_os_configuration = {
  debian = {
    family = "debian"
    template_image_url = "https://..."
    packages = ["docker.io", "docker-compose"]
    package_manager = {
      install_command = "apt-get install -y"
    }
    post_install_commands = [
      "systemctl enable docker",
      "usermod -aG docker root"
    ]
  }
}

Multi-Network Support

The module supports multiple network interfaces per node:

networks = {
  dmz = {
    address = "192.168.61.21/24"
    gateway = "192.168.61.1"
  }
  vmbr1 = {
    address = "192.168.192.121/25"
  }
}

This maps to:

dmz — frontend network with gateway (for public access)
vmbr1 — backend network (for inter-node communication)

Node Sizing

The node_size_configuration block keeps definitions DRY:

node_size_configuration = {
  small = {
    cpu     = 2
    memory  = 512
    os_disk = 20
  }
  medium = {
    cpu     = 8
    memory  = 32768
    os_disk = 256
  }
  large = {
    cpu     = 16
    memory  = 65536
    os_disk = 512
  }
}

My production cluster uses medium nodes (8 vCPU, 32GB RAM, 256GB disk).

Optional Tools

The module can provision additional tools:

configuration = {
  cluster = {
    options = {
      # Hawser - container management
      hawser = {
        enabled = true
        image   = "harbor.example.com/gh/finsys/hawser:latest"
      }
      
      # Newt - container log viewer  
      newt = {
        enabled = true
        image   = "harbor.example.com/dh/fosrl/newt:latest"
        endpoint = "https://newt.example.com"
      }
      
      # APT cache for faster downloads
      apt_cache = {
        enabled = true
        url     = "https://apt.example.com/"
      }
    }
  }
}

My Production Configuration

Here’s the actual production YAML configuration:

# configurations/docker/prod-docker-lxc.yaml
name: prod-docker-lxc
enabled: true

cluster:
  name: prod-docker-lxc
  type: lxc
  datastore:
    id: nas
    node: alpha

host_pool:
  - name: alpha
    device_map:
      - device: /dev/apex_0
        mode: "0666"
      - device: /dev/dri/renderD128
        mode: "0666"
      - device: /dev/dri/card1
        mode: "0666"
    datastore_id: local-lvm
  - name: charlie
    device_map:
      - device: /dev/dri/renderD128
        mode: "0666"
      - device: /dev/dri/card0
        mode: "0666"
    datastore_id: local-lvm
  - name: foxtrot
    device_map:
      - device: /dev/apex_0
        mode: "0666"
      - device: /dev/dri/renderD128
        mode: "0666"
      - device: /dev/dri/card0
        mode: "0666"
    datastore_id: local-lvm

worker_nodes:
  - size: medium
    networks:
      dmz:
        address: 192.168.61.21/24
        gateway: 192.168.61.1
      vmbr1:
        address: 192.168.192.121/24
    vip:
      state: MASTER
      priority: 100
      interface: dmz
  - size: medium
    networks:
      dmz:
        address: 192.168.61.22/24
        gateway: 192.168.61.1
      vmbr1:
        address: 192.168.192.122/24
    vip:
      state: BACKUP
      priority: 90
      interface: dmz
  - size: medium
    networks:
      dmz:
        address: 192.168.61.23/24
        gateway: 192.168.61.1
      vmbr1:
        address: 192.168.192.123/24
    vip:
      state: BACKUP
      priority: 80
      interface: dmz

node_size_configuration:
  medium:
    cpu: 8
    memory: 32768
    os_disk: 256

vip:
  enabled: true
  address: 192.168.61.20
  router_id: 20

Outputs

The module returns node credentials for access:

output "nodes_credentials" {
  value = {
    password = random_password.node_root_password.result
    ssh_key = tls_private_key.node_root_ssh_key.private_key_pem
    hawser_token = random_uuid.hawser_token.id
  }
}

output "nodes_configurations" {
  value = {
    for idx, node in proxmox_virtual_environment_container.this : idx => {
      id      = node.id
      name    = node.name
      node    = node.node
      ip      = node.ip_addresses[0]
    }
  }
}

Credentials are automatically stored in Bitwarden:

resource "bitwarden-secrets_secret" "docker_nodes_password" {
  key   = "${local.cluster_name}-nodes_password"
  value = module.docker[0].nodes_credentials.password.result
}

resource "bitwarden-secrets_secret" "docker_nodes_ssh_key" {
  key   = "${local.cluster_name}-nodes_ssh_key"
  value = module.docker[0].nodes_credentials.ssh_key.private_key_pem
}

Use Cases

This cluster handles workloads like:

Home Assistant — Docker Compose-based home automation
Media services — Plex, Jellyfin with GPU transcoding
VPN services — WireGuard, OpenVPN
CI runners — GitHub Actions self-hosted runners

The hardware acceleration via GPU passthrough is critical for media workloads.

What Most People Get Wrong

“Docker Swarm is dead” — It’s not Kubernetes, but for 10-container workloads, it’s simpler. No RBAC complexity, no CNI headaches.
GPU passthrough works on LXC — Most guides assume PCI passthrough (VMs). With device nodes (/dev/apex_0, /dev/dri/*), LXC containers access GPUs directly.
Keepalived needs 3 nodes for quorum — Two nodes work fine with nopreempt on the master. The backup only takes over if master fails.

When to Use / When NOT to Use

Use Docker Swarm	Use Kubernetes
3-15 containers	50+ containers
Simple networking	Custom CNI required
Single admin	Team with RBAC needs
GPU passthrough via LXC	GPU operators

What’s Next

Current areas of exploration:

GPU scheduling — Kubernetes-style GPU scheduling for Docker
Portainer integration — management UI for Docker
Observability — centralized logging with Loki

Terraform-Driven Homelab Architecture

Sun, 15 Mar 2026 00:00:00 +0000

The Problem Space

Homelabs evolve. You start with one Docker container, add some LXCs, then Kubernetes, and suddenly your infrastructure is a house of cards held together by scripts you wrote two years ago and don’t remember.

This architecture solves that through:

Everything in code — from VM provisioning to Kubernetes bootstrap
Versioned modules — each update is a code review opportunity
Self-service via Backstage — templated provisioning, no Slack threads

Numbers: 3 Proxmox nodes, 2 production clusters (Docker Swarm + Talos K8s), ~50 resources defined across 24+ YAML configurations.

Running in production:

3 Proxmox nodes (alpha, charlie, foxtrot)
Docker Swarm clusters with Keepalived HA
Talos Kubernetes clusters with Flux GitOps
GPU passthrough for hardware acceleration
Multi-network topology (dmz + vmbr1)
Private container registry (Harbor)
Private Terraform registry (Cloudflare Workers)

Start with the basic template. All custom modules derive from it — maintaining consistency across the infrastructure.

Module Hierarchy

  graph TB
    subgraph "Root Module"
        A[tf-infra-homelab]
    end
    
    subgraph "Compute Modules"
        B[tf-module-proxmox-lxc]
        C[tf-module-proxmox-vm]
        D[tf-module-proxmox-talos]
        E[tf-module-proxmox-docker]
    end
    
    subgraph "Application Layer"
        F[applications-homelab]
    end
    
    subgraph "Platform"
        G[Proxmox VE]
    end
    
    A --> B
    A --> C
    A --> D
    A --> E
    D --> F
    E --> F
    B --> G
    C --> G
    D --> G
    E --> G

Module	Purpose
terraform-basic-template	Foundation for all modules
tf-module-proxmox-lxc	LXC container provisioning
tf-module-proxmox-vm	Full VM provisioning
tf-module-proxmox-docker	Docker Swarm clusters
tf-module-proxmox-talos	Talos Kubernetes clusters
tf-infra-homelab	Root orchestration
applications-homelab	Kustomize deployments
github-management-plane	GitHub org management

The Dependency Graph

  graph TB
    T[terraform-basic-template]
    L[tf-module-proxmox-lxc]
    V[tf-module-proxmox-vm]
    DT[tf-module-proxmox-docker]
    TT[tf-module-proxmox-talos]
    RH[tf-infra-homelab]
    AH[applications-homelab]
    P[ProxmoxVE]
    
    T --> L
    T --> V
    L --> DT
    V --> DT
    L --> TT
    V --> TT
    DT --> RH
    TT --> RH
    RH --> P
    DT --> AH
    TT --> AH

Key observations:

Template is foundational — all modules derive from the same template
LXC and VM are leaf modules — no dependencies on other custom modules
Docker and Talos are composite — build on LXC/VM modules
Root module is orchestrational — composes modules based on configurations
Applications deploy post-provisioning — GitOps ties into Docker/Talos clusters

Configuration-Driven

All infrastructure is defined in YAML configurations, not ad-hoc Terraform runs:

configurations/
├── docker/
│   ├── dev-docker-lxc.yaml
│   └── prod-docker-lxc.yaml
├── kubernetes/
│   ├── dev-k8s.yaml
│   └── prod-k8s.yaml
└── virtual_machine/
    └── ...

Each config has an enabled flag for gradual rollout:

name: prod-k8s
enabled: true  # Set to false to disable without deletion

cluster:
  name: prod-k8s
  datastore:
    id: nas
    node: alpha

What’s Running

Cluster	Type	Nodes	VIP	Purpose
prod-docker-lxc	Docker Swarm	3x medium (8vCPU/32GB)	192.168.61.20	Container workloads
prod-k8s	Talos K8s	3x CP (4vCPU/8GB) + 3x worker (10vCPU/48GB)	192.168.62.20	Kubernetes workloads

Both clusters span all 3 Proxmox nodes for high availability.

Design Principles

This architecture follows specific principles:

Principle	Implementation
Single configuration object	All modules use unified `configuration` input
Host pools	Resilience through multi-node distribution
Versioned modules	Each module has explicit versions
YAML configurations	Infrastructure as data, not ad-hoc apply
Private registry	Distribution without Terraform Cloud cost
Secrets integration	Bitwarden for credential storage
GitOps	Flux bootstrapped during cluster creation
Multi-network	Separate DMZ and backend networks
GPU passthrough	Device mapping in host pool

What Most People Get Wrong

“More modules = better architecture” — I started with 10+ modules. Consolidated to 5. Over-modularization creates maintenance overhead.
“YAML = Terraform” — Terraform is the engine, YAML is the fuel. Don’t embed YAML in .tf files; load from external files.
“GitOps replaces Terraform” — They work together: Terraform provisions, Flux manages apps. Both are declarative.

Each component has its own detailed post:

Post	Focus
Talos Kubernetes on Proxmox	tf-module-proxmox-talos deep dive — image factory, machine config, bootstrap
Docker Swarm on Proxmox	tf-module-proxmox-docker deep dive — Keepalived HA, provisioning
LXC & VM Modules	tf-module-proxmox-lxc + tf-module-proxmox-vm basics
Backstage Integration	Catalog generation, software templates
Private Terraform Registry	Module distribution via Cloudflare Workers
GitHub Management Plane	Managing GitHub org via Terraform

Docker on zharif.my

High-Availability Docker Swarm on Proxmox

Why Docker Swarm (Not Kubernetes)

Module Capabilities

Quick Start

LXC vs VM

Host Pool Scheduling

Keepalived HA

GPU Passthrough

Docker Installation

Multi-Network Support

Node Sizing

Optional Tools

My Production Configuration

Outputs

Use Cases

What Most People Get Wrong

When to Use / When NOT to Use

What’s Next

Terraform-Driven Homelab Architecture

The Problem Space

Module Hierarchy

The Dependency Graph

Configuration-Driven

What’s Running

Design Principles

What Most People Get Wrong

Related Posts