<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>Docker on zharif.my</title>
        <link>https://zharif.my/tags/docker/</link>
        <description>Recent content in Docker on zharif.my</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>en-us</language>
        <lastBuildDate>Fri, 10 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://zharif.my/tags/docker/index.xml" rel="self" type="application/rss+xml" /><item>
        <title>High-Availability Docker Swarm on Proxmox</title>
        <link>https://zharif.my/posts/docker-swarm-proxmox/</link>
        <pubDate>Fri, 10 Apr 2026 00:00:00 +0000</pubDate>
        
        <guid>https://zharif.my/posts/docker-swarm-proxmox/</guid>
        <description>&lt;img src="https://images.unsplash.com/photo-1605745341112-85968b19335b?w=800&amp;h=400&amp;fit=crop" alt="Featured image of post High-Availability Docker Swarm on Proxmox" /&gt;&lt;h2 id=&#34;why-docker-swarm-not-kubernetes&#34;&gt;Why Docker Swarm (Not Kubernetes)
&lt;/h2&gt;&lt;p&gt;Kubernetes is the standard, but for 5-10 containers, it&amp;rsquo;s overkill. Docker Swarm gives you:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Service discovery with zero configuration&lt;/li&gt;
&lt;li&gt;Built-in load balancing&lt;/li&gt;
&lt;li&gt;Rolling updates without custom tooling&lt;/li&gt;
&lt;li&gt;Single-node control plane if needed&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The trade-off: advanced scheduling or custom CNIs need Kubernetes. For Home Assistant, Jellyfin, and WireGuard? Swarm is simpler.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Hardware constraint&lt;/strong&gt;: My GPU passthrough works on LXC (device nodes), which is simpler than PCI passthrough on VMs. This determines the choice.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Proxmox LXC containers with device nodes bypass the need for PCI passthrough. This simplifies GPU access significantly.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;module-capabilities&#34;&gt;Module Capabilities
&lt;/h2&gt;&lt;p&gt;The &lt;code&gt;tf-module-proxmox-docker&lt;/code&gt; module provisions Docker containers or VMs with Docker Engine installed, optionally forming a Docker Swarm cluster:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Multi-node provisioning&lt;/strong&gt; — creates LXC or VM nodes across host pool&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Docker installation&lt;/strong&gt; — installs Docker Engine via cloud-init&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Keepalived integration&lt;/strong&gt; — optional VIP for high availability&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;GPU device passthrough&lt;/strong&gt; — passes through /dev/apex_0, /dev/dri/* for hardware acceleration&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Host pool scheduling&lt;/strong&gt; — round-robin distribution across Proxmox nodes&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;quick-start&#34;&gt;Quick Start
&lt;/h2&gt;&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-hcl&#34;&gt;module &amp;#34;docker_cluster&amp;#34; {
  source  = &amp;#34;registry.example.com/namespace/tf-module-proxmox-docker/docker&amp;#34;
  version = &amp;#34;1.2.3&amp;#34;

  configuration = {
    cluster = {
      name = &amp;#34;prod-docker&amp;#34;
      type = &amp;#34;lxc&amp;#34;  # or &amp;#34;vm&amp;#34;
      datastore = { id = &amp;#34;nas&amp;#34;, node = &amp;#34;alpha&amp;#34; }
    }

    host_pool = [
      { name = &amp;#34;alpha&amp;#34;, datastore_id = &amp;#34;local-lvm&amp;#34; },
      { name = &amp;#34;charlie&amp;#34;, datastore_id = &amp;#34;local-lvm&amp;#34; },
      { name = &amp;#34;foxtrot&amp;#34;, datastore_id = &amp;#34;local-lvm&amp;#34; }
    ]

    worker_nodes = [
      {
        size = &amp;#34;medium&amp;#34;
        networks = { dmz = { address = &amp;#34;192.168.61.21/24&amp;#34;, gateway = &amp;#34;192.168.61.1&amp;#34; } }
        vip = { state = &amp;#34;MASTER&amp;#34;, priority = 100, interface = &amp;#34;dmz&amp;#34; }
      },
      {
        size = &amp;#34;medium&amp;#34;
        networks = { dmz = { address = &amp;#34;192.168.61.22/24&amp;#34;, gateway = &amp;#34;192.168.61.1&amp;#34; } }
        vip = { state = &amp;#34;BACKUP&amp;#34;, priority = 90, interface = &amp;#34;dmz&amp;#34; }
      }
    ]

    node_size_configuration = {
      medium = { cpu = 8, memory = 32768, os_disk = 256 }
    }

    vip = { enabled = true, address = &amp;#34;192.168.61.20&amp;#34; }
  }
}&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;lxc-vs-vm&#34;&gt;LXC vs VM
&lt;/h2&gt;&lt;p&gt;The module supports both container and VM backends:&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Aspect&lt;/th&gt;
          &lt;th&gt;LXC&lt;/th&gt;
          &lt;th&gt;VM&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;Resource overhead&lt;/td&gt;
          &lt;td&gt;Minimal&lt;/td&gt;
          &lt;td&gt;Full hypervisor&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;GPU passthrough&lt;/td&gt;
          &lt;td&gt;Device nodes&lt;/td&gt;
          &lt;td&gt;Full PCI&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Nesting support&lt;/td&gt;
          &lt;td&gt;No&lt;/td&gt;
          &lt;td&gt;Yes&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Use case&lt;/td&gt;
          &lt;td&gt;Simple containers&lt;/td&gt;
          &lt;td&gt;Full VMs&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-hcl&#34;&gt;# LXC-based (type = &amp;#34;lxc&amp;#34;)
configuration = {
  cluster = {
    type = &amp;#34;lxc&amp;#34;
  }
}

# VM-based (type = &amp;#34;vm&amp;#34;)
configuration = {
  cluster = {
    type = &amp;#34;vm&amp;#34;
  }
}&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The VM provisioner downloads a cloud image and imports it:&lt;/p&gt;
&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-hcl&#34;&gt;resource &amp;#34;proxmox_download_file&amp;#34; &amp;#34;vm_image&amp;#34; {
  content_type    = &amp;#34;iso&amp;#34;
  datastore_id  = var.configuration.cluster.datastore.id
  file_name    = &amp;#34;docker-base.iso&amp;#34;
  url          = var.configuration.node_os_configuration[var.configuration.cluster.type].template_image_url
}&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;host-pool-scheduling&#34;&gt;Host Pool Scheduling
&lt;/h2&gt;&lt;p&gt;VMs are distributed across Proxmox nodes via modulo arithmetic:&lt;/p&gt;
&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-hcl&#34;&gt;# In nodes.tf
node_name = var.configuration.host_pool[
  each.key % length(var.configuration.host_pool)
].name&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;With 3 nodes and 3 node indices:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Node 0 → alpha (0 % 3)&lt;/li&gt;
&lt;li&gt;Node 1 → charlie (1 % 3)&lt;/li&gt;
&lt;li&gt;Node 2 → foxtrot (2 % 3)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This ensures even distribution across the cluster for resilience.&lt;/p&gt;
&lt;h2 id=&#34;keepalived-ha&#34;&gt;Keepalived HA
&lt;/h2&gt;&lt;p&gt;For high availability, Keepalived provides a floating VIP:&lt;/p&gt;
&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-hcl&#34;&gt;configuration = {
  vip = {
    enabled  = true
    address  = &amp;#34;192.168.61.20&amp;#34;
    router_id = 20
  }
}&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Each node is configured with its role:&lt;/p&gt;
&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-hcl&#34;&gt;worker_nodes = [
  {
    size = &amp;#34;medium&amp;#34;
    networks = { dmz = { address = &amp;#34;192.168.61.21/24&amp;#34;, gateway = &amp;#34;192.168.61.1&amp;#34; } }
    vip = { state = &amp;#34;MASTER&amp;#34;, priority = 100, interface = &amp;#34;dmz&amp;#34; }
  },
  {
    size = &amp;#34;medium&amp;#34;
    networks = { dmz = { address = &amp;#34;192.168.61.22/24&amp;#34;, gateway = &amp;#34;192.168.61.1&amp;#34; } }
    vip = { state = &amp;#34;BACKUP&amp;#34;, priority = 90, interface = &amp;#34;dmz&amp;#34; }
  },
  {
    size = &amp;#34;medium&amp;#34;
    networks = { dmz = { address = &amp;#34;192.168.61.23/24&amp;#34;, gateway = &amp;#34;192.168.61.1&amp;#34; } }
    vip = { state = &amp;#34;BACKUP&amp;#34;, priority = 80, interface = &amp;#34;dmz&amp;#34; }
  }
]&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The module generates Keepalived configuration:&lt;/p&gt;
&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-hcl&#34;&gt;resource &amp;#34;proxmox_virtual_environment_file&amp;#34; &amp;#34;keepalived_config&amp;#34; {
  content = &amp;lt;&amp;lt;-EOF
    vrrp_instance VI_1 {
        state ${node.vip.state}
        interface ${node.vip.interface}
        virtual_router_id ${var.configuration.vip.router_id}
        priority ${node.vip.priority}
        virtual_ipaddress {
            ${var.configuration.vip.address}
        }
    }
  EOF
}&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;gpu-passthrough&#34;&gt;GPU Passthrough
&lt;/h2&gt;&lt;p&gt;For hardware acceleration (e.g., transcodeing, ML workloads), device passthrough is configured in the host pool:&lt;/p&gt;
&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-hcl&#34;&gt;host_pool = [
  {
    name = &amp;#34;alpha&amp;#34;
    device_map = [
      { device = &amp;#34;/dev/apex_0&amp;#34;, mode = &amp;#34;0666&amp;#34; }      # GPU
      { device = &amp;#34;/dev/dri/renderD128&amp;#34;, mode = &amp;#34;0666&amp;#34; }  # iGPU
      { device = &amp;#34;/dev/dri/card1&amp;#34;, mode = &amp;#34;0666&amp;#34; }
    ]
    datastore_id = &amp;#34;local-lvm&amp;#34;
  }
]&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The devices are passed through to containers:&lt;/p&gt;
&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-hcl&#34;&gt;resource &amp;#34;proxmox_virtual_environment_container&amp;#34; &amp;#34;this&amp;#34; {
  # ...
  
 devices_passthrough = [
    for device in var.configuration.host_pool[each.key % length(var.configuration.host_pool)].device_map : {
      path = device.device
      mode = device.mode
    }
  ]
}&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;docker-installation&#34;&gt;Docker Installation
&lt;/h2&gt;&lt;p&gt;Docker is installed via cloud-init:&lt;/p&gt;
&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-hcl&#34;&gt;resource &amp;#34;proxmox_virtual_environment_file&amp;#34; &amp;#34;cloud_config&amp;#34; {
  content = &amp;lt;&amp;lt;-EOF
#cloud-config
package_update: true
packages:
  - docker.io
  - docker-compose
runcmd:
  - systemctl enable docker
  - systemctl start docker
  - usermod -aG docker root
EOF
}&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Or for more complex setups, custom post-install commands:&lt;/p&gt;
&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-hcl&#34;&gt;node_os_configuration = {
  debian = {
    family = &amp;#34;debian&amp;#34;
    template_image_url = &amp;#34;https://...&amp;#34;
    packages = [&amp;#34;docker.io&amp;#34;, &amp;#34;docker-compose&amp;#34;]
    package_manager = {
      install_command = &amp;#34;apt-get install -y&amp;#34;
    }
    post_install_commands = [
      &amp;#34;systemctl enable docker&amp;#34;,
      &amp;#34;usermod -aG docker root&amp;#34;
    ]
  }
}&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;multi-network-support&#34;&gt;Multi-Network Support
&lt;/h2&gt;&lt;p&gt;The module supports multiple network interfaces per node:&lt;/p&gt;
&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-hcl&#34;&gt;networks = {
  dmz = {
    address = &amp;#34;192.168.61.21/24&amp;#34;
    gateway = &amp;#34;192.168.61.1&amp;#34;
  }
  vmbr1 = {
    address = &amp;#34;192.168.192.121/25&amp;#34;
  }
}&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;This maps to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;dmz&lt;/strong&gt; — frontend network with gateway (for public access)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;vmbr1&lt;/strong&gt; — backend network (for inter-node communication)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;node-sizing&#34;&gt;Node Sizing
&lt;/h2&gt;&lt;p&gt;The &lt;code&gt;node_size_configuration&lt;/code&gt; block keeps definitions DRY:&lt;/p&gt;
&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-hcl&#34;&gt;node_size_configuration = {
  small = {
    cpu     = 2
    memory  = 512
    os_disk = 20
  }
  medium = {
    cpu     = 8
    memory  = 32768
    os_disk = 256
  }
  large = {
    cpu     = 16
    memory  = 65536
    os_disk = 512
  }
}&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;My production cluster uses medium nodes (8 vCPU, 32GB RAM, 256GB disk).&lt;/p&gt;
&lt;h2 id=&#34;optional-tools&#34;&gt;Optional Tools
&lt;/h2&gt;&lt;p&gt;The module can provision additional tools:&lt;/p&gt;
&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-hcl&#34;&gt;configuration = {
  cluster = {
    options = {
      # Hawser - container management
      hawser = {
        enabled = true
        image   = &amp;#34;harbor.example.com/gh/finsys/hawser:latest&amp;#34;
      }
      
      # Newt - container log viewer  
      newt = {
        enabled = true
        image   = &amp;#34;harbor.example.com/dh/fosrl/newt:latest&amp;#34;
        endpoint = &amp;#34;https://newt.example.com&amp;#34;
      }
      
      # APT cache for faster downloads
      apt_cache = {
        enabled = true
        url     = &amp;#34;https://apt.example.com/&amp;#34;
      }
    }
  }
}&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;my-production-configuration&#34;&gt;My Production Configuration
&lt;/h2&gt;&lt;p&gt;Here&amp;rsquo;s the actual production YAML configuration:&lt;/p&gt;
&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;# configurations/docker/prod-docker-lxc.yaml
name: prod-docker-lxc
enabled: true

cluster:
  name: prod-docker-lxc
  type: lxc
  datastore:
    id: nas
    node: alpha

host_pool:
  - name: alpha
    device_map:
      - device: /dev/apex_0
        mode: &amp;#34;0666&amp;#34;
      - device: /dev/dri/renderD128
        mode: &amp;#34;0666&amp;#34;
      - device: /dev/dri/card1
        mode: &amp;#34;0666&amp;#34;
    datastore_id: local-lvm
  - name: charlie
    device_map:
      - device: /dev/dri/renderD128
        mode: &amp;#34;0666&amp;#34;
      - device: /dev/dri/card0
        mode: &amp;#34;0666&amp;#34;
    datastore_id: local-lvm
  - name: foxtrot
    device_map:
      - device: /dev/apex_0
        mode: &amp;#34;0666&amp;#34;
      - device: /dev/dri/renderD128
        mode: &amp;#34;0666&amp;#34;
      - device: /dev/dri/card0
        mode: &amp;#34;0666&amp;#34;
    datastore_id: local-lvm

worker_nodes:
  - size: medium
    networks:
      dmz:
        address: 192.168.61.21/24
        gateway: 192.168.61.1
      vmbr1:
        address: 192.168.192.121/24
    vip:
      state: MASTER
      priority: 100
      interface: dmz
  - size: medium
    networks:
      dmz:
        address: 192.168.61.22/24
        gateway: 192.168.61.1
      vmbr1:
        address: 192.168.192.122/24
    vip:
      state: BACKUP
      priority: 90
      interface: dmz
  - size: medium
    networks:
      dmz:
        address: 192.168.61.23/24
        gateway: 192.168.61.1
      vmbr1:
        address: 192.168.192.123/24
    vip:
      state: BACKUP
      priority: 80
      interface: dmz

node_size_configuration:
  medium:
    cpu: 8
    memory: 32768
    os_disk: 256

vip:
  enabled: true
  address: 192.168.61.20
  router_id: 20&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;outputs&#34;&gt;Outputs
&lt;/h2&gt;&lt;p&gt;The module returns node credentials for access:&lt;/p&gt;
&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-hcl&#34;&gt;output &amp;#34;nodes_credentials&amp;#34; {
  value = {
    password = random_password.node_root_password.result
    ssh_key = tls_private_key.node_root_ssh_key.private_key_pem
    hawser_token = random_uuid.hawser_token.id
  }
}

output &amp;#34;nodes_configurations&amp;#34; {
  value = {
    for idx, node in proxmox_virtual_environment_container.this : idx =&amp;gt; {
      id      = node.id
      name    = node.name
      node    = node.node
      ip      = node.ip_addresses[0]
    }
  }
}&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Credentials are automatically stored in Bitwarden:&lt;/p&gt;
&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-hcl&#34;&gt;resource &amp;#34;bitwarden-secrets_secret&amp;#34; &amp;#34;docker_nodes_password&amp;#34; {
  key   = &amp;#34;${local.cluster_name}-nodes_password&amp;#34;
  value = module.docker[0].nodes_credentials.password.result
}

resource &amp;#34;bitwarden-secrets_secret&amp;#34; &amp;#34;docker_nodes_ssh_key&amp;#34; {
  key   = &amp;#34;${local.cluster_name}-nodes_ssh_key&amp;#34;
  value = module.docker[0].nodes_credentials.ssh_key.private_key_pem
}&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;use-cases&#34;&gt;Use Cases
&lt;/h2&gt;&lt;p&gt;This cluster handles workloads like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Home Assistant&lt;/strong&gt; — Docker Compose-based home automation&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Media services&lt;/strong&gt; — Plex, Jellyfin with GPU transcoding&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;VPN services&lt;/strong&gt; — WireGuard, OpenVPN&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CI runners&lt;/strong&gt; — GitHub Actions self-hosted runners&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The hardware acceleration via GPU passthrough is critical for media workloads.&lt;/p&gt;
&lt;h2 id=&#34;what-most-people-get-wrong&#34;&gt;What Most People Get Wrong
&lt;/h2&gt;&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&amp;ldquo;Docker Swarm is dead&amp;rdquo;&lt;/strong&gt; — It&amp;rsquo;s not Kubernetes, but for 10-container workloads, it&amp;rsquo;s simpler. No RBAC complexity, no CNI headaches.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;GPU passthrough works on LXC&lt;/strong&gt; — Most guides assume PCI passthrough (VMs). With device nodes (&lt;code&gt;/dev/apex_0&lt;/code&gt;, &lt;code&gt;/dev/dri/*&lt;/code&gt;), LXC containers access GPUs directly.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Keepalived needs 3 nodes for quorum&lt;/strong&gt; — Two nodes work fine with &lt;code&gt;nopreempt&lt;/code&gt; on the master. The backup only takes over if master fails.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;when-to-use--when-not-to-use&#34;&gt;When to Use / When NOT to Use
&lt;/h2&gt;&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Use Docker Swarm&lt;/th&gt;
          &lt;th&gt;Use Kubernetes&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;3-15 containers&lt;/td&gt;
          &lt;td&gt;50+ containers&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Simple networking&lt;/td&gt;
          &lt;td&gt;Custom CNI required&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Single admin&lt;/td&gt;
          &lt;td&gt;Team with RBAC needs&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;GPU passthrough via LXC&lt;/td&gt;
          &lt;td&gt;GPU operators&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;whats-next&#34;&gt;What&amp;rsquo;s Next
&lt;/h2&gt;&lt;p&gt;Current areas of exploration:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;GPU scheduling&lt;/strong&gt; — Kubernetes-style GPU scheduling for Docker&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Portainer integration&lt;/strong&gt; — management UI for Docker&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Observability&lt;/strong&gt; — centralized logging with Loki&lt;/li&gt;
&lt;/ol&gt;
</description>
        </item>
        <item>
        <title>Terraform-Driven Homelab Architecture</title>
        <link>https://zharif.my/posts/homelab-terraform-architecture/</link>
        <pubDate>Sun, 15 Mar 2026 00:00:00 +0000</pubDate>
        
        <guid>https://zharif.my/posts/homelab-terraform-architecture/</guid>
        <description>&lt;img src="https://images.unsplash.com/photo-1558494949-ef010cbdcc31?w=800&amp;h=400&amp;fit=crop" alt="Featured image of post Terraform-Driven Homelab Architecture" /&gt;&lt;h2 id=&#34;the-problem-space&#34;&gt;The Problem Space
&lt;/h2&gt;&lt;p&gt;Homelabs evolve. You start with one Docker container, add some LXCs, then Kubernetes, and suddenly your infrastructure is a house of cards held together by scripts you wrote two years ago and don&amp;rsquo;t remember.&lt;/p&gt;
&lt;p&gt;This architecture solves that through:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Everything in code&lt;/strong&gt; — from VM provisioning to Kubernetes bootstrap&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Versioned modules&lt;/strong&gt; — each update is a code review opportunity&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Self-service via Backstage&lt;/strong&gt; — templated provisioning, no Slack threads&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Numbers&lt;/strong&gt;: 3 Proxmox nodes, 2 production clusters (Docker Swarm + Talos K8s), ~50 resources defined across 24+ YAML configurations.&lt;/p&gt;
&lt;p&gt;Running in production:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;3 Proxmox nodes (alpha, charlie, foxtrot)&lt;/li&gt;
&lt;li&gt;Docker Swarm clusters with Keepalived HA&lt;/li&gt;
&lt;li&gt;Talos Kubernetes clusters with Flux GitOps&lt;/li&gt;
&lt;li&gt;GPU passthrough for hardware acceleration&lt;/li&gt;
&lt;li&gt;Multi-network topology (dmz + vmbr1)&lt;/li&gt;
&lt;li&gt;Private container registry (Harbor)&lt;/li&gt;
&lt;li&gt;Private Terraform registry (Cloudflare Workers)&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;Start with the basic template. All custom modules derive from it — maintaining consistency across the infrastructure.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&#34;module-hierarchy&#34;&gt;Module Hierarchy
&lt;/h2&gt;&lt;pre class=&#34;mermaid&#34;&gt;
  graph TB
    subgraph &amp;#34;Root Module&amp;#34;
        A[tf-infra-homelab]
    end
    
    subgraph &amp;#34;Compute Modules&amp;#34;
        B[tf-module-proxmox-lxc]
        C[tf-module-proxmox-vm]
        D[tf-module-proxmox-talos]
        E[tf-module-proxmox-docker]
    end
    
    subgraph &amp;#34;Application Layer&amp;#34;
        F[applications-homelab]
    end
    
    subgraph &amp;#34;Platform&amp;#34;
        G[Proxmox VE]
    end
    
    A --&amp;gt; B
    A --&amp;gt; C
    A --&amp;gt; D
    A --&amp;gt; E
    D --&amp;gt; F
    E --&amp;gt; F
    B --&amp;gt; G
    C --&amp;gt; G
    D --&amp;gt; G
    E --&amp;gt; G
&lt;/pre&gt;

&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Module&lt;/th&gt;
          &lt;th&gt;Purpose&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;terraform-basic-template&lt;/td&gt;
          &lt;td&gt;Foundation for all modules&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;tf-module-proxmox-lxc&lt;/td&gt;
          &lt;td&gt;LXC container provisioning&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;tf-module-proxmox-vm&lt;/td&gt;
          &lt;td&gt;Full VM provisioning&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;tf-module-proxmox-docker&lt;/td&gt;
          &lt;td&gt;Docker Swarm clusters&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;tf-module-proxmox-talos&lt;/td&gt;
          &lt;td&gt;Talos Kubernetes clusters&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;tf-infra-homelab&lt;/td&gt;
          &lt;td&gt;Root orchestration&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;applications-homelab&lt;/td&gt;
          &lt;td&gt;Kustomize deployments&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;github-management-plane&lt;/td&gt;
          &lt;td&gt;GitHub org management&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;the-dependency-graph&#34;&gt;The Dependency Graph
&lt;/h2&gt;&lt;pre class=&#34;mermaid&#34;&gt;
  graph TB
    T[terraform-basic-template]
    L[tf-module-proxmox-lxc]
    V[tf-module-proxmox-vm]
    DT[tf-module-proxmox-docker]
    TT[tf-module-proxmox-talos]
    RH[tf-infra-homelab]
    AH[applications-homelab]
    P[ProxmoxVE]
    
    T --&amp;gt; L
    T --&amp;gt; V
    L --&amp;gt; DT
    V --&amp;gt; DT
    L --&amp;gt; TT
    V --&amp;gt; TT
    DT --&amp;gt; RH
    TT --&amp;gt; RH
    RH --&amp;gt; P
    DT --&amp;gt; AH
    TT --&amp;gt; AH
&lt;/pre&gt;

&lt;p&gt;Key observations:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Template is foundational&lt;/strong&gt; — all modules derive from the same template&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;LXC and VM are leaf modules&lt;/strong&gt; — no dependencies on other custom modules&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Docker and Talos are composite&lt;/strong&gt; — build on LXC/VM modules&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Root module is orchestrational&lt;/strong&gt; — composes modules based on configurations&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Applications deploy post-provisioning&lt;/strong&gt; — GitOps ties into Docker/Talos clusters&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;configuration-driven&#34;&gt;Configuration-Driven
&lt;/h2&gt;&lt;p&gt;All infrastructure is defined in YAML configurations, not ad-hoc Terraform runs:&lt;/p&gt;
&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-txt&#34;&gt;configurations/
├── docker/
│   ├── dev-docker-lxc.yaml
│   └── prod-docker-lxc.yaml
├── kubernetes/
│   ├── dev-k8s.yaml
│   └── prod-k8s.yaml
└── virtual_machine/
    └── ...&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Each config has an &lt;code&gt;enabled&lt;/code&gt; flag for gradual rollout:&lt;/p&gt;
&lt;pre class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;name: prod-k8s
enabled: true  # Set to false to disable without deletion

cluster:
  name: prod-k8s
  datastore:
    id: nas
    node: alpha&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;whats-running&#34;&gt;What&amp;rsquo;s Running
&lt;/h2&gt;&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Cluster&lt;/th&gt;
          &lt;th&gt;Type&lt;/th&gt;
          &lt;th&gt;Nodes&lt;/th&gt;
          &lt;th&gt;VIP&lt;/th&gt;
          &lt;th&gt;Purpose&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;prod-docker-lxc&lt;/td&gt;
          &lt;td&gt;Docker Swarm&lt;/td&gt;
          &lt;td&gt;3x medium (8vCPU/32GB)&lt;/td&gt;
          &lt;td&gt;192.168.61.20&lt;/td&gt;
          &lt;td&gt;Container workloads&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;prod-k8s&lt;/td&gt;
          &lt;td&gt;Talos K8s&lt;/td&gt;
          &lt;td&gt;3x CP (4vCPU/8GB) + 3x worker (10vCPU/48GB)&lt;/td&gt;
          &lt;td&gt;192.168.62.20&lt;/td&gt;
          &lt;td&gt;Kubernetes workloads&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Both clusters span all 3 Proxmox nodes for high availability.&lt;/p&gt;
&lt;h2 id=&#34;design-principles&#34;&gt;Design Principles
&lt;/h2&gt;&lt;p&gt;This architecture follows specific principles:&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Principle&lt;/th&gt;
          &lt;th&gt;Implementation&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;Single configuration object&lt;/td&gt;
          &lt;td&gt;All modules use unified &lt;code&gt;configuration&lt;/code&gt; input&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Host pools&lt;/td&gt;
          &lt;td&gt;Resilience through multi-node distribution&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Versioned modules&lt;/td&gt;
          &lt;td&gt;Each module has explicit versions&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;YAML configurations&lt;/td&gt;
          &lt;td&gt;Infrastructure as data, not ad-hoc apply&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Private registry&lt;/td&gt;
          &lt;td&gt;Distribution without Terraform Cloud cost&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Secrets integration&lt;/td&gt;
          &lt;td&gt;Bitwarden for credential storage&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;GitOps&lt;/td&gt;
          &lt;td&gt;Flux bootstrapped during cluster creation&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Multi-network&lt;/td&gt;
          &lt;td&gt;Separate DMZ and backend networks&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;GPU passthrough&lt;/td&gt;
          &lt;td&gt;Device mapping in host pool&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;what-most-people-get-wrong&#34;&gt;What Most People Get Wrong
&lt;/h2&gt;&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&amp;ldquo;More modules = better architecture&amp;rdquo;&lt;/strong&gt; — I started with 10+ modules. Consolidated to 5. Over-modularization creates maintenance overhead.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&amp;ldquo;YAML = Terraform&amp;rdquo;&lt;/strong&gt; — Terraform is the engine, YAML is the fuel. Don&amp;rsquo;t embed YAML in &lt;code&gt;.tf&lt;/code&gt; files; load from external files.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&amp;ldquo;GitOps replaces Terraform&amp;rdquo;&lt;/strong&gt; — They work together: Terraform provisions, Flux manages apps. Both are declarative.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;related-posts&#34;&gt;Related Posts
&lt;/h2&gt;&lt;p&gt;Each component has its own detailed post:&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Post&lt;/th&gt;
          &lt;th&gt;Focus&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://zharif.my/posts/talos-kubernetes-proxmox&#34; &gt;Talos Kubernetes on Proxmox&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;tf-module-proxmox-talos deep dive — image factory, machine config, bootstrap&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://zharif.my/posts/docker-swarm-proxmox&#34; &gt;Docker Swarm on Proxmox&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;tf-module-proxmox-docker deep dive — Keepalived HA, provisioning&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://zharif.my/posts/lxc-vm-modules&#34; &gt;LXC &amp;amp; VM Modules&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;tf-module-proxmox-lxc + tf-module-proxmox-vm basics&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://zharif.my/posts/backstage-homelab&#34; &gt;Backstage Integration&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Catalog generation, software templates&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://zharif.my/posts/terraform-registry-cloudflare-workers&#34; &gt;Private Terraform Registry&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Module distribution via Cloudflare Workers&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;a class=&#34;link&#34; href=&#34;https://zharif.my/posts/github-management-plane&#34; &gt;GitHub Management Plane&lt;/a&gt;&lt;/td&gt;
          &lt;td&gt;Managing GitHub org via Terraform&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
</description>
        </item>
        
    </channel>
</rss>
