Featured image of post Talos Kubernetes on Proxmox

Talos Kubernetes on Proxmox

From custom Talos ISO to bootstrapped cluster with Flux. Registry mirrors and GitOps from day 0.

Why Talos

Kubernetes the hard way (kubeadm) requires 15+ steps and manual certificate management. Talos gives you a declarative cluster that manages its own certificates, API server rotation, and upgrades — all through a single machine config.

The trade-off: Talos is opinionated. You don’t get a traditional kubelet. But for infrastructure that should just work, this is a feature.

Air-gapped requirement: My homelab can’t reach public registries. Every container pull route redirects through my Harbor mirror. Talos’s registry mirror config makes this seamless.

[!tip] Talos automatically rotates certificates before they expire. No manual intervention needed for cluster certificate management.

Module Capabilities

The tf-module-proxmox-talos module provisions a complete Talos-based Kubernetes cluster on Proxmox VE in a single Terraform apply:

  1. Talos Image Factory — generates custom ISOs with specific extensions
  2. Machine Configuration — generates Talos machine configs with networking
  3. ISO Upload — downloads and uploads to Proxmox datastore
  4. Node Provisioning — provisions control plane and worker VMs across host pool
  5. Cluster Bootstrap — applies machine configs and bootstraps Kubernetes
  6. Day-0 GitOps — optionally installs Flux or Argo CD during bootstrap
  7. Registry Mirrors — configures container registry redirects

Quick Start

module "talos_cluster" {
  source  = "registry.example.com/namespace/tf-module-proxmox-talos/talos"
  version = "1.2.1"

  configuration = {
    cluster = {
      name = "prod-k8s"
      datastore = { id = "nas", node = "alpha" }
      talos = { version = "v1.12.4" }
      kubernetes_version = "v1.35.0"
    }

    host_pool = {
      alpha = { datastore_id = "local-lvm" }
      charlie = { datastore_id = "local-lvm" }
      foxtrot = { datastore_id = "local-lvm" }
    }

    control_plane_nodes = {
      nodes = [
        { size = "control_plane", networks = { dmz = { address = "192.168.62.21/24", gateway = "192.168.62.1" } } }
      ]
      host_pool = ["alpha", "charlie", "foxtrot"]
      vip = { enabled = true, address = "192.168.62.20" }
    }

    worker_nodes = {
      nodes = [
        { size = "worker", networks = { dmz = { address = "192.168.62.24/24", gateway = "192.168.62.1" } } }
      ]
      host_pool = ["alpha", "charlie", "foxtrot"]
    }

    node_size_configuration = {
      control_plane = { cpu = 4, memory = 8192, os_disk = 128 }
      worker = { cpu = 10, memory = 49152, os_disk = 128, data_disk = 512 }
    }
  }
}

Talos Image Factory

The module uses Talos’s image factory to generate custom ISOs with specific extensions:

# image.tf
resource "talos_image_factory_schematic" "this" {
  schematic = yamlencode({
    customization = {
      systemExtensions = {
        officialExtensions = data.talos_image_factory_extensions_versions.this.extensions_info[*].name
      }
    }
  })
}

The extensions are defined in locals:

locals {
  image = {
    platform = "nocloud"
    customizations = {
      base = [
        "lldp",           # Network topology discovery
        "qemu-guest-agent",  # Proxmox agent integration
        "util-linux-tools",   # Core utilities
        "iscsi-tools",       # NFS storage
        "nfs-utils"        # NFS mounting
      ]
    }
  }
}

The generated schematic ID is used to construct the ISO URL:

resource "proxmox_download_file" "talos_iso" {
  file_name = "talos-${var.configuration.cluster.name}-${var.configuration.cluster.talos.version}-${data.talos_image_factory_urls.this.schematic_id}.iso"
  url      = var.configuration.cluster.talos.iso_mirror != null
    ? replace(data.talos_image_factory_urls.this.urls.iso, "https://", var.configuration.cluster.talos.iso_mirror)
    : data.talos_image_factory_urls.this.urls.iso
}

This allows using mirror registries for air-gapped environments.

Machine Configuration

Talos machine configuration is generated through the Talos provider:

data "talos_machine_configuration" "configurations" {
  cluster_name     = var.configuration.cluster.name
  cluster_version = var.configuration.cluster.talos.version
  
  # Control plane specific config
  machine_type = "controlplane"
  
  # Network configuration
  network = {
    interfaces = [
      for idx, network in var.configuration.control_plane_nodes.nodes[0].networks : {
        interface = keys(network.networks)[0]
       DHCP    = false
        addresses = [values(network.networks)[0].address]
      }
    ]
  }
  
  # Kubernetes configuration  
  kubernetes = {
    version = var.configuration.cluster.kubernetes_version
  }
}

The configuration supports:

  • Multiple network interfaces per node
  • Registry mirrors for all major registries
  • Custom CNI (Cilium) configuration
  • kube-proxy disable

Registry Mirrors

A key feature is container registry mirror configuration:

configuration = {
  cluster = {
    registry_mirrors = {
      "ghcr.io" = {
        endpoints = ["https://harbor.example.com/v2/gh"]
        override_path = true
      }
      "registry.k8s.io" = {
        endpoints = ["https://harbor.example.com/v2/k8s"]
        override_path = true
      }
      "docker.io" = {
        endpoints = ["https://harbor.example.com/v2/dh"]
        override_path = true
      }
      "quay.io" = {
        endpoints = ["https://harbor.example.com/v2/qi"]
        override_path = true
      }
      "factory.talos.dev" = {
        endpoints = ["https://harbor.example.com/v2/talos"]
        override_path = true
      }
    }
  }
}

All container pulls route through my Harbor registry — essential for air-gapped homelabs.

Multi-Network Support

The module provisions VMs with multiple network interfaces:

network_devices = [
  for network_name, network in each.value.networks : {
    name    = network_name
    enabled = true
    bridge  = network_name
    ipv4 = {
      address = network.address
      gateway = network.gateway
    }
  }
]

My production setup uses:

  • dmz — frontend network with gateway (192.168.62.0/24)
  • vmbr1 — backend network for inter-node communication (192.168.192.0/24)

Cluster Bootstrap

The bootstrap sequence is orchestrated by Terraform:

# 1. Generate machine secrets
resource "talos_machine_secrets" "this" {}

# 2. Apply control plane configuration
resource "talos_machine_configuration_apply" "controlplane" {
  for_each = { for idx, node in var.configuration.control_plane_nodes.nodes : idx => node }
  
  node =  module.control_plane_virtual_machine[each.key].virtual_machine.id
  config = data.talos_machine_configuration.configurations[each.key].machine_config
  secrets = talos_machine_secrets.this.secrets
}

# 3. Bootstrap the cluster
resource "talos_machine_bootstrap" "this" {
  node = var.configuration.control_plane_nodes.nodes[0].name
  config = data.talos_machine_configuration.configurations[0].machine_config
  secrets = talos_machine_secrets.this.secrets
}

GitOps Bootstrap

One of the most powerful features — Flux or ArgoCD can be bootstrapped during cluster creation:

configuration = {
  cluster = {
    gitops = {
      provider      = "flux"  # or "argocd"
      namespace     = "flux-system"
      chart_version = "2.18.2"
      
      bootstrap = {
        repo_url              = "https://github.com/your-org/applications.git"
        revision              = "main"
        path                  = "src/k8s/prod"
        destination_namespace = "homelab"
      }
    }
  }
}

This does:

  1. Installs Flux during Talos bootstrap (via inline manifest)
  2. Configures it to sync from the applications-homelab repository
  3. The cluster starts deploying apps immediately after boot
  sequenceDiagram
    participant TF as Terraform
    participant Talos as Talos
    participant Flux as Flux
    participant GH as GitHub
    participant K8s as Kubernetes
    
    TF->>Talos: Apply machine config
    Talos->>Talos: Bootstrap control plane
    Talos->>Flux: Install Flux CRDs
    Flux->>GH: Clone applications-homelab
    GH-->>Flux: Return repo contents
    Flux->>K8s: Deploy applications

Cilium Integration

For advanced networking, the default CNI can be replaced with bundled Cilium:

configuration = {
  cluster = {
    # Disable Talos-managed CNI
    options = {
      disable_default_cni = true
      disable_kube_proxy = true
    }
    
    # Configure Cilium via helm values
    helm_values_override = {
      cilium = {
        operator = { replicas = 1 }
      }
    }
  }
}

The module uses the Helm provider to template the Cilium manifest:

data "helm_template" "cilium" {
  name      = "cilium"
  repo      = "https://cilium.github.io/cilium"
  chart     = "cilium"
  version   = var.configuration.cluster.talos.version
  namespace = "cilium"
  values    = [var.configuration.cluster.helm_values_override]
}

Node Sizing

The node_size_configuration block keeps definitions DRY:

node_size_configuration = {
  control_plane = {
    cpu     = 4
    memory  = 8192   # MB
    os_disk = 128     # GB
  }
  worker = {
    cpu      = 10
    memory  = 49152  # MB
    os_disk = 128
    data_disk = 512  # Extra data disk for PVs
  }
}

My prod-k8s cluster:

  • 3 control plane nodes: 4 vCPU, 8GB RAM, 128GB disk
  • 3 worker nodes: 10 vCPU, 48GB RAM, 128GB OS + 512GB data

Host Pool Scheduling

VMs are distributed across Proxmox nodes via modulo arithmetic:

# In nodes.tf
node_name = var.configuration.control_plane_nodes.host_pool[
  each.key % length(var.configuration.control_plane_nodes.host_pool)
]

With 3 nodes and 6 worker indices:

  • Worker 0 → alpha (0 % 3)
  • Worker 1 → charlie (1 % 3)
  • Worker 2 → foxtrot (2 % 3)
  • Worker 3 → alpha (3 % 3)
  • Worker 4 → charlie (4 % 3)
  • Worker 5 → foxtrot (5 % 3)

This ensures even distribution across the cluster.

Outputs

The module returns cluster credentials for external use:

output "cluster_credentials" {
  value = {
    kubeconfig    = talos_cluster_kubeconfig.this.kubeconfig
    talosconfig = talos_client_configuration.this.talosconfig
    
    # Kubeconfig file is also written locally when debug = true
    talosconfig_path = local.talosconfig_path
    kubeconfig_path = local.kubeconfig_path
  }
}

Credentials are automatically stored in Bitwarden:

resource "bitwarden-secrets_secret" "kubernetes_kubeconfig" {
  key   = "${local.cluster_name}-kubeconfig"
  value = module.kubernetes[0].cluster_credentials.kubeconfig
}

resource "bitwarden-secrets_secret" "kubernetes_talosconfig" {
  key   = "${local.cluster_name}-talosconfig"
  value = module.kubernetes[0].cluster_credentials.talosconfig
}

My Production Configuration

Here’s the actual production YAML configuration:

# configurations/kubernetes/prod-k8s.yaml
cluster:
  name: prod-k8s
  datastore:
    id: nas
    node: alpha
  talos:
    version: v1.12.4
    installer_mirror: harbor.example.com/talos
    iso_mirror: https://proxy.example.com/
  kubernetes_version: v1.35.0
  registry_mirrors:
    ghcr.io: { endpoints: [https://harbor.example.com/v2/gh], override_path: true }
    registry.k8s.io: { endpoints: [https://harbor.example.com/v2/k8s], override_path: true }
    docker.io: { endpoints: [https://harbor.example.com/v2/dh], override_path: true }
    quay.io: { endpoints: [https://harbor.example.com/v2/qi], override_path: true }
    factory.talos.dev: { endpoints: [https://harbor.example.com/v2/talos], override_path: true }
  options:
    disable_default_cni: true
    disable_kube_proxy: true
    disable_scheduling_on_control_plane: true
  gitops:
    provider: flux
    bootstrap:
      repo_url: https://github.com/your-org/applications.git
      path: src/k8s/prod
      destination_namespace: homelab

host_pool:
  alpha: { datastore_id: local-lvm }
  charlie: { datastore_id: local-lvm }
  foxtrot: { datastore_id: local-lvm }

control_plane_nodes:
  nodes: [...]  # 3 control planes
  host_pool: [alpha, charlie, foxtrot]
  vip: { enabled: true, address: 192.168.62.20 }

worker_nodes:
  nodes: [...]  # 3 workers
  host_pool: [alpha, charlie, foxtrot]

node_size_configuration:
  control_plane: { cpu: 4, memory: 8192, os_disk: 128 }
  worker: { cpu: 10, memory: 49152, os_disk: 128, data_disk: 512 }

What’s Next

Current areas of exploration:

  1. Multi-cluster federation — connecting Talos clusters for workload distribution
  2. Nested Talos — running Talos inside Proxmox for testing
  3. Observability — centralized logging with Loki and Grafana

What Most People Get Wrong

  1. “Talos upgrades break clusters” — With proper machine configs and registry mirrors, upgrades are rolling. The immutability is a feature, not a bug.

  2. “Air-gapped is impossible” — Talos’ registry mirror config + image factory handles this. Your nodes don’t need public internet access.

  3. “No kubelet means no logging” — Talos has built-in talosctl logs and talosctl metrics. It’s different from Kubernetes logging, not less capable.

When to Use / When NOT to Use

Use Talos Stick with kubeadm
Want declarative infrastructure Need full kubelet control
Air-gapped environments Custom init systems required
Single apply to cluster Manual certificate management needed

Use Talos

Use Talos Stick with kubeadm
Want declarative infrastructure Need full kubelet control
Air-gapped environments Custom init systems required
Single apply to cluster Manual certificate management needed

The foundation is solid — every cluster can be versioned, reviewed, and rolled back.