Setting up K3s nodes in Proxmox using Terraform
One of the main benefits of using Proxmox in the homelab, is that I can create and destroy VMs on demand. Proxmox also has a Terraform provider that I can use for defining VMs using HCL, which means creating/changing/destroying resources is similar to any other cloud provider. Getting to this stage required a non-trivial amount of effort, which is what this post is about.
Setting up cloud-init Virtual Machine Templates
It’s possible to create VMs using the Terraform provider without using cloud-init VM Templates. However, I want a baseline template that I can use for all types of workloads which also supports specifying:
- Network interface with a static IP address
- Predefined SSH keys added to authorized keys
The end goal is that once a VM is provisioned by Terraform, I can run an Ansible playbook to configure it directly. The only manual step required should be accepting the host key of the VM on the Ansible control node.
When setting up all of this I have relied on the following sources of information:
- The Teraform provider documentation
- This blogpost from yetiops.net
- This blogpost from austinsnerdythings.com
Big shoutout to all of them for good writeups!
To create the VM templates I added a new role to the existing Ansible playbook for the Proxmox hosts. This role basically does the following:
- Downloads the cloud-init image for multiple distributions
- Customizes the images by installing the QEMU guest agent
- Creates a VM template from each image
Files involved:
roles/tasks/download_iso_images.yaml
roles/tasks/install_qemu_guest_agent.yaml
roles/tasks/create_cloud_init_template.yaml
roles/tasks/main.yaml
roles/vars/main.yaml
Definitions:
roles/tasks/download_iso_images.yaml
:
- name: Download ISO images to directory "{{ pvesm_local_storage_path }}/{{ pvesm_local_storage_iso_subpath }}"
loop: "{{ iso_images_to_download }}"
get_url:
url: "{{ item.url }}"
dest: "{{ pvesm_local_storage_path }}/{{ pvesm_local_storage_iso_subpath }}/{{ item.filename }}"
roles/tasks/install_qemu_guest_agent.yaml
:
- name: Install libguestfs-tools on proxmox nodes
apt:
name: libguestfs-tools
update_cache: true
- name: Install qemu-guest-agent in all images in directory "{{ pvesm_local_storage_path }}/{{ pvesm_local_storage_iso_subpath }}"
loop: "{{ iso_images_to_download }}"
command: virt-customize -a "{{ pvesm_local_storage_path }}/{{ pvesm_local_storage_iso_subpath }}/{{ item.filename }}" --install qemu-guest-agent
roles/tasks/create_cloud_init_template.yaml
:
- name: Check if template already exists
shell: qm list | grep "{{ vm_id }}"
ignore_errors: true
register: template_exists_cmd
- name: Delete template {{ vm_id }} if it already exists
command: qm destroy {{ vm_id }}
when: template_exists_cmd.rc == 0
- name: Create the VM to be used as a cloud-init base for template {{ vm_instance_template_name }}
command: qm create {{ vm_id }} -name {{ vm_instance_template_name }} -memory 1024 -net0 virtio,bridge=vmbr0 -cores 1 -sockets 1
- name: Import the cloud init ISO image into the disk
command: qm importdisk {{ vm_id }} {{ iso_image_path }} {{ storage }}
- name: Attach the disk to the virtual machine
command: qm set {{ vm_id }} -scsihw virtio-scsi-pci -virtio0 "{{ storage }}:vm-{{ vm_id }}-disk-0"
- name: Set a second drive of the cloud-init type
command: qm set {{ vm_id }} -ide2 "{{ storage }}:cloudinit"
- name: Add serial output
command: qm set {{ vm_id }} -serial0 socket -vga serial0
- name: Set the bootdisk to their imported disk
command: qm set {{ vm_id }} -boot c -bootdisk virtio0
- name: Enable qemu guest agent
command: qm set {{ vm_id }} --agent enabled=1
when: enable_qemu_guest_agent is defined
- name: Create a template from the instance
command: qm template {{ vm_id }}
roles/tasks/main.yaml
:
- name: Download ISO images
import_tasks: download_iso_images.yaml
- name: Install qemu-guest-agent in iso images
import_tasks: install_qemu_guest_agent.yaml
- name: Setup cloud-init for alle downloaded ISO images that has cloud-init
loop: "{{ iso_images_to_download }}"
include_tasks: create_cloud_init_template.yaml
when: item.is_cloud_init_image is defined
vars:
vm_id: "{{ item.cloud_init_vm_id }}"
vm_instance_template_name: "{{ item.cloud_init_vm_instance_template_name }}"
iso_image_path: "{{ pvesm_local_storage_path }}/{{ pvesm_local_storage_iso_subpath }}/{{ item.filename }}"
storage: local-lvm
enable_qemu_guest_agent: "{{ item.enable_qemu_guest_agent }}"
roles/vars/main.yaml
:
pvesm_local_storage_path: /var/lib/vz
pvesm_local_storage_iso_subpath: template/iso
iso_images_to_download:
- filename: ubuntu-22.04-server-cloudimg-amd64.img
url: https://cloud-images.ubuntu.com/releases/22.04/release/ubuntu-22.04-server-cloudimg-amd64.img
is_cloud_init_image: true
cloud_init_vm_id: 9001
cloud_init_vm_instance_template_name: ubuntu-22.04-server-cloudimg-amd64
enable_qemu_guest_agent: true
I opted for recreating the VM templates on each run to ensure
they always reflect the latest images. One thing to note is that I
use full_clone
for all VMs to ensure there is no reference that breaks if the
template is deleted.
Once the playbook has been ran, the VM templates are ready and it’s time to define the VMs using HCL.
Writing the Terraform configuration for the VMs
I won’t spend time talking about initializing the Terraform provider or creating the user account in Proxmox since this is already covered in the documentation:
- Creating the Proxmox user and role for Terraform
- Creating connection with username and API token
- Installing the Terraform provider
Once everything has been setup, it’s time to define the VMs.
Most of the configuration is similar enough that defining multiple resources was not necessary in my case:
k8s.tf
:
resource "proxmox_vm_qemu" "k8s" {
for_each = {
for vm in var.vms : vm.name => vm
}
name = each.value.name
desc = each.value.desc
target_node = each.value.pve_node
# Setting the OS type to cloud-init
os_type = "cloud-init"
# Set to the name of the cloud-init VM template created earlier
clone = var.cloud_init_template
# Ensure each VM is cloned in full to avoid
# dependency to the original VM template
full_clone = true
cores = each.value.cores
memory = each.value.memory
# Define a static IP on the primary network interface
ipconfig0 = "ip=${each.value.ip},gw=${var.gateway_ip}"
ciuser = var.username
sshkeys = var.ssh_public_key
# Enable the QEMU guest agent
agent = 1
# This is matched to the
# default mounted OS drive.
disk {
type = "virtio"
storage = "local-lvm"
size = "20G"
}
# Additional storage drive for Longhorn.
disk {
type = "virtio"
storage = "local-lvm"
size = each.value.disk_size
format = "raw"
}
lifecycle {
ignore_changes = [
target_node,
network,
clone,
full_clone,
qemu_os
]
}
}
vars.tf
:
variable "cloud_init_template" {
type = string
description = "Name of the cloud-init template to use"
default = "ubuntu-22.04-server-cloudimg-amd64"
}
variable "username" {
type = string
description = "Username of the cloud-init user"
sensitive = true
}
variable "ssh_public_key" {
type = string
description = "Public SSH Key to add to authorized keys"
sensitive = true
}
variable "gateway_ip" {
type = string
description = "IP of gateway"
default = "<IP address>"
}
variable "vms" {
type = list(object({
name = string
pve_node = string
desc = string
ip = string
memory = number
cores = number
disk_size = string
}))
default = []
}
My current configuration. The 120GB drive on all 3 worker nodes is dedicated to Longhorn. The 30GB drive on the control plane node was previously used for Longhorn storage and is going to be removed in the future.
terraform.tfvars
:
vms = [
{
name = "k8s-control-1"
pve_node = "<Proxmox host containing VM templates in the Proxmox cluster>"
desc = "Kubernetes control plane node 1"
ip = "<IP address in CIDR notation>"
memory = 4096
cores = 4
disk_size = "30G"
},
{
name = "k8s-worker-1"
pve_node = "<Proxmox host containing VM templates in the Proxmox cluster>"
desc = "Kubernetes worker node 1"
ip = "<IP address in CIDR notation>"
memory = 4096
cores = 2
disk_size = "120G"
},
{
name = "k8s-worker-2"
pve_node = "<Proxmox host containing VM templates in the Proxmox cluster>"
desc = "Kubernetes worker node 2"
ip = "<IP address in CIDR notation>"
memory = 4096
cores = 2
disk_size = "120G"
},
{
name = "k8s-worker-3"
pve_node = "<Proxmox host containing VM templates in the Proxmox cluster>"
desc = "Kubernetes worker node 3"
ip = "<IP address in CIDR notation>"
memory = 4096
cores = 2
disk_size = "120G"
}
]
Since all the VMs have predefined IP addresses and usernames I can specify
all this in ~/.ssh/config
.
Host k8s-control-1
HostName <IP address>
User <Username>
Host k8s-worker-1
HostName <IP address>
User <Username>
Host k8s-worker-2
HostName <IP address>
User <Username>
Host k8s-worker-3
HostName <IP address>
User <Username>
Once each VM is up I can SSH into all of them without any passwords involved.
The Ansible playbook for the K3s cluster adds new hosts placed in the inventory. Removing nodes from the cluster is now reduced to the following steps:
- Drain the node
- Remove it from the cluster
- Destroy and recreate the VM of the node using Terraform
- Replace the old SSH public key of the VM on the Ansible control node
- Rerun the Ansible playbook to configure the new node
Unplugging the power from Raspberry Pis, reflashing SD cards, etc is now a thing of the past.
Given my current setup, it’s about as close as I’m going to get to a fully automated solution for managing the underlying infrastructure of the K3s nodes.
Conclusion
At this point I have migrated the entire K3s cluster away from Raspberry Pis and into VMs on Proxmox. Some of the larger pain points I’ve had previously is no longer a concern:
- Nodes in the cluster fail and become unresponsive due to overload, requiring physically reconnecting the power of the Raspberry Pi to force a reboot
- No more silently failing network interfaces which are only fixed by rebooting the Raspberry Pi
I now have IaC for the entire K3s cluster. Combine this with external backups of stateful data and it makes the setup robust enough that I can destroy and recreate the entire cluster when necessary.
As an added bonus, everything I cannot deploy as a container to the K3s cluster I can deploy as a VM using the same baseline template as the K3s nodes.
Some future improvements:
- No replication of Virtual Machine Templates is currently possible in the Proxmox
cluster. This forces me to use a single Proxmox host for first-time provisioning,
which is why the property
pve_node
is ignored. After the initial provisioning I have to migrate VMs manually to other nodes. I would have to look into storage in Proxmox that supports replication and use that for storing the templates to fix this. - Maybe look into using existing Terraform modules such as fvumbaca/k3s/proxmox to remove the need for having my own Ansible playbook for the initial configuration of the cluster. The Longhorn configuration would still be necessary.