Introducing Longhorn to the homelab

July 24, 2022

One thing missing from the homelab was the option to provide underlying persistent storage for workloads. I looked at a few options, but ultimately landed on Longhorn, which is a cloud native distributed block storage for Kubernetes.

Hardware upgrades

Before starting to make use of Longhorn I obviously need some disks formatted and mounted to a few of the Pis that will act as storage nodes. I went with 2x Kingston A400 2.5” SSD 240GB. There was no room left in my current case for the drives so I purchased a new case: Upgraded Complete Enclosure for Raspberry Pi Clusters which has 4x 2.5” slots, leaving 2 slots for future expansion. For cables I got the SSD to USB 3.0 cable based on the reviews and the fact that it has been tested specifically with the Raspberry Pi. The “Notes” section mentions:

Use an SSD when connecting
Preferably use with Raspberry Pi 4B, not 3B+ due to speed constraints with USB 2.0

This leaves 2 nodes in the cluster that qualify, which is why I only went with 2x disks for storage. This caused some headache during the installation that I’ll get into later.

While the new case is an improvement, it has one drawback. Unless you’re using a PoE switch with the PoE Hat, connecting the power source over USB-C requires unfastening the mounted Pi. I wish there was an easier way of connecting to a standard power source on an already mounted Pi. Other than that the case is great, and mounting the Pis, disks, and the remaining components was a breeze. Having space for a switch inside the bottom of the case is a bonus for keeping things tidy.

Baseline setup of storage nodes

Longhorn has a few installation requirements which can be automated using Ansible. This is an extracted version of the inventory and playbook I use for the storage nodes in the cluster (hostx represents fictitious hosts for demonstration).

This setup presumes a few things:

There is only a single disk device on each storage node
Disk device is represented identically on all storage nodes

I could have used lvm to pool disks, but all storage nodes will only ever have a single disk attached to it given the constraints of the case. I’d rather have more nodes with disks than nodes connected to multiple disks. Keeping it simple for now.

Inventory:

[k3s_servers]
host1 node_name=rpi4b1

[k3s_agents]
host2 node_name=rpi4b2
host3 node_name=rpi3b1
host4 node_name=rpi3b2

[storage_servers]
host1
host2

[storage_servers:vars]
device=/dev/sda
partition=/dev/sda1
mount_point=/mnt/data

Playbook:

- name: Format disks for storage before installing Longhorn
  hosts: storage_servers
  tags: longhorn
  tasks:
  - name: Partition the disk into a single large partition filling the disk
    become: yes
    parted:
      device: "{{ device }}"
      number: 1
      state: present
      part_start: "0%"
      part_end: "100%"
      unit: GB

  - name: Create filesystem on new partition
    become: yes
    filesystem:
      fstype: ext4
      dev: "{{ partition }}"

  - name: Mount drive on {{ mount_point }}
    become: yes
    mount:
      src: "{{ partition }}"
      path: "{{ mount_point }}"
      fstype: ext4
      state: mounted
      boot: yes
      backup: yes

- name: Prepare agents for Longhorn
  hosts: k3s_servers, k3s_agents
  tags: longhorn
  tasks:
  - name: Install necessary packages required by Longhorn
    become: yes
    apt:
      pkg:
      - jq
      - open-iscsi
      state: present

  - name: Start the systemd service for iscsid if it is not started
    become: yes
    systemd:
      name: iscsid
      enabled: yes
      state: started

- name: Verify prerequisites for Longhorn and label storage nodes
  hosts: k3s_servers
  tags: longhorn
  tasks:
  - name: Run the preconfig script
    become: yes
    shell: curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.2.4/scripts/environment_check.sh | bash
    register: check

  - name: Print output from check
    debug: msg="{{ check.stdout }}"

  # Label the nodes to ensure Longhorn volumes are only placed on disks of the storage nodes:
  # https://longhorn.io/kb/tip-only-use-storage-on-a-set-of-nodes/#tell-longhorn-to-create-a-default-disk-on-a-specific-set-of-nodes
  - name: Apply label "node.longhorn.io/create-default-disk=true" on storage nodes in Kubernetes
    become: yes
    command: kubectl label nodes --overwrite {{ hostvars[item].node_name }} node.longhorn.io/create-default-disk=true
    loop: "{{ hostvars[inventory_hostname]['groups']['storage_servers'] }}"

The tasks running the preconfig script and dumping the output will take care of verifying the prerequisites. Using a curl pipe into bash as sudo is not recommended on anything thats important, use at your own risk.

Creating backup target

I didn’t add this initially, but having a backup target in Longhorn makes sense not only in terms of data protection, but also for cluster upgrades.

There are two choices when considering backup targets for Longhorn, NFS or S3. I don’t have a NAS or a Minio instance running standalone from my cluster, so I opted for using Object Storage from Linode as a remote backup target. Any solution compatible with S3 or NFS, whether it is self-hosted or cloud-based will do.

Once a bucket and access keys has been created, the following must be configured:

The Kubernetes secret holding the URL, access key and access secret of the bucket
The S3-formatted URL of the bucket

In my case I’m deploying Longhorn into the namespace longhorn-system. Longhorn expects a secret in the following format (all items are base64-encoded as with any secret, but shown in cleartext for clarity):

secret.yaml:

apiVersion: v1
kind: Secret
metadata:
  name: longhorn-linode-s3-backup-credentials
  namespace: longhorn-system
type: Opaque
data:
  AWS_ENDPOINTS: "<s3-endpoint-excluding-bucket-name>" # https://<linode-region>.linodeobjects.com
  AWS_ACCESS_KEY_ID: "<access-key>" # Access Key connected to given bucket in Linode
  AWS_SECRET_ACCESS_KEY: "<access-secret>" # Access Secret connected to given bucket in Linode

AWS_ENDPOINTS does not include the name of the bucket. This is defined in the S3-formatted URL.

Once the secret is created, the last thing required for backups is formatting the URL of the S3-compatible bucket. This is somewhat confusing and not explained well enough in the docs, but the gist of it is to only include the name of the bucket and a placeholder for a region if you are not using S3 from AWS. This means that for any other alternative, the URL must have a region as a placeholder. The format of the URL is s3://<bucket-name>@<placeholder-region>/. The / at the end is required to have all backups placed at the path /backupstore. Append more after / to change the root path. Thanks to Citrullin for pointing out the formatting of the URL.

To give an example of formatting the S3-URL: Say I have a bucket named a hosted on us-east-1 in Linode. It does not matter what is added as region, but to keep things consistent I’ll use us-east-1 in the URL of the Backup Target. The final URL becomes s3://a@us-east-1/, where us-east-1 is just a placeholder.

These are the two pieces of information to keep for the installation process later:

The name of the secret created
The S3 formatted URL (s3://<bucket-name>@<placeholder-region>/)

Installing Longhorn

The installation process is fairly straightforward with Helm. The entire process is captured in a script named install.sh displayed below. This is where the S3-formatted URL and the name of the Kubernetes secret from earlier is used. The values are added as defaultSettings.backupTarget and defaultSettings.backupTargetCredentialSecret respectively.

Note that defining keys under defaultSettings in YAML cannot be done retroactively after the initial installation in 1.2.4, the settings won’t take effect and must be modified in the UI or via a full reinstall of Longhorn. This has been fixed in version 1.3.0.

install.sh:

#!/bin/bash

# Installs Longhorn into the cluster
# for storage.

helm repo add longhorn https://charts.longhorn.io
helm repo update
helm upgrade --install \
--version 1.2.4 \
--namespace longhorn-system \
longhorn longhorn/longhorn \
-f values.yaml \
--set-string defaultSettings.backupTarget="<s3-formatted-url-for-backups>" \
--set-string defaultSettings.backupTargetCredentialSecret="<name-of-the-secret-with-credentials>"

# Remove the default flag from the local-provisioner
# storage class which K3s adds as part of a standard
# installation since longhorn will be the default storage
# class after the installation.
kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'

values.yaml:

# Description of values are explained here:
# https://github.com/longhorn/charts/blob/longhorn-1.2.4/charts/longhorn/questions.yml

# This is because I want to expose the
# UI of Longhorn outside the cluster
# with an IP address. Optional.
service:
  ui:
    type: LoadBalancer

persistence:
  # Is by default 3, since there are
  # only 2 nodes in the cluster dedicated to
  # storage this must be set to 2.
  defaultClassReplicaCount: 2

# Only important if you run Prometheus
# and want to scrape metrics from Longhorn
# and display it with something like Grafana
# in a dashboard.
annotations:
  prometheus.io/scrape: "true"

defaultSettings:
  # Mountpoint of the filesystem on the two Raspberry Pis
  # dedicated to storage.
  defaultDataPath: /mnt/data
  taintToleration: node-role.kubernetes.io/master:NoSchedule
  # Ensure that only specifically labeled nodes are used for storing
  # longhorn volumes: https://longhorn.io/kb/tip-only-use-storage-on-a-set-of-nodes/#tell-longhorn-to-create-a-default-disk-on-a-specific-set-of-nodes
  createDefaultDiskLabeledNodes: true
  storageMinimalAvailablePercentage: 10
  # Ensure that volumes created from the UI
  # of Longhorn has 2 replicas set by default
  # for the same reasons persistence.defaultClassReplicaCount
  # is set to 2 instead of 3 replicas.
  defaultReplicaCount: 2

longhornManager:
  # Let the control plane nodes with
  # disks mounted be available for
  # scheduling. 
  tolerations:
  - key: ""
    operator: "Exists"
    effect: "NoSchedule"

longhornDriver:
  # Let the control plane nodes with
  # disks mounted be available for
  # scheduling.
  tolerations:
  - key: ""
    operator: "Exists"
    effect: "NoSchedule"

Notable values are persistence.defaultClassReplicaCount, defaultSettings.defaultReplicaCount and defaultSettings.createDefaultDiskLabeledNodes. By default Longhorn wants 3 replicas of volumes. With only 2 dedicated storage nodes the installation is instructed to make only 2 replicas. Both storage nodes were labeled node.longhorn.io/create-default-disk=true as part of the Ansible playbook. Adding that specific label in combination with the setting defaultSettings.createDefaultDiskLabeledNodes=true ensures replicas are only scheduled to nodes containing that specific label and excludes the rest of the nodes. This is important since by default any node in the cluster is considered to be eligible as a storage node.

At the end of the script there is a patching of the StorageClass for local-path:

kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'

The patching of the StorageClass is only necessary since K3s ships with its own default StorageClass local-path. By patching the StorageClass to no longer be the default, the Longhorn StorageClass takes over as the default after the installation:

NAME                 PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
local-path           rancher.io/local-path   Delete          WaitForFirstConsumer   false                  42h
longhorn (default)   driver.longhorn.io      Delete          Immediate              true                   40h

After the installation, any PersistentVolume created in the cluster is by default a Longhorn volume.

With Longhorn installed its time to setup two RecurringJobs, which are basically cron-jobs for taking snapshots and backups of Longhorn volumes which are then pushed to the Backup Target. Adding the group default ensures that all volumes not involved in another recurring job is automatically included by the jobs. Snapshots and backups of volumes are then taken care of by default for all volumes.

recurring_snapshot_job.yaml (snapshot volumes every 12 hours):

apiVersion: longhorn.io/v1beta1
kind: RecurringJob
metadata:
  name: 12-hour-snapshots
  namespace: longhorn-system
spec:
  cron: "0 */12 * * *"
  task: "snapshot"
  groups:
  - default
  retain: 3
  concurrency: 2
  labels:
    job: 12-hour-snapshots

recurring_backup_job.yaml (backup volumes at 00:00 every day):

apiVersion: longhorn.io/v1beta1
kind: RecurringJob
metadata:
  name: nightly-backups
  namespace: longhorn-system
spec:
  cron: "0 0 * * *"
  task: "backup"
  groups:
  - default
  retain: 3
  concurrency: 2
  labels:
    job: nightly-backups

Summary of what has been done up until now:

Longhorn is installed and takes care of provisioning Longhorn volumes for any PersistentVolume using the default StorageClass in the cluster.
All Longhorn volumes provisioned are replicated across the two storage nodes and can be attached to any node in the Kubernetes cluster.
All volumes created in Longhorn has snapshots and backups taken, which are then pushed to an S3 bucket at regular intervals.

The last point is important, because it is the foundation of the upgrade strategy explained in the next section.

Cluster upgrades

Rather than performing in-place upgrades of the Kubernetes cluster as new versions are released, it should be possible to treat the cluster like immutable infrastructure that can be destroyed and recreated on demand. The cluster state and the state of all workloads must be possible to reproduce after a cluster upgrade using configuration as code, scripts and restored volumes. Given that the Pis themselves are physical, I want to be able to uninstall everything, delete the disk partitions and reflash the SD-cards.

Having a remote backup of volumes used by workloads solves the issue of data migration for this upgrade strategy. An installation of Longhorn which points to the same backup target after a reinstall has access to all the backups of all volumes from the previous cluster state. PersistentVolumes and PersistentVolumeClaims can be created from those backups. As long as workloads use consistent naming of PersistentVolumeClaims they can bind to claims with the same name.

This upgrade process is reliant on all workloads in the cluster being defined as code. All workloads must be defined as YAML configuration and scripts which should be checked into version control.

The steps of the upgrade process then becomes:

Backup Longhorn volumes if last backup taken is too old (optional)
Delete all non-system namespaces (depends on how graceful the uninstall should be. Can be skipped)
Remove all cluster configuration and uninstall all components using an Ansible playbook, including Longhorn configuration which also deletes the disk partition
Change Ansible playbook used to install K3s. Bump the channel and version of Kubernetes and run the playbook which installs K3s and prepares storage nodes for Longhorn
Reinstall necessary system components required for Longhorn (only really needed for having access to the Longhorn UI outside, can be skipped if ClusterIP and port forwarding is used):
1. MetalLB
Reinstall Longhorn and point it to the same backup target that the previous installation was using
Restore all volumes back into the cluster using the Longhorn UI
Recreate all namespaces in the cluster used by workloads
Create PersistentVolumes and PersistentVolumeClaims for each volume in Longhorn and deploy PersistentVolumeClaims into the correct namespaces using the Longhorn UI. Ensure naming is consistent with the workload YAML
Deploy all workloads to the cluster. All workloads should bind to the previously created PersistentVolumeClaims and spin up with the data from the backup

Several of these steps are redundant if Velero is used. There is documentation on how to make use of Velero as part of a backup and restore strategy with Longhorn.

The upgrade process described above is how I upgraded my homelab cluster from 1.21.7 to 1.22.12.

Conclusion

I find Longhorn to be a fairly straightforward solution for persistent storage. There is some setup required, but I have yet to see anything listed in the docs I couldn’t automate using Ansible or Helm. The backup feature combined with RecurringJobs is a good utility for data protection, but I mainly see it as something simplifying the upgrade process of the cluster in a true “cattle not pets” fashion.

There are some things that I think will improve with better documentation and more iteration on the solution. I don’t like that finding the correct way to format URLs for an S3-compatible service not on AWS includes digging through issues and discussions on Github. The placeholder region which is required as part of the S3-formatted URL when it isn’t even used is not very intuitive. But all things considered these are small things to point out on an otherwise good solution for persistent storage in Kubernetes.