Recreating the Raspberry Pi homelab with Kubernetes
Edit (26.08.2021): The initial playbooks did not include disabling/enabling of swap as part of the setup/teardown. I’ve updated the Ansible playbooks to reflect these changes.
I first started thinking about setting up a Kubernetes cluster on a few Raspberry Pis last year. At the time I had some issues related to the setup of K3s and MetalLB, which proved to be more difficult than I had thought initially. I ended up abandoning the project in favor of a homelab based only on Docker for running applications and services. This is how I self-hosted Pi-Hole.
I decided some time ago to restart the project of setting up Kubernetes on Raspberry Pi with K3s, while also investing a bit more effort into the setup. More specifically, I wanted it to be more fault-tolerant than the previous one. As I discovered when running Pi-Hole on a single host: sometimes you’ll have issues related to SD-Cards or memory usage at the worst possible time. This is especially annoying when the Pi also serves as your primary DNS server.
The Build
Part of the motivation behind this project was to tidy up the current setup. Previously I’ve had 1 Raspberry Pi 4B and 2x Raspberry Pi 3B+, each coupled to a power-outlet, using wifi to connect to the network. To improve the setup I wanted all Pis colocated in a rack tower, with only physical connections to the network.
Last time I tried running K3s I kept having issues with reliability running larger work-loads. I noticed that while the server node (4B) seemed to manage fine, the agents (3B+) struggled much more running certain monitoring and logging stacks. To avoid this issue, I purchased another 4B to increase the total throughput in the cluster. One of the 4Bs will continue as the designated server, while the other will be an agent. Overall this should increase the performance quite a bit. If I see that the 3B+ agents continue to struggle, I can look into providing them lighter loads from the scheduler, or replace them with 4B agents.
For the networking part, I was 1 ethernet port short on my router from being able to connect all 4x Raspberry Pis to it. I ended up purchasing an 8 port switch to fix the issue, this also gives me some room in the future for adding more devices.
The final list of parts in the cluster ended up being the following (including devices I already had, and new additions):
- S2Pi ZP-0088 Rack Tower
- TP-Link TL-SG108E
- 2x Raspberry Pi 4B
- 2x Raspberry Pi 3B+
- 4x Cat6 Ethernet cables, 0.5 meters
- 4x SD-Cards
- Power adapters
The Network
The physical network setup is fairly straightforward:
Router
|
V
Switch
|
V
Raspberry Pi [1:4]
The Raspberry Pis are all setup with DHCP reservations in the router, which also gives each Pi a DNS A record. Another option would have been to go with static IP addresses, I’ll explore that option if I see that there is a need for it.
I considered creating a VLAN in the router or the switch but decided against it to keep things simple for now.
The Cluster
I want the setup of the cluster structured as code. If an SD-card breaks, hardware fails, or something else should happen, I want to know that I can recreate parts of the cluster or the entire cluster by flashing a few SD-cards and running some commands.
There are several pre-configured options for setting up a Kubernetes cluster, I decided to stick with K3s since it is lightweight and I already had some experience with it. I considered using k3sup since it has a dedicated guide for setting up K3s on Raspberry Pi. A drawback for me was that I still had to manually modify kernel settings each time I flashed a new SD-card. I could have opted for a shell script to make the modifications, but then I would also have to make the script idempotent. In the end I opted to automate everything using Ansible and writing my own playbooks.
Creating the Ansible playbook to setup the cluster was one of the more time-consuming parts of the project.
Especially the bit related to configuring the kernel settings in /boot/cmdline.txt
. I chose to not include
removal of those same kernel settings as part of the playbook for uninstalling K3s, since most of my projects
run as containers.
Ansible playbooks
Playbook which sets up the cluster:
# site.yml
---
- name: Setup baseline for all k3s nodes
hosts: k3s_servers, k3s_agents
vars:
boot_file: /boot/cmdline.txt
tasks:
- name: Check if cgroups are set correctly
become: yes
shell: grep -c 'cgroup_memory=1 cgroup_enable=memory' '{{ boot_file }}' | cat
register: cmdline
ignore_errors: yes
- name: Ensure cgroups are correctly set by updating raspberry pi
become: yes
ansible.builtin.lineinfile:
state: present
backrefs: yes
path: '{{ boot_file }}'
regexp: '(.+)(?!cgroup_memory=1 cgroup_enable=memory)'
line: '\1 cgroup_memory=1 cgroup_enable=memory'
when: cmdline.stdout == "0"
- name: Disable swap with dphys-swapfile
become: yes
shell: dphys-swapfile swapoff && dphys-swapfile uninstall && update-rc.d dphys-swapfile remove
- name: Disable dphys-swapfile service
become: yes
systemd:
name: dphys-swapfile
enabled: no
register: swapfile_service
- name: Reboot host if system settings were updated
become: yes
ansible.builtin.reboot:
reboot_timeout: 3600
when: cmdline.stdout == "0" or swapfile_service.changed
- name: Setup k3s servers
hosts: k3s_servers
tasks:
- name: Check if k3s is already installed
ansible.builtin.stat:
path: /usr/local/bin/k3s
register: k3s
- name: Install k3s on server
become: yes
shell: curl -sfL https://get.k3s.io | sh -
environment:
K3S_NODE_NAME: "{{ inventory_hostname }}"
INSTALL_K3S_EXEC: "--disable servicelb"
when: not k3s.stat.exists
- name: Get node join token
become: yes
ansible.builtin.fetch:
src: /var/lib/rancher/k3s/server/token
dest: 'node_join_token'
flat: yes
- name: Setup k3s agents
hosts: k3s_agents
tasks:
- name: Check if k3s is already installed
ansible.builtin.stat:
path: /usr/local/bin/k3s
register: k3s
- name: Extract k3s server node token from control node
local_action:
module: shell
cmd: cat node_join_token
register: node_join_token
- name: Install k3s on agent
become: yes
shell: curl -sfL https://get.k3s.io | sh -
environment:
K3S_TOKEN: "{{ node_join_token.stdout }}"
# Select the first host in the group of k3s servers as the server for the agent
K3S_URL: "https://{{ groups['k3s_servers'] | first }}:6443"
K3S_NODE_NAME: "{{ inventory_hostname }}"
when: not k3s.stat.exists
- name: Wait for all nodes to complete their registraion
hosts: k3s_servers
vars:
total_amount_of_nodes: "{{ groups['k3s_servers'] | count + groups['k3s_agents'] | count }}"
tasks:
- name: Wait until all agents are registered
become: yes
shell: k3s kubectl get nodes --no-headers | wc -l
until: agents.stdout | int == total_amount_of_nodes | int
register: agents
retries: 10
delay: 10
- name: Copy kubectl config from server to temp .kube directory on control node
become: yes
ansible.builtin.fetch:
src: /etc/rancher/k3s/k3s.yaml
dest: .kube/k3s-config
# The kubeconfig should be identical for all servers
flat: yes
- name: Setup kubectl on control node with new context
hosts: localhost
tasks:
- name: Create $HOME/.kube directory if not present
ansible.builtin.file:
path: $HOME/.kube
state: directory
- name: Replace the server reference in k3s kube config with IP of a server node
ansible.builtin.replace:
path: .kube/k3s-config
regexp: '127\.0\.0\.1'
replace: "{{ groups['k3s_servers'] | first }}"
backup: yes
- name: Copy k3s kube config to $HOME/.kube directory
ansible.builtin.copy:
src: .kube/k3s-config
dest: $HOME/.kube
mode: '600'
- name: Remove node join token
ansible.builtin.file:
path: node_join_token
state: absent
- name: Remove temporary kube config directory
ansible.builtin.file:
path: .kube
state: absent
Playbook for deleting the cluster, without reverting the kernel settings:
# teardown.yml
---
- name: Reset system configuration
hosts: k3s_servers, k3s_agents
tasks:
- name: Enable swap with dphys-swapfile
become: yes
shell: dphys-swapfile setup && dphys-swapfile swapon && update-rc.d dphys-swapfile start
- name: Enable dphys-swapfile service
become: yes
systemd:
name: dphys-swapfile
enabled: yes
state: started
register: swapfile_service
- name: Reboot host if system settings were updated
become: yes
ansible.builtin.reboot:
reboot_timeout: 3600
when: swapfile_service.changed
- name: Uninstall k3s agents
hosts: k3s_agents
tasks:
- name: Check if script for uninstalling k3s agent is present
stat:
path: /usr/local/bin/k3s-agent-uninstall.sh
register: k3s_present
- name: Uninstall k3s from agent
shell: /usr/local/bin/k3s-agent-uninstall.sh
when: k3s_present.stat.exists
- name: Uninstall k3s servers
hosts: k3s_servers
tasks:
- name: Check if script for uninstalling k3s server is present
stat:
path: /usr/local/bin/k3s-uninstall.sh
register: k3s_present
- name: Uninstall k3s from server
shell: /usr/local/bin/k3s-uninstall.sh
when: k3s_present.stat.exists
Inventory:
[k3s_servers]
rpi4b1
[k3s_agents]
rpi4b2
rpi3b1
rpi3b2
The only manual process left in the setup is flashing SD-cards and adding configuration in the boot partition to enable the ssh server.
With the setup of the cluster itself automated, it was time to configure it.
Cluster configuration
MetalLB
In order to run apps such as Pi-Hole, I need a way to ensure a Service of type LoadBalancer in Kubernetes is exposed with a valid IP address in the network. By default K3s ships with a load balancer named Klipper Load Balancer, which according to the documentation works by reserving ports on nodes. This means that the IP address of the nodes themselves in combination with ports is how Services are seen outside the cluster. Also, once a certain port is reserved on all nodes in the cluster, it can no longer be used for any new Service. Instead of using this approach, I would rather make use of the available address space in my network for Services.
Enter MetalLB, a network load balancer implementation for bare-metal Kubernetes clusters. MetalLB uses lower-level networking protocols to advertise each Service of type LoadBalancer. It is compatible with Flannel, which ships with K3s by default. Using MetalLB requires disabling the Klipper Load Balancer. There are two modes, Layer2 and BGP. Since my router does not support BGP, I opted for Layer2.
MetalLB needs a pool of IP addresses which it can allocate to Services. I selected a range of IP addresses outside the range of the existing DHCP server in the router. I then created 2 pools in MetalLB, one for Pi-Hole and one default. The Pi-Hole pool only contains a single IP address. The reason for locking down the IP address used for the Pi-Hole Service is so avoid reconfiguring static DNS servers in the router, even if the Service is recreated. With this setup, a “static” IP assignment can be configured by only using pools and annotations in MetalLB (more on this in the section about installing Pi-Hole).
To manage applications in the cluster I use Helm. There is an official MetalLB chart available.
The values.yaml
for the MetalLB chart:
# values.yaml
configInline:
address-pools:
- name: default
protocol: layer2
addresses:
- 192.168.1.154-192.168.1.254
# Ensures there is only ever a single IP address
# that can be given to the pihole service
- name: pihole
protocol: layer2
addresses:
- 192.168.1.153-192.168.1.153
Installation of MetalLB: helm upgrade --install metallb metallb/metallb -f values.yaml
.
Installing Pi-Hole
The final step to replicate the old homelab is to install Pi-Hole. I did this using the chart from Mojo2600.
The annotation metallb.universe.tf/allow-shared-ip
ensures that both UDP and TCP communication
is colocated on the same Service IP address.
The values.yaml
for the chart:
replicaCount: 2
dnsmasq:
customDnsEntries:
# Add custom DNS records in
# dnsmasq-installation of Pi-Hole
- address=/pihole.local/192.168.1.153
persistentVolumeClaim:
enabled: false
serviceWeb:
# The static LoadBalancer IP address for serviceWeb and
# serviceDns does not have to be set, since the pool "pihole"
# in metallb will only contain a single IP address that can
# be allocated when using the address-pool "pihole".
annotations:
# Ensures that the pihole receives IP address from
# predefined pool in metallb
metallb.universe.tf/address-pool: pihole
# This ensures that port 53 for TCP and UDP is colocated
# on the same IP address.
metallb.universe.tf/allow-shared-ip: pihole-svc
type: LoadBalancer
serviceDns:
annotations:
# Ensures that the pihole receives IP address from
# predefined pool in metallb
metallb.universe.tf/address-pool: pihole
# This ensures that port 53 for TCP and UDP is colocated
# on the same IP address.
metallb.universe.tf/allow-shared-ip: pihole-svc
type: LoadBalancer
Installation of Pi-Hole: helm upgrade --install pihole mojo2600/pihole -f values.yaml --set adminPassword=<pihole-admin-password>
.
Wrapping up
At this point I have a 4 node cluster colocated in a rack tower with only physical networking. K3s has a network load balancer implementation providing IP addresses within the network. Pi-Hole is installed and serves as the primary DNS server in the router, blocking ads across the network.
The entire setup is checked into code using Ansible and Helm. In the event of a single node failure or cluster-wide failure I can replace almost everything by flashing some SD-cards and running a few commands. I say almost because there is nothing in this setup that takes into account storage and recovery of data. This is something I’ll have to look into once I start deploying work-loads that rely on data persistence.
Although this is a work in progress, it serves as a good baseline for a new homelab.
Resources
While doing researching for this project, I came across several well written articles that deserves mentioning: