March 30, 2026

Bootstrapping a Homelab the Hard Way

A view into my homelab and some notes on running a small dynamic cloud on three Raspberry Pis, three mini PCs, and an old Dell laptop.

I wanted a homelab where I could play around and break things without worrying about the time it takes to set it back up. I had three Minisform MS-01s, a Dell Inspiron Laptop that was collecting dust, and about two Raspberry Pis, accumulated for various automation and AI tasks (with the AI Hat+). The benefit of the three Minisforms was that they were enough to build a cluster in most cases, but now I have to flash an OS on three machines. So, inspired by production private cloud clusters, I decided to make most of my compute accessible on demand, dynamically through PXE. Currently, I am running a Canonical OpenStack cluster in my homelab.

The bootstrap problem

My homelab hits a chicken-and-egg problem: I want to manage machines with a tool like MAAS (Metal as a Service), but MAAS has to run somewhere. I want to monitor and update my machines using Landscape, but the Landscape server must also run somewhere. If every “somewhere” depends on a whole other bootstrap, I spend a lot of time configuring services that I want running across tests/projects.

My answer is a three-tier chain. The only machines I have ever hand-installed are three Raspberry Pi 5s (I added another one so I can cluster them). Everything else, including the machine that deploys OpenStack, gets PXE-booted and provisioned from those Pis. If I wiped the MS-01s and the Dell tomorrow, I could bring the whole lab back up without plugging in a USB stick.

Server rack showing 3 minisform ms-01 and a Asus Ascent. — The inside of the server rack. Excuse the messy cables at the bottom. The dust on here should prove my point of not touching this to redeploy.

Tier 0: The Raspberry Pis

The three Pi 5s run Ubuntu Core with MicroCloud. Ubuntu Core gives me a transactional, all-snap base that is hard to accidentally break. MicroCloud gives me a clustered LXC environment across the three of them without the overhead of full VMs. On top of that cluster, I run MAAS, Landscape Server, an LXC running Docker (Yes, nested containerization), and some other services I want to keep constant.

The reason MAAS runs on Pis rather than on something bigger is pragmatic. MAAS is not a high-frequency workload, and I wanted the provisioner to run on the machines that consume the least power and have the fewest moving parts, since it stays idle most of the time. Three Pis at the top of the rack, pulling a handful of watts each, is a much nicer failure surface than a single x86 box that has to stay up, or nothing else can boot. If one Pi goes down, the cluster remains functional.

Tier 1: The Dell Inspiron

Once MAAS is up, it can PXE-boot and commission an x86 node, hand it an Ubuntu image, and register it into Juju. The first machine I did that on is a Dell Inspiron. It is nothing special — an old laptop I had lying around — but it has a specific job: it runs a Juju controller and a small governor VM. For the current OpenStack, it is the perfect machine for services needed to build the cloud, but which are outside the cloud itself.

The Dell Laptop was chosen cause it was collecting dust and is an efficient but powerful source of computing to run a couple of services.

Tier 2: The MS-01s

The three Minisforum MS-01s are the real workhorse tier. Each one has a mobile Core i9-13900H with 20 cores, 64GB of RAM, dual 1TB NVMe drives, and, critically, an SFP+ port. They run Canonical OpenStack in a hyperconverged topology: control, compute, and storage on all nodes.

Three nodes are the minimum for a Ceph cluster that can tolerate a single-node failure, and it is also the minimum for any kind of HA control plane.

The MS-01 was the right box for this specifically because of its SFP+ ports and two 2.5G NICs. With this, I can put OAM and management traffic on 2.5G NICs and VLANs and Ceph replication on a 10G VLAN, without either interfering with the other. Additionally, with Intel AMT installed on these machines, I can enable remote KVM with no additional hardware. It gives me an experience of production servers with a dedicated BMC controller.

Networking: where most of the pain actually lives

Most of what has gone wrong in this lab has been at layer 2. VLANs sound simple on paper until you are debugging why a node commissioning in MAAS sees DHCP traffic but never gets an IP, and the answer turns out to be that your switch is dropping tagged frames on an access port because of a rogue config change made a while ago. I split the traffic into separate logical networks: OAM and management, which also carries PXE, and Ceph replication on its own VLAN over the SFP+ uplinks. Regular user traffic gets its own segmentation on top of it, so I don’t get people in the house coming to me saying that I took the internet down.

The L2/L3 boundary is where I have burned the most hours. When commissioning fails, or when Ceph decides it wants to route through the gateway instead of directly to each other (0.4 ms to 7ms btw), it is the switch, a forgotten trunk config, or a misconfigured networking feature.

What lives outside the cluster: NAS and the GPU tier

Two items in the rack are not part of the dynamic chain but are essential for daily use, what I jokingly call my “production” homelab. It is safely hidden in a closet.

Jonsbo NAS — My NAS Server, built after watching the LTT video on the case.

The first is a self-built NAS with 12 TB of storage. It runs separately from anything else and serves personal files: media, backups, and the long tail of things I do not want tied to my cluster, which I wipe every month or so. I deliberately did not fold it into Ceph because its failure domain is different. I want my data on something that survives a full homelab redeployment, not something whose uptime depends on whether I am currently messing with OpenStack.

The second tier is AI/ML: an NVIDIA Jetson Nano and an NVIDIA DGX Spark. The DGX Spark runs MicroK8s with the GPU Operator and time-slicing enabled, and it mounts the NAS over NFS, so my fine-tuning experiments have somewhere to land. These devices are less isolated from my homelab than my NAS. I occasionally extend the Kubernetes running on here into my cluster to run things like Kubeflow and offload to the Spark as a GPU-tainted node.

I wrote about my DGX Spark setup in a separate post: Kubernetes on the NVIDIA DGX Spark.

The Jetson is along for a future project involving a small ground robot and, eventually, a drone. That is a while away, and it is not doing anything useful today beyond sitting on the shelf, reminding me that it exists. It used to be part of a local-AI smart home project, but it broke, and I never got around to fixing it.

A jetson nano — The Jetson Nano, it does not do much at the moment.

What “production” would actually mean

I do not run any load-bearing components in this OpenStack. There are a handful of VMs that keep the infrastructure itself healthy — Landscape, some patch management components, COS (Canonical Observability Stack) for dashboard and monitoring, and some databases (PostgreSQL, MySQL) for testing. Calling this cluster “production” would require three things I do not have yet: real workloads I actually depend on, enough hardware for proper HA at every tier, and a push towards hybrid architecture (ARM and AMD).

That framing is also why I am happy to keep the lab in its current split state. The interesting work is not in making this cluster bigger. It is in making the existing pieces fit together better, and in pushing some of what I learn back into the projects I am building on.

Two hundred watts

The whole rack idles at around 200 watts. That is roughly the draw of a single gaming PC, running three OpenStack nodes, a MicroCloud cluster, a Juju controller, and the beginnings of a GPU tier. I think about that number a lot when I am deciding whether to add more hardware. Most of the time, the answer is that I can learn more by changing how the existing machines are wired up than by buying another one.

A power meter showing 195.1 Watts consumed — 200 W with everything running, while not runnung CPU benchmarks on everything ofcourse