gnode: IaC for a single-node Kubernetes cluster on Azure

2025-23-12

Motivation

A lot of what I know today about Software Engineering, Cloud Infrastructure, Kubernetes, Linux, etc came from building my own physical homelab a few years ago. I bought a couple of Dell Optiplex workstations from Craigslist for $150 each and wired them up via Ethernet to my home router. Then I went through the process of:

Unfortunately, when I moved to San Francisco in May of 2025 I had to leave my homelab behind. I realized that beyond the learning aspect, having some persistent machines to host random stuff on is really convenient for side projects and the cloud is really expensive. Fortunately, as a LinkedIn employee I get $150/mo in Azure credits. I set out to see what the best homelab I could build on Azure for that $150/mo could be. Thus, gnode was born.

Design and Results

gnode is just two Terraform modules and a bash script that:

To deploy the whole thing you just need to run the gnode.sh script. It prompts you to enter the required variables and secrets then performs the required terraform apply steps.

The Terraform modules also create a local kubeconfig and configure the local IP in the NSG for the VM so that as soon as the deployment is done you have access to the "cluster" from your local machine via standard kubectl. If you move around you might need to manually add more IPs to the NSG on port 6443.

I also use Cloudflare as my domain registrar mainly because they are so simple to use, have a good Terraform provider, and make it easy to add a proxy so that the server's IP is not easily visible.

Why Kubernetes

A lot of people will probably wonder why Kubernetes is worth the trouble if it's only running on a single node. A simple answer is: because I like it. A less simple answer is: it provides a declarative API for managing infrastructure. This is convenient at every scale, from a single node to a hundred-thousand nodes. Helm is also a huge benefit, it really is a "package manager" for Kubernetes and makes installing large collections of software (like the monitoring stack in gnode) trivial. You'd have a hard time convincing me that it's any easier with Docker Compose or shell scripts.

Performance

Currently, this website is hosted on my personal gnode instance with a VM SKU of Standard_D4s_v3. This SKU has 4 vCPUs and 16GB RAM. It's kind of crazy that's all you can get in the cloud for about $150/mo. Nonetheless, it seems to be plenty to run gnode and atleast a few of your own deployments provided they are low-traffic. Before deploying anything extra the whole setup sits at about 5% CPU utilization and uses 1GB of RAM which leaves quite a bit of overhead to play with.

In addition to this blog, I plan on deploying ClickHouse and hooking it up to the Grafana instance that comes with kube-prometheus-stack to track metrics for training jobs from my ML project gaia.

Learnings

gnode originally started off as a single Terraform module but I ran into a chicken-and-egg problem when trying to deploy the Helm charts. The kubernetes Terraform provider requires a valid kubeconfig to be present to initialize properly. Of course, there is no valid kubeconfig until the node is deployed and Kubernetes is installed on it. I first tried configuring a dummy kubeconfig but kept running into issues and decided that the cleanest approach would be to split everything into two modules: infra and apps. In retrospect, this was probably the better implementation to begin with as its a cleaner seperation of concerns.

I want CI/CD pipelines for the apps I decide to deploy to gnode. The simplest way to accomplish that is via kubectl apply within GitHub Actions. The issue is that you don't really want to publically expose port 6443 (kube-api-server) on your cluster since its vulnerable to attacks. The answer is to just allow GH Actions IPs withing the NSG to access port 6443. What I came to realize is that the list of GH Actions IP CIDR blocks is over 6000 entries long. This presents an issue because Azure NSGs have a 4000 address limit on Security Rules. I compromised by taking most of the IPv4 addresses and a few IPv6 addresses. This implies to me that if you land on a bad runner then the deployment will fail. For a low-stakes situation like this case I'm fine with that and will just rerun the job in that case to avoid introducing additional complexity. I wanted to keep gnode as simple as possible so adding e.g. OIDC or similar machinery for auth seemed like overkill.

References

gnode repo

k3s docs

My original homelab repo