ansible/ ├── ansible.cfg ← tells Ansible where the inventory is + SSH settings ├── site.yml ← the entry point — defines the 3 plays in order ├── inventory/hosts.ini ← lists all nodes + their IPs + SSH key paths ├── group_vars/all.yml ← shared variables (k8s version, pod CIDR, Calico version) └── roles/ ├── common/ ← Play 1: runs on ALL 3 nodes │ ├── tasks/main.yml ← installs containerd + kubelet/kubeadm/kubectl │ └── handlers/main.yml ← restarts containerd / kubelet when config changes ├── master/ ← Play 2: runs on k8s-master ONLY │ └── tasks/main.yml ← kubeadm init, kubeconfig, Calico, join command └── worker/ ← Play 3: runs on k8s-worker-1 + k8s-worker-2 └── tasks/main.yml ← joins cluster using master's join command
Ansible reads ansible.cfg first to find the inventory and SSH settings.
You run ansible-playbook site.yml and the three plays execute in sequence —
common finishes on all nodes before master starts, and master finishes before workers join.
[defaults] inventory = inventory/hosts.ini host_key_checking = False remote_user = vagrant [ssh_connection] pipelining = True
-i inventory/hosts.ini on every command. Relative to
where ansible.cfg lives.
vagrant
user with passwordless sudo. This tells Ansible to SSH as that user on every host.
# Kubernetes k8s_version: "1.32" pod_cidr: "10.244.0.0/16" master_ip: "192.168.56.10" # Calico CNI calico_version: "v3.29.1" # MetalLB (Phase 02) metallb_version: "0.14.9" metallb_ip_range: "192.168.56.200-192.168.56.250" # NGINX Ingress (Phase 02) nginx_ingress_version: "4.11.3" # ArgoCD (Phase 03) argocd_version: "v2.13.3" # Online Boutique (Phase 03) boutique_version: "HEAD" # Monitoring (Phase 04) # chart version auto-resolved at deploy time via helm search repo grafana_admin_password: "grafana"
Ansible automatically loads any file named all.yml inside group_vars/
and makes every variable available to every play and role. This is the single source of truth
for cluster-wide settings — change a version here and it updates everywhere.
pkgs.k8s.io/core:/stable:/v1.32/deb/. Bumping this to
"1.33" automatically updates the repo and package installs in the
common role.
kubeadm init --pod-network-cidr and the Calico manifest.
Both must match — if they differ, Calico rejects pod traffic.
kubeadm init --apiserver-advertise-address. Tells the
API server which NIC to bind to — must be the private network IP, not the
Vagrant NAT interface.
192.168.56.200-192.168.56.250 sits above the node IPs
(.10/.11/.12) in the same host-only subnet, so your laptop can reach them directly.
helm upgrade --install --version in the nginx-ingress role.
Pins the Helm chart version so every run installs the same controller binary.
HEAD tracks the latest commit on the default branch — ensures
current k8s API compatibility without pinning to a specific tag.
grafana.adminPassword.
Sets the password for the built-in admin account at
http://grafana.lab.local. Change before deploying outside a local lab.
helm search repo.
The playbook queries the latest available chart version automatically and
prints it before installing, so every run uses the newest stable release.
[master] k8s-master ansible_host=192.168.56.10 ansible_ssh_private_key_file=../vagrant/.vagrant/machines/k8s-master/virtualbox/private_key [workers] k8s-worker-1 ansible_host=192.168.56.11 ansible_ssh_private_key_file=../vagrant/.vagrant/machines/k8s-worker-1/virtualbox/private_key k8s-worker-2 ansible_host=192.168.56.12 ansible_ssh_private_key_file=../vagrant/.vagrant/machines/k8s-worker-2/virtualbox/private_key [k8s_cluster:children] master workers [k8s_cluster:vars] ansible_user = vagrant
The inventory is Ansible's map of the world — it defines every host and how to reach them.
site.yml
reference these directly: hosts: master runs only on the master group,
hosts: workers runs on both workers simultaneously.
k8s-master) is just a label — without ansible_host, Ansible
would try to DNS-resolve "k8s-master", which doesn't exist.
.vagrant/machines/<name>/virtualbox/private_key.
This points Ansible to that key so it can log in without a password.
master and workers. Used in site.yml Play 1:
hosts: k8s_cluster targets all 3 nodes at once.
k8s_cluster group. ansible_user = vagrant sets the SSH
login user for all nodes in one place.
--- # Run: ansible-playbook site.yml - name: "Phase 01 | Common — containerd & Kubernetes packages" hosts: k8s_cluster # all 3 nodes become: true roles: - common - name: "Phase 01 | Master — kubeadm init & Calico CNI" hosts: master # k8s-master only become: true roles: - master - name: "Phase 01 | Workers — join cluster" hosts: workers # both workers in parallel become: true roles: - worker
site.yml is the only file you ever run directly. It defines three plays that
execute in strict order — the next play only starts when the previous one succeeds on every
targeted host.
kubeadm all need
root access. The vagrant user has passwordless sudo so no prompt appears.
# ── Install containerd ────────────────────────────── - name: Install containerd apt: name: containerd state: present - name: Generate default containerd config shell: containerd config default > /etc/containerd/config.toml args: creates: /etc/containerd/config.toml # idempotent — skip if file exists notify: Restart containerd - name: Enable SystemdCgroup in containerd config replace: path: /etc/containerd/config.toml regexp: 'SystemdCgroup\s*=\s*false' replace: 'SystemdCgroup = true' notify: Restart containerd - name: Flush handlers # restart containerd NOW before K8s install meta: flush_handlers # ── Add Kubernetes apt repo ───────────────────────── - name: Download Kubernetes apt signing key shell: > curl -fsSL https://pkgs.k8s.io/core:/stable:/v{{ k8s_version }}/deb/Release.key | gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg args: creates: /etc/apt/keyrings/kubernetes-apt-keyring.gpg - name: Add Kubernetes apt repository apt_repository: repo: "deb [signed-by=...] https://pkgs.k8s.io/core:/stable:/v{{ k8s_version }}/deb/ /" filename: kubernetes # ── Install + lock K8s components ────────────────── - name: Install kubelet, kubeadm, kubectl apt: name: [ kubelet, kubeadm, kubectl ] state: present - name: Hold Kubernetes packages at current version dpkg_selections: name: "{{ item }}" selection: hold loop: [ kubelet, kubeadm, kubectl ] # ── Pin kubelet to private NIC ────────────────────── - name: Set kubelet node IP to private network interface copy: content: "KUBELET_EXTRA_ARGS=--node-ip={{ ansible_host }}\n" dest: /etc/default/kubelet
cgroupfs but systemd-based systems (like
Ubuntu) use systemd. If they disagree, the node crashes at random
intervals due to conflicting memory accounting.
flush_handlers forces them to run
right now, ensuring containerd is restarted with the new config before the K8s
packages are installed.
apt upgrade never touches them. A surprise kubeadm upgrade mid-cluster
would break the control plane.
10.0.2.15 — same on every VM) and nodes can't find each other.
{{ ansible_host }} resolves to the host's IP from the inventory
(192.168.56.10/.11/.12).
--- - name: Restart containerd systemd: name: containerd state: restarted daemon_reload: yes - name: Restart kubelet systemd: name: kubelet state: restarted daemon_reload: yes
Handlers are special tasks that only run when notified by another task — and only once, even if notified multiple times. They are Ansible's way of restarting services only when something actually changed.
notify: Restart containerd.
If either task makes a change, the handler fires. If neither changed (re-run on
already-configured node), the handler is skipped entirely.
.service files were changed;
harmless if they weren't.
state: restarted restarts the service every single time the playbook runs,
even when nothing changed. Handlers restart only on actual change — keeping re-runs fast
and non-disruptive.
# ── Idempotency check ─────────────────────────────── - name: Check if cluster is already initialised stat: path: /etc/kubernetes/admin.conf register: kubeadm_init - name: Initialise Kubernetes cluster with kubeadm command: > kubeadm init --pod-network-cidr={{ pod_cidr }} --apiserver-advertise-address={{ master_ip }} --node-name=k8s-master when: not kubeadm_init.stat.exists # ── kubeconfig for vagrant user ───────────────────── - name: Create .kube directory for vagrant user file: path: /home/vagrant/.kube state: directory owner: vagrant mode: '0700' - name: Copy admin.conf to vagrant user kubeconfig copy: src: /etc/kubernetes/admin.conf dest: /home/vagrant/.kube/config owner: vagrant mode: '0600' remote_src: yes # ── Calico CNI (Tigera Operator) ──────────────────── - name: Install Tigera Operator become: false command: kubectl apply -f https://.../tigera-operator.yaml environment: KUBECONFIG: /home/vagrant/.kube/config - name: Set pod CIDR in Calico manifest to match kubeadm init replace: path: /tmp/calico-custom-resources.yaml regexp: '192\.168\.0\.0/16' replace: "{{ pod_cidr }}" # ── Wait + generate join command ──────────────────── - name: Wait for k8s-master node to be Ready (up to 3 min) become: false command: kubectl wait --for=condition=Ready node/k8s-master --timeout=180s environment: KUBECONFIG: /home/vagrant/.kube/config - name: Generate worker join command command: kubeadm token create --print-join-command register: kubeadm_join_output - name: Expose join command as Ansible fact set_fact: join_command: "{{ kubeadm_join_output.stdout }}"
/etc/kubernetes/admin.conf already exists, kubeadm init
was already run — skip it. Re-running the playbook is safe.
--pod-network-cidr must match the
Calico config; --apiserver-advertise-address binds the API server to the
private NIC (not the NAT interface); --node-name must match the VM hostname.
copy module that the source
file is already on the remote machine (not on your laptop). Without it, Ansible would
look for admin.conf locally and fail.
vagrant user, so kubectl must run as vagrant, not root. These tasks
override the play-level become: true.
custom-resources.yaml
hardcodes 192.168.0.0/16 as the pod CIDR. We patch it to
10.244.0.0/16 to match kubeadm init. If they differ,
Calico creates the wrong IP pool and pods get unroutable addresses.
Ready. Without this gate, workers could try to join before the API server
and Calico are fully operational, causing flaky join failures.
register captures the command output
into a variable. set_fact promotes it to a persistent host fact that survives
across plays. Workers read it in the next play via hostvars.
--- - name: Check if node has already joined the cluster stat: path: /etc/kubernetes/kubelet.conf register: kubelet_conf - name: Join cluster using master join command command: "{{ hostvars['k8s-master']['join_command'] }}" when: not kubelet_conf.stat.exists
Two tasks. That's all workers need — the heavy lifting was done by the master role.
kubeadm join. If it exists, the node is already in the
cluster — skip the join.
set_fact in the master role. hostvars is a special Ansible
dictionary that holds all facts for all hosts collected during the run. This bridges
the data from Play 2 (master) into Play 3 (workers) with no files written to disk.
kubeadm join 192.168.56.10:6443 --token abc123... --discovery-token-ca-cert-hash sha256:xyz...Play 2 — master role Play 3 — worker role ───────────────────────────────────────────── ────────────────────────────────── kubeadm token create --print-join-command │ ▼ register: kubeadm_join_output │ ▼ set_fact: join_command: "{{ kubeadm_join_output.stdout }}" │ │ stored in Ansible's in-memory hostvars │ ──────────────────────────────────────────────────────────────────────▶ │ hostvars['k8s-master'] │ │ │ ▼ │ join_command → kubeadm join ... │ │ │ command: "{{ hostvars['k8s-master'] │ ['join_command'] }}" │ │ │ ▼ │ k8s-worker-1 joins cluster │ k8s-worker-2 joins cluster
This is the most important design pattern in the whole playbook. The join command is generated on the master and consumed by two different hosts in a completely different play — with no files written to disk and no manual copy-paste.
/tmp/join.sh and have workers read it. But that leaves a file containing a
valid cluster token sitting on disk. hostvars lives purely in Ansible's
memory for the duration of the run — cleaner and more secure.