MetalLB — Deep Dive · k8s-observability-platform

Phase 02 · Local Networking

MetalLB — deep dive

ansible/roles/metallb/ · Kubernetes LoadBalancer for bare metal

In the cloud, creating a LoadBalancer service instantly gives you a real IP. Locally, that request just hangs forever — there's no cloud provider to fulfill it. MetalLB is the missing piece: it watches for LoadBalancer services and assigns real IPs from a pool you define, announcing them to your network via ARP.

01 The Problem MetalLB Solves Concept

Cloud cluster (AWS/GCP/Azure)
─────────────────────────────────────────────────────────────
kubectl expose ... --type=LoadBalancer
        ↓
Cloud provider API  ← automatically provisions a real external IP
        ↓
Service gets  EXTERNAL-IP: 34.102.x.x  ← reachable from anywhere


Local cluster (our setup)
─────────────────────────────────────────────────────────────
kubectl expose ... --type=LoadBalancer
        ↓
??? nobody handles this ???
        ↓
Service stays  EXTERNAL-IP: <pending>  ← forever, nothing happens


Local cluster + MetalLB
─────────────────────────────────────────────────────────────
kubectl expose ... --type=LoadBalancer
        ↓
MetalLB controller  ← watches for pending LoadBalancer services
        ↓
Service gets  EXTERNAL-IP: 192.168.56.200  ← from our lab-pool
        ↓
MetalLB speaker  ← announces the IP via ARP so your laptop can reach it

02 Installation — Native Manifest Ansible

# roles/metallb/tasks/main.yml
- name: Install MetalLB
  become: false
  command: >
    kubectl apply --server-side --force-conflicts -f
    https://raw.githubusercontent.com/metallb/metallb/v{{ metallb_version }}/config/manifests/metallb-native.yaml
  environment:
    KUBECONFIG: /home/vagrant/.kube/config

MetalLB's native manifest creates everything it needs in one apply: the metallb-system namespace, RBAC rules, the Controller Deployment, and the Speaker DaemonSet.

metallb-native.yaml vs metallb-frr.yaml — native uses the standard Linux networking stack for L2 mode. FRR (Free Range Routing) is needed only for BGP mode. We're using L2, so native is the right choice.
--server-side --force-conflicts — same fix as Calico. MetalLB's CRDs are also large enough to hit the 262144 byte annotation limit on client-side apply.
metallb_version — pinned in group_vars/all.yml to 0.14.9. Changing it there updates both the install URL and any other reference in one place.

03 Two Components — Controller & Speaker Concept

Controller (Deployment) Runs once on the cluster. Watches for LoadBalancer services with a pending external IP. When it finds one, it picks an IP from the pool and assigns it to the service. Think of it as the IP allocator.

Speaker (DaemonSet) Runs on every node. In L2 mode it handles ARP — when your laptop asks "who has 192.168.56.200?", the Speaker on the node owning that IP replies "I do". This is how traffic finds its way to the cluster.

Why DaemonSet for Speaker? The Speaker must be present on whichever node currently owns a given IP. If that node goes down, another Speaker on another node takes over and re-announces the IP via ARP. Running on every node ensures this failover is always possible.

04 IPAddressPool YAML

# roles/metallb/files/ip-pool.yaml
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name:      lab-pool
  namespace: metallb-system
spec:
  addresses:
    - 192.168.56.200-192.168.56.250

The IPAddressPool is a custom resource (CRD) that tells MetalLB which IPs it is allowed to hand out. Without this, the Controller doesn't know what range to allocate from and services stay pending.

192.168.56.200-192.168.56.250 — we reserved this range at the top of the host-only subnet when designing the Vagrantfile. The cluster nodes sit at .10/.11/.12, leaving the upper range free for MetalLB.
Why 50 IPs? Each LoadBalancer service gets one IP. We have NGINX Ingress (1 IP), ArgoCD, Grafana — but realistically all external traffic goes through NGINX Ingress which uses 1 IP. The range is generous for future expansion.
auto-assign: true (default) — MetalLB automatically picks the next available IP from the pool. You can also request a specific IP by annotating the service, but auto-assign is fine for this project.

05 L2Advertisement — ARP Announcements YAML

apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name:      lab-l2
  namespace: metallb-system
spec:
  ipAddressPools:
    - lab-pool   # references the pool above

The IPAddressPool defines the IPs. The L2Advertisement defines how those IPs are announced. Without this resource, IPs are assigned but never advertised — your laptop still can't route to them.

L2 mode (ARP) — when a LoadBalancer service gets IP 192.168.56.200, the Speaker on the owning node responds to ARP requests for that IP. Your laptop sends "who has .200?" and the node replies with its MAC address. Standard Ethernet, no special router config needed.
vs BGP mode — BGP mode is more production-grade (proper routing, ECMP load balancing) but requires a BGP router/peer. L2 mode works on any flat network — perfect for a local lab.
ipAddressPools: [lab-pool] — you can have multiple pools (e.g. one for internal services, one for external) and multiple advertisements with different rules. We only need one of each.

L2 mode limitation: all traffic for a given IP goes to one node (the one that won the ARP election). There's no true load balancing at the IP level — Kubernetes handles distribution at the service level via kube-proxy. This is fine for a lab but BGP would be used in production.

06 Full Traffic Flow Reference

Your Laptop
    │
    │  curl http://app.lab.local
    │       ↓  (hosts file resolves to 192.168.56.200)
    │
    │  ARP: "who has 192.168.56.200?"
    │       ↓  (MetalLB Speaker replies with node MAC)
    │
    ▼
192.168.56.200  ← MetalLB assigns this to ingress-nginx Service
    │
    │  Traffic arrives at the node running the Speaker
    │       ↓  kube-proxy DNAT rule
    ▼
NGINX Ingress Controller Pod
    │
    │  Routes by hostname  (Host: app.lab.local)
    ▼
Application Pod  (Online Boutique / ArgoCD / Grafana)

MetalLB handles one thing: getting traffic from your laptop to the cluster edge. Everything after that (hostname routing to the right pod) is NGINX Ingress's job. They're designed to work together — MetalLB gives NGINX one stable IP, NGINX routes everything behind it.