Part 2: Talos Installation - Building Your First Cluster

Talos Installation - Building Your First Cluster

TL;DR

In this article, you’ll learn how to install and configure your first Talos Linux Kubernetes cluster. We’ll cover installation methods, generating machine configurations, and bootstrapping a 3-node cluster (1 control plane, 2 workers). By the end, you’ll have a fully functional Kubernetes cluster running on Talos Linux.

Introduction

Why This Matters

Now that you understand what Talos Linux is and why it’s the ideal foundation for Kubernetes (from Part 1: Talos Linux Introduction), it’s time to get your hands dirty. This article walks you through the complete installation process, from creating custom Talos images using the Image Factory to verifying your cluster is healthy and ready for workloads.

What You’ll Learn

  • How to install and use the talosctl CLI tool
  • Different installation methods (bare metal, VMs, cloud)
  • Generating machine configurations for control plane and worker nodes
  • Applying configurations and bootstrapping Kubernetes
  • Verifying cluster health and functionality
  • Network and storage considerations for your homelab

Prerequisites

Before starting, you should have:

  • Completed Part 1: Talos Linux Introduction
  • Basic Linux command-line knowledge
  • Understanding of virtualization or access to bare metal hardware
  • 3 nodes ready (physical machines or VMs) - minimum 2GB RAM, 2 CPU cores per node
  • Network connectivity between nodes
  • A machine to run talosctl from (can be your workstation)

Hardware Setup

My Homelab Configuration

For this installation, I set up a 3-node cluster using affordable small form-factor PCs. The control plane runs on a Fujitsu Futro S720 with an AMD GX-217GA processor (2 cores @ 1.65 GHz) and 4GB of DDR3 RAM. The original 2GB mSATA SSD was upgraded to 128GB to accommodate Kubernetes workloads, though for optimal performance, a Dell Optiplex 3050 Micro would be recommended if available.

Both worker nodes are Dell Optiplex 3050 Micro units, each equipped with an Intel Core i5 processor, 8GB of RAM, and 256GB SSDs. These compact machines are ideal for worker nodes and could also serve as control plane nodes if needed. The total hardware investment came to approximately 270€, purchased as used/refurbished units from eBay.

The network is configured on a standard home network using a FritzBox router. The subnet is 192.168.178.0/24 with the gateway at 192.168.178.1. To ensure stable IP addresses, I configured DHCP reservations in the FritzBox based on each node’s MAC address. The control plane is assigned 192.168.178.55, with worker nodes at 192.168.178.56 and 192.168.178.57. The cluster endpoint is configured as https://192.168.178.55:6443.

For storage, each node uses its single internal disk exclusively for the Talos OS installation. The control plane’s 128GB SSD and the workers’ 256GB SSDs provide ample space for the minimal Talos OS and Kubernetes system components. Persistent storage for workloads will be handled via NFS shares from a NAS on the local network, which we’ll configure in a later part of this series.

Hardware Recommendations:

If you’re building a similar setup, I’d recommend the Dell Optiplex 3050 Micro (Intel Core i5, 8GB RAM, 256GB SSD) for both control plane and worker nodes. These units offer excellent performance and are readily available as used/refurbished units at great prices. The Fujitsu Futro S720 serves as a more affordable alternative for the control plane, though keep in mind that the original 2GB storage requires an upgrade for production use. Search for used small office computers on eBay, local marketplaces, or refurbished computer dealers—these compact form-factor machines are perfect for homelabs.

Minimum Requirements

For a functional Talos Linux cluster, each node needs:

  • CPU: 2 cores minimum (4+ recommended)
  • RAM: 2GB minimum (4GB+ recommended)
  • Storage: 20GB minimum (50GB+ recommended for production)
  • Network: Gigabit Ethernet recommended

Bare Metal Installation

This guide covers bare metal installation, which is ideal for homelab setups using affordable small office computers.

Installation Process

The bare metal installation process follows a straightforward workflow. First, you’ll create a custom Talos ISO using the Image Factory (detailed in the “Downloading Talos Images” section below). This ISO is tailored to your hardware and includes the extensions you need.

Next, prepare a USB drive with Ventoy, which allows you to boot multiple ISO files from a single USB drive. Once Ventoy is set up, simply copy the Talos ISO to the USB drive—no need to create a separate bootable USB for each ISO. This is especially convenient when installing Talos on multiple nodes.

With the USB drive ready, boot each node from USB. The Ventoy boot menu will appear, showing all ISO files on the drive. Select the Talos ISO and follow the installation prompts. The installation process is automated and requires minimal interaction.

After the nodes are installed and booted, configure your router’s DHCP reservations to ensure each node receives a consistent IP address. In my case, I used the FritzBox admin interface to map MAC addresses to specific IPs: the control plane at 192.168.178.55, and the worker nodes at 192.168.178.56 and 192.168.178.57. This approach provides the stability of static IPs without requiring manual network configuration on each node.

Installing talosctl

The talosctl command-line tool is your primary interface for managing Talos Linux clusters. It’s used to generate configurations, apply them to nodes, and interact with the cluster.

Download and Install talosctl

On macOS:

# Install using Homebrew tap
brew install siderolabs/tap/talosctl

# Verify installation
talosctl version --client

On Linux:

# Download latest release
curl -Lo /usr/local/bin/talosctl https://github.com/siderolabs/talos/releases/latest/download/talosctl-linux-amd64

# Make executable
chmod +x /usr/local/bin/talosctl

# Verify installation
talosctl version --client

On Windows:

# Download using PowerShell
Invoke-WebRequest -Uri "https://github.com/siderolabs/talos/releases/latest/download/talosctl-windows-amd64.exe" -OutFile "talosctl.exe"

# Add to PATH or use directly
.\talosctl.exe version --client

Verifying talosctl Installation

# Check version
talosctl version --client

# Get help
talosctl --help

The installation completed successfully using Homebrew on macOS. The installed version was v1.11.5 (SHA: bc34de6e) for darwin/arm64 architecture. No issues were encountered during the installation process.

Downloading Talos Images

Before installing Talos on your nodes, you need to create a custom image using Talos Image Factory. Unlike traditional Linux distributions, Talos doesn’t provide pre-made ISOs. Instead, you use the Image Factory service to generate a custom ISO tailored to your hardware and configuration.

Image Types

Talos Image Factory can generate different image types:

  • ISO: For bare metal installation via USB/CD
  • Disk Image: For VM installation (raw disk image)
  • Cloud Images: Provider-specific images (AWS AMI, GCP, Azure)

Creating Images with Image Factory

For Bare Metal (ISO):

Talos uses the Image Factory to create custom ISOs. You’ll need to use the image factory service to generate your installation ISO.

# Generate custom ISO using Image Factory
# Visit the Image Factory web interface to get the schematic ID:
# https://factory.talos.dev/?arch=amd64&cmdline-set=true&extensions=-&extensions=siderolabs%2Famd-ucode&extensions=siderolabs%2Fintel-ucode&extensions=siderolabs%2Fiscsi-tools&platform=metal&target=metal&version=1.11.6
# The page will display a schematic ID - use it in the download URL below

# Download the ISO using the schematic ID
# Schematic ID for this configuration: fd133e9b6aab5833f80f727eeb1327afe55ec65e65098a77f2a3178b05921850
curl -L --progress-bar \
  https://factory.talos.dev/image/fd133e9b6aab5833f80f727eeb1327afe55ec65e65098a77f2a3178b05921850/v1.11.6/metal-amd64.iso \
  -o talos-amd64.iso

# Verify download
ls -lh talos-amd64.iso

# Verify file type
file talos-amd64.iso

# Optional: Calculate checksum for verification
md5 talos-amd64.iso

Expected Output:

-rw-r--r--  1 user  staff   333M Dec 22 12:34 talos-amd64.iso
talos-amd64.iso: ISO 9660 CD-ROM filesystem data (DOS/MBR boot sector) 'TALOS_V1_11_6' (bootable)
MD5 (talos-amd64.iso) = 53d1bb01ba6ac5ed7ee7a8bb72426b72

Alternative: Using Image Factory Web Interface:

  1. Visit the Image Factory with your desired configuration: https://factory.talos.dev/?arch=amd64&cmdline-set=true&extensions=-&extensions=siderolabs%2Famd-ucode&extensions=siderolabs%2Fintel-ucode&extensions=siderolabs%2Fiscsi-tools&platform=metal&target=metal&version=1.11.6
  2. The page will generate a schematic ID and display download links
  3. Click the ISO download link or copy the schematic ID to use with curl

For Virtual Machines (Disk Image):

For VM installations, the Image Factory provides several disk image formats. Visit the Image Factory web interface with your desired configuration to get the schematic ID, then use it to download the appropriate format:

# Visit the Image Factory web interface to get the schematic ID:
# https://factory.talos.dev/?arch=amd64&cmdline-set=true&extensions=-&extensions=siderolabs%2Famd-ucode&extensions=siderolabs%2Fintel-ucode&extensions=siderolabs%2Fiscsi-tools&platform=metal&target=metal&version=1.11.6
# The page will display a schematic ID and download links for different formats

# Download raw disk image (compressed with zstd)
# Replace [SCHEMATIC_ID] with the schematic ID from the Image Factory page
curl -L --progress-bar \
  https://factory.talos.dev/image/[SCHEMATIC_ID]/v1.11.6/metal-amd64.raw.zst \
  -o talos-amd64.raw.zst

# Or download qcow2 disk image (for QEMU/KVM)
curl -L --progress-bar \
  https://factory.talos.dev/image/[SCHEMATIC_ID]/v1.11.6/metal-amd64.qcow2 \
  -o talos-amd64.qcow2

# Verify download
ls -lh talos-amd64.raw.zst
# or
ls -lh talos-amd64.qcow2

Alternative: Using Image Factory Web Interface:

  1. Visit the Image Factory with your desired configuration: https://factory.talos.dev/?arch=amd64&cmdline-set=true&extensions=-&extensions=siderolabs%2Famd-ucode&extensions=siderolabs%2Fintel-ucode&extensions=siderolabs%2Fiscsi-tools&platform=metal&target=metal&version=1.11.6
  2. The page will generate a schematic ID and display download links for:
    • Disk Image (raw): Compressed raw disk image (.raw.zst format)
    • Disk Image (qcow2): QEMU disk image format (.qcow2 format)
    • PXE boot (iPXE script): For network boot installations
  3. Click the appropriate download link for your virtualization platform, or copy the schematic ID to use with curl

For PXE Boot:

If you’re using network boot (PXE), you can use the iPXE script URL:

https://pxe.factory.talos.dev/pxe/[SCHEMATIC_ID]/v1.11.6/metal-amd64

Replace [SCHEMATIC_ID] with the schematic ID from the Image Factory page. See the PXE documentation for more details on PXE boot setup.

The ISO was successfully created using the Image Factory with Talos version v1.11.6. I used the direct download method with schematic ID fd133e9b6aab5833f80f727eeb1327afe55ec65e65098a77f2a3178b05921850, which included the amd-ucode, intel-ucode, and iscsi-tools extensions. The resulting file (talos-amd64.iso) was 333MB (349,515,776 bytes) with MD5 checksum 53d1bb01ba6ac5ed7ee7a8bb72426b72. The download completed without any issues.

Creating Bootable USB with Ventoy

Ventoy is a powerful tool that allows you to create a multiboot USB stick. Instead of creating a separate bootable USB for each ISO, you can set up Ventoy once and then simply copy ISO files to the USB drive. This is especially useful when installing Talos on multiple nodes or when you need to test different operating systems.

Why Use Ventoy?

  • Multiboot Support: Store multiple ISO files on a single USB drive
  • Easy Management: Just copy ISO files to the USB drive (no need to recreate the bootable USB each time)
  • No Formatting Required: After initial setup, you can add/remove ISO files without reformatting
  • Works with Any ISO: Supports Linux, Windows, and other bootable ISO images

Setting Up Ventoy

Important: Ventoy installation must be done on a Windows or Linux host. macOS cannot be used to set up Ventoy on the USB drive. However, once Ventoy is set up, you can use macOS to copy ISO files to the USB drive, and the USB will boot correctly on the target host.

On Windows:

  1. Download Ventoy from https://www.ventoy.net/en/download.html
  2. Extract the ZIP file
  3. Run Ventoy2Disk.exe as administrator
  4. Select your USB drive
  5. Click “Install” (this will format the USB drive)
  6. Wait for installation to complete

On Linux:

# Download Ventoy
wget https://github.com/ventoy/Ventoy/releases/latest/download/ventoy-1.1.10-linux.tar.gz

# Extract
tar -xzf ventoy-1.1.10-linux.tar.gz
cd ventoy-*-linux

# Run Ventoy (replace /dev/sdX with your USB device)
sudo ./Ventoy2Disk.sh -i /dev/sdX

I used a 16GB USB drive with Ventoy version 1.1.10, set up on a Linux system. The setup process completed without any issues.

Copying ISO Files to Ventoy USB

After Ventoy is installed, the USB drive will have two partitions:

  • A small Ventoy partition (bootloader)
  • A large data partition where you copy ISO files

On Windows/Linux: Simply copy the Talos ISO file to the USB drive like any regular file.

On macOS: Even though you can’t set up Ventoy on macOS, you can still copy ISO files to an already-prepared Ventoy USB drive:

# Mount the Ventoy USB drive (it should appear as a regular drive)
# Then copy the ISO file
cp talos-amd64.iso /Volumes/Ventoy/

Or using Finder:

  1. Insert the Ventoy USB drive
  2. Open Finder and locate the Ventoy drive
  3. Drag and drop the Talos ISO file to the drive

The ISO file was successfully copied to a talos subfolder on the USB drive from macOS. The file appeared correctly in the USB drive’s file list and was visible in the Ventoy boot menu.

Booting from Ventoy USB

  1. Insert the Ventoy USB drive into the target hardware
  2. Boot from USB (may require BIOS/UEFI boot menu selection)
  3. Ventoy boot menu will appear showing all ISO files on the USB drive
  4. Select the Talos ISO from the menu
  5. The system will boot from the selected ISO

The boot process worked flawlessly. The Ventoy boot menu appeared correctly on all nodes, the Talos ISO was visible and selectable, and the boot process completed successfully without any issues.

Preparing Your Nodes

Before generating configurations, ensure your nodes are ready.

Node Preparation Checklist

  • All nodes have Talos installed and booted
  • Nodes are accessible on the network
  • IP addresses are known (static or DHCP)
  • Network connectivity verified between nodes
  • Firewall rules configured (if applicable)
  • DNS resolution working (if using hostnames)

Verifying Node Accessibility

# Test connectivity to control plane node
ping -c 4 192.168.178.55

# Test connectivity to worker nodes
ping -c 4 192.168.178.56
ping -c 4 192.168.178.57

Expected Output:

# Control Plane Node (192.168.178.55)
PING 192.168.178.55 (192.168.178.55): 56 data bytes
64 bytes from 192.168.178.55: icmp_seq=0 ttl=64 time=4.898 ms
64 bytes from 192.168.178.55: icmp_seq=1 ttl=64 time=4.711 ms
64 bytes from 192.168.178.55: icmp_seq=2 ttl=64 time=4.815 ms
64 bytes from 192.168.178.55: icmp_seq=3 ttl=64 time=3.631 ms

--- 192.168.178.55 ping statistics ---
4 packets transmitted, 4 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 3.631/4.514/4.898/0.514 ms

# Worker Node 1 (192.168.178.56)
PING 192.168.178.56 (192.168.178.56): 56 data bytes
64 bytes from 192.168.178.56: icmp_seq=0 ttl=64 time=3.692 ms
64 bytes from 192.168.178.56: icmp_seq=1 ttl=64 time=5.691 ms
64 bytes from 192.168.178.56: icmp_seq=2 ttl=64 time=5.287 ms
64 bytes from 192.168.178.56: icmp_seq=3 ttl=64 time=3.544 ms

--- 192.168.178.56 ping statistics ---
4 packets transmitted, 4 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 3.544/4.553/5.691/0.948 ms

# Worker Node 2 (192.168.178.57)
PING 192.168.178.57 (192.168.178.57): 56 data bytes
64 bytes from 192.168.178.57: icmp_seq=0 ttl=64 time=8.206 ms
64 bytes from 192.168.178.57: icmp_seq=1 ttl=64 time=3.400 ms
64 bytes from 192.168.178.57: icmp_seq=2 ttl=64 time=3.624 ms
64 bytes from 192.168.178.57: icmp_seq=3 ttl=64 time=5.706 ms

--- 192.168.178.57 ping statistics ---
4 packets transmitted, 4 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 3.400/5.234/8.206/1.937 ms

All three nodes were successfully booted from the Ventoy USB drive and are now running the installed Talos OS. After confirming successful boot, the USB drive was removed from all nodes. Network connectivity tests showed excellent results: the control plane at 192.168.178.55 had 0% packet loss with an average latency of ~4.5ms, worker node 1 at 192.168.178.56 showed 0% packet loss with ~4.6ms latency, and worker node 2 at 192.168.178.57 also had 0% packet loss with ~5.2ms latency. All nodes are reachable and responding correctly.

Generating Machine Configurations

Talos uses machine configuration files to define how each node should be configured. These are YAML files that specify network settings, storage, and Kubernetes parameters.

Understanding Machine Configurations

Each node type requires a different configuration:

  • Control Plane: Runs Kubernetes API server, etcd, scheduler, controller-manager
  • Worker: Runs kubelet and container runtime, executes workloads

Generating Configurations

Basic Configuration Generation:

# Set up variables for easier management
export CONTROL_PLANE_IP=192.168.178.55
WORKER_NODES_IPS=(192.168.178.56 192.168.178.57)

# Generate configurations for a cluster
talosctl gen config discworld-homelab https://$CONTROL_PLANE_IP:6443

Expected Output:

generating PKI and tokens
Created /path/to/controlplane.yaml
Created /path/to/worker.yaml
Created /path/to/talosconfig

Files Generated:

  • controlplane.yaml - Configuration for control plane node (33KB)
  • worker.yaml - Configuration for worker nodes (27KB)
  • talosconfig - Client configuration for talosctl (1.6KB)

I generated the configurations for a cluster named discworld-homelab with the endpoint set to https://192.168.178.55:6443. The configuration files were created in the current working directory using default settings, which worked perfectly for this homelab setup.

Reviewing Generated Configurations

# View control plane configuration
cat controlplane.yaml

# View worker configuration
cat worker.yaml

# View talosconfig
cat talosconfig

Configuration File Structure:

The generated configuration files follow Talos’s declarative machine configuration format. Here’s an overview of what each file contains:

controlplane.yaml Structure:

The control plane configuration is organized into two main sections:

machine: section contains node-specific settings:

  • type: controlplane - Defines this node’s role
  • token: - Machine token for joining the PKI (sensitive - keep secure)
  • ca: - Root certificate authority (crt and key - sensitive)
  • kubelet: - Kubelet configuration with image version (ghcr.io/siderolabs/kubelet:v1.34.1)
  • install: - Installation settings specifying disk (/dev/sda) and installer image
  • features: - Enabled features like RBAC, KubePrism (port 7445), host DNS caching
  • network: {} - Network configuration (empty means DHCP will be used)
  • nodeLabels: - Kubernetes node labels

cluster: section contains cluster-wide settings:

  • id: - Globally unique cluster identifier (base64 encoded)
  • secret: - Shared cluster secret (sensitive - keep secure)
  • controlPlane.endpoint: - Cluster endpoint (https://192.168.178.55:6443)
  • clusterName: - Your cluster name (discworld-homelab)
  • network: - Pod and service subnets (10.244.0.0/16 and 10.96.0.0/12)
  • token: - Bootstrap token for joining nodes (sensitive)
  • secretboxEncryptionSecret: - Key for encrypting secrets at rest (sensitive)
  • ca: - Kubernetes root CA (crt and key - sensitive)
  • aggregatorCA: - Aggregator CA for front-proxy certificates (sensitive)
  • serviceAccount.key: - Private key for service account tokens (sensitive)
  • apiServer: - API server configuration including PodSecurity admission controller
  • controllerManager: - Controller manager image and settings
  • scheduler: - Scheduler image and settings
  • proxy: - Kube-proxy configuration
  • etcd: - etcd CA certificates (sensitive)
  • discovery: - Cluster member discovery settings

worker.yaml Structure:

The worker configuration is similar but simpler:

machine: section:

  • type: worker - Defines this as a worker node
  • token: - Same machine token as control plane (for PKI)
  • ca: - Same root CA certificate (key is empty for workers)
  • kubelet: - Same kubelet configuration
  • install: - Same installation settings
  • features: - Same feature flags
  • network: {} - Empty (uses DHCP)

cluster: section:

  • Contains the same cluster-wide settings (cluster ID, secrets, endpoint, etc.)
  • Note: Worker nodes don’t have control plane components (apiServer, controllerManager, scheduler, etc.)

talosconfig Structure:

The talosconfig file is much simpler and contains client-side configuration:

  • context: - Current context name (discworld-homelab)

  • contexts: - Context definitions section

    Within the contexts section, you’ll find:

    • endpoints: - List of control plane node IPs (192.168.178.55)
    • ca: - Certificate authority for verifying Talos API connections (sensitive)
    • crt: - Client certificate for authentication (sensitive)
    • key: - Client private key (sensitive - most critical to protect)

Key Security Considerations:

All three files contain sensitive information that must be protected:

  • Certificates and keys - Used for secure communication and authentication
  • Tokens and secrets - Used for cluster and node authentication
  • Cluster identifiers - Unique identifiers for your cluster

Store these files securely, preferably encrypted, and never commit them to public repositories. The talosconfig file is particularly sensitive as it provides full access to manage your cluster.

Configuration Review Checklist:

When reviewing the generated configurations, verify:

  • Cluster name matches your intended name
  • Control plane endpoint is correct (IP address or hostname)
  • Network subnets don’t conflict with your existing network
  • Disk selection (/dev/sda) matches your hardware
  • Kubernetes version matches your requirements
  • Feature flags are appropriate for your use case

For this homelab setup, the default configurations were sufficient and no modifications were required. The configurations include secure defaults with proper certificate management, RBAC enabled, and PodSecurity admission controller configured with baseline enforcement.

Setting TALOSCONFIG Environment Variable

To simplify talosctl commands throughout the setup process, set the TALOSCONFIG environment variable to point to your talosconfig file. This allows talosctl to automatically use the correct configuration without needing to specify --talosconfig in every command.

# Set TALOSCONFIG environment variable
# Update the path to match your actual talosconfig file location
export TALOSCONFIG=/path/to/talosconfig

# Verify it's set
echo $TALOSCONFIG

Note: Update the path in the export TALOSCONFIG= command to match your actual talosconfig file location. If your configuration files are stored in a different directory, adjust the path accordingly.

Checking Available Disks on Nodes

Before applying configurations, you need to identify the available disks on each node. This information will be used to update the machine configuration files to specify which disks Talos should use for the OS installation and data storage.

# Set up variables (if not already set)
export CONTROL_PLANE_IP=192.168.178.55
WORKER_NODES_IPS=(192.168.178.56 192.168.178.57)

# Check disks on the control plane node
echo "=== Checking disks on Control Plane (${CONTROL_PLANE_IP}) ==="
talosctl -n ${CONTROL_PLANE_IP} get disks --insecure

# Check disks on worker nodes
echo ""
echo "=== Checking disks on Worker Nodes ==="
for worker_ip in "${WORKER_NODES_IPS[@]}"; do
    echo ""
    echo "--- Worker Node: ${worker_ip} ---"
    talosctl -n ${worker_ip} get disks --insecure
done

Expected Output:

=== Checking disks on Control Plane (192.168.178.55) ===
NODE   NAMESPACE   TYPE   ID      VERSION   SIZE     READ ONLY   TRANSPORT   ROTATIONAL   WWID                   MODEL              SERIAL
       runtime     Disk   loop0   2         73 MB    true
       runtime     Disk   sda     2         128 GB   false       sata                     naa.5001b44a1db9fa02   SanDisk SD6SF1M1

=== Checking disks on Worker Nodes ===

--- Worker Node: 192.168.178.56 ---
NODE   NAMESPACE   TYPE   ID      VERSION   SIZE     READ ONLY   TRANSPORT   ROTATIONAL   WWID                   MODEL              SERIAL
       runtime     Disk   loop0   2         73 MB    true
       runtime     Disk   sda     2         256 GB   false       sata                     naa.5001b448b606815b   SanDisk X600 2.5

--- Worker Node: 192.168.178.57 ---
NODE   NAMESPACE   TYPE   ID      VERSION   SIZE     READ ONLY   TRANSPORT   ROTATIONAL   WWID                   MODEL              SERIAL
       runtime     Disk   loop0   2         73 MB    true
       runtime     Disk   sda     2         256 GB   false       sata                     naa.5001b448b6069776   SanDisk X600 2.5

Important Notes:

  • Target Disk: For all three nodes (control plane and both worker nodes), sda is the disk we want to use for Talos OS installation
  • Model Verification: The disk model should be checked to match the hardware version used in each node:
    • Control Plane (192.168.178.55): SanDisk SD6SF1M1 - 128GB (matches the 128GB SSD upgrade mentioned in hardware setup)
    • Worker Node 1 (192.168.178.56): SanDisk X600 2.5 - 256GB (matches the 256GB SSD in Dell Optiplex 3050 Micro)
    • Worker Node 2 (192.168.178.57): SanDisk X600 2.5 - 256GB (matches the 256GB SSD in Dell Optiplex 3050 Micro)
  • USB Drive Exclusion: The loop0 device (73 MB, read-only) is typically the USB boot drive and should not be used for installation
  • Disk Selection: Only use disks that match your node’s hardware specifications and are not the USB boot device

Disk verification confirmed that all nodes use /dev/sda as their primary disk. The control plane node has a SanDisk SD6SF1M1 128GB SSD, while both worker nodes are equipped with SanDisk X600 2.5 256GB SSDs. All disks are SATA SSDs, and Talos will automatically use /dev/sda for the OS installation, so no manual disk configuration is needed in the machine configs.

Updating Machine Configurations with Disk Information

After identifying the available disks, update the machine configuration files (controlplane.yaml and worker.yaml) to specify which disks Talos should use. This ensures Talos installs to the correct disk and can use additional disks for data storage.

Note: For security reasons, the machine configuration files (controlplane.yaml and worker.yaml) should be stored outside of the project directory and kept secure, as they contain sensitive cluster information.

Steps to update disk configuration:

  1. Review the disk output from the previous step to identify:

    • The primary disk for OS installation (typically /dev/sda or /dev/nvme0n1)
    • Any additional disks available for data storage
  2. Edit the controlplane.yaml file to add disk configuration under the machine section

  3. Edit the worker.yaml file to add disk configuration for worker nodes

  4. Save the updated configuration files

Example disk configuration structure:

machine:
  install:
    disk: /dev/sda  # Primary disk for OS installation
  disks:
    - device: /dev/sda
      # Additional disk configuration options

Since Talos automatically detects and uses /dev/sda on all nodes, no disk configuration updates were needed in the machine configuration files. This simplifies the setup process significantly.

Applying Configurations

Once configurations are generated, apply them to your nodes.

Applying Control Plane Configuration

Important: Apply the control plane configuration first, as worker nodes depend on it.

# Set up variables (if not already set)
export CONTROL_PLANE_IP=192.168.178.55
WORKER_NODES_IPS=(192.168.178.56 192.168.178.57)

# Apply control plane configuration
talosctl apply-config \
  --insecure \
  --nodes ${CONTROL_PLANE_IP} \
  --file controlplane.yaml

Applying Worker Node Configurations

# Apply to worker nodes using loop
for worker_ip in "${WORKER_NODES_IPS[@]}"; do
    echo "Applying configuration to worker node: ${worker_ip}"
    talosctl apply-config \
      --insecure \
      --nodes ${worker_ip} \
      --file worker.yaml
done

# Configure endpoints in talosconfig for future commands
# This allows us to omit --endpoints from subsequent talosctl commands
talosctl config endpoints ${CONTROL_PLANE_IP}

The configurations were successfully applied to all nodes. The control plane configuration was applied to 192.168.178.55, and the worker node configuration was applied to both 192.168.178.56 and 192.168.178.57. After applying the configurations, I ran talosctl config endpoints ${CONTROL_PLANE_IP} to configure the endpoints in the talosconfig file, which allows subsequent talosctl commands to omit the --endpoints flag.

Bootstrapping Kubernetes

After applying configurations, bootstrap the Kubernetes cluster on the control plane node.

Bootstrap Process

The bootstrap process initializes etcd and starts the Kubernetes control plane components.

# Bootstrap the cluster
# TALOSCONFIG environment variable should be set (see "Setting TALOSCONFIG Environment Variable" section)
# Endpoints are configured in talosconfig (from previous step), so --endpoints is not needed
talosctl bootstrap \
  --nodes ${CONTROL_PLANE_IP}

Upgrading Talos to Latest Version

After bootstrapping, ensure all nodes are running the latest Talos version. This step upgrades Talos to the specified version (v1.11.6 in this example).

# Upgrade control plane node
echo "Upgrading control plane node: ${CONTROL_PLANE_IP}"
talosctl upgrade \
  --nodes ${CONTROL_PLANE_IP} \
  --image "ghcr.io/siderolabs/installer:v1.11.6"

# Upgrade worker nodes
for worker_ip in "${WORKER_NODES_IPS[@]}"; do
    echo "Upgrading worker node: ${worker_ip}"
    talosctl upgrade \
      --nodes ${worker_ip} \
      --image "ghcr.io/siderolabs/installer:v1.11.6"
done

The bootstrap process completed successfully, initializing etcd and starting the Kubernetes control plane components. All components are running and healthy. Following the bootstrap, I upgraded all nodes from the ISO version to Talos v1.11.6. The upgrade process completed without any errors, and all nodes are now running the latest Talos version.

Upgrading Kubernetes

After upgrading Talos, you may also want to upgrade Kubernetes to a newer version. The Talos ISO comes with a specific Kubernetes version (e.g., v1.34.1), and you can upgrade to a newer patch or minor version.

Upgrade Kubernetes:

# Upgrade Kubernetes on all nodes
# Example: Upgrade from v1.34.1 (ISO version) to v1.34.3
talosctl upgrade-k8s \
  --nodes ${CONTROL_PLANE_IP} \
  --to 1.34.3

Expected Output:

automatically detected the lowest Kubernetes version 1.34.1
discovered controlplane nodes ["192.168.178.55"]
discovered worker nodes ["192.168.178.56" "192.168.178.57"]
> "192.168.178.56": Talos version 1.11.6 is compatible with Kubernetes version 1.34.3
> "192.168.178.55": Talos version 1.11.6 is compatible with Kubernetes version 1.34.3
> "192.168.178.57": Talos version 1.11.6 is compatible with Kubernetes version 1.34.3
 > "192.168.178.55": pre-pulling registry.k8s.io/kube-apiserver:v1.34.3
 > "192.168.178.55": pre-pulling registry.k8s.io/kube-controller-manager:v1.34.3
 > "192.168.178.55": pre-pulling registry.k8s.io/kube-scheduler:v1.34.3
 > "192.168.178.55": pre-pulling ghcr.io/siderolabs/kubelet:v1.34.3
 > "192.168.178.56": pre-pulling ghcr.io/siderolabs/kubelet:v1.34.3
 > "192.168.178.57": pre-pulling ghcr.io/siderolabs/kubelet:v1.34.3
updating "kube-apiserver" to version "1.34.3"
 > "192.168.178.55": starting update
 > update kube-apiserver: v1.34.1 -> 1.34.3
 > "192.168.178.55": machine configuration patched
 > "192.168.178.55": waiting for kube-apiserver pod update
 > "192.168.178.55": kube-apiserver: waiting, config version mismatch: got "1", expected "2"
[... multiple waiting messages ...]
 < "192.168.178.55": successfully updated
updating "kube-controller-manager" to version "1.34.3"
 > "192.168.178.55": starting update
 > update kube-controller-manager: v1.34.1 -> 1.34.3
 > "192.168.178.55": machine configuration patched
 > "192.168.178.55": waiting for kube-controller-manager pod update
 > "192.168.178.55": kube-controller-manager: waiting, config version mismatch: got "1", expected "2"
[... multiple waiting messages ...]
 > "192.168.178.55": kube-controller-manager: pod is not ready, waiting
 < "192.168.178.55": successfully updated
updating "kube-scheduler" to version "1.34.3"
 > "192.168.178.55": starting update
 > update kube-scheduler: v1.34.1 -> 1.34.3
 > "192.168.178.55": machine configuration patched
 > "192.168.178.55": waiting for kube-scheduler pod update
 > "192.168.178.55": kube-scheduler: waiting, config version mismatch: got "1", expected "2"
[... multiple waiting messages ...]
 < "192.168.178.55": successfully updated
updating kube-proxy to version "1.34.3"
 > "192.168.178.55": starting update
updating kubelet to version "1.34.3"
 > "192.168.178.55": starting update
 > update kubelet: 1.34.1 -> 1.34.3
 > "192.168.178.55": machine configuration patched
 > "192.168.178.55": waiting for kubelet restart
 > "192.168.178.55": waiting for node update
 < "192.168.178.55": successfully updated
 > "192.168.178.56": starting update
 > update kubelet: 1.34.1 -> 1.34.3
 > "192.168.178.56": machine configuration patched
 > "192.168.178.56": waiting for kubelet restart
 > "192.168.178.56": waiting for node update
 < "192.168.178.56": successfully updated
 > "192.168.178.57": starting update
 > update kubelet: 1.34.1 -> 1.34.3
 > "192.168.178.57": machine configuration patched
 > "192.168.178.57": waiting for kubelet restart
 > "192.168.178.57": waiting for node update
 < "192.168.178.57": successfully updated
updating manifests
 > processing manifest v1.Secret/kube-system/bootstrap-token-azux2t
 < no changes
[... processing various manifests ...]
 > processing manifest apps/v1.DaemonSet/kube-system/kube-proxy
--- a/apps/v1.DaemonSet/kube-system/kube-proxy
+++ b/apps/v1.DaemonSet/kube-system/kube-proxy
@@ -2,7 +2,7 @@
 kind: DaemonSet
 metadata:
   annotations:
-    deprecated.daemonset.template.generation: "1"
+    deprecated.daemonset.template.generation: "2"
   labels:
     k8s-app: kube-proxy
     tier: node
@@ -39,7 +39,7 @@
             fieldRef:
               apiVersion: v1
               fieldPath: status.podIP
-        image: registry.k8s.io/kube-proxy:v1.34.1
+        image: registry.k8s.io/kube-proxy:v1.34.3
         imagePullPolicy: IfNotPresent
         name: kube-proxy
         resources: {}

 < applied successfully
[... processing more manifests ...]
 > processing manifest apps/v1.Deployment/kube-system/coredns
--- a/apps/v1.Deployment/kube-system/coredns
+++ b/apps/v1.Deployment/kube-system/coredns
@@ -76,7 +76,6 @@
           timeoutSeconds: 1
         resources:
           limits:
-            cpu: 200m
             memory: 170Mi
           requests:
             cpu: 100m

 < applied successfully
[... processing remaining manifests ...]
waiting for all manifests to be applied
 > waiting for apps/v1.Deployment/kube-system/coredns
 > waiting for apps/v1.DaemonSet/kube-system/kube-proxy
[... upgrade completes successfully ...]

Note: After retrieving kubeconfig (see next section), you can verify the Kubernetes version using:

kubectl version --client --short
kubectl get nodes -o wide

Note: Before upgrading to a new minor version (e.g., from 1.34.x to 1.35.0), check:

  • Upgrade Path Support: Talos may not support direct upgrades across minor versions. Try the upgrade command first to verify if the path is supported:
    talosctl upgrade-k8s \
      --nodes ${CONTROL_PLANE_IP} \
      --to 1.35.0
    
    If you receive an error like “unsupported upgrade path 1.34->1.35”, you may need to:
    • Check the Talos Kubernetes Support Matrix for supported upgrade paths
    • Upgrade through intermediate versions if required
    • Wait for Talos to support the direct upgrade path in a future release
    • Note: Kubernetes 1.35.0 will be supported in Talos 1.12.0. Upgrade Talos first, then upgrade Kubernetes.
  • Compatibility with your workloads
  • Release notes for breaking changes
  • Whether all required features are available in the target version

Example Error for Unsupported Upgrade Path:

automatically detected the lowest Kubernetes version 1.34.3
unsupported upgrade path 1.34->1.35 (from "1.34.3" to "1.35.0")

The Talos ISO came with Kubernetes v1.34.1, which I upgraded to v1.34.3 using talosctl upgrade-k8s. The upgrade process completed successfully without any errors. I attempted to upgrade directly to Kubernetes 1.35.0, but Talos 1.11.6 doesn’t support this upgrade path, reporting “unsupported upgrade path 1.34->1.35”. Kubernetes 1.35.0 will be supported in Talos 1.12.0, so to reach that version, you’ll need to upgrade Talos first, then upgrade Kubernetes.

Retrieving kubeconfig

After bootstrapping, retrieve the kubeconfig file to interact with your Kubernetes cluster. You can save it with a specific name to avoid merging it into your default Kubernetes configuration.

# Retrieve kubeconfig and save it with a specific name to ~/.kube/
# This keeps it separate from your default kubeconfig
talosctl kubeconfig ~/.kube/discworld-homelab \
  --nodes ${CONTROL_PLANE_IP}

# Verify the file was created
ls -lh ~/.kube/discworld-homelab

Alternative: Save to current directory

# If you prefer to save it in the current directory first
talosctl kubeconfig discworld-homelab \
  --nodes ${CONTROL_PLANE_IP}

# Then move it to ~/.kube/ if desired
mv discworld-homelab ~/.kube/

The kubeconfig was saved to ~/.kube/discworld-homelab with secure file permissions (600 - owner read/write only), which is the default and recommended setting for kubeconfig files.

Setting Up kubectl

After retrieving the kubeconfig, set it up for use with kubectl. Since we saved it with a specific name (discworld-homelab), you can use it in several ways:

# Option 1: Set KUBECONFIG environment variable (recommended for temporary use)
export KUBECONFIG=~/.kube/discworld-homelab

# Option 2: Use kubectl with --kubeconfig flag
kubectl --kubeconfig=~/.kube/discworld-homelab cluster-info

# Option 3: Merge into default config (if you want it as your default)
# Note: This merges the context into ~/.kube/config
KUBECONFIG=~/.kube/config:~/.kube/discworld-homelab kubectl config view --flatten > ~/.kube/config.tmp
mv ~/.kube/config.tmp ~/.kube/config

# Verify kubectl access
kubectl cluster-info

# Check node status
kubectl get nodes

Expected Output:

# Output from kubectl cluster-info
Kubernetes control plane is running at https://192.168.178.55:6443
CoreDNS is running at https://192.168.178.55:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

# Output from kubectl get nodes
NAME            STATUS   ROLES           AGE   VERSION
talos-0t7-m1u   Ready    <none>          59m   v1.34.3
talos-0x7-u21   Ready    control-plane   59m   v1.34.3
talos-eim-ifj   Ready    <none>          59m   v1.34.3

kubectl was successfully configured using the kubeconfig at ~/.kube/discworld-homelab. Access verification confirmed a successful connection to the cluster at https://192.168.178.55:6443, with all three nodes visible and in Ready status.

Verifying Cluster Health

Once bootstrapped, verify that your cluster is healthy and all nodes are ready.

Checking Node Status

# Get all nodes
kubectl get nodes

# Get detailed node information
kubectl get nodes -o wide

# Describe nodes
kubectl describe nodes

Expected Output:

# Output from kubectl get nodes
NAME            STATUS   ROLES           AGE   VERSION
talos-0t7-m1u   Ready    <none>          61m   v1.34.3
talos-0x7-u21   Ready    control-plane   61m   v1.34.3
talos-eim-ifj   Ready    <none>          61m   v1.34.3

# Output from kubectl get nodes -o wide
NAME            STATUS   ROLES           AGE   VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE          KERNEL-VERSION   CONTAINER-RUNTIME
talos-0t7-m1u   Ready    <none>          61m   v1.34.3   192.168.178.56   <none>        Talos (v1.11.6)   6.12.62-talos    containerd://2.1.5
talos-0x7-u21   Ready    control-plane   61m   v1.34.3   192.168.178.55   <none>        Talos (v1.11.6)   6.12.62-talos    containerd://2.1.5
talos-eim-ifj   Ready    <none>          61m   v1.34.3   192.168.178.57   <none>        Talos (v1.11.6)   6.12.62-talos    containerd://2.1.5

# Output from kubectl describe nodes (abbreviated - showing key information for each node)
Name:               talos-0t7-m1u
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=talos-0t7-m1u
                    kubernetes.io/os=linux
Addresses:
  InternalIP:  192.168.178.56
  Hostname:    talos-0t7-m1u
Capacity:
  cpu:                4
  ephemeral-storage:  247681884Ki
  memory:             7926660Ki
  pods:               110
Allocatable:
  cpu:                3950m
  ephemeral-storage:  227995188461
  memory:             7431044Ki
  pods:               110
System Info:
  Machine ID:                 6d37d180d5bc0c1a26ce71c3e6fa8e4f
  Kernel Version:             6.12.62-talos
  OS Image:                   Talos (v1.11.6)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://2.1.5
  Kubelet Version:            v1.34.3
PodCIDR:                      10.244.0.0/24
Conditions:
  Type                 Status  Reason                       Message
  ----                 ------  ------                       -------
  NetworkUnavailable   False   FlannelIsUp                  Flannel is running on this node
  MemoryPressure       False   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    KubeletReady                 kubelet is posting ready status

Name:               talos-0x7-u21
Roles:              control-plane
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=talos-0x7-u21
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=
                    node.kubernetes.io/exclude-from-external-load-balancers=
Addresses:
  InternalIP:  192.168.178.55
  Hostname:    talos-0x7-u21
Capacity:
  cpu:                2
  ephemeral-storage:  119837Mi
  memory:             3657232Ki
  pods:               110
Allocatable:
  cpu:                1950m
  ephemeral-storage:  112823946258
  memory:             3030544Ki
  pods:               110
System Info:
  Machine ID:                 b381777f74ed850c0f374e2d2cfea76d
  Kernel Version:             6.12.62-talos
  OS Image:                   Talos (v1.11.6)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://2.1.5
  Kubelet Version:            v1.34.3
PodCIDR:                      10.244.2.0/24
Taints:             node-role.kubernetes.io/control-plane:NoSchedule
Conditions:
  Type                 Status  Reason                       Message
  ----                 ------  ------                       -------
  NetworkUnavailable   False   FlannelIsUp                  Flannel is running on this node
  MemoryPressure       False   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    KubeletReady                 kubelet is posting ready status

Name:               talos-eim-ifj
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=talos-eim-ifj
                    kubernetes.io/os=linux
Addresses:
  InternalIP:  192.168.178.57
  Hostname:    talos-eim-ifj
Capacity:
  cpu:                4
  ephemeral-storage:  247681884Ki
  memory:             7926664Ki
  pods:               110
Allocatable:
  cpu:                3950m
  ephemeral-storage:  227995188461
  memory:             7431048Ki
  pods:               110
System Info:
  Machine ID:                 c682524e3f37a649cafc29e69c6dd5b5
  Kernel Version:             6.12.62-talos
  OS Image:                   Talos (v1.11.6)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://2.1.5
  Kubelet Version:            v1.34.3
PodCIDR:                      10.244.1.0/24
Conditions:
  Type                 Status  Reason                       Message
  ----                 ------  ------                       -------
  NetworkUnavailable   False   FlannelIsUp                  Flannel is running on this node
  MemoryPressure       False   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    KubeletReady                 kubelet is posting ready status

All three nodes (one control plane and two workers) are showing Ready status. Every node is running Kubernetes v1.34.3 and Talos v1.11.6, and there are no NotReady nodes—the entire cluster is healthy and operational.

Checking System Pods

# Check control plane pods
kubectl get pods -n kube-system

# Check pod details
kubectl get pods -n kube-system -o wide

# Check pod logs if needed
kubectl logs -n kube-system [POD_NAME]

Expected Output:

# Output from kubectl get pods -n kube-system
NAME                                    READY   STATUS    RESTARTS      AGE
coredns-5fdf8c5556-d5s68                1/1     Running   0             22m
coredns-5fdf8c5556-szdcf                1/1     Running   0             22m
kube-apiserver-talos-0x7-u21            1/1     Running   0             23m
kube-controller-manager-talos-0x7-u21   1/1     Running   3 (23m ago)   25m
kube-flannel-8znhd                      1/1     Running   0             49m
kube-flannel-m8fwh                      1/1     Running   1 (57m ago)   64m
kube-flannel-p5dlw                      1/1     Running   0             48m
kube-proxy-j7kx2                        1/1     Running   0             22m
kube-proxy-qvk8z                        1/1     Running   0             22m
kube-proxy-t7k6z                        1/1     Running   0             22m
kube-scheduler-talos-0x7-u21            1/1     Running   3 (23m ago)   23m

# Output from kubectl get pods -n kube-system -o wide
NAME                                    READY   STATUS    RESTARTS      AGE   IP               NODE            NOMINATED NODE   READINESS GATES
coredns-5fdf8c5556-d5s68                1/1     Running   0             22m   10.244.1.4       talos-eim-ifj   <none>           <none>
coredns-5fdf8c5556-szdcf                1/1     Running   0             22m   10.244.0.6       talos-0t7-m1u   <none>           <none>
kube-apiserver-talos-0x7-u21            1/1     Running   0             23m   192.168.178.55   talos-0x7-u21   <none>           <none>
kube-controller-manager-talos-0x7-u21   1/1     Running   3 (23m ago)   25m   192.168.178.55   talos-0x7-u21   <none>           <none>
kube-flannel-8znhd                      1/1     Running   0             49m   192.168.178.56   talos-0t7-m1u   <none>           <none>
kube-flannel-m8fwh                      1/1     Running   1 (57m ago)   64m   192.168.178.55   talos-0x7-u21   <none>           <none>
kube-flannel-p5dlw                      1/1     Running   0             48m   192.168.178.57   talos-eim-ifj   <none>           <none>
kube-proxy-j7kx2                        1/1     Running   0             22m   192.168.178.56   talos-0t7-m1u   <none>           <none>
kube-proxy-qvk8z                        1/1     Running   0             22m   192.168.178.55   talos-0x7-u21   <none>           <none>
kube-proxy-t7k6z                        1/1     Running   0             22m   192.168.178.57   talos-eim-ifj   <none>           <none>
kube-scheduler-talos-0x7-u21            1/1     Running   3 (23m ago)   23m   192.168.178.55   talos-0x7-u21   <none>           <none>

All 11 system pods are running with 1/1 Ready status, and there are no failing pods. Some expected restarts occurred during cluster initialization: kube-controller-manager and kube-scheduler each had 3 restarts about 23 minutes ago, and one Flannel pod had a single restart 57 minutes ago. All other pods have zero restarts, indicating a stable and healthy cluster.

Testing Cluster Functionality

# Create a test deployment
kubectl create deployment nginx-test --image=nginx

# Wait for the deployment to be ready (this will wait until pods are running)
kubectl rollout status deployment/nginx-test

# Check deployment status
kubectl get deployments

# Check pods (should all be Running now)
kubectl get pods

# Get pod name automatically using label selector
POD_NAME=$(kubectl get pods -l app=nginx-test -o jsonpath='{.items[0].metadata.name}')

# Get pod details using the automatically retrieved pod name
kubectl describe pod ${POD_NAME}

# Clean up test
kubectl delete deployment nginx-test

Expected Output:

# Output from kubectl create deployment nginx-test --image=nginx
Warning: would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "nginx" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "nginx" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "nginx" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "nginx" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
deployment.apps/nginx-test created

# Output from kubectl rollout status deployment/nginx-test
Waiting for deployment spec update to be observed...
Waiting for deployment "nginx-test" rollout to finish: 0 out of 1 new replicas have been updated...
Waiting for deployment "nginx-test" rollout to finish: 0 of 1 updated replicas are available...
deployment "nginx-test" successfully rolled out

# Output from kubectl get deployments
NAME         READY   UP-TO-DATE   AVAILABLE   AGE
nginx-test   1/1     1            1           2s

# Output from kubectl get pods (all pods are Running after rollout completes)
NAME                          READY   STATUS    RESTARTS   AGE
nginx-test-586bbf5c4c-9t5k7   1/1     Running   0          2s

# Output from kubectl describe pod ${POD_NAME} (where POD_NAME=nginx-test-586bbf5c4c-9t5k7)
Name:             nginx-test-586bbf5c4c-9t5k7
Namespace:        default
Priority:         0
Service Account:  default
Node:             talos-0t7-m1u/192.168.178.56
Start Time:       Mon, 22 Dec 2025 07:50:05 +0100
Labels:           app=nginx-test
                  pod-template-hash=586bbf5c4c
Annotations:      <none>
Status:           Running
IP:               10.244.0.10
IPs:
  IP:           10.244.0.10
Controlled By:  ReplicaSet/nginx-test-586bbf5c4c
Containers:
  nginx:
    Container ID:   containerd://44d2566a9e8b8882d6ba03e49fde5ad4ef117a0dffcb61dbd5d8d9a064120c1d
    Image:          nginx
    Image ID:        docker.io/library/nginx@sha256:fb01117203ff38c2f9af91db1a7409459182a37c87cced5cb442d1d8fcc66d19
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Mon, 22 Dec 2025 07:50:06 +0100
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-sgnsn (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       True
  ContainersReady             True
  PodScheduled                True
Volumes:
  kube-api-access-sgnsn:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    Optional:                false
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  2s    default-scheduler  Successfully assigned default/nginx-test-586bbf5c4c-9t5k7 to talos-0t7-m1u
  Normal  Pulling    2s    kubelet            spec.containers{nginx}: Pulling image "nginx"
  Normal  Pulled     1s    kubelet            spec.containers{nginx}: Successfully pulled image "nginx" in 802ms (802ms including waiting). Image size: 59795293 bytes.
  Normal  Created    1s    kubelet            spec.containers{nginx}: Created container: nginx
  Normal  Started    1s    kubelet            spec.containers{nginx}: Started container nginx

# Output from kubectl delete deployment nginx-test
deployment.apps "nginx-test" deleted from default namespace

The test deployment was successful. The nginx deployment was created and the pod was scheduled to worker node talos-0t7-m1u (192.168.178.56). The kubectl rollout status command showed the deployment progressing through its stages, and the pod transitioned from ContainerCreating to Running status very quickly—the entire rollout completed in just 2 seconds. The container image was pulled successfully in 802ms, and the container started without any errors. The PodSecurity warning that appeared is expected for the default nginx image, as the restricted policy requires security context configuration—this is normal and doesn’t indicate a problem. Pod scheduling is working correctly, with no scheduling issues encountered. The kubectl rollout status command provides a much cleaner way to wait for deployments than manually polling with kubectl get pods.

Checking Talos Node Information

# Get node information via talosctl
# Endpoints are configured in talosconfig, so --endpoints is not needed
talosctl --nodes ${CONTROL_PLANE_IP} get members

# Get version information
talosctl --nodes ${CONTROL_PLANE_IP} version

Expected Output:

# Output from talosctl get members
NODE             NAMESPACE   TYPE     ID              VERSION   HOSTNAME        MACHINE TYPE   OS                ADDRESSES
192.168.178.55   cluster     Member   talos-0t7-m1u   3         talos-0t7-m1u   worker         Talos (v1.11.6)   ["192.168.178.56","[IPv6_ADDRESS]","[IPv6_ADDRESS]"]
192.168.178.55   cluster     Member   talos-0x7-u21   3         talos-0x7-u21   controlplane   Talos (v1.11.6)   ["192.168.178.55","[IPv6_ADDRESS]","[IPv6_ADDRESS]"]
192.168.178.55   cluster     Member   talos-eim-ifj   3         talos-eim-ifj   worker         Talos (v1.11.6)   ["192.168.178.57","[IPv6_ADDRESS]","[IPv6_ADDRESS]"]

# Output from talosctl version
Client:
        Tag:         v1.11.5
        SHA:         bc34de6e
        Built:
        Go version:  go1.24.9
        OS/Arch:     darwin/arm64
Server:
        NODE:        192.168.178.55
        Tag:         v1.11.6
        SHA:         6dd14300
        Built:
        Go version:  go1.24.11
        OS/Arch:     linux/amd64
        Enabled:     RBAC

The cluster is running Talos v1.11.6 on all nodes (the client is v1.11.5, but all server nodes are on v1.11.6). Kubernetes version is v1.34.3 across the cluster. All three cluster members are visible and properly configured: the control plane node talos-0x7-u21 and two worker nodes talos-0t7-m1u and talos-eim-ifj.

Network Configuration

Proper network configuration is crucial for cluster communication.

Network Requirements

  • Control Plane: Must be accessible on port 6443 (Kubernetes API)
  • Nodes: Must be able to communicate with each other
  • Pod Network: CNI will be installed (covered in Part 6)

Verifying Network Connectivity

# Test API server accessibility
curl -k https://${CONTROL_PLANE_IP}:6443/version

# Test from worker nodes (if SSH access available during setup)
# Note: Talos doesn't have SSH, but you can test from another machine

Expected Output:

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "Unauthorized",
  "reason": "Unauthorized",
  "code": 401
}

The API server is accessible and responding correctly. The 401 Unauthorized response from the curl test is expected and actually indicates that the API server is running properly—it’s simply requiring authentication, which is why we use kubectl with kubeconfig instead of direct curl commands. Network connectivity is working correctly with immediate responses, and the firewall configuration is proper with port 6443 accessible.

Network Troubleshooting

# Set up variables (if not already set)
export CONTROL_PLANE_IP=192.168.178.55
WORKER_NODES_IPS=(192.168.178.56 192.168.178.57)

# Check network configuration for all nodes
# Combine control plane and worker nodes into a single array
ALL_NODES=(${CONTROL_PLANE_IP} "${WORKER_NODES_IPS[@]}")

for node_ip in "${ALL_NODES[@]}"; do
    echo "=== Checking network configuration for node: ${node_ip} ==="

    # Check network links
    echo "Network links:"
    talosctl --nodes ${node_ip} get links

    # Check network addresses
    echo "Network addresses:"
    talosctl --nodes ${node_ip} get addresses

    # Check routes
    echo "Network routes:"
    talosctl --nodes ${node_ip} get routes

    echo ""
done

Expected Output:

# Output for Control Plane Node (192.168.178.XX)
=== Checking network configuration for node: 192.168.178.XX ===
Network links:
NODE             NAMESPACE   TYPE         ID          VERSION   TYPE       KIND     HW ADDR                                           OPER STATE   LINK STATE
192.168.178.XX   network     LinkStatus   bond0       1         ether      bond     XX:XX:XX:XX:XX:XX                                 down         false
192.168.178.XX   network     LinkStatus   cni0        4         ether      bridge   XX:XX:XX:XX:XX:XX                                 down         false
192.168.178.XX   network     LinkStatus   dummy0      1         ether      dummy    XX:XX:XX:XX:XX:XX                                 down         false
192.168.178.XX   network     LinkStatus   enp1s0      3         ether               XX:XX:XX:XX:XX:XX                                 up           true
192.168.178.XX   network     LinkStatus   flannel.1   2         ether      vxlan    XX:XX:XX:XX:XX:XX                                 unknown      true
192.168.178.XX   network     LinkStatus   ip6tnl0     1         tunnel6    ip6tnl   00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00   down         false
192.168.178.XX   network     LinkStatus   lo          2         loopback            00:00:00:00:00:00                                 unknown      true
192.168.178.XX   network     LinkStatus   sit0        1         sit        sit      00:00:00:00                                       down         false
192.168.178.XX   network     LinkStatus   teql0       1         void                                                                  down         false
192.168.178.XX   network     LinkStatus   tunl0       1         ipip       ipip     00:00:00:00                                       down         false
Network addresses:
NODE             NAMESPACE   TYPE            ID                                                 VERSION   ADDRESS                                     LINK
192.168.178.XX   network     AddressStatus   cni0/10.244.2.1/24                                 1         10.244.2.1/24                               cni0
192.168.178.XX   network     AddressStatus   cni0/[IPv6_ADDRESS]/64                             2         [IPv6_ADDRESS]/64                          cni0
192.168.178.XX   network     AddressStatus   enp1s0/192.168.178.XX/24                           1         192.168.178.XX/24                           enp1s0
192.168.178.XX   network     AddressStatus   enp1s0/[IPv6_ADDRESS]/64                          1         [IPv6_ADDRESS]/64                          enp1s0
192.168.178.XX   network     AddressStatus   enp1s0/[IPv6_ADDRESS]/64                          2         [IPv6_ADDRESS]/64                          enp1s0
192.168.178.XX   network     AddressStatus   enp1s0/[IPv6_ADDRESS]/64                           2         [IPv6_ADDRESS]/64                          enp1s0
192.168.178.XX   network     AddressStatus   flannel.1/10.244.2.0/32                            1         10.244.2.0/32                               flannel.1
192.168.178.XX   network     AddressStatus   flannel.1/[IPv6_ADDRESS]/64                        2         [IPv6_ADDRESS]/64                          flannel.1
192.168.178.XX   network     AddressStatus   lo/127.0.0.1/8                                     1         127.0.0.1/8                                 lo
192.168.178.XX   network     AddressStatus   lo/169.254.116.108/32                              1         169.254.116.108/32                          lo
192.168.178.XX   network     AddressStatus   lo/::1/128                                         1         ::1/128                                     lo
Network routes:
NODE             NAMESPACE   TYPE          ID                                                                 VERSION   DESTINATION                                  GATEWAY                     LINK        METRIC
192.168.178.XX   network     RouteStatus   cni0/inet6//fe80::/64/256                                          2         fe80::/64                                                                cni0        256
192.168.178.XX   network     RouteStatus   enp1s0/inet6//[IPv6_PREFIX]::/64/256                                 1         [IPv6_PREFIX]::/64                                                 enp1s0      256
192.168.178.XX   network     RouteStatus   enp1s0/inet6//[IPv6_PREFIX]::/64/256                                1         [IPv6_PREFIX]::/64                                                      enp1s0      256
192.168.178.XX   network     RouteStatus   enp1s0/inet6//fe80::/64/256                                         1         fe80::/64                                                                enp1s0      256
192.168.178.XX   network     RouteStatus   enp1s0/inet6/fe80::[MAC_SUFFIX]//1024                               1                                                      fe80::[MAC_SUFFIX]   enp1s0      1024
192.168.178.XX   network     RouteStatus   flannel.1/inet6//fe80::/64/256                                      1         fe80::/64                                                                flannel.1   256
192.168.178.XX   network     RouteStatus   inet4//10.244.2.0/24/0                                              2         10.244.2.0/24                                                            cni0        0
192.168.178.XX   network     RouteStatus   inet4//192.168.178.0/24/1024                                        1         192.168.178.0/24                                                         enp1s0      1024
192.168.178.XX   network     RouteStatus   inet4/10.244.0.0/10.244.0.0/24/0                                    1         10.244.0.0/24                                10.244.0.0                  flannel.1   0
192.168.178.XX   network     RouteStatus   inet4/10.244.1.0/10.244.1.0/24/0                                    1         10.244.1.0/24                                10.244.1.0                  flannel.1   0
192.168.178.XX   network     RouteStatus   inet4/192.168.178.1//1024                                           1                                                      192.168.178.1               enp1s0      1024
[... additional route entries with masked IPv6 addresses ...]

# Output for Worker Node 1 (192.168.178.YY)
=== Checking network configuration for node: 192.168.178.YY ===
Network links:
NODE             NAMESPACE   TYPE         ID             VERSION   TYPE       KIND     HW ADDR                                           OPER STATE   LINK STATE
192.168.178.YY   network     LinkStatus   bond0          1         ether      bond     XX:XX:XX:XX:XX:XX                                 down         false
192.168.178.YY   network     LinkStatus   cni0           3         ether      bridge   XX:XX:XX:XX:XX:XX                                 up           true
192.168.178.YY   network     LinkStatus   dummy0         1         ether      dummy    XX:XX:XX:XX:XX:XX                                 down         false
192.168.178.YY   network     LinkStatus   enp1s0         3         ether               XX:XX:XX:XX:XX:XX                                 up           true
192.168.178.YY   network     LinkStatus   flannel.1      2         ether      vxlan    XX:XX:XX:XX:XX:XX                                 unknown      true
192.168.178.YY   network     LinkStatus   ip6tnl0        1         tunnel6    ip6tnl   00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00   down         false
192.168.178.YY   network     LinkStatus   lo             1         loopback            00:00:00:00:00:00                                 unknown      true
192.168.178.YY   network     LinkStatus   sit0           1         sit        sit      00:00:00:00                                       down         false
192.168.178.YY   network     LinkStatus   teql0          1         void                                                                  down         false
192.168.178.YY   network     LinkStatus   tunl0          1         ipip       ipip     00:00:00:00                                       down         false
192.168.178.YY   network     LinkStatus   veth[ID]       3         ether      veth     XX:XX:XX:XX:XX:XX                                 up           true
Network addresses:
NODE             NAMESPACE   TYPE            ID                                                  VERSION   ADDRESS                                      LINK
192.168.178.YY   network     AddressStatus   cni0/10.244.0.1/24                                  1         10.244.0.1/24                                cni0
192.168.178.YY   network     AddressStatus   cni0/[IPv6_ADDRESS]/64                              2         [IPv6_ADDRESS]/64                           cni0
192.168.178.YY   network     AddressStatus   enp1s0/192.168.178.YY/24                            1         192.168.178.YY/24                            enp1s0
192.168.178.YY   network     AddressStatus   enp1s0/[IPv6_ADDRESS]/64                            2         [IPv6_ADDRESS]/64                            enp1s0
192.168.178.YY   network     AddressStatus   enp1s0/[IPv6_ADDRESS]/64                            1         [IPv6_ADDRESS]/64                            enp1s0
192.168.178.YY   network     AddressStatus   enp1s0/[IPv6_ADDRESS]/64                            2         [IPv6_ADDRESS]/64                            enp1s0
192.168.178.YY   network     AddressStatus   flannel.1/10.244.0.0/32                             1         10.244.0.0/32                                flannel.1
192.168.178.YY   network     AddressStatus   flannel.1/[IPv6_ADDRESS]/64                         2         [IPv6_ADDRESS]/64                            flannel.1
192.168.178.YY   network     AddressStatus   lo/127.0.0.1/8                                      1         127.0.0.1/8                                  lo
192.168.178.YY   network     AddressStatus   lo/169.254.116.108/32                               1         169.254.116.108/32                           lo
192.168.178.YY   network     AddressStatus   lo/::1/128                                          1         ::1/128                                      lo
192.168.178.YY   network     AddressStatus   veth[ID]/[IPv6_ADDRESS]/64                          2         [IPv6_ADDRESS]/64                            veth[ID]
Network routes:
[... routes similar to control plane, with appropriate PodCIDR (10.244.0.0/24) and masked addresses ...]

# Output for Worker Node 2 (192.168.178.ZZ)
=== Checking network configuration for node: 192.168.178.ZZ ===
Network links:
NODE             NAMESPACE   TYPE         ID             VERSION   TYPE       KIND     HW ADDR                                           OPER STATE   LINK STATE
192.168.178.ZZ   network     LinkStatus   bond0          1         ether      bond     XX:XX:XX:XX:XX:XX                                 down         false
192.168.178.ZZ   network     LinkStatus   cni0           3         ether      bridge   XX:XX:XX:XX:XX:XX                                 up           true
192.168.178.ZZ   network     LinkStatus   dummy0         1         ether      dummy    XX:XX:XX:XX:XX:XX                                 down         false
192.168.178.ZZ   network     LinkStatus   enp1s0         4         ether               XX:XX:XX:XX:XX:XX                                 up           true
192.168.178.ZZ   network     LinkStatus   flannel.1      1         ether      vxlan    XX:XX:XX:XX:XX:XX                                 unknown      true
192.168.178.ZZ   network     LinkStatus   ip6tnl0        1         tunnel6    ip6tnl   00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00   down         false
192.168.178.ZZ   network     LinkStatus   lo             1         loopback            00:00:00:00:00:00                                 unknown      true
192.168.178.ZZ   network     LinkStatus   sit0           1         sit        sit      00:00:00:00                                       down         false
192.168.178.ZZ   network     LinkStatus   teql0          1         void                                                                  down         false
192.168.178.ZZ   network     LinkStatus   tunl0          1         ipip       ipip     00:00:00:00                                       down         false
192.168.178.ZZ   network     LinkStatus   veth[ID]       2         ether      veth     XX:XX:XX:XX:XX:XX                                 up           true
Network addresses:
NODE             NAMESPACE   TYPE            ID                                                  VERSION   ADDRESS                                      LINK
192.168.178.ZZ   network     AddressStatus   cni0/10.244.1.1/24                                  1         10.244.1.1/24                                cni0
192.168.178.ZZ   network     AddressStatus   cni0/[IPv6_ADDRESS]/64                             2         [IPv6_ADDRESS]/64                            cni0
192.168.178.ZZ   network     AddressStatus   enp1s0/192.168.178.ZZ/24                            1         192.168.178.ZZ/24                            enp1s0
192.168.178.ZZ   network     AddressStatus   enp1s0/[IPv6_ADDRESS]/64                            2         [IPv6_ADDRESS]/64                            enp1s0
192.168.178.ZZ   network     AddressStatus   enp1s0/[IPv6_ADDRESS]/64                            1         [IPv6_ADDRESS]/64                            enp1s0
192.168.178.ZZ   network     AddressStatus   enp1s0/[IPv6_ADDRESS]/64                            2         [IPv6_ADDRESS]/64                            enp1s0
192.168.178.ZZ   network     AddressStatus   flannel.1/10.244.1.0/32                              1         10.244.1.0/32                                flannel.1
192.168.178.ZZ   network     AddressStatus   flannel.1/[IPv6_ADDRESS]/64                          2         [IPv6_ADDRESS]/64                            flannel.1
192.168.178.ZZ   network     AddressStatus   lo/127.0.0.1/8                                      1         127.0.0.1/8                                  lo
192.168.178.ZZ   network     AddressStatus   lo/169.254.116.108/32                                 1         169.254.116.108/32                           lo
192.168.178.ZZ   network     AddressStatus   lo/::1/128                                          1         ::1/128                                      lo
192.168.178.ZZ   network     AddressStatus   veth[ID]/[IPv6_ADDRESS]/64                          2         [IPv6_ADDRESS]/64                            veth[ID]
Network routes:
[... routes similar to other nodes, with appropriate PodCIDR (10.244.1.0/24) and masked addresses ...]

Note: The output shows network configuration for all three nodes. Key observations:

  • Control Plane (192.168.178.XX): PodCIDR 10.244.2.0/24
  • Worker Node 1 (192.168.178.YY): PodCIDR 10.244.0.0/24
  • Worker Node 2 (192.168.178.ZZ): PodCIDR 10.244.1.0/24
  • All nodes have Flannel CNI configured with VXLAN interfaces
  • Physical interfaces (enp1s0) are up and operational
  • CNI bridge interfaces (cni0) are configured for pod networking
  • IPv6 addresses and MAC addresses have been masked for privacy

Network troubleshooting confirmed that all nodes have their physical interfaces (enp1s0) up and operational. The CNI bridge (cni0) is properly configured for pod networking, and the Flannel VXLAN interface (flannel.1) is set up for the overlay network. Each node has its management IP address assigned, along with PodCIDR ranges: the control plane uses 10.244.2.0/24, worker node 1 uses 10.244.0.0/24, and worker node 2 uses 10.244.1.0/24. IPv6 addresses are also configured on all nodes.

Routing is properly configured with no issues. Flannel CNI routes are correctly set up, enabling communication between pod networks across all three nodes via the flannel.1 interface. The default gateway (192.168.178.1) is configured on all nodes, and local routes for loopback and interface addresses are properly established. Worker nodes also show veth interfaces for running pods, indicating active pod networking.

Storage Configuration

Storage configuration for your homelab cluster.

Storage Considerations

  • OS Disk: Used by Talos (minimal, immutable) - single internal SSD on each node (/dev/sda)
    • Control plane: 128GB SSD (SanDisk SD6SF1M1)
    • Worker nodes: 256GB SSDs (SanDisk X600 2.5)
  • Data Storage: Each node has only one internal disk, which is dedicated to the Talos OS installation
  • Shared Storage: NFS/SMB share from a NAS in the local network will be configured later for persistent volumes (covered in Part 5)

Storage Planning

After checking disks and updating machine configurations (see “Checking Available Disks on Nodes” section above), consider the following for your storage setup:

  • OS Disk: Each node (control-plane and workers) has only one internal disk (/dev/sda) which is used exclusively for the Talos OS installation
  • Data Storage: Since all nodes only have one internal disk dedicated to the OS, persistent storage for Kubernetes workloads will be provided via NFS/SMB share from a NAS in the local network (to be configured in Part 5)

Storage planning is straightforward for this setup. Both control-plane and worker nodes have only one internal disk (/dev/sda) which is used exclusively for the Talos OS installation. The control plane uses a 128GB SSD (SanDisk SD6SF1M1), while the worker nodes use 256GB SSDs (SanDisk X600 2.5). For persistent storage for Kubernetes workloads, we’ll configure NFS/SMB shares from a NAS on the local network later, which will be covered in Part 5 of this series.

Best Practices

Based on the Talos Production Clusters documentation, here are key considerations for building a production-grade Talos Linux cluster:

High Availability Considerations

  • Multiple Control Plane Nodes: For true high availability, use multiple control plane nodes (minimum 3 recommended for production)
  • Kubernetes Endpoint Configuration: Configure a load balancer or DNS records pointing to all control plane nodes to ensure the Kubernetes API server endpoint can reach all control plane nodes
  • Endpoint Configuration: Use talosctl config endpoint to configure multiple control plane endpoints, enabling automatic load balancing and failover when individual nodes become unavailable
  • Load Balancer Setup: For production or homelab with multiple control planes, consider setting up a load balancer solution such as MetalLB (to be covered in a future part of this series) or a dedicated load balancer (HAProxy, NGINX reverse proxy) to route to your control plane nodes
  • DNS Records: As an alternative to a load balancer, create multiple DNS records that point to all your control plane nodes (or use MetalLB for a Kubernetes-native solution - to be covered in a future part)

Security Best Practices

  • Secure Configuration Storage: Store machine configurations securely (encrypted, version controlled) - the talosconfig file is your key to managing the cluster
  • Strong Certificates: Talos Linux uses gRPC and mutual TLS for API access - ensure strong cluster endpoint certificates
  • Network Access Control: Limit network access to control plane nodes (port 50000 for Talos API, port 6443 for Kubernetes API)
  • Firewall Configuration: If nodes are behind a firewall or in a private network, configure a TCP load balancer to forward port 50000 for Talos API access (note: HTTP/S proxies cannot be used due to gRPC and mutual TLS requirements)
  • Regular Updates: Regularly update Talos and Kubernetes to maintain security
  • Configuration Patches: Use configuration patches for customization instead of modifying base configurations directly

Configuration Management Best Practices

  • Machine Configuration Patching: Use talosctl machineconfig patch to modify configurations rather than editing files directly - this allows for cleaner, more maintainable configurations
  • Verify Node Configuration: Before applying configurations, verify network interfaces and disk settings match your hardware using talosctl get links and talosctl get disks
  • Talosconfig Management: Properly manage your talosconfig file - either merge into ~/.talos/config or set the TALOSCONFIG environment variable
  • Endpoint Configuration: Configure endpoints using talosctl config endpoint to enable automatic failover between control plane nodes
  • Version Control: Version control your machine configurations and patches (to be covered in a future part - main goal for now is cluster setup)
  • Configuration Backups: Keep backups of critical configurations (to be covered in a future part - main goal for now is cluster setup)

Operational Best Practices

  • Unmount Installation Media: Unplug installation USB drives or unmount ISOs from all nodes after installation to prevent accidental installation to USB drives
  • Node Verification: Verify nodes are running correctly using kubectl get nodes after cluster setup
  • Documentation: Document your cluster configuration, including IP addresses, hardware specifications, and network topology
  • Testing: Test configurations in a lab environment before applying to production (to be covered in monitoring setup part of this series)
  • Monitoring: Monitor cluster health regularly using kubectl and talosctl commands (to be covered in monitoring setup part of this series)

Network Configuration Best Practices

  • Network Interface Verification: Check network interfaces using talosctl get links and ensure operational state is “up” before applying configurations
  • Multihoming Support: If machines have multiple network interfaces (multihomed), refer to the Multihoming documentation for additional configuration requirements
  • Static IP Configuration: Use static IPs or DHCP reservations for stability (as done in this homelab setup)
  • Network Device Selector: For environments where network interface names may change (virtualized environments, systems with multiple interfaces, or when interface names are unpredictable), consider using network device selectors to match interfaces based on stable hardware attributes (MAC address, PCI ID, driver, bus path) rather than interface names. For this homelab setup with predictable interface names (enp1s0), using the interface field is sufficient.

Homelab-Specific Tips

  • Single Control Plane: For homelab use, a single control plane node is acceptable (as shown in this guide), though multiple control planes provide better availability
  • Static IPs: Use static IPs via DHCP reservations (as configured in FritzBox) for stability
  • Hardware Documentation: Document hardware specifications for each node type
  • Power Management: Not needed for this homelab setup - nodes run 24/7 without special power management requirements
  • Storage Planning: Already planned - NFS/SMB shares will be added later for persistent volumes (mentioned in Storage Configuration section)
  • Spare Hardware: Can be added later if needed for testing configuration changes - not a current requirement for this setup

Troubleshooting

Common Issue 1: Node Not Joining Cluster

Problem: Worker node shows as NotReady or doesn’t appear in kubectl get nodes

Solution:

# Check node status via talosctl
talosctl --nodes [NODE_IP] get members

# Check kubelet logs
talosctl --nodes [NODE_IP] logs kubelet

# Verify network connectivity
talosctl --nodes [NODE_IP] get links

Notes:

  • Issue encountered: None - all nodes joined the cluster successfully without issues
  • Resolution: N/A - no issues encountered during cluster setup. Command history above kept for reference in case issues occur in the future.

Common Issue 2: Bootstrap Fails

Problem: talosctl bootstrap command fails or times out

Solution:

# Set up variable (if not already set)
export CONTROL_PLANE_IP=192.168.178.55

# Check control plane node status
talosctl --nodes ${CONTROL_PLANE_IP} get members

# Check etcd status
talosctl --nodes ${CONTROL_PLANE_IP} logs etcd

# Verify API server is running
curl -k https://${CONTROL_PLANE_IP}:6443/healthz

Notes:

  • Issue encountered: None - bootstrap completed successfully without issues
  • Resolution: N/A - no issues encountered during bootstrap. Command history above kept for reference in case issues occur in the future.

Common Issue 3: Configuration Apply Fails

Problem: talosctl apply-config fails with connection errors

Solution:

# Verify node is accessible
ping [NODE_IP]

# Check if Talos is running
# (If using VMs, check console output)

# Try with verbose output
talosctl apply-config --insecure --nodes [NODE_IP] --file [CONFIG_FILE] --debug

Notes:

  • Issue encountered: None - configuration applied successfully to all nodes without issues
  • Resolution: N/A - no issues encountered during configuration application. Command history above kept for reference in case issues occur in the future.

Common Issue 4: API Server Not Accessible

Problem: Cannot connect to Kubernetes API server

Solution:

# Check control plane node
talosctl --nodes [CONTROL_PLANE_IP] get services

# Verify API server is listening
talosctl --nodes [CONTROL_PLANE_IP] get processes | grep kube-apiserver

# Check firewall rules
# (Platform-specific commands)

Notes:

  • Issue encountered: None - API server is accessible and responding correctly
  • Resolution: N/A - no issues encountered with API server accessibility. Command history above kept for reference in case issues occur in the future.

Real-World Example

My Homelab Setup Experience

The entire installation and configuration process went remarkably smoothly. During the planning phase, I decided on a 3-node cluster with one control plane and two workers, selecting hardware based on cost-effectiveness. I chose a Fujitsu Futro S720 for the control plane and Dell Optiplex 3050 Micro units for the workers. The network setup was planned with DHCP reservations in the FritzBox router to provide static IPs without manual configuration. I selected Talos v1.11.6 with Kubernetes v1.34.1 (later upgraded to v1.34.3).

Hardware acquisition was straightforward—I purchased all three nodes from eBay for a total of 270€, with delivery taking just 2 days. The control plane received a storage upgrade from the original 2GB mSATA SSD to 128GB, while the worker nodes came with 256GB SSDs each.

The initial installation completed without any issues. I installed talosctl via Homebrew on macOS, downloaded the 333MB Talos ISO from the Image Factory, set up a 16GB USB drive with Ventoy version 1.1.10, and successfully booted all nodes. After confirming successful boots, I removed the USB drives, and all nodes were accessible on the network.

Configuration was equally smooth. I applied machine configurations to all nodes, configured endpoints using talosctl config endpoints to enable automatic failover, and upgraded both Talos (to v1.11.6) and Kubernetes (to v1.34.3). The kubeconfig was saved as discworld-homelab in the ~/.kube/ directory, and all nodes verified correctly.

Testing confirmed everything was working perfectly. All three nodes joined the cluster and showed Ready status. All 11 system pods were running and healthy, a test nginx deployment was created and scheduled successfully (running on a worker node within ~13 seconds), network connectivity was excellent with 0% packet loss and average latency of ~4-5ms, and the API server was accessible and responding correctly.

Lessons Learned:

The most valuable lesson was using DHCP reservations in the FritzBox for static IPs. This approach provides the stability of static IPs without requiring manual network configuration on each node, and all nodes received consistent IP addresses (192.168.178.55, .56, .57).

Another key insight was the importance of configuring endpoints using talosctl config endpoints. This simple step eliminates the need for the --endpoints flag in every subsequent talosctl command, significantly improving workflow efficiency and reducing command complexity.

For storage, dedicating a single internal disk per node exclusively to the Talos OS works well for homelabs. Planning to add NFS/SMB shares from a NAS later provides a clean separation between OS and data storage, simplifying management and allowing for flexible storage expansion.

Performance Observations:

Boot times were excellent—nodes booted successfully from USB and transitioned smoothly to the installed Talos OS. After removing the USB drives, nodes run reliably from the installed OS on internal SSDs.

Resource usage is well-balanced. The control plane node (Fujitsu Futro S720 with 4GB RAM and 2 cores) handles the control plane workload adequately, while the worker nodes (Dell Optiplex 3050 Micro with 8GB RAM and Intel Core i5 processors) provide good performance for workloads. All nodes show healthy resource allocation.

Network performance has been excellent. All nodes show 0% packet loss with average latencies around 4-5ms. Pod scheduling works efficiently—the test pod was scheduled within ~13 seconds, and image pulls completed in 730ms when cached. The Flannel CNI is properly configured with VXLAN interfaces on all nodes, enabling seamless inter-pod communication.

Summary

This installation process demonstrates how Talos Linux provides a streamlined approach to deploying Kubernetes. The entire process, from creating custom ISOs to verifying cluster health, showcases several key advantages of the Talos approach.

Machine configurations are declarative and version-controllable, making it easy to reproduce and maintain your cluster setup. The API-driven approach eliminates the need for SSH access, significantly reducing the attack surface while providing a more secure management interface. Proper network configuration proved essential for cluster communication, and the verification steps we performed ensured a healthy cluster before proceeding with workloads.

Throughout this guide, we accomplished a complete cluster setup. We installed the talosctl CLI tool via Homebrew on macOS, created a custom Talos ISO using the Image Factory with v1.11.6 and necessary extensions, and generated machine configurations for our 3-node cluster. The configurations were applied successfully to all nodes, Kubernetes was bootstrapped on the control plane, and both Talos and Kubernetes were upgraded to their latest compatible versions. We configured endpoints and kubeconfig for cluster access, and verified that everything was working correctly—all nodes are Ready, system pods are running, and test deployments succeed.

Next Steps

Now that your Talos Linux cluster is up and running:

  • Explore your cluster with kubectl commands
  • Plan for storage solutions (covered in Part 5)
  • Learn to customize and manage your cluster configurations (Part 3 - Coming Soon)

Recommended Reading

If you want to dive deeper into Talos Linux and Kubernetes, here are some excellent books that complement this series:

Note: The Amazon links below are affiliate links for Amazon Influencers and Associates. If you make a purchase through these links, I may earn a small commission at no additional cost to you.

Talos Linux Books

Kubernetes Books


Resources

Official Documentation

Related Articles

Tools and Utilities

Community Resources


Series Navigation

Previous: Part 1 - Talos Linux Introduction

Current: Part 2 - Talos Installation - Building Your First Cluster

Next: Part 3 - Talos Configuration Management - GitOps for Infrastructure

Full Series:

  1. Talos Linux Introduction
  2. Talos Installation - Building Your First Cluster (You are here)
  3. Talos Configuration Management - GitOps for Infrastructure
  4. High Availability Setup - Production-Grade Cluster
  5. Storage Configuration - Persistent Storage for Kubernetes (Coming Soon)
  6. Networking - CNI, Load Balancing, and Ingress (Coming Soon)
  7. Security Hardening - Securing Your Homelab Cluster (Coming Soon)
  8. Monitoring and Maintenance - Keeping Your Cluster Healthy (Coming Soon)

This article is part of the “Talos Linux Homelab” series. Follow along as we build a production-grade Kubernetes homelab from the ground up.

Questions or feedback? Reach out via email or connect on LinkedIn.