TrueK8S Part 10

29 Nov 2025 by bmel

Destroying and Rebuilding our Talos Cluster

For the final post of this series, we’ll put our cluster through a disaster recovery scenario. Because our cluster’s configuration is defined declaratively and all of our volumes and databases are replicated to cloud storage, we can re-create our cluster with little more than our git repo and our SOPS key. For this exercise, I’ll be simulating a complete disaster recovery. Let’s assume that my current cluster and administration machine have been destroyed and work through the process of rebuilding the Talos cluster, rebuilding our administration workstation, and then bootstrapping our new cluster with our config.

Before getting started (and especially before destroying any of your work) you’ll need to make sure you have the following things lined up:

Your Talos patch files (optional, but convenient)
Access to your repo with the same or a new personal access token
Your SOPS key
Your Flux bootsrap command (optional, but convenient)

Destroy the Cluster

Go ahead and give your existing cluster a graceful shutdown. If you’re feeling brave you can delete them, otherwise power them off and keep them as a backup.

Rebuilding the Cluster

I’ll start by creating three new VMs, just like we did in part 1.

Don’t forget to install Talosctl if you haven’t already.

curl -sL https://talos.dev/install | sh

Save your IPs and cluster names as variables, then generate your initial talos config file.

export CONTROL_IP=192.168.XX.101
export WORKER1_IP=192.168.XX.111
export WORKER2_IP=192.168.XX.112
export CLUSTERNAME=truek8s

$ talosctl gen config $CLUSTERNAME https://$CONTROL_IP:6443 --install-image factory.talos.dev/installer/82866c01b2842b490c27a6f1a4996aae05f096c83db40ecda166b03da9deae46/:v1.9.2
generating PKI and tokens
Created /home/bmel/Documents/truek8s-talos/controlplane.yaml
Created /home/bmel/Documents/truek8s-talos/worker.yaml
Created /home/bmel/Documents/truek8s-talos/talosconfig

💡 Remember, you can specify any version using the same schematic ID. This guide was written with v1.9.2 but I’ve tested v1.10.8 and the process is the same.

Then go ahead and recreate or copy your patch files to your working directory. Check part 1 for details on recreating them. Remember, the only thing different between these three is the hostname field.

truek8s-c1.patch
truek8s-w1.patch
truek8s-w2.patch

And then patch your new machine config files:

$ talosctl machineconfig patch controlplane.yaml --patch @truek8s-c1.patch --output truek8s-c1.yaml
$ talosctl machineconfig patch worker.yaml --patch @truek8s-w1.patch --output truek8s-w1.yaml
$ talosctl machineconfig patch worker.yaml --patch @truek8s-w2.patch --output truek8s-w2.yaml

Double-check the machine config files you just generated, and then apply them to your cluster. Start with the control node, wait for it to come up, and then do your workers.

$ talosctl apply-config --insecure -n $CONTROL_IP --file truek8s-c1.yaml

$ talosctl apply-config --insecure -n $WORKER1_IP --file truek8s-w1.yaml
$ talosctl apply-config --insecure -n $WORKER2_IP --file truek8s-w2.yaml

When all three are back up and show healthy, finalize the install by bootstrapping the cluster:

$ talosctl bootstrap --nodes $CONTROL_IP --endpoints $CONTROL_IP --talosconfig talosconfig

Finally, generate and save your talos and kube config files.

copy and edit the talos config file

$ cp talosconfig ~/.talos/config
$ nano ~/.talos/config 
$ cat ~/.talos/config 
context: truek8s
contexts:
    truek8s:
        endpoints: [192.168.XX.101]
############################
############################
############################

Generate kubeconfig

$ talosctl kubeconfig -n $CONTROL_IP

If you don’t already have kubectl installed, install it:

$ apt install kubectl

And finally, check to make sure talosctl and kubectl are working.

$ talosctl dashboard -n $CONTROL_IP 

$ kubectl get nodes
NAME         STATUS   ROLES           AGE   VERSION
truek8s-c1   Ready    control-plane   62s   v1.33.2
truek8s-w1   Ready    <none>          55s   v1.33.2
truek8s-w2   Ready    <none>          66s   v1.33.2

Clone your repo

Once the cluster is up, clone your flux repo to your admin workstation so you can make changes. You will need a private access token, but it doesn’t have to be the same one. Check part 2 for details on how to make a new one if you need it.

#You don't need quotes here
export GITHUB_TOKEN=<your-PAT>
export GITHUB_USER=<your-username>
export REPO=truek8s

git clone https://$GITHUB_TOKEN@github.com/$GITHUB_USER/$REPO
Cloning into 'truek8s'...

Configure SOPS

Before we can move on to bootstrapping flux, we need to configure SOPS. We’ll need to provide the key to flux as a kube secret later, but also need to have a copy of it on our admin workstation so we can decrypt our config map.

If you haven’t already, install SOPS and AGE.

# Download the binary
curl -LO https://github.com/getsops/sops/releases/download/v3.10.2/sops-v3.10.2.linux.amd64

# Move the binary in to your PATH
mv sops-v3.10.2.linux.amd64 /usr/local/bin/sops

# Make the binary executable
chmod +x /usr/local/bin/sops

#Install age
$ apt install age

💡 This command installs version 3.10.2. Check SOPS documentation to see if a newer version is available.

Copy your sops-key.text file to ~/.config/sops/age/keys.txt. You can test that SOPS is configured and that you have the right key by decrypting and re-encrypting your configmap.

$ sops -d -i cluster-config.yaml
$ sops -e -i cluster-config.yaml

Make sure you keep that key file handy. You’ll need to provide it to flux in a later step.

Volsync and CNPG Configuration

Before we bootstrap flux and commit our config to the cluster, we need to make sure that any charts using VolSync or CNPG are configured to restore. In the examples we’ve worked on through this guide, that simply means uncommenting the mode: recovery statement in any helm charts.

Using the configuration we made for nextcloud as an example:

      cnpg:
        main:
          cluster:
            singleNode: true
          pgVersion: 15
# Uncomment this line          
          mode: recovery
          backups:
            enabled: true
            credentials: backblaze
            scheduledBackups:
              - name: daily-backup
                schedule: "0 0 0 * * *"
                backupOwnerReference: self
                immediate: true
                suspend: false
          recovery:
            method: object_store
            credentials: backblaze

Make sure you do this for any CNPG enabled charts in your repo. When you’re all done, commit any changes to your repo:

⚠️ make sure you config map is encrypted before committing.

git add -A && git commit -m 'set recovery'
git push

Flux Bootstrap

Now all you need to do is bootstrap flux with your repo, hand it the sops key, and it will take care of everything from there. For the github PAT, you can create a new one or re-use the same one from earlier.

if you don’t already have it installed, go ahead and install flux:

curl -s https://fluxcd.io/install.sh | sudo bash

And then re-run the flux bootstrap command:

$ export GITHUB_TOKEN=<PAT>
$ export GITHUB_USER=<Github Username>
$ export REPO=truek8s
$ flux bootstrap github --token-auth --owner=$GITHUB_USER   --repository=$REPO --branch=main   --path=./clusters/$REPO   --personal
► connecting to github.com
► cloning branch "main" from Git repository "https://github.com/bmel343/truek8s.git"
✔ cloned repository
► generating component manifests
✔ generated component manifests
✔ committed sync manifests to "main" ("6b49c159dff1f604c5df9e542731c41e0d4412f1")
► pushing component manifests to "https://github.com/bmel343/truek8s.git"
► installing components in "flux-system" namespace
✔ installed components
✔ reconciled components
► determining if source secret "flux-system/flux-system" exists
► generating source secret
► applying source secret "flux-system/flux-system"
✔ reconciled source secret
► generating sync manifests
✔ generated sync manifests
✔ sync manifests are up to date
► applying sync manifests
✔ reconciled sync configuration
◎ waiting for Kustomization "flux-system/flux-system" to be reconciled
✔ Kustomization reconciled successfully
► confirming components are healthy
✔ helm-controller: deployment ready
✔ kustomize-controller: deployment ready
✔ notification-controller: deployment ready
✔ source-controller: deployment ready
✔ all components are healthy

As soon as flux is done bootstrapping, create the sops-age secret in the flux-system namespace. Don’t worry, thanks to the dependency configuration, Flux will wait for the secret to exist to reconcile the config ks, and all other KS depend on that.

$ kubectl -n flux-system create secret generic sops-age --from-file=age.agekey=sops-key.txt

secret/sops-age created

Sit Back and Watch

Flux will take it from here, working diligently to apply all the resources defined in our repo to the cluster. In my experience, it can take up 20-30 minutes for actual apps to start working again. I like to watch flux events, and check on specific kustomizations while I wait. Flux will be pretty noisy, and you may even see a few errors in there, but flux can often reconcile that by waiting and trying again. In this example it’s complaining about not finding the sops-age secret, but later finds it and continues along.

$ flux events --watch
LAST SEEN       TYPE    REASON                  OBJECT                  MESSAGE                                                   
11s             Normal  DependencyNotReady      Kustomization/apps      Dependencies do not meet ready condition, retrying in 30s
10s     Normal  NewArtifact     HelmRepository/cloudnative-pg   stored fetched index of size 77.48kB from 'https://cloudnative-pg.github.io/charts'
11s     Warning BuildFailed     Kustomization/config    secrets "sops-age" not found
18s     Normal  NewArtifact     GitRepository/flux-system       stored artifact for commit 'set recovery'
12s     Normal  Progressing     Kustomization/flux-system       CustomResourceDefinition/alerts.notification.toolkit.fluxcd.io configured     
                                                                CustomResourceDefinition/buckets.source.toolkit.fluxcd.io configured          
                                                                CustomResourceDefinition/externalartifacts.source.toolkit.fluxcd.io configured
                                                                CustomResourceDefinition/gitrepositories.source.toolkit.fluxcd.io configured  
                                                                CustomResourceDefinition/helmcharts.source.toolkit.fluxcd.io configured       
                                                                CustomResourceDefinition/helmreleases.helm.toolkit.fluxcd.io configured       
                                                                CustomResourceDefinition/helmrepositories.source.toolkit.fluxcd.io configured 
                                                                CustomResourceDefinition/kustomizations.kustomize.toolkit.fluxcd.io configured
                                                                CustomResourceDefinition/ocirepositories.source.toolkit.fluxcd.io configured  
                                                                CustomResourceDefinition/providers.notification.toolkit.fluxcd.io configured  
                                                                CustomResourceDefinition/receivers.notification.toolkit.fluxcd.io configured  
                                                                Namespace/flux-system configured                                              
                                                                ClusterRole/crd-controller-flux-system configured                             
                                                                ClusterRole/flux-edit-flux-system configured                                  
                                                                ClusterRole/flux-view-flux-system configured                                  
                                                                ClusterRoleBinding/cluster-reconciler-flux-system configured                  
                                                                ClusterRoleBinding/crd-controller-flux-system configured                      
                                                                ResourceQuota/flux-system/critical-pods-flux-system configured                
                                                                ServiceAccount/flux-system/helm-controller configured                         
                                                                ServiceAccount/flux-system/kustomize-controller configured                    
                                                                ServiceAccount/flux-system/notification-controller configured                 
                                                                ServiceAccount/flux-system/source-controller configured                       
                                                                Service/flux-system/notification-controller configured                        
                                                                Service/flux-system/source-controller configured                              
                                                                Service/flux-system/webhook-receiver configured                               
                                                                Deployment/flux-system/helm-controller configured                             
                                                                Deployment/flux-system/kustomize-controller configured                        
                                                                Deployment/flux-system/notification-controller configured                     
                                                                Deployment/flux-system/source-controller configured                           
                                                                Kustomization/flux-system/apps created                                        
                                                                Kustomization/flux-system/config created                                      
                                                                Kustomization/flux-system/flux-system configured                              
                                                                Kustomization/flux-system/infrastructure created                              
                                                                Kustomization/flux-system/repos created                                       
                                                                NetworkPolicy/flux-system/allow-egress configured                             
                                                                NetworkPolicy/flux-system/allow-scraping configured                           
                                                                NetworkPolicy/flux-system/allow-webhooks configured                           
                                                                GitRepository/flux-system/flux-system configured                              
12s     Normal  ReconciliationSucceeded Kustomization/flux-system       Reconciliation finished in 6.02411383s, next run in 10m0s
11s     Normal  DependencyNotReady      Kustomization/infrastructure    Dependencies do not meet ready condition, retrying in 30s
10s     Normal  NewArtifact     HelmRepository/ingress-nginx    stored fetched index of size 224.7kB from 'https://kubernetes.github.io/ingress-nginx'
10s     Normal  NewArtifact     HelmRepository/jetstack stored fetched index of size 492.1kB from 'https://charts.jetstack.io/'
10s     Normal  NewArtifact     HelmRepository/longhorn stored fetched index of size 70.95kB from 'https://charts.longhorn.io'
9s      Normal  NewArtifact     HelmRepository/metallb  stored fetched index of size 29.23kB from 'https://metallb.github.io/metallb'
11s     Normal  Progressing     Kustomization/repos     HelmRepository/flux-system/cloudnative-pg created
                                                        HelmRepository/flux-system/ingress-nginx created 
                                                        HelmRepository/flux-system/jetstack created      
                                                        HelmRepository/flux-system/longhorn created      
                                                        HelmRepository/flux-system/metallb created       
                                                        HelmRepository/flux-system/podinfo created       
                                                        HelmRepository/flux-system/strrl created         
                                                        HelmRepository/flux-system/truecharts created    
6s      Normal  Progressing     Kustomization/repos     Health check passed in 5.044890457s

Pay particular attention to the order in which kustomizations are applied. Think back to part 2 when we considered how best to structure our repository, and how to configure dependencies. For example, the apps kustomization watis for the infrastructure kustomization to be deployed first. An application like Vualtwarden or Nextcloud depends on VolSync, CNPG, clusterissuer, metallb, longhorn, etc. The way we’ve configured things, Flux knows to wait until all of those resources are available to start deploying any apps, which prevents all sorts of potential deployment errors.

$ flux get ks
NAME                    REVISION                SUSPENDED       READY   MESSAGE                                              
apps                                            False           False   dependency 'flux-system/infrastructure' is not ready
cert-manager            main@sha1:0cf3b4e3      False           True    Applied revision: main@sha1:0cf3b4e3                
cf-tunnel-ingress       main@sha1:0cf3b4e3      False           True    Applied revision: main@sha1:0cf3b4e3                
cloudnative-pg          main@sha1:0cf3b4e3      False           False   dependency 'flux-system/longhorn' is not ready      
cluster-issuer          main@sha1:0cf3b4e3      False           False   dependency 'flux-system/cert-manager' is not ready  
config                  main@sha1:0cf3b4e3      False           True    Applied revision: main@sha1:0cf3b4e3                
flux-system             main@sha1:0cf3b4e3      False           True    Applied revision: main@sha1:0cf3b4e3                
infrastructure                                  False           Unknown Reconciliation in progress                          
ingress-nginx           main@sha1:0cf3b4e3      False           True    Applied revision: main@sha1:0cf3b4e3                
kubernetes-reflector    main@sha1:0cf3b4e3      False           True    Applied revision: main@sha1:0cf3b4e3                
longhorn                main@sha1:0cf3b4e3      False           True    Applied revision: main@sha1:0cf3b4e3                
longhorn-config                                 False           False   dependency 'flux-system/longhorn' is not ready      
metallb                 main@sha1:0cf3b4e3      False           True    Applied revision: main@sha1:0cf3b4e3                
metallb-config          main@sha1:0cf3b4e3      False           True    Applied revision: main@sha1:0cf3b4e3                
repos                   main@sha1:0cf3b4e3      False           True    Applied revision: main@sha1:0cf3b4e3                
snapshot-controller     main@sha1:0cf3b4e3      False           True    Applied revision: main@sha1:0cf3b4e3                
volsync                 main@sha1:0cf3b4e3      False           True    Applied revision: main@sha1:0cf3b4e3                
volsync-config          main@sha1:0cf3b4e3      False           True    Applied revision: main@sha1:0cf3b4e3

Your first sign that things are working is when the infrastructure ks reconciles. Once it does, infrastructure components like longhorn, metallb, nginx should come up. One of my first checks is the longhorn UI, which will show volumes created for any restore tasks or new deployments.

Once flux starts deploying your applications, you can keep an eye on any pods created in that applicaiton namespace. It can be anxiety inducing to wait for a large deployment like nextcloud to reach ready status. Just know that if the CNPG and Volsync containers reach complete or running status, you’re probably fine.

$ kubectl get pods -n nextcloud-demo 
NAME                                             READY   STATUS              RESTARTS   AGE
nextcloud-demo-5c4777bf8d-f8dhp                  0/1     Init:0/3            0          8m11s
nextcloud-demo-clamav-76666fbd67-tsrbd           0/1     Init:0/2            0          8m11s
nextcloud-demo-cnpg-main-1-full-recovery-pws75   1/1     Running             0          8m10s
nextcloud-demo-collabora-644c55596f-rrglv        0/1     Init:0/2            0          8m11s
nextcloud-demo-imaginary-5d55d79fcf-2jbzd        0/1     Init:0/2            0          8m11s
nextcloud-demo-nextcloud-cron-29407260-7c2pq     0/1     ContainerCreating   0          7m8s
nextcloud-demo-nginx-675b945674-8n5d4            0/1     Init:0/3            0          8m11s
nextcloud-demo-notify-66bc88d785-zrfkj           0/1     Init:0/3            0          8m12s
nextcloud-demo-preview-cron-29407260-748xr       0/1     ContainerCreating   0          7m8s
nextcloud-demo-redis-0                           1/1     Running             0          8m11s

And when it’s all done, your applications should be back online. Here I am checking for my test record in vaultwarden, and logging back into nextcloud.

Conclusion

I could go on writing about this little project of mine for another 10 posts, but I need to draw a line in the sand so I can close this out and return to some of my other projects. All in all, it’s been over a year since I set out to migrate all of my TrueCharts apps into the Talos\Kubernetes ecosystem, and it’s been 6 months since I started writing this series. I think I had everything ‘working’ a few months ago, but challenging myself along the way to write this series of posts forced me to do my best to do things the right way. I’m sure I will have other posts related to this, but let’s call this the ‘core’ guide for my TrueK8S project. Thanks for reading, and I hope you found something useful here.

truek8s

truecharts

kubernetes

talos

homelab

← TrueK8S Part 09