Remove failed node in TKG after MachineHealth checks mark it failed

Date: 27 May 2021

By: gubi

Tag: k8s

Comment: 0

With Tanzu Kubernetes Grid Cluster API implements Machine Health-Check that provides node health monitoring and node auto-repair for Tanzu Kubernetes clusters.

In action, what happens is it automatically consolidates the desired state with regards to node configuration in the event of failure. Saw this in action when power wen-out and caused one my worker nodes to be corrupted.

As a response, the cluster-api provisioned a new node to reach the desired state of the kubernetes cluster. At the same time, it would try to recover the failed node by powering on the VM.