In this post, I'll describe how to remove a particular node from a Kubernetes cluster on GKE. Why would you want to do that? In my case, I'm running jupyterhub and I need to do that as part of implementing cluster scaling. That's probably a rare need, but it helped me understand more about the GCE structures behind a Kubernetes cluster.
So let's start. The first thing you need to do is:
Drain your node
Let's look at my nodes:
$ kl get nodes NAME STATUS AGE gke-jcluster-default-pool-9cc4e660-rx9p Ready 1d gke-jcluster-default-pool-9cc4e660-xr4z Ready 2d
I want to remove rx9p. I'll first drain it:
$ kl drain gke-jcluster-default-pool-9cc4e660-rx9p --force node "gke-jcluster-default-pool-9cc4e660-rx9p" cordoned error: pods with local storage (use --delete-local-data to override): jupyter-petko-1
Great, the node is now drained. Next is:
Removing the GCE VM
Your Kubernetes cluster runs in an instance group. We'll need to know what this group is. Here's how to do it from the command line.
$ export GROUP_ID=$(gcloud container clusters describe jcluster --format json | jq --raw-output '.instanceGroupUrls' | rev | cut -d'/' -f 1 | rev) $ echo $GROUP_ID gke-jcluster-default-pool-9cc4e660-grp
Let's check my instances:
$ gcloud compute instances list NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS gke-jcluster-default-pool-9cc4e660-rx9p us-central1-b n1-standard-1 10.128.0.2 18.104.22.168 RUNNING gke-jcluster-default-pool-9cc4e660-xr4z us-central1-b n1-standard-1 10.128.0.4 22.214.171.124 RUNNING
If I just run
gcloud compute instances delete that won't work! That's because I have an instance group of size 2 and if I delete one of the machines, GCE will start a new one. I have to use the
gcloud compute instance-groups managed delete-instances command, followed by
gcloud compute instance-groups managed wait-until-stable if I want to wait until the job is done.
Let's see how that looks like:
$ gcloud compute instance-groups managed delete-instances $GROUP_ID --instances=gke-jcluster-default-pool-9cc4e660-rx9p Updated [https://www.googleapis.com/compute/v1/projects/myhub-161019/zones/us-central1-b/instanceGroupManagers/gke-jcluster-default-pool-9cc4e660-grp]. --- baseInstanceName: gke-jcluster-default-pool-9cc4e660 creationTimestamp: '2017-03-25T02:52:22.040-07:00' currentActions: abandoning: 0 creating: 0 creatingWithoutRetries: 0 deleting: 1 none: 1 recreating: 0 refreshing: 0 restarting: 0 fingerprint: kUg7ggCEudY= id: '6475008099735012154' instanceGroup: gke-jcluster-default-pool-9cc4e660-grp instanceTemplate: gke-jcluster-default-pool-9cc4e660 kind: compute#instanceGroupManager name: gke-jcluster-default-pool-9cc4e660-grp selfLink: https://www.googleapis.com/compute/v1/projects/myhub-161019/zones/us-central1-b/instanceGroupManagers/gke-jcluster-default-pool-9cc4e660-grp targetSize: 1 zone: us-central1-b $ gcloud compute instance-groups managed wait-until-stable $GROUP_ID Waiting for group to become stable, current operations: deleting: 1 ... Waiting for group to become stable, current operations: deleting: 1 Group is stable
And indeed, we only have one node left now:
$ kl get nodes gcloud compute instanceNAME STATUS AGE gke-jcluster-default-pool-9cc4e660-xr4z Ready 2d $ gcloud compute instances list NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS gke-jcluster-default-pool-9cc4e660-xr4z us-central1-b n1-standard-1 10.128.0.4 126.96.36.199 RUNNING
So that's it. Regardless of whether you need to delete individual nodes, it's interesting to take a look at how you can do that.