Removing a node from a Kubernetes cluster on GKE (Google Container Engine)
In this post, I'll describe how to remove a particular node from a Kubernetes cluster on GKE. Why would you want to do that? In my case, I'm running jupyterhub and I need to do that as part of implementing cluster scaling. That's probably a rare need, but it helped me understand more about the GCE structures behind a Kubernetes cluster.
So let's start. The first thing you need to do is:
Drain your node
Let's look at my nodes:
$ kl get nodes
NAME STATUS AGE
gke-jcluster-default-pool-9cc4e660-rx9p Ready 1d
gke-jcluster-default-pool-9cc4e660-xr4z Ready 2d
I want to remove rx9p. I'll first drain it:
$ kl drain gke-jcluster-default-pool-9cc4e660-rx9p --force
node "gke-jcluster-default-pool-9cc4e660-rx9p" cordoned
error: pods with local storage (use --delete-local-data to override): jupyter-petko-1
Great, the node is now drained. Next is:
Removing the GCE VM
Your Kubernetes cluster runs in an instance group. We'll need to know what this group is. Here's how to do it from the command line.
$ export GROUP_ID=$(gcloud container clusters describe jcluster --format json | jq --raw-output '.instanceGroupUrls[0]' | rev | cut -d'/' -f 1 | rev)
$ echo $GROUP_ID
gke-jcluster-default-pool-9cc4e660-grp
Let's check my instances:
$ gcloud compute instances list
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
gke-jcluster-default-pool-9cc4e660-rx9p us-central1-b n1-standard-1 10.128.0.2 104.198.174.222 RUNNING
gke-jcluster-default-pool-9cc4e660-xr4z us-central1-b n1-standard-1 10.128.0.4 104.197.237.135 RUNNING
If I just run gcloud compute instances delete
that won't work! That's because I have an instance group of size 2 and if I delete one of the machines, GCE will start a new one. I have to use the gcloud compute instance-groups managed delete-instances
command, followed by gcloud compute instance-groups managed wait-until-stable
if I want to wait until the job is done.
Let's see how that looks like:
$ gcloud compute instance-groups managed delete-instances $GROUP_ID --instances=gke-jcluster-default-pool-9cc4e660-rx9p
Updated [https://www.googleapis.com/compute/v1/projects/myhub-161019/zones/us-central1-b/instanceGroupManagers/gke-jcluster-default-pool-9cc4e660-grp].
---
baseInstanceName: gke-jcluster-default-pool-9cc4e660
creationTimestamp: '2017-03-25T02:52:22.040-07:00'
currentActions:
abandoning: 0
creating: 0
creatingWithoutRetries: 0
deleting: 1
none: 1
recreating: 0
refreshing: 0
restarting: 0
fingerprint: kUg7ggCEudY=
id: '6475008099735012154'
instanceGroup: gke-jcluster-default-pool-9cc4e660-grp
instanceTemplate: gke-jcluster-default-pool-9cc4e660
kind: compute#instanceGroupManager
name: gke-jcluster-default-pool-9cc4e660-grp
selfLink: https://www.googleapis.com/compute/v1/projects/myhub-161019/zones/us-central1-b/instanceGroupManagers/gke-jcluster-default-pool-9cc4e660-grp
targetSize: 1
zone: us-central1-b
$ gcloud compute instance-groups managed wait-until-stable $GROUP_ID
Waiting for group to become stable, current operations: deleting: 1
...
Waiting for group to become stable, current operations: deleting: 1
Group is stable
And indeed, we only have one node left now:
$ kl get nodes
gcloud compute instanceNAME STATUS AGE
gke-jcluster-default-pool-9cc4e660-xr4z Ready 2d
$ gcloud compute instances list
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
gke-jcluster-default-pool-9cc4e660-xr4z us-central1-b n1-standard-1 10.128.0.4 104.197.237.135 RUNNING
So that's it. Regardless of whether you need to delete individual nodes, it's interesting to take a look at how you can do that.