Removing a node from a Kubernetes cluster on GKE (Google Container Engine)

In this post, I'll describe how to remove a particular node from a Kubernetes cluster on GKE. Why would you want to do that? In my case, I'm running jupyterhub and I need to do that as part of implementing cluster scaling. That's probably a rare need, but it helped me understand more about the GCE structures behind a Kubernetes cluster.

So let's start. The first thing you need to do is:

Drain your node

Let's look at my nodes:

$ kl get nodes
NAME                                      STATUS    AGE
gke-jcluster-default-pool-9cc4e660-rx9p   Ready     1d
gke-jcluster-default-pool-9cc4e660-xr4z   Ready     2d

I want to remove rx9p. I'll first drain it:

$ kl drain gke-jcluster-default-pool-9cc4e660-rx9p  --force
node "gke-jcluster-default-pool-9cc4e660-rx9p" cordoned
error: pods with local storage (use --delete-local-data to override): jupyter-petko-1

Great, the node is now drained. Next is:

Removing the GCE VM

Your Kubernetes cluster runs in an instance group. We'll need to know what this group is. Here's how to do it from the command line.

$ export GROUP_ID=$(gcloud container clusters describe jcluster --format json | jq  --raw-output '.instanceGroupUrls[0]' | rev | cut -d'/' -f 1 | rev)
$ echo $GROUP_ID
gke-jcluster-default-pool-9cc4e660-grp

Let's check my instances:

$ gcloud compute instances list
NAME                                     ZONE           MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP      STATUS
gke-jcluster-default-pool-9cc4e660-rx9p  us-central1-b  n1-standard-1               10.128.0.2   104.198.174.222  RUNNING
gke-jcluster-default-pool-9cc4e660-xr4z  us-central1-b  n1-standard-1               10.128.0.4   104.197.237.135  RUNNING

If I just run gcloud compute instances delete that won't work! That's because I have an instance group of size 2 and if I delete one of the machines, GCE will start a new one. I have to use the gcloud compute instance-groups managed delete-instances command, followed by gcloud compute instance-groups managed wait-until-stable if I want to wait until the job is done.

Let's see how that looks like:

$ gcloud compute instance-groups managed delete-instances $GROUP_ID --instances=gke-jcluster-default-pool-9cc4e660-rx9p
Updated [https://www.googleapis.com/compute/v1/projects/myhub-161019/zones/us-central1-b/instanceGroupManagers/gke-jcluster-default-pool-9cc4e660-grp].
---
baseInstanceName: gke-jcluster-default-pool-9cc4e660
creationTimestamp: '2017-03-25T02:52:22.040-07:00'
currentActions:
  abandoning: 0
  creating: 0
  creatingWithoutRetries: 0
  deleting: 1
  none: 1
  recreating: 0
  refreshing: 0
  restarting: 0
fingerprint: kUg7ggCEudY=
id: '6475008099735012154'
instanceGroup: gke-jcluster-default-pool-9cc4e660-grp
instanceTemplate: gke-jcluster-default-pool-9cc4e660
kind: compute#instanceGroupManager
name: gke-jcluster-default-pool-9cc4e660-grp
selfLink: https://www.googleapis.com/compute/v1/projects/myhub-161019/zones/us-central1-b/instanceGroupManagers/gke-jcluster-default-pool-9cc4e660-grp
targetSize: 1
zone: us-central1-b
$ gcloud compute instance-groups managed wait-until-stable $GROUP_ID
Waiting for group to become stable, current operations: deleting: 1
...
Waiting for group to become stable, current operations: deleting: 1
Group is stable

And indeed, we only have one node left now:

$ kl get nodes
gcloud compute instanceNAME                                      STATUS    AGE
gke-jcluster-default-pool-9cc4e660-xr4z   Ready     2d
$ gcloud compute instances list
NAME                                     ZONE           MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP      STATUS
gke-jcluster-default-pool-9cc4e660-xr4z  us-central1-b  n1-standard-1               10.128.0.4   104.197.237.135  RUNNING

So that's it. Regardless of whether you need to delete individual nodes, it's interesting to take a look at how you can do that.

social