A visual guide on troubleshooting Kubernetes deployments

Cantaro86 | 240 points

Note that this flowchart includes monitoring, logging, and networking, in addition to typical app stuff. Of course it's complicated.

The article does contain a lot of useful information. I don't think it was intended to ride the "k8s is too complicated compared to vanilla X" bandwagon, despite the huge flowchart at the top.

organsnyder | 4 years ago

I like how it has quite a few boxes that say "no idea what the reason might be". Each one of them hides another chart as big as this one, if not bigger.

orthoxerox | 4 years ago

Fun fact, on GCP Kubernetes you can have green lightbulbs on every single dashboard, and your entire site can be down anyway.

Our CI/CD was leaking "review" deployments, I forgot about them until one day I upgraded a node and the entire site went down, even though everything was green. Turned out there is some sort of naximum amount of nginx entries in ingress and we were hitting it. That was some frantic debugging, solution was just to delete the spurious review deployments.

tinco | 4 years ago

I enjoyed petulantly answering every question with “no” and the funding the final state to be “Consult Stackoverflow”

nathanwh | 4 years ago

Is there a way to prevent Kubernetes from killing and restarting a pod (from a deployment) when you are debugging it with kubectl exec -it? I.E. inform Kubernetes I am using this pod, don’t restart it automatically.

nodesocket | 4 years ago

Haven't had to work with k8s yet, but that flow chart looks really detailed. It must have taken much effort to make it, that makes me wonder, for how long shall it remain current? Will 6 months down the line it become invalid in subtle manner that it can only add to the confusion.

billfruit | 4 years ago

What about kubectl randomly hanging and some PLEG errors in the kubelet log?

de_watcher | 4 years ago

Looking at this I think it would be really cool for someone to build a Wheel of Misfortune to help dry run some of these scenarios for debugging.

gravypod | 4 years ago

Tickles my funny bone every time I see the industry jump on a new bandwagon that starts simple and then inevitably adapts to the real world that the old technology was addressing. When are we gonna learn folks?

Props to the author for the chart. He's actually providing real value to all the suckers, err, SREs stuck dealing with this stuff.

fheyfhth14353 | 4 years ago

Yikes

jtdev | 4 years ago

I guess I can just print this out and show to people any time somebody wants to move to k8s.

StreamBright | 4 years ago