Cloud Native Rejekts NA (Los Angeles + Hybrid) 2021

Troubleshooting Kubernetes CRDs is too damn hard
10-09, 16:30–17:00 (US/Pacific), Main stage

When something to do with your CRDs breaks, how many kubectl commands do you need to understand what’s happened?

This talk argues that “how many kubectls?” is a key metric that codifies the usability of any set of CRDs, and suggests some guidelines for CRD authors to manage this metric and improve the UX of their CRDs.


Here’s the situation. You’ve installed the latest piece of awesome Kubernetes tech, which comes with a set of CRDs. You set up everything you want, and then… nothing. What do you do? How do you figure out what’s gone wrong?

How many kubectl commands do you need to understand what’s happened? A lot of the time, this number is surprisingly large.

CRD designers can help by carefully designing the “status” section of CRDs with this metric in mind. Hear from a builder of CRDs about what they’ve learned about what to do and what not to do. From solving the murder mystery of troubleshooting Ingress, to the far better experience that comes from following the API conventions and making good use of Conditions, there’s a lot to be learned from prior art in this area.

The talk will finish up with some guidelines you can use for building your own CRDs, or reviewing others for ease of use.

Nick has been working to prevent the entropic downfall of systems for 20 years, across Windows and Linux, datacenters and clouds, networking, storage and compute. Currently he's a Staff Engineer at VMWare, and the tech lead on the CNCF Incubating Contour project, where in addition to his primary task of always having Simpsons quotes available, he works on improving the Ingress experience. He was a co-chair of the now-completed Kubernetes LTS working group. In his spare time, he spends time with his young family, then with whatever's left he works on maintaining his jack-of-all-geeks card. Random fact: he took notes at university in the Tengwar, Tolkien's elvish script.