In my experience of debugging large-scale systems, it always starts exactly the same way: with me staring at a bug asking, “howtf did that happen?”

This is a collection of deep dives into those exact moments. I will try to de-mystify the bugs that live at the deepest levels of the systems stack.

There is an immense, almost ridiculous pleasure in finally stripping away the magic and feeling that computers can actually be understood. Or, at least, I like to tell myself that until the next crash.


I’m Deep Shah. I’m currently building systems for agentic workflows at a stealth startup. Before this, I was at Meta working on large-scale pre-training infrastructure for Llama, collective communications libraries, and bringing up the GB200 training stack. Before Meta, I was at VMware on the Virtual Machine Monitor team, working on core virtualization.