Dev Therapy part II: Recoveries

Also check out Dev Therapy I: How to not get stuck (as a solo dev)

I’ve never really thought of myself as a programmer. Most of my experience over the last 20+ years has been either as a consultant or building my own tools and products.

Working as a consultant, I specialized in identifying and resolving performance issues for teams that were stuck or overwhelmed as they were scaling up.

The devs were often smarter than me. They definitely knew their code better than me. And they sometimes knew their frameworks better than me. But I had the advantage of seeing things with fresh eyes. And more importantly: because I was unencumbered by internal pressures like product deadlines, I had the patience to employ a process that had some amount of rigor.

When I built my own products in the past, programming was just the dirty means, not the end. I found feature dev boring and laborious. Ironically (given my day job) I didn’t have much patience for it.

Now that I’ve been working on a product in C++ for…. 3 years (!), this topic has been back on my mind. I fail a LOT. I refactor a LOT.

And that can be disheartening. Failures cascade. I can move into a mental space that catastrophizes these failures: I’ll never get to release. I’ll never be able to make something at the quality level I want. I’m not good enough or fast enough or resilient enough to make it.

Luckily I’m not in that mental space often. But there are lows and highs. Especially in those lows, I’ve found it valuable to dig a bit deeper to ensure I recover more gracefully from failure.

Recoveries

When failure happens, I’ve been trying to take a step back and be mindful of it as a part of the process.

It helps me to think about it generally. This is the koolaid I’ve been selling myself: Recovering from failure is an underrated but critical skill of almost every human endeavor.

Recovering can be defined as “playing through mistakes” or “minimizing the impact of a mistake” or “transmogrifying failure to success.”

Or the visceral “get back up and try again.”

Multiple failures in a row can be disheartening. It can be something I laugh off. Something to learn from. Definitely depends on my mental.

Recoveries in Music Performance

Any beginner learning a musical instrument tends to get tripped up and stop playing when they make a mistake. Surprise, disappointment and frustration is normal. Flow state is guaranteed to be interrupted.

Learning how to “play through” those mistakes involves making so many mistakes that a mistake is no longer an “event.”

When it happens, it’s no longer a surprise. It’s a natural part of practicing, a natural part of performing. It’s informational (need to practice your scales again, or work on those arpeggios). A fact of life.

Getting to this level requires honing the skill to observe and correct oneself (by “landing” or transitioning to a “good” note) in real time, without interrupting the flow state.

An experienced Jazz improviser might no longer consider a wrong note to be “wrong”. It can be re-contextualized as “interesting” or “innovative,” an opportunity to explore a new direction.

Recoveries in Sports

In skateboarding, learning to fall is critical.

One could argue recovery is the most important skateboarding skill to improve in the long term. Injuries set people back for months or years.

In eSports, another real-time performance domain, recoveries are critical:

If you are able to recover, you can go for riskier things and be able to get back without conceding. It really opens up your possibilities

JOREUZ, professional Rocket League player

Recoveries in Dev

Dev isn’t “real-time mechanical” like musical or athletic performance. It’s not imperative that you mechanically type class or def with ripping speed and crystal clean accuracy.

But I’m convinced the human brain is subject to the same tendencies when learning and applying our skillset.

Our days are spent trying to convince a computer to do something. Often the computer says no. Our job is to work around known and unknown limitations.

We are almost always learning new things, hitting roadblocks, getting stuck. We have dedicated tools to help us jump these hurdles like Stack Overflow and GitHub Copilot.

In other words, we live in a world of trial and error.

Our ability to recover from that error defines both our success as well as the amount of time we sink into the trials.

Dev Recoveries

In dev, I think of recovering as “the actions you take when you hit a problem.”

A debugging example: You are building a feature. In order to build it, you need to integrate a new-to-you library or technique. This seems straightforward…

Until you hit your first problem. An esoteric error. A lack of familiarity with the tool means you don’t even know where your problem is.

It could be a problem with your mental model (how you thought the software would work), a mechanical issue (typo), an ecosystem issue (version mismatch), or a bug with the framework (unlikely if you are a beginner).

How do you proceed? How do you make sure you don’t lose hours or days down the rabbit hole, ending in frustration or worse, giving up on something entirely?

Employing a bit of rigor

Here’s the general flow:

  1. Expect problems to arise
  2. When they do, identify that it’s time to “recover”
  3. Refine your ability to recover efficiently from various types of problems to reduce future impact

This isn’t a novel idea. Any process with some amount of rigor and repeatability is likely to outperform the lazy and messy “poke at it with a stick for a few hours.” Although that can be fun too.

Recovering well is more about remembering the cheesy adage that failure presents an opportunity for improvement.

I don’t always remember right away. Sometimes I “wake up” and realize I’ve been stick-poking for an hour before I catch myself and start to employ some rigor.

Example Dev Recovery Workflow

Here’s the process I used as a consultant, debugging complex system performance:

1. Identify that a clear particular problem exists. Sometimes we can just plow forward, not really consciously admitting there’s a roadblock, temporarily working around it, etc. Explicitly giving a name to the problem (or problems!!) is step 1.

2. Gather data about when the problem exists. Problems are often fuzzy. Something’s not working right. Identifying when the problem occurs is critical to understanding the issue. For example, maybe a web app only goes down during high traffic periods or only at midnight. Or your audio plugin crashes, but only when loading presets from old versions.

3. Gather data about where the problem could exist. This is often answered by the “when.” For the audio plugin example, it’s obvious there’s a problem in the preset loading logic. Ideally at this point, explicit reproduction of the issue is possible (though sometimes it’s never possible in complex systems).

4. State the problem out loud and try to reason through potential solutions. This is rubber duck debugging. For the audio plugin example: perhaps some new parameters were introduced, some old parameters were removed, some parameter was renamed, some parameter range was modified, or the serialization library was updated! This can be the fun part: you know the codebase well, you can place bets for which solution is most likely.

5. Clearly eliminate hypotheses, one at a time. This can be attempting quick fixes, stepping through the debugger (for C++), reviewing recent commits, or sure, just removing large chunks of code.

6. One a quick fix is found, assess what the ideal fix would be. Yes, this implies that the “quick fix” might not be sustainable. It may leave you with tech debt or in a precarious situation. This is where tests come in. A refactor might be necessary. If a big refactor is needed, it might make sense to decide to decide later.

7. Clearly document for future-you / your team what the issue was that lead to the fix. Whether it be a code comment, a commit message or on an issue tracker, documentation is the valuable fruit of your failure. Communicate it forward before it becomes a fuzzy memory (well I think it was something to do with the…)

Recovering well means doing a bit more

Whether it be jazz, skateboarding or dev, the general idea is to invest more, inline, at the point of failure. It requires employing hygiene, some process…

At minimum, it requires understanding the context around the failure.

This can be tough! You just spent 2 days banging your head against the wall feeling like an idiot. The last thing you want to do is work harder or read those damn API docs one more time.

But it probably just means another 30 minutes cleaning up, learning the topic a bit more deeply, thinking through the strategies you employed, noting what to do and what not to do next time. Oh, and documentation!

This extra effort inline makes all the difference. Future-you and/or your teammates will thank you.

Limiting trial and error

As a consultant, it was clear that not only are most of us hoping to “poke it with a stick,” but we also want to “throw random quick fixes at the problem.”

If there are 4 hypothesis for a bug, we’ll work on all 4 fronts at the same time and perhaps even commit 2 or 3 “fixes” in blind hopes the problem will be solved.

Codebase complexity aside, this eliminates learning opportunities from the failure. Which is a bummer! If the problem is fixed, yay! But we’re still robbed of a clear understanding of why it was fixed and how to prevent future failure. If it wasn’t fixed, a new concoction of “quick fixes” might be be cooked up, putting us in a spot where we can’t clearly eliminate individual hypotheses.

This is subtle, but important: when problems don’t have clear resolutions, they all tend to feel “unique”: we lose the ability to generalize about the codebase, our techniques, processes or mental models. We don’t really learn that much from resolving them.

So eliminating one hypothesis at a time is critical and is probably the biggest difference-maker when recovering from bugs and issues.

Something I’m failing at when working solo these days: Committing every “working” WIP state to git. Sometimes I break something, I look at my working tree and it turns out I went the whole day without committing! I have 2 half-features and 5 bug fixes peppered around the code base — a lot of extra mental overhead when looking for the where of the problem.

Automating “trial and error”

Another way to recover well is to automate trial and error so you only become involved in more difficult cases. (Yeah, tests!)

I’ve consulted for folks who just add tests alongside random quick fixes but aren’t able to say why something broke or why it was fixed. I’ve done it before. However a good suite will enumerate through expected behaviors and help clarify the mental model, making the why easier to obtain.

Although less prevalent in the Digital Signal Processing world, I’ve found tests absolutely crucial and underrated. The entire output of my program is just a stream of float data! Ideal for tests.

This synthesizer I’m building is one of the most complex things I’ve ever built. It would be so much more difficult to identify or debug changing behavior without tests. Plus now I can launch a debugger from a tests to step through myriad different scenarios, which helps reproduction.

Micro-recoveries!

This is the subtle but actual impetus for this post.

Sometimes my reaction when developing is to flinch if I hit an error state (for example, not being able to compile).

Or maybe my heart drops a bit when I look at a section of code that is more fleshed out in my mind than in my IDE.

Or maybe I’m using my product (aka making music, lol) and a bug annoys me and bumps me out of my flow state.

All perfectly normal reactions! But I’ve been thinking lately that those very quick reactions and impulses are the key to all of this. They can build up and make development feel heavy, like a chore. Like a mountain of work is sitting in front of me, unchanging no matter how much effort is spent.

Or…. those moments can be seen as a factual part of the process. I can “play through” the bugs and failure states, each time learning a bit more about how to recover from them more gracefully.

It’s an additive beast with 1000 oscillators
and a ton of fun sound shaping tools

Check it out

Responses

  1. Kris Keillor Avatar

    Thank you Sudara, great reminders and a wonderful detailed perspective on the SMART steps to take when recovering from a bug (or other challenge). This is the kind of article that is begging to be an infographic. Maybe you can make it in Melatonin Blur ;P

    1. sudara Avatar

      Thanks Kris! Infographic is a great idea!!

  2. Eloi Avatar
    Eloi

    As a graduate musician who works as a freelance software developer and has skateboarding as a hobby (it blew my mind when I saw the subtitle of the article…), I can totally relate from beginning to end.
    Specifically, I’m right in the process of steering my software career into audio development, so I need to be ready to handle a lot of failures in the near future… your article felt just perfect at this moment.

    Thanks for your words, I’ll take the #micro-recoveries hashtag with me as a reminder.

  3. Jim Avatar

    Thanks for writing this! I needed the reminder about this perspective. I’m running to a people problem, rather than a technical problem, at work, and started to feel myself catastrophizing. Maybe something can be learned here, though.

Leave a Reply

Your email address will not be published. Required fields are marked *