Fail fast, fix fast

I recently listened to a podcast that discussed Elon Musk and quoted something like, "If 20% of attempts aren't failing, you aren't taking enough risk".

In a software context, I'm not advocating that one in five production releases should fail, but I like trying new ideas and approaches.

If you're releasing small changes regularly or practising continuous deployment, changes are easy to revert if there's a problem or the smaller the deployment and the more recently the code was written, then it should be easier to resolve the issue and "fix forward" instead of rolling back.

Using feature flags lets you quickly turn off a feature flag while investigating and resolving the issue without needing another deployment.

If you have an appropriate plan to follow in the case of an issue, that mitigates the risk and minimises the impact of a potential issue - making it quicker to resolve and restore the service.

Two of the DORA metrics refer to failure rate and restoration time:

  • Deployment frequency
  • Lead time for changes
  • Change failure rate
  • Time to restore service

Then, it depends on your organisation's tolerance for risk and what's acceptable.

But, the more frequent the releases, the lower the failure rate and the quicker it will be to restore the service if there is an issue.

- Oliver

Was this interesting?

Sign up here and get more like this delivered straight to your inbox every day.

About me

Picture of Oliver

I'm an Acquia-certified Drupal Triple Expert with 17 years of experience, an open-source software maintainer and Drupal core contributor, public speaker, live streamer, and host of the Beyond Blocks podcast.