Fail fast, fix fast

December 17th, 2023

I recently listened to a podcast that discussed Elon Musk and quoted something like, "If 20% of attempts aren't failing, you aren't taking enough risk".

In a software context, I'm not advocating that one in five production releases should fail, but I like trying new ideas and approaches.

If you're releasing small changes regularly or practising continuous deployment, changes are easy to revert if there's a problem or the smaller the deployment and the more recently the code was written, then it should be easier to resolve the issue and "fix forward" instead of rolling back.

Using feature flags lets you quickly turn off a feature flag while investigating and resolving the issue without needing another deployment.

If you have an appropriate plan to follow in the case of an issue, that mitigates the risk and minimises the impact of a potential issue - making it quicker to resolve and restore the service.

Two of the DORA metrics refer to failure rate and restoration time:

Deployment frequency
Lead time for changes
Change failure rate
Time to restore service

Then, it depends on your organisation's tolerance for risk and what's acceptable.

But, the more frequent the releases, the lower the failure rate and the quicker it will be to restore the service if there is an issue.

- Oliver

Was this interesting?

About me