AMP – A retrospective and post-mortem

It would be easy to blame the failure of AMP 2342 on being the first update we tried to release after recovering from COVID and call it a day, but in practice multiple things went wrong that meant this release was a bit of a mess that required cleaning up afterwards.

What was the actual problem?

The next ‘big’ AMP release was due to contain some new functionality to control how AMP handled networking in docker containers. Part of this was moving the toggle for using host mode networking away from being a global setting in ADS to be something that can be configured on a per-instance basis, with ADS merely controlling the default for new instances.

The problem was that in ADS, host mode networking was enabled by default. When the new field was added, it was false by default. Newly created instances didn’t suffer any problems because they were created with the setting enabled from the start.

But existing instances now had the wrong value, which prevented them from starting as a result of having the wrong network configuration.

This only affected Docker based instances on Linux. Non-docker instances and Windows users were unaffected.

How was this release able to pass testing?

While the new functionality was tested as part of the existing tests that verify the creation of new instances, there was no existing test that looked at how instances from the previous version were affected. It was human error in assuming that because the existing tests passed, that the new functionality was working as intended.

The tests that did verify upgrade scenarios did not take Docker into account.

How are we going to fix this?

We’re going to implement a number of new measures to prevent this kind of mistake from happening again:

  • The test suites are going to be updated to include more upgrade scenarios, taking the current mainline release and applying the version to be tested as an upgrade, and verifying that existing instances start correctly.
  • The existing test suites that tested upgrade scenarios will be updated to include additional tests for Docker based instances.

We’re also looking to introduce an LTS release stream that is updated less frequently and receives extended testing. Users will be able to ‘downgrade’ to this release should any problems arise in Mainline.

We’d like to apologise for the inconvenience and thank you all for your patience as we sort this out. We’ve since released AMP which addresses the issues above, and also repairs existing instances to undo the bad configuration change.