ADS Randomly Shuts Down After Upgrade to 2.3.2.0

ScienceDad02 · 1 February 2022 19:36

OS Name/Version: Ubuntu 20.04 LXC (Proxmox 7.1)

Product Name/Version: 2.3.2.0

Problem Description:

I’m running 6 nodes, 1 front end for management/auth and 5 nodes for gameservers I host internally on the network. Of the 5, this one specific node will not keep ADS01 running. For the life of me, I can’t figure out what’s sending the SIGUSR1 to shut it down.

[19:05:02] [Core Info]            : OS: Linux / x86_64
[19:05:02] [Core Info]            : CPU: Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz (6C/0T)
[19:05:02] [Core Info]            : AMP Instance ID: aea0c3af-dbd5-4b24-abbe-342de6b37ce8
[19:05:02] [ModuleLoader Info]    : Loaded ADSModule version 1.0.0.0 by CubeCoders Limited
[19:05:02] [Core Info]            : Metrics publishing is enabled at udp://amp.lahni.com:12820.
[19:05:02] [ModuleLoader Info]    : Loaded FileManagerPlugin version 1.0.0.0 by CubeCoders Limited
[19:05:02] [ModuleLoader Info]    : Loaded EmailSenderPlugin version 1.0.0.0 by CubeCoders Limited
[19:05:02] [ModuleLoader Info]    : Loaded WebRequestPlugin version 1.0.0.0 by CubeCoders Limited
[19:05:02] [ModuleLoader Info]    : Loaded LocalFileBackupPlugin version 1.0.0.0 by CubeCoders Limited
[19:05:02] [ModuleLoader Info]    : Loaded CommonCorePlugin version 1.0.0.0 by CubeCoders Limited
[19:05:02] [ModuleLoader Info]    : ADSModule requests dependency InstanceManagerPlugin...
[19:05:03] [ModuleLoader Info]    : Loaded InstanceManagerPlugin version 1.0.0.0 by CubeCoders Limited
[19:05:03] [ModuleLoader Info]    : ADSModule requests dependency SystemUserManagerPlugin...
[19:05:03] [ModuleLoader Info]    : Loaded SystemUserManagerPlugin version 1.0.0.0 by CubeCoders Limited
[19:05:03] [ModuleLoader Info]    : Loaded steamcmdplugin version 1.0.0.0 by CubeCoders Limited
[19:05:03] [Core Info]            : Metrics server started OK on port 12820
[19:05:03] [FileManager Notice]   : Using keypair with fingerprint <Redacted>
[19:05:03] [FileManager Info]     : SFTP Server started on 0.0.0.0:2223
[19:05:03] [WebServer Info]       : Websockets are enabled.
[19:05:03] [WebServer Info]       : Webserver started on http://0.0.0.0:8080
[19:05:03] [Logger Warning]       : RouterTimer@10Hz with 2 jobs started
[19:05:03] [Core Info]            : Checking for AMP updates...
[19:05:04] [Core Info]            : AMP is up-to-date.
[19:06:00] [RemoteAMPAuth:Anonymous Activity] : Authentication attempt for user AMP_SYSTEM from 192.168.65.60
[19:07:08] [RemoteAMPAuth:Anonymous Activity] : Authentication success
[19:07:17] [API:AMP_SYSTEM Activity] : Changing setting ADSModule.NewInstanceDefaults.DefaultAuthServerURL to http://192.168.65.60:8080/
[19:11:34] [Core Info]            : Stop requested: SIGUSR1 Recieved
[19:11:34] [Core Notice]          : AMP shutdown requested.
[19:11:34] [Core Notice]          : Stopping Application...
[19:11:39] [Core Notice]          : Stopping Web Server...
[19:11:39] [WebServer Info]       : Web server shutdown.
[19:11:39] [Core Notice]          : Goodbye!
Rest in peace - Stephen Hawking 1942-2018
[19:11:39] [Logger Warning]       : RouterTimer@10Hz stopped

Steps to reproduce:

ADS01 successfully starts on boot. 2 more instances also boot successfully. A few minutes will go by and ADS01 will automatically shut itself down
I just started it manually using ampinstmgr -a to see if it’s just at system startup or if it’ll die this way too. So far, when started manually, it’s staying up.

Actions taken to resolve so far: Tried to see if there was any obvious error messages in the logs, but nothing that jumped out at me.

Mike · 2 February 2022 12:08

What else is installed on the system? SIGUSR1 is the signal AMP uses to request a clean shutdown but other processes shouldn’t be randomly sending signals to other processes.

ScienceDad02 · 2 February 2022 14:51

Yea, it’s definitely not crashing. It’s shutting down cleanly and for the life of me I cannot figure out why. And I can now confirm it’s only happening on the initial startup after a reboot. If I start the process manually, it’ll continue to run without any issues.

There is nothing else running in this container other than 2 minecraft instances within AMP. It’s an Ubuntu container sitting inside Proxmox. For some added topology, I have 6 hosts: amp (head node), gs01, gs02 (this container), gs03, gs04, gs05. They’re all built identically but live on different Proxmox nodes and only after updating does gs02 exhibit this behavior. I’ll try to dive into gs02’s system logs to see if the issue lies there.

Is there any easy way to migrate non-ADS01 instances to other nodes? For example, how easy is it to move my minecraft instances on gs02 to gs05? Just scp the folders? That way I can see if I can either recreate the issue on gs05 or just rebuild gs02 without having to spend a bunch of time hunting a ghost.

Mike · 2 February 2022 22:22

You can copy the instance directory over then run ampinstmgr repair

ScienceDad02 · 4 February 2022 07:20

Bingo, back up and running. No idea what the kink was but that resolved it easily enough.