ArangoDB Starter Recovery Procedure

This procedure is intended to recover a cluster (that was started with the ArangoDB Starter) when a machine of that cluster is broken without the possibility to recover it (e.g. complete HD failure). In the procedure is does not matter if a replacement machine uses the old or a new IP address.

To recover from this scenario, you must:

  • Create a new (replacement) machine with ArangoDB (including Starter) installed.
  • Create a file called RECOVERY in the directory you are going to use as data directory of the Starter (the one that is passed via the option --starter.data-dir). This file must contain the IP address and port of the Starter that has been broken (and will be replaced with this new machine).

E.g.

echo "192.168.1.25:8528" > $DATADIR/RECOVERY

After creating the RECOVERY file, start the Starter using all the normal command line arguments.

The Starter will now:

  1. Talk to the remaining Starters to find the ID of the Starter it replaces and use that ID to join the remaining Starters.
  2. Talk to the remaining Agents to find the ID of the Agent it replaces and adjust the command-line arguments of the Agent (it will start) to use that ID. This is skipped if the Starter was not running an Agent.
  3. Remove the RECOVERY file from the data directory.

The cluster will now recover automatically. It will however have one more Coordinators and DBServers than expected. Exactly one Coordinator and one DBServer will be listed "red" in the web UI of the database. They will have to be removed manually using the ArangoDB Web UI.