ArangoDB Starter Recovery Procedure
This procedure is intended to recover a cluster (that was started with the ArangoDB Starter) when a machine of that cluster is broken without the possibility to recover it (e.g. complete HD failure). In the procedure is does not matter if a replacement machine uses the old or a new IP address.
To recover from this scenario, you must:
- Create a new (replacement) machine with ArangoDB (including Starter) installed.
- Create a file called
RECOVERYin the directory you are going to use as data directory of the Starter (the one that is passed via the option
--starter.data-dir). This file must contain the IP address and port of the Starter that has been broken (and will be replaced with this new machine).
echo "192.168.1.25:8528" > $DATADIR/RECOVERY
After creating the
RECOVERY file, start the Starter using all the normal command
The Starter will now:
- Talk to the remaining Starters to find the ID of the Starter it replaces and use that ID to join the remaining Starters.
- Talk to the remaining Agents to find the ID of the Agent it replaces and adjust the command-line arguments of the Agent (it will start) to use that ID. This is skipped if the Starter was not running an Agent.
- Remove the
RECOVERYfile from the data directory.
The cluster will now recover automatically. It will however have one more Coordinators and DBServers than expected. Exactly one Coordinator and one DBServer will be listed "red" in the web UI of the database. They will have to be removed manually using the ArangoDB Web UI.