ArangoDB v3.13 is under development and not released yet. This documentation is not final and potentially incomplete.
HTTP interface for clusters
The cluster-specific endpoints let you get information about individual cluster nodes and the cluster as a whole, as well as monitor and administrate cluster deployments
Monitoring
The permissions required to use the /_admin/cluster/*
endpoints depends on the
setting of the --cluster.api-jwt-policy
startup option.
Get the statistics of a DB-Server
Get the cluster health
Queries the health of the cluster as assessed by the supervision (Agency) for
monitoring purposes. The response is a JSON object, containing the standard
code
, error
, errorNum
, and errorMessage
fields as appropriate.
The endpoint-specific fields are as follows:
ClusterId
: A UUID string identifying the clusterHealth
: An object containing a descriptive sub-object for each node in the cluster.<nodeID>
: Each entry inHealth
will be keyed by the node ID and contain the following attributes:Endpoint
: A string representing the network endpoint of the server.Role
: The role the server plays. Possible values are"AGENT"
,"COORDINATOR"
, and"DBSERVER"
.CanBeDeleted
: Boolean representing whether the node can safely be removed from the cluster.Version
: Version String of ArangoDB used by that node.Engine
: Storage Engine used by that node.Status
: A string indicating the health of the node as assessed by the supervision (Agency). This should be considered primary source of truth for Coordinator and DB-Servers node health. If the node is responding normally to requests, it is"GOOD"
. If it has missed one heartbeat, it is"BAD"
. If it has been declared failed by the supervision, which occurs after missing heartbeats for about 15 seconds, it will be marked"FAILED"
.
Additionally it will also have the following attributes for:
Coordinators and DB-Servers
SyncStatus
: The last sync status reported by the node. This value is primarily used to determine the value ofStatus
. Possible values include"UNKNOWN"
,"UNDEFINED"
,"STARTUP"
,"STOPPING"
,"STOPPED"
,"SERVING"
,"SHUTDOWN"
.LastAckedTime
: ISO 8601 timestamp specifying the last heartbeat received.ShortName
: A string representing the shortname of the server, e.g."Coordinator0001"
.Timestamp
: ISO 8601 timestamp specifying the last heartbeat received. (deprecated)Host
: An optional string, specifying the host machine if known.
Coordinators only
AdvertisedEndpoint
: A string representing the advertised endpoint, if set. (e.g. external IP address or load balancer, optional)
Agents
Leader
: ID of the Agent this node regards as leader.Leading
: Whether this Agent is the leader (true) or not (false).LastAckedTime
: Time since lastacked
in seconds.
Endpoints
List all Coordinator endpoints
endpoints
, which contains an
array of objects, which each have the attribute endpoint
, whose value
is a string with the endpoint description. There is an entry for each
Coordinator in the cluster. This method only works on Coordinators in
cluster mode. In case of an error the error
attribute is set to
true
.Cluster node information
Get the server ID
Get the server role
role
attribute of the result.200 OK
Is returned in all cases.
role* string
The server role. Possible values:
SINGLE
: the server is a standalone server without clusteringCOORDINATOR
: the server is a Coordinator in a clusterPRIMARY
: the server is a DB-Server in a clusterSECONDARY
: this role is not used anymoreAGENT
: the server is an Agency node in a clusterUNDEFINED
: in a cluster, this is returned if the server role cannot be determined.
Response Body application/json object
Maintenance
Set the cluster maintenance mode
Enable or disable the cluster supervision (Agency) maintenance mode.
This API allows to temporarily enable the supervision maintenance mode. Please be aware that no automatic failovers of any kind will take place while the maintenance mode is enabled. The cluster supervision reactivates itself automatically at some point after disabling it.
To enable the maintenance mode the request body must contain the string "on"
(Please note it must be lowercase as well as include the quotes). This will enable the
maintenance mode for 60 minutes, i.e. the supervision maintenance will reactivate itself
after 60 minutes.
Since ArangoDB 3.8.3 it is possible to enable the maintenance mode for a different
duration than 60 minutes, it is possible to send the desired duration value (in seconds)
as a string in the request body. For example, sending "7200"
(including the quotes) will enable the maintenance mode for 7200 seconds, i.e. 2 hours.
To disable the maintenance mode the request body must contain the string "off"
(Please note it must be lowercase as well as include the quotes).
Get the maintenance status of a DB-Server
Set the maintenance status of a DB-Server
Enable or disable the maintenance mode of a DB-Server.
For rolling upgrades or rolling restarts, DB-Servers can be put into maintenance mode, so that no attempts are made to re-distribute the data in a cluster for such planned events. DB-Servers in maintenance mode are not considered viable failover targets because they are likely restarted soon.
Rebalance
As of version 3.10, ArangoDB has built-in capabilities to rebalance the distribution of shards. This might become necessary since imbalances can lead to uneven disk usage across the DBServers (data unbalance) or to uneven load distribution across the DBServers (leader imbalance).
If the data is distributed relatively evenly across the DBServers, then the leader imbalance can usually be adjusted relatively cheaply, since you only have to transfer leadership for a number of shards to a different replica, which already has the data. This is not true in all cases, but as a rule of thumb it is true.
If, however, data needs to be moved between DBServers, then this is a costly and potentially lengthy operation. This is inevitable, but it has been made in this way so that this operation is done in the background and does not lead to service interruption.
Nevertheless, data movement requires I/O, CPU, and network resources and thus it always puts an additional load on the cluster.
Rebalancing shards is a rather complex optimization problem, in particular if there are many shards in the system. Fortunately, in most situations it is relatively easy to find operations to make good progress towards a better state, but “perfection” is hard, and finding a “cheap” way to get there is even harder.
The APIs described here try to help with the following approach: There is an “imbalance score” which is computed on a given shard distribution, which basically says how “imbalanced” the cluster is. This score involves leader imbalance as well as data imbalance. Higher score means more imbalance, the actual numerical value does not have any meaning.
The GET
API call can be used to evaluate this score and give back how
imbalanced the cluster currently is. The POST
API call does the same
and additionally computes a list of shard movements which the system
suggests to lower the imbalance score. A variant of the POST
API call
can then take this (or another) suggestion and execute it in the
background. For convenience, you can use the PUT
API call to do all at
once: compute the score, suggest moves, and execute them right away.
Since the execution can take some time, the GET
API call also
tells you how many of the moves are still outstanding.
There are three types of moves:
- Switch leadership of one shard from the leader to a follower, which is currently in sync. This is a fast move.
- Move the data of a leader to a new DBServer and make it the new leader. This is a slow move, since it needs to copy the data over the network and then switch the leadership.
- Move the data of a follower to a new DBServer and make it a new follower, then drop the data on the previous follower. This is a slow move, since it needs to copy the data over the network.
The suggestion engine behind the POST
and PUT
API calls has three
switches to activate/deactivate these three types of moves
independently. If a type of move is activated, the engine considers
all possible such moves, if it is deactivated, no such moves are
considered. The three flags are:
leaderChanges
(defaulttrue
): consider moves of type 1.moveLeaders
(defaultfalse
): consider moves of type 2.moveFollowers
(defaultfalse
): consider moves of type 3.
The engine then enumerates all possible moves of the various types.
The first choice is the one which improves the imbalance
the most. After that move, it reevaluates the imbalance score and
again look for the move which improves the imbalance score the most.
It altogether suggests up to maximumNumberOfMoves
moves and
then stops. The default value for this maximum is 1000
and it is capped at
5000
to avoid overly long optimization computations.
It is conceivable that for large clusters, 1000
or even 5000
might not be
enough to achieve a full balancing. In such cases, you simply have to
repeat the API calls potentially multiple times.
Other considerations
First, in the case of smart graphs or one shard databases, not all shards can be moved freely. Rather, some shards are “coupled” and can only move their place in the cluster or even their leadership together. This severely limits the possibilities of shard movement and sometimes makes a good balance impossible. A prominent example here is a single one shard database in the cluster. In this case, all leaders have to reside on the same server, so no good leader distribution is possible at all.
Secondly, the current implementation does not take actual shard sizes into account. It essentially works on the number of shards and tries to distribute the numbers evenly. It computes weights on the grounds of how many shards are “coupled” together, but it does not take actual data size into account. This means that it is possible that we get a “good” data distribution w.r.t. number of shards, but not with respect to their disk size usage.
Thirdly, the current implementation does not take compute load on different collections and shards into account. Therefore, it is possible that we end up with a shard distribution which distributes the leader numbers evenly across the cluster, even though the actual compute load is then unevenly distributed, since some collections/shard simply are hit by more queries than others.
How to use the rebalancing API
By far, the easiest way to rebalance a cluster is to simply call the
PUT
variant of the API, which analyzes the situation, comes up with a
plan to balance things out, and directly schedules it. To rebalance
leaders, you can use curl
like this:
curl -X PUT https://<endpoint>:<port>/_admin/cluster/rebalance -d '{"version":1}' --user root:<rootpassword>
You need admin rights, so you should use the user root
or another user
with write permissions on the _system
database. Alternatively, you can
use a header with a valid JWT token (for superuser access).
Since the default value for leaderChanges
is true
and for moveLeaders
and moveFollowers
is false
, this only schedules cheap leader
changes. So it can address leader imbalance, but not data imbalance.
You can monitor progress with this command:
curl https://<endpoint>:<port>/_admin/cluster/rebalance --user root:<rootpassword>
The resulting object looks roughly like this:
{
"error": false,
"code": 200,
"result": {
"leader": {
"weightUsed": [
51,
54,
53
],
"targetWeight": [
52.666666666666664,
52.666666666666664,
52.666666666666664
],
"numberShards": [
31,
54,
21
],
...
"imbalance": 1920000004.6666667
},
"shards": {
"sizeUsed": [
60817408,
106954752,
54525952
],
"targetSize": [
74099370.66666666,
74099370.66666666,
74099370.66666666
],
"numberShards": [
58,
102,
52
],
...
"imbalance": 1639005333138090.8
},
"pendingMoveShards": 0,
"todoMoveShards": 0
}
}
Of particular relevance are the two attributes pendingMoveShards
and
todoMoveShards
. These show how many move operations are still to do
(scheduled, but not begun), and how many are pending (scheduled,
started, but not yet finished). Once these two numbers have reached 0,
the rebalancing operation is finished.
In the leader
section you see stats about the distribution of the
leader shards, in the shards
section you see stats about the
distribution of the data (leader shards and follower shards). In both
sections, we see numbers for numberShards
and for the current
distribution (weightUsed
for leaders and sizeUsed
for shards), as
well as for the target distribution. Finally, the imbalance
number is
the “imbalance score”, its absolute value is not meaningful, but the
smaller the score, the better the balance is.
If you actually want to allow the system to move data to improve the data distribution, use this command:
curl -X PUT https://<endpoint>:<port>/_admin/cluster/rebalance -d '{"version":1, "moveLeaders": true, "moveFollowers": true}' --user root:<rootpassword>
This operation is monitored with the same GET
request as above, it is
expected that it takes considerably longer to finish.
There are a few more knobs to turn in this, but these should usually not be necessary and are intended only for expert use.
Note that both these API calls only ever schedules up to 1000 move shard jobs. For large data sets, you might want to repeat the call after completion.
Get the current cluster imbalance
Compute a set of move shard operations to improve balance
- The options for the rebalance plan.
piFactor number
A weighting factor that should remain untouched. (Default:
256e6
)If a collection has more shards than there are DB-Servers, there can be a subtle form of leader imbalance. Some DB-Servers may be responsible for more shards as leader than others. The
piFactor
adjusts how much weight such imbalances get in the overall imbalance score.
Execute a set of move shard operations
POST /_admin/cluster/rebalance
endpoint to calculate these operations to improve
the balance of shards, leader shards, and follower shards.Compute and execute a set of move shard operations to improve balance
- The options for the rebalancing.
piFactor number
A weighting factor that should remain untouched. (Default:
256e6
)If a collection has more shards than there are DB-Servers, there can be a subtle form of leader imbalance. Some DB-Servers may be responsible for more shards as leader than others. The
piFactor
adjusts how much weight such imbalances get in the overall imbalance score.