Administrate ArangoDB cluster deployments
This section includes information related to the administration of an ArangoDB Cluster.
For a general introduction to the ArangoDB Cluster, please refer to the Cluster chapter.
There is also a detailed Cluster Administration Course for download.
Enabling synchronous replication
For an introduction about Synchronous Replication in Cluster, please refer to the Cluster Architecture section.
You can enable synchronous replication per collection. When you create a
collection, you may specify the number of replicas using the
replicationFactor
parameter. You can also adjust it later. The default value
is set to 1
, which effectively disables synchronous replication among
DB-Servers.
3
means
that there is one leader replica and two follower replicas, and that the data
exists three times in total.Whenever you specify a replication factor greater than 1
, synchronous
replication is activated for this collection. The Cluster determines suitable
leaders and followers for every requested shard (numberOfShards
) within
the Cluster.
An example of creating a collection in arangosh with a replication factor of
3
, requiring three replicas to report success for any write operation in this
collection:
db._create("test", { "replicationFactor": 3 })
The replicationFactor
value can be between the minimum and maximum
replication factor (inclusive) as defined by the following startup options:
The default replication factor for regular and for system collections is defined by the following startup options:
Preparing growth
You may create a collection with a higher replication factor than available DB-Servers. When additional DB-Servers become available, the shards are automatically replicated to the newly available DB-Servers.
You need to set the enforceReplicationFactor
option to false
when creating
a collection with a higher replication factor than available DB-Servers
(the default value is true
). For example, in arangosh you can pass
a third argument to the db._create()
method with this option:
db._create("test", { replicationFactor: 4 }, { enforceReplicationFactor: false });
This option is not available in the web interface.
Sharding
For an introduction about Sharding in Cluster, please refer to the Cluster Sharding section.
Number of shards can be configured at collection creation time, e.g. the UI, or the ArangoDB Shell:
db._create("sharded_collection", {"numberOfShards": 4});
To configure a custom hashing for another attribute (default is _key):
db._create("sharded_collection", {"numberOfShards": 4, "shardKeys": ["country"]});
The example above, where ‘country’ has been used as shardKeys can be useful to keep data of every country in one shard, which would result in better performance for queries working on a per country base.
It is also possible to specify multiple shardKeys
.
Note however that if you change the shard keys from their default ["_key"]
,
then finding a document in the collection by its primary key involves a request
to every single shard. However this can be mitigated: All CRUD APIs and AQL
support taking the shard keys as a lookup hint. Just make sure that the shard
key attributes are present in the documents you send, or in case of AQL, that
you use a document reference or an object for the UPDATE, REPLACE or REMOVE
operation which includes the shard key attributes:
FOR doc IN sharded_collection
FILTER doc._key == "123"
UPDATE doc WITH { … } IN sharded_collection
UPDATE { _key: "123", country: "…" } WITH { … } IN sharded_collection
Using a string with just the document key as key expression instead is processed without shard hints and thus perform slower:
UPDATE "123" WITH { … } IN sharded_collection
If custom shard keys are used, you can no longer specify the primary key value
for a new document, but must let the server generate one automatically. This
restriction comes from the fact that ensuring uniqueness of the primary key
would be very inefficient if the user could specify the document key.
If custom shard keys are used, trying to store documents with the primary key value
(_key
attribute) set results in a runtime error (“must not specify _key
for this collection”).
Unique indexes on sharded collections are only allowed if the fields used to determine the shard key are also included in the list of attribute paths for the index:
shardKeys | indexKeys | |
---|---|---|
a | a | allowed |
a | b | not allowed |
a | a, b | allowed |
a, b | a | not allowed |
a, b | b | not allowed |
a, b | a, b | allowed |
a, b | a, b, c | allowed |
a, b, c | a, b | not allowed |
a, b, c | a, b, c | allowed |
On which DB-Server in a Cluster a particular shard is kept is undefined. There is no option to configure an affinity based on certain shard keys.
Sharding strategy
Strategy to use for the collection. There are
different sharding strategies to select from when creating a new
collection. The selected shardingStrategy
value remains
fixed for the collection and cannot be changed afterwards. This is
important to make the collection keep its sharding settings and
always find documents already distributed to shards using the same
initial sharding algorithm.
The available sharding strategies are:
community-compat
: default sharding used by ArangoDB Community Edition before version 3.4enterprise-compat
: default sharding used by ArangoDB Enterprise Edition before version 3.4enterprise-smart-edge-compat
: default sharding used by smart edge collections in ArangoDB Enterprise Edition before version 3.4hash
: default sharding used for new collections starting from version 3.4 (excluding smart edge collections)enterprise-hash-smart-edge
: default sharding used for new smart edge collections starting from version 3.4enterprise-hex-smart-vertex
: sharding used for vertex collections of EnterpriseGraphs
If no sharding strategy is specified, the default is hash
for
all normal collections, enterprise-hash-smart-edge
for all smart edge
collections, and enterprise-hex-smart-vertex
for EnterpriseGraph
vertex collections (the latter two require the Enterprise Edition of ArangoDB).
Manually overriding the sharding strategy does not yet provide a
benefit, but it may later in case other sharding strategies are added.
The OneShard
feature does not have its own sharding strategy, it uses hash
instead.
Moving/Rebalancing shards
Rebalancing redistributes resources in the cluster to optimize resource allocation - shards and location of leaders/followers.
It aims to achieve, for example, a balanced load, fair shard distribution, and resiliency.
Rebalancing might occur, amongst other scenarios, when:
- There is a change in the number of nodes in the cluster and more (or fewer) resources are available to the cluster.
- There is a detectable imbalance in the distribution of shards (i.e. specific nodes holding high number of shards while others don’t) or in the distribution of leaders/followers across the nodes, resulting in computational imbalance on the nodes.
- There are changes in the number or size of data collections.
A shard can be moved from a DB-Server to another, and the entire shard distribution can be rebalanced using the corresponding buttons in the web UI.
You can also do any of the following by using the API:
- Calculate the current cluster imbalance.
- Compute a set of move shard operations to improve balance.
- Execute the given set of move shard operations.
- Compute a set of move shard operations to improve balance and immediately execute them.
For more information, see the Cluster section of the HTTP API documentation.
Replacing/Removing a Coordinator
Coordinators are effectively stateless and can be replaced, added and removed without more consideration than meeting the necessities of the particular installation.
To take out a Coordinator stop the
Coordinator’s instance by issuing kill -SIGTERM <pid>
.
About 15 seconds later, the cluster UI on any other Coordinator marks the Coordinator in question as failed. Almost simultaneously, the recycle bin icon appears to the right of the name of the Coordinator. Clicking that icon removes the Coordinator from the Coordinator registry.
Any new Coordinator instance that is informed of where to find any/all
Agent(s), --cluster.agency-endpoint
<some agent endpoint>
is
integrated as a new Coordinator into the cluster. You may also just
restart the Coordinator as before and it reintegrates itself into
the cluster.
Replacing/Removing a DB-Server
DB-Servers are where the data of an ArangoDB cluster is stored. They do not publish a web interface and are not meant to be accessed by any other entity than Coordinators to perform client requests or other DB-Servers to uphold replication and resilience.
The clean way of removing a DB-Server is to first relieve it of all
its responsibilities for shards. This applies to followers as well as
leaders of shards. The requirement for this operation is that no
collection in any of the databases has a replicationFactor
greater than
the current number of DB-Servers minus one. In other words, the highest
replication factor must not exceed the future DB-Server count. To clean
out DBServer004
, for example, you can issue the following command to
any Coordinator in the cluster:
curl <coord-ip:coord-port>/_admin/cluster/cleanOutServer -d '{"server":"DBServer004"}'
After the DB-Server has been cleaned out, you find the recycle bin icon to the right of the name of the DB-Server on any Coordinators’ UI. Clicking it removes the DB-Server in question from the cluster.
Firing up any DB-Server from a clean data directory, while specifying any of the available Agency endpoints, integrates the new DB-Server into the cluster.
To distribute shards onto the new DB-Server, go to the Distribution tab in the CLUSTER page of the web interface and click the Rebalance button at the bottom of the Shard distribution section.
The clean out process can be monitored using the following script, which periodically prints the amount of shards that still need to be moved. It is basically a countdown to when the process finishes.
Save the code below to a file named serverCleanMonitor.js
:
var dblist = db._databases();
var internal = require("internal");
var arango = internal.arango;
var server = ARGUMENTS[0];
var sleep = ARGUMENTS[1] | 0;
if (!server) {
print("\nNo server name specified. Provide it like:\n\narangosh <options> -- DBServerXXXX");
process.exit();
}
if (sleep <= 0) sleep = 10;
console.log("Checking shard distribution every %d seconds...", sleep);
var count;
do {
count = 0;
for (dbase in dblist) {
var sd = arango.GET("/_db/" + dblist[dbase] + "/_admin/cluster/shardDistribution");
var collections = sd.results;
for (collection in collections) {
var current = collections[collection].Current;
for (shard in current) {
if (current[shard].leader == server) {
++count;
}
}
}
}
console.log("Shards to be moved away from node %s: %d", server, count);
if (count == 0) break;
internal.wait(sleep);
} while (count > 0);
This script has to be executed in arangosh
by issuing the following command:
arangosh --server.username <username> --server.password <password> --javascript.execute <path/to/serverCleanMonitor.js> -- DBServer<number>
The output should be similar to the one below:
arangosh --server.username root --server.password pass --javascript.execute ~./serverCleanMonitor.js -- DBServer0002
[7836] INFO Checking shard distribution every 10 seconds...
[7836] INFO Shards to be moved away from node DBServer0002: 9
[7836] INFO Shards to be moved away from node DBServer0002: 4
[7836] INFO Shards to be moved away from node DBServer0002: 1
[7836] INFO Shards to be moved away from node DBServer0002: 0
The current status is logged every 10 seconds. You may adjust the
interval by passing a number after the DB-Server name, e.g.
arangosh <options> -- DBServer0002 60
for every 60 seconds.
Once the count is 0
, all shards of the underlying DB-Server have been moved
and the cleanOutServer
process has finished.