Replication applier commands
The applier commands allow to remotely start, stop, and query the state and configuration of an ArangoDB database’s replication applier.
Get the replication applier configuration
Returns the configuration of the replication applier.
The body of the response is a JSON object with the configuration. The following attributes may be present in the configuration:
endpoint
: the logger server to connect to (e.g. “tcp://192.168.173.13:8529”).database
: the name of the database to connect to (e.g. “_system”).username
: an optional ArangoDB username to use when connecting to the endpoint.password
: the password to use when connecting to the endpoint.maxConnectRetries
: the maximum number of connection attempts the applier will make in a row. If the applier cannot establish a connection to the endpoint in this number of attempts, it will stop itself.connectTimeout
: the timeout (in seconds) when attempting to connect to the endpoint. This value is used for each connection attempt.requestTimeout
: the timeout (in seconds) for individual requests to the endpoint.chunkSize
: the requested maximum size for log transfer packets that is used when the endpoint is contacted.autoStart
: whether or not to auto-start the replication applier on (next and following) server startsadaptivePolling
: whether or not the replication applier will use adaptive polling.includeSystem
: whether or not system collection operations will be appliedautoResync
: whether or not the follower should perform a full automatic resynchronization with the leader in case the leader cannot serve log data requested by the follower, or when the replication is started and no tick value can be found.autoResyncRetries
: number of resynchronization retries that will be performed in a row when automatic resynchronization is enabled and kicks in. Setting this to0
will effectively disableautoResync
. Setting it to some other value will limit the number of retries that are performed. This helps preventing endless retries in case resynchronizations always fail.initialSyncMaxWaitTime
: the maximum wait time (in seconds) that the initial synchronization will wait for a response from the leader when fetching initial collection data. This wait time can be used to control after what time the initial synchronization will give up waiting for a response and fail. This value is relevant even for continuous replication whenautoResync
is set totrue
because this may re-start the initial synchronization when the leader cannot provide log data the follower requires. This value will be ignored if set to0
.connectionRetryWaitTime
: the time (in seconds) that the applier will intentionally idle before it retries connecting to the leader in case of connection problems. This value will be ignored if set to0
.idleMinWaitTime
: the minimum wait time (in seconds) that the applier will intentionally idle before fetching more log data from the leader in case the leader has already sent all its log data. This wait time can be used to control the frequency with which the replication applier sends HTTP log fetch requests to the leader in case there is no write activity on the leader. This value will be ignored if set to0
.idleMaxWaitTime
: the maximum wait time (in seconds) that the applier will intentionally idle before fetching more log data from the leader in case the leader has already sent all its log data and there have been previous log fetch attempts that resulted in no more log data. This wait time can be used to control the maximum frequency with which the replication applier sends HTTP log fetch requests to the leader in case there is no write activity on the leader for longer periods. This configuration value will only be used if the optionadaptivePolling
is set totrue
. This value will be ignored if set to0
.requireFromPresent
: if set totrue
, then the replication applier will check at start whether the start tick from which it starts or resumes replication is still present on the leader. If not, then there would be data loss. IfrequireFromPresent
istrue
, the replication applier will abort with an appropriate error message. If set tofalse
, then the replication applier will still start, and ignore the data loss.verbose
: if set totrue
, then a log line will be emitted for all operations performed by the replication applier. This should be used for debugging replication problems only.restrictType
: the configuration forrestrictCollections
restrictCollections
: the optional array of collections to include or exclude, based on the setting ofrestrictType
Examples
curl --header 'accept: application/json' --dump - 'http://localhost:8529/_api/replication/applier-config'
Update the replication applier configuration
Sets the configuration of the replication applier. The configuration can only be changed while the applier is not running. The updated configuration will be saved immediately but only become active with the next start of the applier.
In case of success, the body of the response is a JSON object with the updated configuration.
adaptivePolling* boolean
if set to
true
, the replication applier will fall to sleep for an increasingly long period in case the logger server at the endpoint does not have any more replication events to apply. Using adaptive polling is thus useful to reduce the amount of work for both the applier and the logger server for cases when there are only infrequent changes. The downside is that when using adaptive polling, it might take longer for the replication applier to detect that there are new replication events on the logger server.Setting
adaptivePolling
to false will make the replication applier contact the logger server in a constant interval, regardless of whether the logger server provides updates frequently or seldom.autoResyncRetries integer
number of resynchronization retries that will be performed in a row when automatic resynchronization is enabled and kicks in. Setting this to
0
will effectively disableautoResync
. Setting it to some other value will limit the number of retries that are performed. This helps preventing endless retries in case resynchronizations always fail.idleMaxWaitTime integer
the maximum wait time (in seconds) that the applier will intentionally idle before fetching more log data from the leader in case the leader has already sent all its log data and there have been previous log fetch attempts that resulted in no more log data. This wait time can be used to control the maximum frequency with which the replication applier sends HTTP log fetch requests to the leader in case there is no write activity on the leader for longer periods. This configuration value will only be used if the option
adaptivePolling
is set totrue
. This value will be ignored if set to0
.idleMinWaitTime integer
the minimum wait time (in seconds) that the applier will intentionally idle before fetching more log data from the leader in case the leader has already sent all its log data. This wait time can be used to control the frequency with which the replication applier sends HTTP log fetch requests to the leader in case there is no write activity on the leader. This value will be ignored if set to
0
.initialSyncMaxWaitTime integer
the maximum wait time (in seconds) that the initial synchronization will wait for a response from the leader when fetching initial collection data. This wait time can be used to control after what time the initial synchronization will give up waiting for a response and fail. This value is relevant even for continuous replication when
autoResync
is set totrue
because this may re-start the initial synchronization when the leader cannot provide log data the follower requires. This value will be ignored if set to0
.requireFromPresent* boolean
if set to
true
, then the replication applier will check at start whether the start tick from which it starts or resumes replication is still present on the leader. If not, then there would be data loss. IfrequireFromPresent
istrue
, the replication applier will abort with an appropriate error message. If set tofalse
, then the replication applier will still start, and ignore the data loss.
Examples
curl -X PUT --header 'accept: application/json' --data-binary @- --dump - 'http://localhost:8529/_api/replication/applier-config' <<'EOF'
{
"endpoint": "tcp://127.0.0.1:8529",
"username": "replicationApplier",
"password": "applier1234@foxx",
"chunkSize": 4194304,
"autoStart": false,
"adaptivePolling": true
}
EOF
Start the replication applier
Starts the replication applier. This will return immediately if the replication applier is already running.
If the replication applier is not already running, the applier configuration will be checked, and if it is complete, the applier will be started in a background thread. This means that even if the applier will encounter any errors while running, they will not be reported in the response to this method.
To detect replication applier errors after the applier was started, use the
/_api/replication/applier-state
API instead.
Examples
curl -X PUT --header 'accept: application/json' --dump - 'http://localhost:8529/_api/replication/applier-start'
Stop the replication applier
Examples
curl -X PUT --header 'accept: application/json' --dump - 'http://localhost:8529/_api/replication/applier-stop'
Get the replication applier state
Returns the state of the replication applier, regardless of whether the applier is currently running or not.
The response is a JSON object with the following attributes:
state
: a JSON object with the following sub-attributes:running
: whether or not the applier is active and runninglastAppliedContinuousTick
: the last tick value from the continuous replication log the applier has applied.lastProcessedContinuousTick
: the last tick value from the continuous replication log the applier has processed.Regularly, the last applied and last processed tick values should be identical. For transactional operations, the replication applier will first process incoming log events before applying them, so the processed tick value might be higher than the applied tick value. This will be the case until the applier encounters the transaction commit log event for the transaction.
lastAvailableContinuousTick
: the last tick value the remote server can provide, for all databases.ticksBehind
: this attribute will be present only if the applier is currently running. It will provide the number of log ticks between what the applier has applied/seen and the last log tick value provided by the remote server. If this value is zero, then both servers are in sync. If this is non-zero, then the remote server has additional data that the applier has not yet fetched and processed, or the remote server may have more data that is not applicable to the applier.Client applications can use it to determine approximately how far the applier is behind the remote server, and can periodically check if the value is increasing (applier is falling behind) or decreasing (applier is catching up).
Please note that as the remote server will only keep one last log tick value for all of its databases, but replication may be restricted to just certain databases on the applier, this value is more meaningful when the global applier is used. Additionally, the last log tick provided by the remote server may increase due to writes into system collections that are not replicated due to replication configuration. So the reported value may exaggerate the reality a bit for some scenarios.
time
: the time on the applier server.totalRequests
: the total number of requests the applier has made to the endpoint.totalFailedConnects
: the total number of failed connection attempts the applier has made.totalEvents
: the total number of log events the applier has processed.totalOperationsExcluded
: the total number of log events excluded because ofrestrictCollections
.progress
: a JSON object with details about the replication applier progress. It contains the following sub-attributes if there is progress to report:message
: a textual description of the progresstime
: the date and time the progress was loggedfailedConnects
: the current number of failed connection attempts
lastError
: a JSON object with details about the last error that happened on the applier. It contains the following sub-attributes if there was an error:errorNum
: a numerical error codeerrorMessage
: a textual error descriptiontime
: the date and time the error occurred
In case no error has occurred,
lastError
will be empty.
server
: a JSON object with the following sub-attributes:version
: the applier server’s versionserverId
: the applier server’s id
endpoint
: the endpoint the applier is connected to (if applier is active) or will connect to (if applier is currently inactive)database
: the name of the database the applier is connected to (if applier is active) or will connect to (if applier is currently inactive)
Please note that all “tick” values returned do not have a specific unit. Tick values are only meaningful when compared to each other. Higher tick values mean “later in time” than lower tick values.
Examples
Fetching the state of an inactive applier:
curl --header 'accept: application/json' --dump - 'http://localhost:8529/_api/replication/applier-state'
Fetching the state of an active applier:
curl --header 'accept: application/json' --dump - 'http://localhost:8529/_api/replication/applier-state'
Turn a server into a follower of another
Changes the role to a follower and starts a full data synchronization from a remote endpoint into the local ArangoDB database and afterwards starts the continuous replication.
The operation works on a per-database level.
All local database data will be removed prior to the synchronization.
In case of success, the body of the response is a JSON object with the following attributes:
state
: a JSON object with the following sub-attributes:running
: whether or not the applier is active and runninglastAppliedContinuousTick
: the last tick value from the continuous replication log the applier has applied.lastProcessedContinuousTick
: the last tick value from the continuous replication log the applier has processed.Regularly, the last applied and last processed tick values should be identical. For transactional operations, the replication applier will first process incoming log events before applying them, so the processed tick value might be higher than the applied tick value. This will be the case until the applier encounters the transaction commit log event for the transaction.
lastAvailableContinuousTick
: the last tick value the remote server can provide.ticksBehind
: this attribute will be present only if the applier is currently running. It will provide the number of log ticks between what the applier has applied/seen and the last log tick value provided by the remote server. If this value is zero, then both servers are in sync. If this is non-zero, then the remote server has additional data that the applier has not yet fetched and processed, or the remote server may have more data that is not applicable to the applier.Client applications can use it to determine approximately how far the applier is behind the remote server, and can periodically check if the value is increasing (applier is falling behind) or decreasing (applier is catching up).
Please note that as the remote server will only keep one last log tick value for all of its databases, but replication may be restricted to just certain databases on the applier, this value is more meaningful when the global applier is used. Additionally, the last log tick provided by the remote server may increase due to writes into system collections that are not replicated due to replication configuration. So the reported value may exaggerate the reality a bit for some scenarios.
time
: the time on the applier server.totalRequests
: the total number of requests the applier has made to the endpoint.totalFailedConnects
: the total number of failed connection attempts the applier has made.totalEvents
: the total number of log events the applier has processed.totalOperationsExcluded
: the total number of log events excluded because ofrestrictCollections
.progress
: a JSON object with details about the replication applier progress. It contains the following sub-attributes if there is progress to report:message
: a textual description of the progresstime
: the date and time the progress was loggedfailedConnects
: the current number of failed connection attempts
lastError
: a JSON object with details about the last error that happened on the applier. It contains the following sub-attributes if there was an error:errorNum
: a numerical error codeerrorMessage
: a textual error descriptiontime
: the date and time the error occurred
In case no error has occurred,
lastError
will be empty.
server
: a JSON object with the following sub-attributes:version
: the applier server’s versionserverId
: the applier server’s id
endpoint
: the endpoint the applier is connected to (if applier is active) or will connect to (if applier is currently inactive)database
: the name of the database the applier is connected to (if applier is active) or will connect to (if applier is currently inactive)
Please note that all “tick” values returned do not have a specific unit. Tick values are only meaningful when compared to each other. Higher tick values mean “later in time” than lower tick values.
autoResyncRetries integer
number of resynchronization retries that will be performed in a row when automatic resynchronization is enabled and kicks in. Setting this to
0
will effectively disableautoResync
. Setting it to some other value will limit the number of retries that are performed. This helps preventing endless retries in case resynchronizations always fail.idleMaxWaitTime integer
the maximum wait time (in seconds) that the applier will intentionally idle before fetching more log data from the leader in case the leader has already sent all its log data and there have been previous log fetch attempts that resulted in no more log data. This wait time can be used to control the maximum frequency with which the replication applier sends HTTP log fetch requests to the leader in case there is no write activity on the leader for longer periods. This configuration value will only be used if the option
adaptivePolling
is set totrue
. This value will be ignored if set to0
.idleMinWaitTime integer
the minimum wait time (in seconds) that the applier will intentionally idle before fetching more log data from the leader in case the leader has already sent all its log data. This wait time can be used to control the frequency with which the replication applier sends HTTP log fetch requests to the leader in case there is no write activity on the leader. This value will be ignored if set to
0
.initialSyncMaxWaitTime integer
the maximum wait time (in seconds) that the initial synchronization will wait for a response from the leader when fetching initial collection data. This wait time can be used to control after what time the initial synchronization will give up waiting for a response and fail. This value is relevant even for continuous replication when
autoResync
is set totrue
because this may re-start the initial synchronization when the leader cannot provide log data the follower requires. This value will be ignored if set to0
.requireFromPresent boolean
if set to
true
, then the replication applier will check at start of its continuous replication if the start tick from the dump phase is still present on the leader. If not, then there would be data loss. IfrequireFromPresent
istrue
, the replication applier will abort with an appropriate error message. If set tofalse
, then the replication applier will still start, and ignore the data loss.