Hot Backup Examples
How to create a consistent snapshot with the arangobackup
tool
Create
Hot backups are created near instantaneously. The single server as well as other deployment modes try to obtain a global write transaction lock to enforce consistency across all servers, databases, collections etc. Hot backups still require no Data Definition operations (e.g., create database, create collection) to be active at the time of hot backup, please review the requirements and limitations for more details.
Once that lock could be acquired the hot backup itself is most readily described as a consistent snapshot on the local file system.
arangobackup create --server.endpoint tcp://myserver:8529 --label my-label
The above will create a hot backup with a unique identifier consisting of the UTC time according to the local computer clock output and the specified label and report the success like below.
2019-05-15T13:57:11Z [15213] INFO {backup} Server version: 3.5.1
2019-05-15T14:20:16Z [15397] INFO {backup} Backup succeeded. Generated identifier '2019-05-15T14.20.15Z_my-label'
label
marker is omitted then a unique identifier string is
generated instead.There are more options for the cluster mode regarding the acquisition of the global write transaction lock:
--max-wait-for-lock
: configures how long the system tries to get the global write transaction lock before it reports failure. Its value must be a number in seconds (default: 120 seconds).--allow-inconsistent
: if set tofalse
(default), the operation is considered to have failed if the maximal waiting time for the lock is exceeded. If set totrue
, the system will take a potentially non-consistent hot backup when the timeout is exceeded.--force
: will make arangobackup abort ongoing write transactions in order to more quickly acquire the global write transaction lock. This option should be used with caution, as it will potentially abort valid write transactions, meaning client applications will see errors for otherwise valid operations and queries. The force option currently only aborts Stream Transactions but no JavaScript Transactions.
Restore
Once a hot backup is created, one can use the generated backup id,
for example 2019-05-15T14.36.38Z_my-label
to restore the entire
instance to that “snapshot”.
arangobackup restore --server.username root --identifier 2019-05-15T14.36.38Z_my-label
The output will reflect the restore operation’s success:
2019-05-15T15:24:14Z [16201] INFO {backup} Server version: 3.5.1
2019-05-15T15:24:14Z [16201] INFO {backup} Successfully restored '2019-05-15T14.36.38Z_my-label'
Delete
Hot backups, analogous to virtual machine snapshots, cause additional disk usage. With every hot backup a consistent state in time is frozen. Later changes will then have to hold a difference to older hot backups. Compactions can no longer cover events before the last hot backup. Naturally, one may want to be able to free disk space, once hot backups become obsolete.
arangobackup delete --server.username root --identifier <identifier>
The result of the operation is thus delivered:
2019-05-15T15:34:34Z [16257] INFO {backup} Server version: 3.5.1
2019-05-15T15:34:34Z [16257] INFO {backup} Successfully deleted '2019-05-15T13.57.03Z'
List
One may hold a multitude of hot backups. Those would all be available
to restore from. In order to get a listing of such hot backups, one
may use the list
command.
arangobackup list
The output lists all available hot backups:
2019-05-15T15:28:17Z [16224] INFO {backup} Server version: 3.5.1
2019-05-15T15:28:17Z [16224] INFO {backup} The following backups are available:
2019-05-15T15:28:17Z [16224] INFO {backup} - 2019-05-15T13.57.11Z_my-label
2019-05-15T15:28:17Z [16224] INFO {backup} - 2019-05-15T13.57.03Z-other-label
Upload
Hot backups can be uploaded to a remote repository, here is an example which
uses the S3
protocol:
arangobackup upload --server.endpoint tcp://myserver:8529 --rclone-config-file /path/to/remote.json --identifier 2019-05-13T07.15.43Z_some-label --remote-path S3://remote-endpoint/remote-directory
The output will look like this:
2019-07-30T08:10:10Z [17184] INFO [06792] {backup} Server version: 3.5.1
2019-07-30T08:10:10Z [17184] INFO [a9597] {backup} Backup initiated, use
2019-07-30T08:10:10Z [17184] INFO [4c459] {backup} arangobackup upload --status-id=114
2019-07-30T08:10:10Z [17184] INFO [5cd70] {backup} to query progress.
This uses a file remote.json
in the current directory to configure
credentials for the remote site. Here is an example:
{
"my-s3": {
"type": "s3",
"provider": "aws",
"env_auth": "false",
"access_key_id": "XXXXXXXXXXXXXXXXXXXX",
"secret_access_key": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
"region": "xx-xxxx-x",
"acl": "private"
}
}
This process may take as long as it needs to upload the data from the single server or all of the cluster’s DB-Servers to the remote location. However, the upload will take advantage from previously uploaded hot backups which might contain identical files. Therefore, the functionality is incremental, if regular hot backups are taken and uploaded to the same remote site.
The status of the process may be acquired at any later time.
arangobackup upload --server.endpoint tcp://myserver:8529 --status-id=114
where the number given in the --status-id
option is the one which was
reported in the original upload command.
The output will look like this:
2019-07-30T08:11:09Z [17465] INFO [06792] {backup} Server version: 3.5.1
2019-07-30T08:11:09Z [17465] INFO [24d75] {backup} SNGL Status: COMPLETED
2019-07-30T08:11:09Z [17465] INFO [68cc8] {backup} Last progress update 2019-07-30T08:10:10Z: 5/5 files done
See rclone Configuration for details about the remote.json
file to configure the remote site for rclone
for different protocols than S3.
Download
Hot backups can be downloaded from a remote repository like this:
arangobackup download --server.endpoint tcp://myserver:8529 --rclone-config-file /path/to/remote.json --identifier 2019-05-13T07.15.43Z_some-label --remote-path S3://remote-endpoint/remote-directory
The output will look like this:
2019-07-30T08:14:43Z [17621] INFO [06792] {backup} Server version: 3.5.1
2019-07-30T08:14:43Z [17621] INFO [a9597] {backup} Backup initiated, use
2019-07-30T08:14:43Z [17621] INFO [4c459] {backup} arangobackup download --status-id=250
2019-07-30T08:14:43Z [17621] INFO [5cd70] {backup} to query progress.
This process may take as long as it needs to download the data to the single server or all of the cluster’s DB-Servers from the remote endpoint given network limitations. However, the download will take advantage from other hot backups which might already or still be present locally that contain identical files. Therefore, the functionality is incremental, if a hot backup is downloaded and a similar one is already present.
The status of the download process may be acquired at any later time.
arangobackup download --server.endpoint tcp://myserver:8529 --status-id=250
The output will look like this:
2019-07-30T08:18:07Z [17753] INFO [06792] {backup} Server version: 3.5.1
2019-07-30T08:18:07Z [17753] INFO [24d75] {backup} SNGL Status: COMPLETED
2019-07-30T08:18:07Z [17753] INFO [68cc8] {backup} Last progress update 2019-07-30T08:14:43Z: 5/5 files done
Rclone Configuration
Rclone is a versatile open-source remote file sync program that can deal with over 30 different remote file IO protocols. Enterprise Editions of ArangoDB come with a bundled version of rclone, which is distributed under the MIT license. It is used to both download and upload hot backup sets to and from local and cloud operated storage resources.
To configure rclone, use the rclone-config-file
startup option to
point arangobackup to a JSON configuration file. The expected format
is an object with user-chosen remote names as attribute keys, and the
actual configuration as attribute value (a nested object). The option
names and values in the rclone documentation
directly translate into attribute/value pairs in the JSON file.
Note that "true"
and "false"
must be enclosed by double quotes.
{
"my-remote": {
"option": "value",
"boolean": "true"
}
}
The remote path can be specified via the remote-path
startup option.
The syntax for remote paths is remote:path
, where remote
is the
name of a top-level attribute in the configuration file, path
is a
remote path, and both are separated by a colon (e.g. my-remote:/a/b/c
).
S3
for example are found at rclone.org/s3/ .
Every parameter can be executed as an option to the program invocation, say
--s3-upload-cutoff=0
, as an environment variable like
export RCLONE_S3_UPLOAD_CUTOFF=0
, or most importantly, for use with ArangoDB,
as a key value pair for the JSON files below, { ..., "upload_cutoff": 0, ... }
.S3
… --rclone-config-file ~/my-s3.json --remote-path my-s3://remote-endpoint/remote-directory
The file my-s3.json
could look like this:
{
"my-s3": {
"type": "s3",
"provider": "aws",
"env_auth": "false",
"access_key_id": "XXXXXXXXXXXXXXXXXXXX",
"secret_access_key": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
"region": "xx-xxxx-x",
"acl": "private"
}
}
More examples and details for S3 configurations can be found at rclone.org/s3/ .
Locally mounted local or remote volumes
… --rclone-config-file ~/my-local.json --remote-path my-local://mnt/backup/arangodb
The file my-local.json
could look like this:
{
"my-local": {
"type": "local",
"copy-links": "false",
"links": "false",
"one_file_system": "false"
}
}
More examples and details for local configurations can be found at rclone.org/local/ .
WebDAV
… --rclone-config-file ~/my-dav.json --remote-path my-dav://remote-endpoint/remote-directory
This file my-dav.json
could look like this:
{
"my-dav": {
"pass": "A0OeLviBmwqKyCi7S6Rnn6dG576cJeRN1Nh0Dm5h8k0",
"type": "webdav",
"url": "https://dav.myserver.com",
"user": "davuser",
"vendor": "other"
}
}
More examples and details on WebDAV configurations can be found rclone.org/webdav/ .