While Kubernetes and the ArangoDB Kubernetes operator will automatically resolve a lot of issues, there are always cases where human attention is needed.
This chapter gives your tips & tricks to help you troubleshoot deployments.
Where to look
In Kubernetes all resources can be inspected using
kubectl using either
To get all details of the resource (both specification & status), run the following command:
kubectl get <resource-type> <resource-name> -n <namespace> -o yaml
For example, to get the entire specification and status
ArangoDeployment resource named
my-arangodb in the
kubectl get ArangoDeployment my-arango -n default -o yaml # or shorter kubectl get arango my-arango -o yaml
Several types of resources (including all ArangoDB custom resources) support events. These events show what happened to the resource over time.
To show the events (and most important resource data) of a resource, run the following command:
kubectl describe <resource-type> <resource-name> -n <namespace>
Another invaluable source of information is the log of containers being run
These logs are accessible through the
Pods that group these containers.
To fetch the logs of the default container running in a
kubectl logs <pod-name> -n <namespace> # or with follow option to keep inspecting logs while they are written kubectl logs <pod-name> -n <namespace> -f
To inspect the logs of a specific container in
You can find the names of the containers in the
kubectl describe pod ....
Note that the ArangoDB operators are being deployed themselves as a Kubernetes
with 2 replicas. This means that you will have to fetch the logs of 2
Pods of a deployment stay in
There are two common causes for this.
Pods cannot be scheduled because there are not enough nodes available.
This is usally only the case with a
spec.environment setting that has a value of
Solution: Add more nodes.
1) There are no
PersistentVolumes available to be bound to the
created by the operator.
kubectl get persistentvolumes to inspect the available
and if needed, use the
ArangoLocalStorage operator to provision
When restarting a
Pods scheduled on that node remain in
Node no longer makes regular calls to the Kubernetes API server, it is
marked as not available. Depending on specific settings in your
will at some point decide to terminate the
Pod. As long as the
Node is not
completely removed from the Kubernetes API server, Kubernetes will try to use
Node itself to terminate the
ArangoDeployment operator recognizes this condition and will try to replace those
Pods on different nodes. The exact behavior differs per type of server.
What happens when a
Node with local data is broken
PersistentVolumes hosted on that
Node is broken and
cannot be repaired, the data in those
PersistentVolumes is lost.
ArangoDeployment of type
Single was using one of those
the database is lost and must be restored from a backup.
ArangoDeployment of type
Cluster was using one of
PersistentVolumes, it depends on the type of server that was using the volume.
- If an
Agentwas using the volume, it can be repaired as long as 2 other agents are still healthy.
- If a
DBServerwas using the volume, and the replication factor of all database collections is 2 or higher, and the remaining dbservers are still healthy, the cluster will duplicate the remaining replicas to bring the number of replicases back to the original number.
- If a
DBServerwas using the volume, and the replication factor of a database collection is 1 and happens to be stored on that dbserver, the data is lost.
- If a single server of an
ActiveFailoverdeployment was using the volume, and the other single server is still healthy, the other single server will become leader. After replacing the failed single server, the new follower will synchronize with the leader.