ArangoDB v3.13 is under development and not released yet. This documentation is not final and potentially incomplete.
Features and Improvements in ArangoDB 3.12
Improved memory accounting, wildcard search, and dump performance, multi-dimensional indexes, external versioning, and transparent compression
The following list shows in detail which features have been added or improved in ArangoDB 3.12. ArangoDB 3.12 also contains several bug fixes that are not listed here.
All Enterprise Edition features in Community Edition
Introduced in: v3.12.5
Up to version 3.12.4, the Community Edition of ArangoDB didn’t include certain query, performance, compliance, and security features. They used to be exclusive to the Enterprise Edition.
From version 3.12.5 onward, the Community Edition includes all Enterprise Edition features without time restrictions. You still need a license to use version 3.12 or later for commercial purposes or for a dataset size over 100 GiB.
The following features are now available in the Community Edition:
Performance
- SmartGraphs
- EnterpriseGraphs
- SmartGraphs using SatelliteCollections
- SatelliteGraphs
- SatelliteCollections
- SmartJoins
- OneShard
- Traversal Parallelization
- Traversal Projections
- Parallel index creation
- minhashAnalyzer
- geo_s2Analyzer
- ArangoSearch column cache
- ArangoSearch WAND optimization
- Read from followers in clusters
Querying
- Search highlighting
- Nested search
- classificationand- nearest_neighborsAnalyzers (experimental)
- Skip inaccessible collections
Security
- Auditing
- Encryption at Rest
- Encrypted Backups
- Hot Backups
- Enhanced Data Masking
- Key rotation for JWT secrets and on-disk encryption
- Server Name Indication (SNI)
ArangoSearch
WAND optimization (Enterprise Edition)
For arangosearch Views and inverted indexes (and by extension search-alias
Views), you can define a list of sort expressions that you want to optimize.
This is also known as WAND optimization.
If you query a View with the SEARCH operation in combination with a
SORT and LIMIT operation, search results can be retrieved faster if the
SORT expression matches one of the optimized expressions.
Only sorting by highest rank is supported, that is, sorting by the result
of a scoring function in descending order (DESC).
See Optimizing View and inverted index query performance for examples.
This feature is only available in the Enterprise Edition up to v3.12.4 and included in all Editions from v3.12.5 onward.
SEARCH parallelization 
In search queries against Views, you can set the new parallelism option for
SEARCH operations to optionally process index segments in parallel using
multiple threads. This can speed up search queries.
The default value for the parallelism option is defined by the new
--arangosearch.default-parallelism startup option that defaults to 1.
The new --arangosearch.execution-threads-limit startup option controls how
many threads can be used in total for search queries. The new
arangodb_search_execution_threads_demand metric reports the number of threads
that queries request. If it is below the configured thread limit, it coincides
with the number of active threads. If it exceeds the limit, some queries cannot
currently get the threads as requested and may have to use a single thread until
more become available.
See SEARCH operation in AQL
for details.
Analyzers
wildcard Analyzer 
You can use the new wildcard Analyzer in combination with an inverted index or
View to accelerate wildcard searches, especially if you want to find non-prefix
partial matches in long strings.
The Analyzer can apply another Analyzer of your choice before creating n-grams
that are then used in LIKE searches with _ and % wildcards.
See Transforming data with Analyzers for details.
multi_delimiter Analyzer 
The new multi_delimiter Analyzer type accepts an array of strings to define
multiple delimiters to split the input at. Each string is considered as one
delimiter that can be one or multiple characters long.
Unlike with the delimiter Analyzer, the multi_delimiter Analyzer does not
support quoting fields.
If you want to split text using multiple delimiters and don’t require CSV-like
quoting, use the multi_delimiter Analyzer instead of chaining multiple
delimiter Analyzers in a pipeline Analyzer.
var analyzers = require("@arangodb/analyzers");
var a = analyzers.save("delimiter_multiple", "multi_delimiter", {
  delimiters: [",", ";", "||"]
}, []);
db._query(`RETURN TOKENS("differently,delimited;words||one|token", "delimiter_multiple")`).toArray();
// [ ["differently", "delimited", "words", "one|token"] ]
See Analyzers for details.
Improved memory accounting and usage
Version 3.12 features multiple improvements to the observability of ArangoDB deployments. Memory usage is more accurately tracked and additional metrics have been added for monitoring the memory consumption.
AQL queries may now report a higher memory usage and thus run into memory limits sooner, see Higher reported memory usage for AQL queries.
The RocksDB block cache metric rocksdb_block_cache_usage now also includes the
memory used for table building, table reading, file metadata, flushing and
compactions by default.
Furthermore, the memory usage of some subsystems has been optimized. When dropping a database, all contained collections are now marked as dropped immediately. Ongoing operations on these collections can be stopped earlier, and memory for the underlying collections and indexes can be reclaimed sooner. Memory used for index selectively estimates is now also released early. ArangoSearch has a smaller memory footprint for removal operations now.
The following new metrics have been added for memory observability:
| Label | Description | 
|---|---|
| arangodb_agency_node_memory_usage | Memory used by Agency store/cache. | 
| arangodb_aql_cursors_memory_usage | Total memory usage of active AQL query result cursors. | 
| arangodb_index_estimates_memory_usage | Total memory usage of all index selectivity estimates. | 
| arangodb_internal_cluster_info_memory_usage | Amount of memory spent in ClusterInfo. | 
| arangodb_requests_memory_usage | Memory consumed by incoming, queued, and currently processed requests. | 
| arangodb_revision_tree_buffered_memory_usage | Total memory usage of buffered updates for all revision trees. | 
| arangodb_scheduler_queue_memory_usage | Number of bytes allocated for tasks in the scheduler queue. | 
| arangodb_scheduler_stack_memory_usage | Approximate stack memory usage of worker threads. | 
| arangodb_search_consolidations_memory_usage | Amount of memory in bytes that is used for consolidating an ArangoSearch index. | 
| arangodb_search_mapped_memory | Amount of memory in bytes that is mapped for an ArangoSearch index. | 
| arangodb_search_readers_memory_usage | Amount of memory in bytes that is used for reading from an ArangoSearch index. | 
| arangodb_search_writers_memory_usage | Amount of memory in bytes that is used for writing to an ArangoSearch index. | 
| arangodb_transactions_internal_memory_usage | Total memory usage of internal transactions. | 
| arangodb_transactions_rest_memory_usage | Total memory usage of user transactions (excluding top-level AQL queries). | 
Introduced in: v3.12.6
The memory accounting for AQL queries has been extended to track the memory usage of the following:
- Grouping with the COLLECToperation if thesortedmethod is used.
- Aggregating with COLLECT ... AGGREGATE.
- Deduplicating results with RETURN DISTINCT.
- Using the MERGE()function to combine objects.
- Internal buffers for execution blocks, late materialization, building results, and distributing AQL queries in cluster deployments.
- Internal buffers for decay, distance, and replace functions.
- Internal buffers used by the query parser and optimizer.
It is expected that the reported memory consumption is now higher.
Web interface
Shard rebalancing
The feature for rebalancing shards in cluster deployments has been moved from the Rebalance Shards tab in the Nodes section to the Distribution tab in the Cluster section of the web interface.
The updated interface now offers the following options:
- Move Leaders
- Move Followers
- Include System Collections
Unified list view
ArangoDB 3.12 brings a significant enhancement to the display of collections, Views, graphs, users, services, and databases in the web interface. The previous tile format has been replaced with a user-friendly tabular layout, providing a consistent and intuitive experience that is visually aligned across all components. The existing tabular views have also been reworked to ensure a seamless transition.
The new tabular format includes the following features:
- Dynamic filters on columns: Each column now has a dynamic filter box, allowing you to efficiently search and filter based on keywords. This makes it easy to locate specific items within the list.
- Dynamic sorting on columns: Sort elements easily based on column data such as name, date, or size. This functionality provides a flexible way to organize and view your data.
Options & optimizer rules management in the Query Editor
You can now specify extra options for your queries via the query editor. You can find all available options in the new Options tab in the right-hand pane of the editor view, positioned alongside the Bind Variables tab.
In the new Options tab, you also have access to the optimizer rules. This section allows you to selectively disable multiple optimizer rules, giving you more control over query optimization according to your specific requirements.
Swagger UI
The interactive tool for exploring HTTP APIs has been updated to version 5.4.1.
You can find it in the web interface in the Rest API tab of the Support
section, as well as in the API tab of Foxx services and Foxx routes that use
module.context.createDocumentationRouter().
The new version adds support for OpenAPI 3.x specifications in addition to Swagger 2.x compatibility.
External versioning support
Document operations that update or replace documents now support a versionAttribute option.
If set, the attribute with the name specified by the option is looked up in the
stored document and the attribute value is compared numerically to the value of
the versioning attribute in the supplied document that is supposed to update/replace it.
The document is only changed if the new number is higher.
This simple versioning can help to avoid overwriting existing data with older versions in case data is transferred from an external system into ArangoDB and the copies are currently not in sync.
This new feature is supported in AQL, the JavaScript API, and the HTTP API for
the respective operations to update and replace documents, including the insert
operations when used to update/replace a document with overwriteMode: "update"
or overwriteMode: "replace".
Examples:
Insert a new document normally using arangosh:
db.collection.insert({ _key: "123", externalVersion: 1 });Update the document if the versioning attribute is higher in the new document, which is true in this case:
db.collection.update("123",
  { externalVersion: 5, anotherAttribute: true },
  { versionAttribute: "externalVersion" });Updating the document is skipped if the versioning attribute is lower or equal in the supplied document compared to what is currently stored in ArangoDB:
db.collection.update("123",
  { externalVersion: 4, anotherAttribute: false },
  { versionAttribute: "externalVersion" });You can also make use of the versionAttribute option in an insert-update
operation for the update case, including in AQL:
db.collection.insert({ _key: "123", externalVersion: 6, value: "foo" },
  { overwriteMode: "update", versionAttribute: "externalVersion" });
db._query(`UPDATE { _key: "123", externalVersion: 7, value: "bar" } IN collection 
  OPTIONS { versionAttribute: "externalVersion"}`);External versioning is opt-in and no version checking is performed for
operations for which the versionAttribute option isn’t set. Document removal
operations do not support external versioning. Removal operations are always
carried out normally.
Note that version checking is performed only if both the existing version of
the document in the database and the new document version contain the version
attribute with numeric values of 0 or greater. If neither the existing document
in the database nor the new document contains the version attribute, or if the
version attribute in any of the two is not a number inside the valid range, the
update/replace operations behave as if no version checking was requested.
This may overwrite the versioning attribute in the database.
Also see:
- The AQL INSERT, UPDATE and REPLACE operations
- The insert(),update(),replace()methods of the collection object in the JavaScript API
- The endpoints to create, update, and replace a single or multiple documents in the HTTP API
AQL
Improved joins
The AQL optimizer now automatically recognizes whether a better strategy for
joining collections can be used, using the new join-index-nodes optimizer rule.
If two or more collections are joined using nested FOR loops and the
attributes you join on are indexed by primary indexes or persistent indexes,
then a merge join can be performed because they are sorted.
Note that returning document attributes from the outer loop is limited to attributes covered by the index, or the improved join strategy cannot be used.
The following example query shows an inner join between orders and users on
user ID. Each document in the orders collection references a user, and the
users collection stores the user ID in the _key attribute. The query returns
the total attribute of every order along with the user information:
FOR o IN orders
  FOR u IN users
    FILTER o.user == u._key
    RETURN { orderTotal: o.total, user: u }The _key attribute is covered by the primary index of the users collection.
If the orders collection has a persistent index defined over the user
attribute and additionally includes the total attribute in
storedValues,
then the query is eligible for a merge join.
Execution plan:
 Id   NodeType          Par     Est.   Comment
  1   SingletonNode                1   * ROOT
 10   JoinNode            ✓   500000     - JOIN
 10   JoinNode                500000       - FOR o IN orders   LET #8 = o.`total`   /* index scan (projections: `total`) */
 10   JoinNode                     1       - FOR u IN users   /* index scan + document lookup */
  6   CalculationNode     ✓   500000     - LET #4 = { "orderTotal" : #8, "user" : u }   /* simple expression */   /* collections used: u : users */
  7   ReturnNode              500000     - RETURN #4
Indexes used:
 By   Name                      Type         Collection   Unique   Sparse   Cache   Selectivity   Fields       Stored values   Ranges
 10   idx_1784521139132825600   persistent   orders       false    false    false      100.00 %   [ `user` ]   [ `total` ]     *
 10   primary                   primary      users        true     false    false      100.00 %   [ `_key` ]   [  ]            (o.`user` == u.`_key`)Filter matching syntax for UPSERT operations 
Version 3.12 introduces an alternative syntax for
UPSERT operations that allows
you to use more flexible filter conditions to look up documents. Previously,
the expression used to look up a document had to be an object literal, and this
is now called the “exact-value matching” syntax.
The new “filter matching” syntax lets you use a FILTER statement for the
UPSERT operation. The filter condition for the lookup can make use of the
CURRENT pseudo-variable to access the lookup document.
UPSERT FILTER CURRENT.name == 'superuser'
INSERT { name: 'superuser', logins: 1, dateCreated: DATE_NOW() }
UPDATE { logins: OLD.logins + 1 } IN usersUPSERT { name: 'superuser' }
INSERT { name: 'superuser', logins: 1, dateCreated: DATE_NOW() }
UPDATE { logins: OLD.logins + 1 } IN usersThe FILTER expression can contain operators such as AND and OR to create
complex filter conditions, and you can apply more filters on the CURRENT
pseudo-variable than just equality matches.
UPSERT FILTER CURRENT.age < 30 AND (STARTS_WITH(CURRENT.name, "Jo") OR CURRENT.gender IN ["f", "x"])
INSERT { name: 'Jordan', age: 29, logins: 1, dateCreated: DATE_NOW() }
UPDATE { logins: OLD.logins + 1 } IN usersRead more about UPSERT operations in AQL.
readOwnWrites option for UPSERT operations 
A readOwnWrites option has been added for UPSERT operations. The default
value is true and the behavior is identical to previous versions of ArangoDB that
do not have this option. When enabled, an UPSERT operation processes its
inputs one by one. This way, the operation can observe its own writes and can
handle modifying the same target document multiple times in the same query.
When the option is set to false, an UPSERT operation processes its inputs
in batches. Normally, a batch has 1000 inputs, which can lead to a faster execution.
However, when using batches, the UPSERT operation cannot observe its own writes.
Therefore, you should only set the readOwnWrites option to false if you can
guarantee that the input of the UPSERT leads to disjoint documents being
inserted, updated, or replaced.
Parallel execution within an AQL query
The new async-prefetch optimizer rule allows certain operations of a query to
asynchronously prefetch the next batch of data while processing the current batch,
allowing parts of the query to run in parallel. This can lead to performance
improvements if there is still reserve (scheduler) capacity.
The new Par column in a query explain output shows which nodes of a query are
eligible for asynchronous prefetching. Write queries, graph execution nodes,
nodes inside subqueries, LIMIT nodes and their dependencies above, as well as
all query parts that include a RemoteNode are not eligible.
Execution plan:
 Id   NodeType                  Par   Est.   Comment
  1   SingletonNode                      1   * ROOT
  2   EnumerateCollectionNode     ✓     18     - FOR doc IN places   /* full collection scan  */   FILTER (doc.`label` IN [ "Glasgow", "Aberdeen" ])   /* early pruning */
  5   CalculationNode             ✓     18       - LET #2 = doc.`label`   /* attribute expression */   /* collections used: doc : places */
  6   SortNode                    ✓     18       - SORT #2 ASC   /* sorting strategy: standard */
  7   ReturnNode                        18       - RETURN docThe profiling output for queries includes a new Par column as well, but it
shows the number of successful parallel asynchronous prefetch calls.
To not overwhelm the server, async prefetching is restricted and the limits are adjustable. See Configurable async prefetch limits.
Added AQL functions
The new PARSE_COLLECTION() and PARSE_KEY() functions let you extract the
collection name respectively the document key from a document identifier with
less overhead.
See Document and object functions in AQL.
The new REPEAT() function repeats the input value a given number of times,
optionally with a separator between repetitions, and returns the resulting string.
The new TO_CHAR() functions lets you specify a numeric Unicode codepoint and
returns the corresponding character as a string.
A numeric function RANDOM() has been added as an alias for the existing RAND().
Introduced in: v3.12.1
The new ENTRIES() functions returns the top-level attributes of an object in
pairs of attribute keys and values.
See Document and object functions in AQL.
Timezone parameter for date functions
The following AQL date functions now accept an optional timezone argument to perform date and time calculations in certain timezones:
- DATE_DAYOFWEEK(date, timezone)
- DATE_YEAR(date, timezone)
- DATE_MONTH(date, timezone)
- DATE_DAY(date, timezone)
- DATE_HOUR(date, timezone)
- DATE_MINUTE(date, timezone)
- DATE_DAYOFYEAR(date, timezone)
- DATE_ISOWEEK(date, timezone)
- DATE_ISOWEEKYEAR(date, timezone)
- DATE_LEAPYEAR(date, timezone)
- DATE_QUARTER(date, timezone)
- DATE_DAYS_IN_MONTH(date, timezone)
- DATE_TRUNC(date, unit, timezone)
- DATE_ROUND(date, amount, unit, timezone)
- DATE_FORMAT(date, format, timezone)
- DATE_ADD(date, amount, unit, timezone)
- DATE_SUBTRACT(date, amount, unit, timezone)
The following two functions accept up to two timezone arguments. If you only specify the first, then both input dates are assumed to be in this one timezone. If you specify two timezones, then the first date is assumed to be in the first timezone, and the second date in the second timezone:
- DATE_DIFF(date1, date2, unit, asFloat, timezone1, timezone2)(- asFloatcan be left out)
- DATE_COMPARE(date1, date2, unitRangeStart, unitRangeEnd, timezone1, timezone2)
Improved move-filters-into-enumerate optimizer rule 
The move-filters-into-enumerate optimizer rule can now also move filters into
EnumerateListNodes for early pruning. This can significantly improve the
performance of queries that do a lot of filtering on longer lists of
non-collection data.
Improved late document materialization
When FILTER operations can be covered by primary, edge, or persistent
indexes, ArangoDB utilizes the index information to only request documents from
the storage engine that fulfill the criteria. This late document materialization
has been improved to load documents in batches for efficiency. It now also
supports projections to fetch subsets of the documents if only a few attributes
are accessed in an AQL query.
For example, a query like below can use late materialization if there is a
persistent index over the x attribute:
FOR doc IN coll
  FILTER doc.x > 5
  RETURN [doc.y, doc.z, doc.a]If the y, z, and a attributes are not covered by the index (e.g. storedValues),
then they need to be fetched from the storage engine. This no longer requires to
load the full documents, as indicated by the /* (projections: … ) /* comment
on the MaterializeNode due to an improved reduce-extraction-to-projection
optimizer rule. The loading is performed in batches due to the new
batch-materialize-documents optimization.
Execution plan:
 Id   NodeType          Par   Est.   Comment
  1   SingletonNode              1   * ROOT
  7   IndexNode           ✓   1000     - FOR doc IN coll   /* persistent index scan, index scan + document lookup */    /* with late materialization */
  8   MaterializeNode         1000       - MATERIALIZE doc /* (projections: `a`, `y`, `z`) */
  5   CalculationNode     ✓   1000       - LET #2 = [ doc.`y`, doc.`z`, doc.`a` ]   /* simple expression */   /* collections used: doc : coll */
  6   ReturnNode              1000       - RETURN #2
Indexes used:
 By   Name                      Type         Collection   Unique   Sparse   Cache   Selectivity   Fields    Stored values   Ranges
  7   idx_1788354556957032448   persistent   coll         false    false    false      100.00 %   [ `x` ]   [ `b` ]         (doc.`x` > 5)
Optimization rules applied:
 Id   Rule Name                                 Id   Rule Name                                 Id   Rule Name                        
  1   move-calculations-up                       5   use-indexes                                9   batch-materialize-documents      
  2   move-filters-up                            6   remove-filter-covered-by-index            10   async-prefetch                   
  3   move-calculations-up-2                     7   remove-unnecessary-calculations-2
  4   move-filters-up-2                          8   reduce-extraction-to-projection  
The number of document lookups caused by late materialization is now reported
under extra.stats.documentLookups by the Cursor HTTP API and shown in a column
of the query profile output:
Query Statistics:
 Writes Exec      Writes Ign      Doc. Lookups      Scan Full      Scan Index      Cache Hits/Misses      Filtered      Peak Mem [b]      Exec Time [s]
           0               0               290              0            3708                  0 / 0          3418             32768            0.01247Introduced in: v3.12.1
The late materialization has been further improved via two new optimizer rules:
- push-down-late-materialization
- materialize-into-separate-variable
Loading documents is deferred in more cases, namely when attributes accessed in
FILTER and SORT operations are covered by an index.
For example, if you have a collection coll with a persistent index over the
attribute x and additionally the attribute b in storedValues, then the
materialization would previously occur before the SORT operation of the
following query:
FOR doc IN coll
  FILTER doc.x > 5
  SORT doc.b
  RETURN docExecution plan:
 Id   NodeType          Par   Est.   Comment
  1   SingletonNode              1   * ROOT
  8   IndexNode           ✓    120     - FOR doc IN coll   /* persistent index scan, scan only */    /* with late materialization */
  9   MaterializeNode          120       - MATERIALIZE doc INTO #4
  5   CalculationNode     ✓    120       - LET #2 = #4.`b`   /* attribute expression */
  6   SortNode            ✓    120       - SORT #2 ASC   /* sorting strategy: standard */
  7   ReturnNode               120       - RETURN #4
Indexes used:
 By   Name                      Type         Collection   Unique   Sparse   Cache   Selectivity   Fields    Stored values   Ranges
  8   idx_1799108758565027840   persistent   coll         false    false    false       83.33 %   [ `x` ]   [ `b` ]         (doc.`x` > 5)The push-down-late-materialization rule moves the materialization down in the
execution plan, below the SORT operation:
Execution plan:
 Id   NodeType          Par   Est.   Comment
  1   SingletonNode              1   * ROOT
  8   IndexNode           ✓    120     - FOR doc IN coll   /* persistent index scan, index only (projections: `b`) */    LET #5 = doc.`b`   /* with late materialization */
  6   SortNode            ✓    120       - SORT #5 ASC   /* sorting strategy: standard */
  9   MaterializeNode          120       - MATERIALIZE doc INTO #4
  7   ReturnNode               120       - RETURN #4
Indexes used:
 By   Name                      Type         Collection   Unique   Sparse   Cache   Selectivity   Fields    Stored values   Ranges
  8   idx_1799108758565027840   persistent   coll         false    false    false       83.33 %   [ `x` ]   [ `b` ]         (doc.`x` > 5)The rule can also push the materialization past FOR loops in case they are
optimized to a merge join (JoinNode, see Improved joins).
The materialize-into-separate-variable rule optimizes how projections of
materializations are managed. Using separate internal variables for projected
attributes avoids the creation of temporary objects and having to look up the
attributes by key in these objects, which is faster. Example:
FOR doc IN coll
  FILTER doc.x > 5
  RETURN [doc.a, doc.b]In previous versions, the execution plan looks like this:
Execution plan:
 Id   NodeType          Est.   Comment
  1   SingletonNode        1   * ROOT
  7   IndexNode          100     - FOR doc IN coll   /* persistent index scan, index scan + document lookup (projections: `a`, `b`) */    
  5   CalculationNode    100       - LET #3 = [ doc.`a`, doc.`b` ]   /* simple expression */   /* collections used: doc : coll */
  6   ReturnNode         100       - RETURN #3While not directly visible, the extracted document attributes a and b are
internally stored in a temporary object with only these two attributes.
The attributes need to be looked up in the object for constructing the result.
From v3.12.1 onward, the execution plan indicates the use of separate internal variables and that no attribute access is needed to construct the result:
Execution plan:
 Id   NodeType          Par   Est.   Comment
  1   SingletonNode              1   * ROOT
  7   IndexNode           ✓    100     - FOR doc IN coll   /* persistent index scan, scan only */    /* with late materialization */
  8   MaterializeNode          100       - MATERIALIZE doc INTO #4 /* (projections: `a`, `b`) */   LET #5 = #4.`a`, #6 = #4.`b`
  5   CalculationNode     ✓    100       - LET #2 = [ #5, #6 ]   /* simple expression */
  6   ReturnNode               100       - RETURN #2Short-circuiting subquery evaluation
Introduced in: v3.12.1
A ternary operator generally evaluates a condition to decide whether to execute
the true branch or the false branch. For example, count < 5 ? "few" : "many"
evaluates to the string "few" if the count variable is less than five,
or to the string "many" otherwise.
In AQL, the branches are not limited to simple expressions but they can also be
subqueries. Up until ArangoDB v3.12.0, the catch is that subqueries are pulled
out by the query optimizer into separate LET operations that are executed
before the condition is evaluated. Regardless of which branch is taken,
the subquery code is always executed.
Similarly, when you use subqueries as sub-expressions that are combined with
logical AND or OR, the subqueries are executed unconditionally. This is
typically not what you want and can lead to logical errors.
Ternary operator example:
LET doc = FIRST(FOR d IN coll FILTER d._key == "A" RETURN d)
RETURN doc ?: FIRST(INSERT { _key: "A" } INTO coll RETURN NEW)
// doc ?: FIRST(...) is a shorthand for doc ? doc : FIRST(...)
The above query looks up a document with the key A in a collection called coll.
If it exists (the doc variable is truthy), then it is returned. Otherwise, the
document is supposed to be created. However, the false branch is a subquery that
gets executed unconditionally. If the document with key A already exists, an
attempt to create this document is still made (but fails because of a key conflict).
Execution plan:
 Id   NodeType            Est.   Comment
  1   SingletonNode          1   * ROOT
 10   CalculationNode        1     - LET #9 = { "_key" : "A" }   /* json expression */   /* const assignment */
 20   SubqueryStartNode      1     - LET #5 = ( /* subquery begin */
 11   InsertNode             1       - INSERT #9 IN coll 
 21   SubqueryEndNode        1       - RETURN  $NEW ) /* subquery end */
 18   SubqueryStartNode      1     - LET #1 = ( /* subquery begin */
 17   IndexNode              1       - FOR d IN coll   /* primary index scan, index scan + document lookup */    
 16   LimitNode              1         - LIMIT 0, 1
 19   SubqueryEndNode        1         - RETURN  d ) /* subquery end */
 14   CalculationNode        1     - LET #11 = (FIRST(#1) ?: #5)   /* simple expression */
 15   ReturnNode             1     - RETURN #11As you can see, the false branch execution starts at node 18, and the
evaluation of the ternary operator happens afterwards in node 14.
From v3.12.1 onward, the evaluation behavior is changed so that only the applicable branch of a ternary operator is executed and subqueries that are used as sub-expressions are effectively evaluated lazily. For example, you can observe the query optimizer rewriting ternary operators differently to support short-circuiting when using subqueries:
Execution plan:
 Id   NodeType            Par   Est.   Comment
  1   SingletonNode                1   * ROOT
 22   SubqueryStartNode            1     - LET #1 = ( /* subquery begin */
 19   IndexNode                    1       - FOR d IN coll   /* primary index scan, index scan + document lookup */    
 18   LimitNode                    1         - LIMIT 0, 1
 23   SubqueryEndNode              1         - RETURN  d ) /* subquery end */
  8   CalculationNode              1     - LET doc = FIRST(#1)   /* simple expression */
 20   SubqueryStartNode            1     - LET #6 = ( /* subquery begin */
 10   CalculationNode              1       - LET #9 = ! doc   /* simple expression */
 11   FilterNode                   1       - FILTER #9
 12   CalculationNode              1       - LET #10 = { "_key" : "A" }   /* json expression */   /* const assignment */
 13   InsertNode                   1       - INSERT #10 IN coll 
 21   SubqueryEndNode              1       - RETURN  $NEW ) /* subquery end */
 16   CalculationNode              1     - LET #11 = (doc ?: FIRST(#6))   /* simple expression */
 17   ReturnNode                   1     - RETURN #11The false branch subquery starts at node 20, still before the evaluation of
the ternary operator at node 16. However, the subquery has a FILTER operation
using the negated condition. This prevents the INSERT operation from running
when the doc variable is truthy because the opposite is false and
FILTER false leaves nothing to process.
Also see Evaluation of subqueries.
Index hints for traversals
Introduced in: v3.12.1
For graph traversals, you can now specify index hints in AQL.
If vertex-centric indexes are known to perform better than the edge index but aren’t chosen automatically, you can make the optimizer prefer the indexes you specify. This can be done per edge collection, direction, and level/depth:
FOR v, e, p IN 1..4 OUTBOUND startNode edgeCollection
OPTIONS {
  indexHint: {
    "edgeCollection": {
      "outbound": {
        "base": ["edge"],
        "1": "myIndex1",
        "2": ["myIndex2", "myIndex1"],
        "3": "myIndex3",
      }
    }
  }
}
FILTER p.edges[1].foo == "bar" AND
       p.edges[2].foo == "bar" AND
       p.edges[2].baz == "qux"See the Traversal options for details.
Query logging
Introduced in: v3.12.2
A new feature for logging metadata of past AQL queries has been added.
You can optionally let ArangoDB store information such as run time, memory usage,
and failure reasons to the _queries system collection in the _system database
with a configurable sampling probability and retention period. This allows you
to analyze the metadata directly in the database system to debug query issues
and understand usage patterns.
See Query logging for details.
Bypass edge cache for graph operations
Introduced in: v3.12.2
The useCache option is now supported for graph traversals and path searches.
You can set this option to false to not make a large graph operation pollute
the edge cache.
FOR v, e, p IN 1..5 OUTBOUND "nodes/123" edges
  OPTIONS { useCache: false }
  ...Push limit into index optimization
Introduced in: v3.12.2
A new push-limit-into-index optimizer rule has been added to better utilize
persistent indexes over multiple fields when there are a subsequent SORT and
LIMIT operation. The following conditions need to be met for the rule to be
applied:
- The index must be a compound index
- The index condition must use the INcomparison operator
- The attributes used for the INcomparison must be the ones used by the index
- There must not be an outer loop and there must not be post-filtering
- The attributes to sort by must be the same ones as in index and in the same
order, and they must all be in the same direction (all ASCor allDESC).
Under these circumstances, fetching all the data to sort it is unnecessary because it is already sorted in the index. This makes it possible to get results from the index in batches. This can greatly improve the performance of certain queries.
Example query:
FOR doc IN coll
  FILTER doc.x IN ["foo", "bar"]
  SORT doc.y
  LIMIT 10
  RETURN docWith a persistent index over x and y but without the optimization, the
query explain output looks as follows:
Execution plan:
 Id   NodeType          Par   Est.   Comment
  1   SingletonNode              1   * ROOT 
  9   IndexNode                500     - FOR doc IN coll   /* persistent index scan, index only (projections: `y`) */    LET #5 = doc.`y`   /* with late materialization */
  6   SortNode                 500       - SORT #5 ASC   /* sorting strategy: constrained heap */
  7   LimitNode                 10       - LIMIT 0, 10
 10   MaterializeNode           10       - MATERIALIZE doc INTO #4
  8   ReturnNode                10       - RETURN #4
Indexes used:
 By   Name                      Type         Collection   Unique   Sparse   Cache   Selectivity   Fields         Stored values   Ranges
  9   idx_1806008994935865344   persistent   coll         false    false    false      100.00 %   [ `x`, `y` ]   [  ]            (doc.`x` IN [ "bar", "foo" ])With the optimization, the LIMIT 10 is pushed into the index node with the
comment /* early reducing results */:
Execution plan:
 Id   NodeType          Par   Est.   Comment
  1   SingletonNode              1   * ROOT 
  9   IndexNode                500     - FOR doc IN coll   /* persistent index scan, index only (projections: `y`) */    LET #5 = doc.`y`   LIMIT 10 /* early reducing results */   /* with late materialization */
  6   SortNode                 500       - SORT #5 ASC   /* sorting strategy: constrained heap */
  7   LimitNode                 10       - LIMIT 0, 10
 10   MaterializeNode           10       - MATERIALIZE doc INTO #4
  8   ReturnNode                10       - RETURN #4
Indexes used:
 By   Name                      Type         Collection   Unique   Sparse   Cache   Selectivity   Fields         Stored values   Ranges
  9   idx_1806008994935865344   persistent   coll         false    false    false      100.00 %   [ `x`, `y` ]   [  ]            (doc.`x` IN [ "bar", "foo" ])Array and object destructuring
Introduced in: v3.12.2
Destructuring lets you assign array values and object attributes to one or
multiple variables with a single LET operation and as part of regular FOR
loops. This can be convenient to extract a subset of values and name them in a
concise manner.
Array values are assigned by position and you can skip elements by leaving out variable names.
Object attributes are assigned by name but you can also map them to different variable names.
You can mix both array and object destructuring.
LET [x, y] = [1, 2, 3]   // Assign 1 to variable x and 2 to y
LET [, y, z] = [1, 2, 3] // Assign 2 to variable y and 3 to z
// Assign "Luna Miller" to variable name and 39 to age
LET { name, age } = { vip: true, age: 39, name: "Luna Miller" }
// Assign the vip attribute value to variable status
LET { vip: status } = { vip: true, age: 39, name: "Luna Miller" }
// Assign 1 to variable x, 2 to y, and 3 to z
LET { obj: [x, [y, z]] } = { obj: [1, [2, 3]] }
// Iterate over array of objects and extract the firstName attribute
LET names = [ { firstName: "Luna"}, { firstName: "Sam" } ]
FOR { firstName } IN names
  RETURN firstNameSee Array destructuring and Object destructuring for details.
Improved utilization of sorting order for COLLECT 
Introduced in: v3.12.3
The query optimizer now automatically recognizes additional cases that allow
using the faster sorted method for a COLLECT operation. For example, the
following query previously created a query plan with two SORT operations and
used the hash method for COLLECT.
FOR doc IN coll
  SORT doc.value DESC
  COLLECT val = doc.value
  RETURN valExecution plan:
 Id   NodeType                  Par   Est.   Comment
  1   SingletonNode                      1   * ROOT 
  2   EnumerateCollectionNode     ✓     36     - FOR doc IN coll   /* full collection scan (projections: `value`)  */   LET #4 = doc.`value`
  4   SortNode                    ✓     36       - SORT #4 DESC   /* sorting strategy: standard */
  6   CollectNode                 ✓     28       - COLLECT val = #4   /* hash */
  8   SortNode                    ✓     28       - SORT val ASC   /* sorting strategy: standard */
  7   ReturnNode                        28       - RETURN valNow, the optimizer checks whether all grouping values are covered by the
user-requested SORT, ignoring the direction, and doesn’t create an additional
SORT node in that case. A sort in descending order can thus now be utilized
for grouping using the sorted method, and possibly even utilize an index.
Execution plan:
 Id   NodeType                  Par   Est.   Comment
  1   SingletonNode                      1   * ROOT 
  2   EnumerateCollectionNode     ✓     36     - FOR doc IN coll   /* full collection scan (projections: `value`)  */   LET #5 = doc.`value`
  4   SortNode                    ✓     36       - SORT #5 DESC   /* sorting strategy: standard */
  6   CollectNode                 ✓     28       - COLLECT val = #5   /* sorted */
  7   ReturnNode                        28       - RETURN valFast object enumeration with ENTRIES() 
Introduced in: v3.12.3
A new optimization has been implemented to improve the efficiency of the
ENTRIES() function in AQL.
The new replace-entries-with-object-iteration optimizer rule can recognize a
query pattern like FOR obj IN source FOR [key, value] IN ENTRIES(obj) ...
and use a faster code path for iterating over the object that avoids copying a
lot of key/value pairs and storing intermediate results.
LET source = [ { a: 1, b: 2 }, { c: 3 } ]
FOR doc IN source // collection or array of objects
  FOR [key, value] IN ENTRIES(doc)
  RETURN CONCAT(key, value)Without the optimization, the node with ID 6 uses a regular list iteration:
Execution plan:
 Id   NodeType            Par   Est.   Comment
  1   SingletonNode                1   * ROOT
  2   CalculationNode       ✓      1     - LET source = [ { "a" : 1, "b" : 2 }, { "c" : 3 } ]   /* json expression */   /* const assignment */
  4   EnumerateListNode     ✓      2     - FOR doc IN source   /* list iteration */
  5   CalculationNode       ✓      2       - LET #7 = ENTRIES(doc)   /* simple expression */
  6   EnumerateListNode     ✓    200       - FOR #4 IN #7   /* list iteration */
  9   CalculationNode       ✓    200         - LET #8 = CONCAT(#4[0], #4[1])   /* simple expression */
 10   ReturnNode                 200         - RETURN #8With the optimization applied, the faster object iteration is used:
Execution plan:
 Id   NodeType            Par   Est.   Comment
  1   SingletonNode                1   * ROOT
  2   CalculationNode       ✓      1     - LET source = [ { "a" : 1, "b" : 2 }, { "c" : 3 } ]   /* json expression */   /* const assignment */
  4   EnumerateListNode     ✓      2     - FOR doc IN source   /* list iteration */
  6   EnumerateListNode     ✓    200       - FOR [key, value] OF doc   /* object iteration */
  9   CalculationNode       ✓    200         - LET #8 = CONCAT(key, value)   /* simple expression */
 10   ReturnNode                 200         - RETURN #8Improved graph path searches
Introduced in: v3.12.3
Due to a refactoring in version 3.11, the performance of certain graph queries
regressed while others improved. In particular shortest path queries like
K_SHORTEST_PATHS queries became slower for certain datasets compared to
version 3.10. The performance should now be similar again due to a switch from
a Dijkstra-like algorithm back to Yen’s algorithm and by re-enabling caching
of neighbor nodes in one case.
In addition, shortest path searches may finish earlier now due to some optimizations to disregard candidate paths for which better candidates have been found already.
Cache for query execution plans (experimental)
Introduced in: v3.12.4
An optional execution plan cache for AQL queries has been added to let you skip query planning and optimization when running the same queries repeatedly. This can significantly reduce the total time for running particular queries where a lot of time is spent on the query planning and optimization passes in proportion to the actual execution.
Query plans are not cached by default. You need to set the new usePlanCache
query option to true to utilize cached plans as well as to add plans to the
cache. Otherwise, the plan cache is bypassed.
db._query("FOR doc IN coll FILTER doc.attr == @val RETURN doc", { val: "foo" }, { usePlanCache: true });Not all AQL queries are eligible for plan caching. You can generally not cache plans of queries where bind variables affect the structure of the execution plan or the index utilization. See Cache eligibility for details.
HTTP API endpoints and a JavaScript API module have been added for clearing the contents of the query plan cache and for retrieving the current plan cache entries. See The execution plan cache for AQL queries for details.
require("@arangodb/aql/plan-cache").toArray();[
  {
    "hash" : "2757239675060883499",
    "query" : "FOR doc IN coll FILTER doc.attr == @val RETURN doc",
    "queryHash" : 11382508862770890000,
    "bindVars" : {
    },
    "fullCount" : false,
    "dataSources" : [
      "coll"
    ],
    "created" : "2024-11-20T17:21:34Z",
    "hits" : 0,
    "memoryUsage" : 3070
  }
]
The following startup options have been added to let you configure the plan cache:
- --query.plan-cache-max-entries: The maximum number of plans in the query plan cache per database. The default value is- 128.
- --query.plan-cache-max-memory-usage: The maximum total memory usage for the query plan cache in each database. The default value is- 8MB.
- --query.plan-cache-max-entry-size: The maximum size of an individual entry in the query plan cache in each database. The default value is- 2MB.
- --query.plan-cache-invalidation-time: The time in seconds after which a query plan is invalidated in the query plan cache.
The following metrics have been added to monitor the query plan cache:
| Label | Description | 
|---|---|
| arangodb_aql_query_plan_cache_hits_total | Total number of lookup hits in the AQL query plan cache. | 
| arangodb_aql_query_plan_cache_memory_usage | Total memory usage of all query plan caches across all databases. | 
| arangodb_aql_query_plan_cache_misses_total | Total number of lookup misses in the AQL query plan cache. | 
PUSH() function available for aggregations 
Introduced in: v3.12.4
The COLLECT and WINDOW operations now support the PUSH() function in
AGGREGATE expressions.
For grouping data with COLLECT, it means that you can rewrite a
COLLECT ... INTO var = <projectionExpression> construct to
COLLECT ... AGGREGATE var = PUSH(<projectionExpression>), for instance.
You can add more assignments to the AGGREGATE clause to perform additional
calculations in one go, making it more powerful than COLLECT ... INTO.
For sliding window calculations with WINDOW, the PUSH() function can be handy
when developing queries to understand what values are in the sliding window:
FOR t IN observations
  SORT t.time
  WINDOW { preceding: "unbounded", following: 0 }
  AGGREGATE sum = SUM(t.val), values = PUSH(t.val)
  RETURN { sum, values }| sum | values | 
|---|---|
| 10 | [10] | 
| 10 | [10,0] | 
| 19 | [10,0,9] | 
| 29 | [10,0,9,10] | 
| 54 | [10,0,9,10,25] | 
| 59 | [10,0,9,10,25,5] | 
| 79 | [10,0,9,10,25,5,20] | 
| 109 | [10,0,9,10,25,5,20,30] | 
| 134 | [10,0,9,10,25,5,20,30,25] | 
See the COLLECT operation
and the WINDOW operation for details.
Improved index utilization for SORT 
Introduced in: v3.12.4
The existing use-index-for-sort optimizer rule has been extended to take
advantage of persistent indexes for sorting in more cases, namely when only a
prefix of the indexed fields are used to sort by.
Persistent indexes are sorted and could already be utilized to optimize SORT
operations away if the attributes you sort by match the attributes the index is
over. Now, the sortedness can be further exploited if there is a persistent index
(e.g. over ["a", "b"]) that starts with the same fields as the SORT operation
(e.g. a), but you sort by additional attributes not covered by the index (e.g. c):
FOR { a, b, c } IN coll
  SORT a, c
  RETURN { a, b, c }The data is already sorted in the index by the first attributes (here: a) and
each group of values only needs to be sorted by the remaining attributes (here: c).
This reduces the amount of work necessary to establish the total sorting order.
The grouped sorting strategy that is applied in such cases is indicated in the query explain output:
Execution plan:
 Id   NodeType          Par   Est.   Comment
  1   SingletonNode              1   * ROOT 
  9   IndexNode           ✓    100     - FOR #3 IN coll   /* persistent index scan, index only (projections: `a`) */    LET #10 = #3.`a`   /* with late materialization */
 10   MaterializeNode          100       - MATERIALIZE #3 INTO #7 /* (projections: `b`, `c`) */   LET #8 = #7.`b`, #9 = #7.`c`
  6   SortNode            ✓    100       - SORT #9 ASC; GROUPED BY #10   /* sorting strategy: grouped */
  7   CalculationNode     ✓    100       - LET #5 = { "a" : #10, "b" : #8, "c" : #9 }   /* simple expression */
  8   ReturnNode               100       - RETURN #5
Indexes used:
 By   Name                      Type         Collection   Unique   Sparse   Cache   Selectivity   Fields         Stored values   Ranges
  9   idx_1821511732613349376   persistent   coll         false    false    false      100.00 %   [ `a`, `b` ]   [  ]            *
Optimization rules applied:
 Id   Rule Name                                  Id   Rule Name                                  Id   Rule Name                         
  1   move-calculations-up                        6   move-calculations-down                     11   optimize-projections              
  2   remove-unnecessary-calculations             7   reduce-extraction-to-projection            12   remove-unnecessary-calculations-4 
  3   move-calculations-up-2                      8   batch-materialize-documents                13   async-prefetch                    
  4   use-indexes                                 9   push-down-late-materialization    
  5   use-index-for-sort                         10   materialize-into-separate-variableImproved index utilization for COLLECT 
Introduced in: v3.12.4
When grouping data with COLLECT to determine distinct values, such operations
can now benefit from persistent indexes. The new use-index-for-collect
optimizer rule speeds up the scanning for distinct values up to orders of
magnitude if the selectivity is low, i.e. if there are few different values.
FOR doc IN coll
  COLLECT a = doc.a, b = doc.b
  RETURN { a, b }If there is a persistent index over the attributes a and b, then the query
explain output looks like this with the optimization applied:
Execution plan:
 Id   NodeType           Par   Est.   Comment
  1   SingletonNode               1   * ROOT 
 10   IndexCollectNode          100     - FOR doc IN coll COLLECT a = doc.`a`, b = doc.`b` /* distinct value index scan */
  6   CalculationNode      ✓    100     - LET #5 = { "a" : a, "b" : b }   /* simple expression */
  7   ReturnNode                100     - RETURN #5
Indexes used:
 By   Name                      Type         Collection   Unique   Sparse   Cache   Selectivity   Fields         Stored values   Ranges
 10   idx_1821499964373598208   persistent   coll         false    false    false       10.00 %   [ `a`, `b` ]   [  ]            *
Optimization rules applied:
 Id   Rule Name                               Id   Rule Name                               Id   Rule Name                      
  1   move-calculations-up                     4   use-index-for-sort                       7   async-prefetch                 
  2   move-calculations-up-2                   5   reduce-extraction-to-projection
  3   use-indexes                              6   use-index-for-collect   
You can disable the optimization for individual COLLECT operations by setting
the new disableIndex option to true:
FOR doc IN coll
  COLLECT a = doc.a, b = doc.b OPTIONS { disableIndex: true }
  RETURN { a, b }Execution plan:
 Id   NodeType          Par   Est.   Comment
  1   SingletonNode              1   * ROOT 
  9   IndexNode           ✓   1000     - FOR doc IN coll   /* persistent index scan, index only (projections: `a`, `b`) */    LET #8 = doc.`a`, #9 = doc.`b`   
  5   CollectNode         ✓    800       - COLLECT a = #8, b = #9   /* sorted */
  6   CalculationNode     ✓    800       - LET #5 = { "a" : a, "b" : b }   /* simple expression */
  7   ReturnNode               800       - RETURN #5
Indexes used:
 By   Name                      Type         Collection   Unique   Sparse   Cache   Selectivity   Fields         Stored values   Ranges
  9   idx_1821499964373598208   persistent   coll         false    false    false       10.00 %   [ `a`, `b` ]   [  ]            *
Optimization rules applied:
 Id   Rule Name                                 Id   Rule Name                                 Id   Rule Name                        
  1   move-calculations-up                       4   use-index-for-sort                         7   remove-unnecessary-calculations-4
  2   move-calculations-up-2                     5   reduce-extraction-to-projection            8   async-prefetch                   
  3   use-indexes                                6   optimize-projections         
The optimization is automatically disabled if the selectivity is high, i.e.
there are many different values, or if there is filtering or an INTO or
AGGREGATE clause. Other optimizations may still be able to utilize the index
to some extent.
See the COLLECT operation
for details.
Introduced in: v3.12.5
The use-index-for-collect optimizer rule has been further extended.
Queries where a COLLECT operation has an AGGREGATE clause that exclusively
refers to attributes covered by a persistent index (and no other variables nor
contains calls of aggregation functions with constant values) can now utilize
this index. The index must not be sparse.
Reading the data from the index instead of the stored documents for aggregations can increase the performance by a factor of two.
FOR doc IN coll
  COLLECT a = doc.a AGGREGATE b = MAX(doc.b)
  RETURN { a, b }If there is a persistent index over the attributes a and b, then the above
example query has an IndexCollectNode in the explain output and the index
usage is indicated if the optimization is applied:
Execution plan:
 Id   NodeType           Par   Est.   Comment
  1   SingletonNode               1   * ROOT 
 10   IndexCollectNode         4999     - FOR doc IN coll COLLECT a = doc.`a` AGGREGATE b = MAX(doc.`b`) /* full index scan */
  6   CalculationNode      ✓   4999     - LET #5 = { "a" : a, "b" : b }   /* simple expression */
  7   ReturnNode               4999     - RETURN #5
Indexes used:
 By   Name                      Type         Collection   Unique   Sparse   Cache   Selectivity   Fields         Stored values   Ranges
 10   idx_1836452431376941056   persistent   coll   
Deduction of node collection for graph queries
Introduced in: v3.12.6
AQL graph traversals and path searches using anonymous graphs / collection sets
require that you declare all involved node collections upfront for cluster
deployments. That is, you need to use the WITH operation to list the collections
edges may point to, as well as the collection of the start node if not declared
otherwise. This also applies to single servers if the --query.require-with
startup option is enabled for parity between both deployment modes.
For example, assume you have two node collections, person and movie, and
edges pointing from one to the other stored in an acts_in edge collection.
If you want to run a traversal query starting from a person that you specify
with its document ID, you need to declare both node collections at the
beginning of the query:
WITH person, movie
FOR v IN 0..1 OUTBOUND "person/1544" acts_in
  LIMIT 4
  RETURN v.labelFrom v3.12.6 onward, the node collections can be automatically inferred if there is a named graph using the same edge collection(s).
For example, assume there is a named graph that includes an edge definition for
the acts_in edge collection, with person as the from collection and movie
as the to collection. If you now specify acts_in as an edge collection in
an anonymous graph query, all named graphs are checked for this edge collection,
and if there is a matching edge definition, its node collections are automatically
added as data sources to the query. You no longer have to manually declare the
person and movie collections:
FOR v IN 0..1 OUTBOUND "person/1544" acts_in
  LIMIT 4
  RETURN v.labelYou can still declare collections manually, in which case they are added as data sources in addition to automatically deduced collections.
Optional elevation for GeoJSON Points
Introduced in: v3.11.14-2, v3.12.6
The GEO_POINT() function now accepts an optional third argument to create a
GeoJSON Point with three coordinates: [longitude, latitude, elevation].
GeoJSON Points may now have three coordinates in general. However, ArangoDB does not take any elevation into account in geo-spatial calculations.
Points with an elevation do no longer fail the validation in the GEO_POLYGON()
and GEO_MULTIPOLYGON() functions. Moreover, GeoJSON with three coordinates is
now indexed by geo indexes and thus also matched by geo-spatial queries, which
means you may find more results than before.
Also see Geo-spatial functions in AQL.
Indexing
Multi-dimensional indexes
The previously experimental zkd index type is now stable and has been renamed
to mdi. Existing indexes keep the zkd type.
Multi-dimensional indexes can now be declared as sparse to exclude documents
from the index that do not have the defined attributes or are explicitly set to
null values. If a value other than null is set, it still needs to be numeric.
Multi-dimensional indexes now support storedValues to cover queries for better
performance.
An additional mdi-prefixed index variant has been added that lets you specify
additional attributes for the index to narrow down the search space using
equality checks. It can be used as a vertex-centric index for graph traversals
if created on an edge collection with the first attribute in prefixFields set
to _from or _to.
See Multi-dimensional indexes for details.
Native strict ranges
Introduced in: v3.12.3
Multi-dimensional indexes no longer require post-filtering when using strict ranges like in the following query:
FOR d IN coll
  FILTER 0 < d.x && d.x < 1
  RETURN d.xExecution plan:
 Id   NodeType        Par   Est.   Comment
  1   SingletonNode            1   * ROOT
  7   IndexNode         ✓     71     - FOR d IN coll   /* mdi index scan, index scan + document lookup (filter projections: `x`) (projections: `x`) */    LET #3 = d.`x`   FILTER ((d.`x` > 0) && (d.`x` < 1))   /* early pruning */   
  6   ReturnNode              71       - RETURN #3
Indexes used:
 By   Name                      Type   Collection   Unique   Sparse   Cache   Selectivity   Fields         Stored values   Ranges
  7   idx_1812443690233233408   mdi    coll         false    false    false           n/a   [ `x`, `y` ]   [  ]            ((d.`x` >= 0) && (d.`x` <= 1))Native support for strict ranges removes a potential bottleneck when working with large datasets.
Execution plan:
 Id   NodeType        Par   Est.   Comment
  1   SingletonNode            1   * ROOT 
  7   IndexNode         ✓     71     - FOR d IN coll   /* mdi index scan, index scan + document lookup (projections: `x`) */    LET #3 = d.`x`   
  6   ReturnNode              71       - RETURN #3
Indexes used:
 By   Name                      Type   Collection   Unique   Sparse   Cache   Selectivity   Fields         Stored values   Ranges
  7   idx_1812443856099082240   mdi    coll         false    false    false           n/a   [ `x`, `y` ]   [  ]            ((d.`x` > 0) && (d.`x` < 1))Extended utilization of sparse indexes
Introduced in: v3.12.3
The null value is less than all other values in AQL. Therefore, range queries
without a lower bound need to include null but sparse indexes do not include
null values. However, if you explicitly exclude null in range queries,
sparse indexes can be utilized after all. This was not previously supported for
multi-dimensional indexes. Even with a sparse mdi index over the fields x
and y and the exclusion of null, the following example query cannot take
advantage of the index up to v3.12.2:
FOR d IN coll
  FILTER d.x < 10 && d.x != null RETURN dExecution plan:
 Id   NodeType                  Par   Est.   Comment
  1   SingletonNode                      1   * ROOT
  2   EnumerateCollectionNode     ✓    100     - FOR d IN coll   /* full collection scan  */   FILTER ((d.`x` < 10) && (d.`x` != null))   /* early pruning */
  5   ReturnNode                       100       - RETURN dFrom v3.13.3 onward, such a query gets optimized to utilize the sparse
multi-dimensional index and the condition for excluding null is removed from
the query plan because it is unnecessary – a sparse index contains values other
than null only:
Execution plan:
Id NodeType Par Est. Comment
1 SingletonNode 1 * ROOT
6 IndexNode ✓ 71 - FOR d IN coll /* mdi index scan, index scan + document lookup */
5 ReturnNode 71 - RETURN d
Indexes used:
By Name Type Collection Unique Sparse Cache Selectivity Fields Stored values Ranges
6 idx_1812445012396343296 mdi coll false true false n/a [ x, y ] [ ] (d.x < 10)
Stored values can contain the _id attribute 
The usage of the _id system attribute was previously disallowed for
persistent indexes inside of storedValues. This is now allowed in v3.12.
Note that it is still forbidden to use _id as a top-level attribute or
sub-attribute in fields of persistent indexes. On the other hand, inverted
indexes have been allowing to index and store the _id system attribute.
Vector indexes
Introduced in: v3.12.4
A new vector index type has been added that enables
you to find items with similar properties by comparing vector embeddings, which
are numerical representations generated by machine learning models.
To use this feature, start an ArangoDB server (arangod) with the --vector-index
startup option (or --experimental-vector-index in v3.12.4 and v3.12.5).
You need to generate vector embeddings before creating a vector index. For more
information about the vector index type including the available settings, see the
Vector indexes
documentation.
You can also follow the guide in this blog post: Vector Search in ArangoDB: Practical Insights and Hands-On Examples
If the vector index type is enabled, the following new AQL functions are available to retrieve similar documents:
- APPROX_NEAR_COSINE()
- APPROX_NEAR_L2()
For how to use these functions as well as query examples, see Vector search functions in AQL.
Two startup options for the storage engine related to vector indexes have been added:
- --rocksdb.max-write-buffer-number-vector
- --rocksdb.partition-files-for-vector-index
The new use-vector-index AQL optimizer ruler is responsible for
utilizing vector indexes in queries.
Furthermore, a new error code ERROR_QUERY_VECTOR_SEARCH_NOT_APPLIED (1554)
has been added.
Introduced in: v3.12.6
Vector indexes now support filtering. You can add FILTER operations between
FOR and SORT that are then applied during the lookup in the vector index.
Note that e.g. LIMIT 5 does not ensure that you get 5 results by searching
as many neighboring Voronoi cells as necessary, but it rather considers only as
many as configured via the nProbes parameter. Example:
FOR doc IN coll
  FILTER doc.val > 3
  SORT APPROX_NEAR_COSINE(doc.vector, @q) DESC
  LIMIT 5
  RETURN docVector indexes can now be sparse to exclude documents with the embedding attribute
for indexing missing or set to null.
Another metric has been added. The innerProduct is a vector similarity measure
calculated using the dot product of two vectors without normalizing them.
Therefore, it compares not only the angle but also the magnitudes.
The accompanying AQL function is the following:
- APPROX_NEAR_INNER_PRODUCT()
Server options
Effective and available startup options
The new GET /_admin/options and GET /_admin/options-description HTTP API
endpoints allow you to return the effective configuration and the available
startup options of the queried arangod instance.
Previously, it was only possible to fetch the current configuration on
single servers and Coordinators using a JavaScript Transaction, and to list
the available startup options with --dump-options.
See the HTTP interface for administration for details.
Protocol aliases for endpoints
You can now use http:// and https:// as aliases for tcp:// and ssl://
in the --server.endpoint startup option of the server.
Adjustable Stream Transaction size
The previously fixed limit of 128 MiB for Stream Transactions
can now be configured with the new --transaction.streaming-max-transaction-size
startup option. The default value remains 128 MiB up to v3.12.3.
From v3.12.4 onward, the default value is 512 MiB.
Transparent compression of requests and responses between ArangoDB servers and client tools
The following startup options have been added to all client tools (except the ArangoDB Starter) and can be used to enable transparent compression of the data that is sent between a client tool and an ArangoDB server:
- --compress-transfer
- --compress-request-threshold
If the --compress-transfer option is set to true, the client tool adds an
extra Accept-Encoding: deflate HTTP header to all requests made to the server.
This allows the server to compress its responses before sending them back to the
client tool.
The client also transparently compresses its own requests to the server if the
size of the request body (in bytes) is at least the value of the
--compress-request-threshold startup option. The default value is 0, which
disables the compression of the request bodies in the client tool. To opt in to
sending compressed data, set the option to a value greater than 0.
The client tool adds a Content-Encoding: deflate HTTP header to the request
if the request body is compressed using the deflate compression algorithm.
The following options have been added to the ArangoDB server:
- --http.compress-response-threshold
- --http.handle-content-encoding-for-unauthenticated-requests
The value of the --http.compress-response-threshold startup option specifies
the threshold value (in bytes) from which on response bodies are sent out
compressed by the server. The default value is 0, which disables sending out
compressed response bodies. To enable compression, the option should be set to a
value greater than 0. The selected value should be large enough to justify the
compression overhead. Regardless of the value of this option, the client has to
signal that it expects a compressed response body by sending an
Accept-Encoding: gzip or Accept-Encoding: deflate HTTP header with its request.
If that header is missing, no response compression is performed by the server.
If the --http.handle-content-encoding-for-unauthenticated-requests
startup option is set to true, the ArangoDB server automatically decompresses
incoming HTTP requests with Content-Encodings: gzip or
Content-Encoding: deflate HTTP header even if the request is not authenticated.
If the option is set to false, any unauthenticated request that has a
Content-Encoding header set is rejected. This is the default setting.
As compression uses CPU cycles, it should be activated only when the network communication between the server and clients is slow and there is enough CPU capacity left for the extra compression/decompression work.
Furthermore, requests and responses should only be compressed when they exceed a certain minimum size, e.g. 250 bytes.
LZ4 compression for values in the in-memory edge cache
Introduced in: v3.11.2
LZ4 compression of edge index cache values allows you to store more data in main memory than without compression, so the available memory can be used more efficiently. The compression is transparent and does not require any change to queries or applications. The compression can add CPU overhead for compressing values when storing them in the cache, and for decompressing values when fetching them from the cache.
The new startup option --cache.min-value-size-for-edge-compression can be
used to set a threshold value size for compression edge index cache payload
values. The default value is 1GB, which effectively turns compression
off. Setting the option to a lower value (i.e. 100) turns on the
compression for any payloads whose size exceeds this value.
The new startup option --cache.acceleration-factor-for-edge-compression can
be used to fine-tune the compression. The default value is 1.
Higher values typically mean less compression but faster speeds.
The following new metrics can be used to determine the usefulness of compression:
- rocksdb_cache_edge_inserts_effective_entries_size_total: returns the total number of bytes of all entries that were stored in the in-memory edge cache, after compression was attempted/applied. This metric is populated regardless of whether compression is used or not.
- rocksdb_cache_edge_inserts_uncompressed_entries_size_total: returns the total number of bytes of all entries that were ever stored in the in-memory edge cache, before compression was applied. This metric is populated regardless of whether compression is used or not.
- rocksdb_cache_edge_compression_ratio: returns the effective compression ratio for all edge cache entries ever stored in the cache.
Note that these metrics are increased upon every insertion into the edge cache, but not decreased when data gets evicted from the cache.
Limit the number of databases in a deployment
Introduced in: v3.10.10, v3.11.2
The --database.max-databases startup option allows you to limit the
number of databases that can exist in parallel in a deployment. You can use this
option to limit the resources used by database objects. If the option is used
and there are already as many databases as configured by this option, any
attempt to create an additional database fails with error
32 (ERROR_RESOURCE_LIMIT). Additional databases can then only be created
if other databases are dropped first. The default value for this option is
unlimited, so an arbitrary amount of databases can be created.
--database.extended-names enabled by default 
The --database.extended-names startup option is now enabled by default.
This allows you to use Unicode characters inside database names, collection names,
view names and index names by default, unless you explicitly turn off the
functionality.
Note that once a server in your deployment has been started with the flag set to
true, it stores this setting permanently. Switching the startup option back to
false raises a warning about the option change at startup, but it is not
blockig the startup.
Existing databases, collections, views and indexes with extended names can still
be used even with the option set back to false, but no new database objects
with extended names can be created with the option disabled. This state is only
meant to facilitate downgrading or reverting the option change. When the option
is set to false, all database objects with extended names that were created
in the meantime should be removed manually.
Cluster-internal connectivity checks
Introduced in: v3.11.5
This feature makes Coordinators and DB-Servers in a cluster periodically send check requests to each other, in order to see if all nodes can connect to each other. If a cluster-internal connection to another Coordinator or DB-Server cannot be established within 10 seconds, a warning is now logged.
The new --cluster.connectivity-check-interval startup option can be used
to control the frequency of the connectivity check, in seconds.
If set to a value greater than zero, the initial connectivity check is
performed approximately 15 seconds after the instance start, and subsequent
connectivity checks are executed with the specified frequency.
If set to 0, connectivity checks are disabled.
You can also use the following metrics to monitor and detect temporary or permanent connectivity issues:
- arangodb_network_connectivity_failures_coordinators: Number of failed connectivity check requests sent by this instance to Coordinators.
- arangodb_network_connectivity_failures_dbservers_total: Number of failed connectivity check requests sent to DB-Servers.
Configurable maximum for queued log entries
Introduced in: v3.10.12, v3.11.5
The new --log.max-queued-entries startup option lets you configure how many
log entries are queued in a background thread.
Log entries are pushed on a queue for asynchronous writing unless you enable the
--log.force-direct startup option. If you use a slow log output (e.g. syslog),
the queue might grow and eventually overflow.
You can configure the upper bound of the queue with this option. If the queue is full, log entries are written synchronously until the queue has space again.
Configurable maximal size for cache entries
Introduced in: v3.12.1
A new --cache.max-cache-value-size startup option has been added to limit the
maximum size of individual values stored in the in-memory cache. The cache is
used for edge indexes, persistent indexes with caching enabled, and collections
with document caches enabled.
The startup option defaults to 4 MiB, meaning that no individual values larger than this are stored in the in-memory cache. It avoids storing very large values in the cache for huge documents or for all the connections of super nodes, and thus leaves more memory for other purposes.
Configurable async prefetch limits
Introduced in: v3.12.1
While async prefetching is normally beneficial for query performance, the async prefetch operations may cause congestion in the ArangoDB scheduler and interfere with other operations. The amount of prefetch operations is limited and you adjust these limits with he following startup options:
- --query.max-total-async-prefetch-slots: The maximum total number of slots available for asynchronous prefetching, across all AQL queries. Default:- 256
- --query.max-query-async-prefetch-slots: The maximum per-query number of slots available for asynchronous prefetching inside any AQL query. Default:- 32
The total number of concurrent prefetch operations across all AQL queries can be limited using the first option, and the maximum number of prefetch operations in every single AQL query can be capped with the second option.
These options prevent that running a lot of AQL queries with async prefetching fully congests the scheduler queue, and also they prevent large AQL queries to use up all async prefetching capacity on their own.
New server scheduler type
Introduced in: v3.12.1
A new --server.scheduler startup option has been added to let you select the
scheduler type. The scheduler currently used by ArangoDB has the value
supervised. A new work-stealing scheduler is being implemented and can be
selected using the value threadpools. This new scheduler is experimental and
should not be used in production.
Query logging options
Introduced in: v3.12.2
The following startup options related to the Query logging feature have been added:
- --query.collection-logger-enabled: Whether to enable the logging of metadata for past AQL queries
- --query.collection-logger-include-system-database: Whether to log queries that run in the- _systemdatabase
- --query.collection-logger-probability: The sampling probability for logging queries (in percent)
- --query.collection-logger-all-slow-queries: Whether to always log slow queries regardless of whether they are selected for sampling or not
- --query.collection-logger-retention-time: The retention period for entries in the- _queriessystem collection (in seconds)
- --query.collection-logger-cleanup-interval: The interval for running the cleanup process for the retention configuration (in milliseconds)
- --query.collection-logger-push-interval: How long to buffer query log entries in memory before they are actually written to the system collection (in milliseconds)
- --query.collection-logger-max-buffered-queries: The number of query log entries to buffer in memory before they are flushed to the system collection, discarding additional query metadata if the logging thread cannot keep up
Cluster management options
Introduced in: v3.12.4
The following startup options for cluster deployments have been added:
- --agency.supervision-expired-servers-grace-period: The supervision time after which a server is removed from the Agency if it does no longer send heartbeats (in seconds).
- --cluster.no-heartbeat-delay-before-shutdown: The delay (in seconds) before shutting down a Coordinator if no heartbeat can be sent. Set to- 0to deactivate this shutdown.
Limit for shard synchronization actions
Introduced in: v3.11.14, v3.12.5
The number of SynchronizeShard actions that can be scheduled internally by the
cluster maintenance has been restricted to prevent these actions from blocking
TakeoverShardLeadership actions with a higher priority, which could lead to
service interruption during upgrades and after failovers.
The new --server.maximal-number-sync-shard-actions startup option controls
how many SynchronizeShard actions can be queued at any given time.
Full RocksDB compaction on upgrade
Introduced in: v3.12.5-2
A new --database.auto-upgrade-full-compaction startup option has been added
that you can use together with --database.auto-upgrade for upgrading.
With the new option enabled, the server will perform a full RocksDB compaction
after the database upgrade has completed successfully but before shutting down.
This performs a complete compaction of all column families with the changeLevel
and compactBottomMostLevel options enabled, which can help optimize the
database files after an upgrade.
The server process terminates with the new exit code 30
(EXIT_FULL_COMPACTION_FAILED) if the compaction fails.
Miscellaneous changes
V8 and ICU library upgrades
The bundled V8 JavaScript engine has been upgraded from version 7.9.317 to 12.1.165. As part of this upgrade, the Unicode character handling library ICU has been upgraded as well, from version 64.2 to 73.1 (but only for JavaScript contexts, see Incompatible changes in ArangoDB 3.12).
Note that ArangoDB’s build of V8 has pointer compression disabled to allow for more than 4 GB of heap memory.
The V8 upgrade brings various language features to JavaScript contexts in ArangoDB, like arangosh and Foxx. These features are part of the ECMAScript specifications ES2020 through ES2024. The following list is non-exhaustive:
- Optional chaining, like - obj.foo?.bar?.lengthto easily access an object property or call a function but stop evaluating the expression as soon as the value is- undefinedor- nulland return- undefinedinstead of throwing an error
- Nullish coalescing operator, like - foo ?? barto evaluate to the value of- barif- foois- nullor- undefined, otherwise to the value of- foo
- Return the array element at the given index, allowing positive as well as negative integer values, like - [1,2,3].at(-1)
- Copying versions of the array methods - reverse(),- sort(), and- splice()that perform in-place operations, and a copying version of the bracket notation for changing the value at a given index- Return a new array with the elements in reverse order, like [1,2,3].toReversed()
- Return a new array with the elements sorted in ascending order, like [2,3,1].toSorted()
- Return a new array with elements removed and optionally inserted at a given
index, like [1,2,3,4].toSpliced(1,2,"new", "items")
- Return a new array with one element replaced at a given index, allowing
positive and negative integer values, like [1,2,3,4].with(-2, "three")
 
- Return a new array with the elements in reverse order, like 
- Find array elements from the end with - findLast()and- findLastIndex(), like- [1,2,3,4].findLast(v => v % 2 == 1)
- Return a new string with all matches of a pattern replaced with a provided value, not requiring a regular expression, like - "foo bar foo".replaceAll("foo", "baz")
- If the - matchAll()method of a string is used with a regular expression that misses the global- gflag, an error is thrown
- A new regular expression flag - dto include capture group start and end indices, like- /f(o+)/d.exec("foobar").indices[1]
- A new regular expression flag - vto enable the Unicode sets mode, like- /^\p{RGI_Emoji}$/v.test("👨🏾⚕️")or- /[\p{Script_Extensions=Greek}--[α-γ]]/v.test('β')
- A static method to check whether an object directly defines a property, like - Object.hasOwn({ foo: 42 }, "toString"), superseding- Object.prototype.hasOwnProperty.call(…)
- Object.groupBy()and- Map.groupBy()to group the elements of an iterable according to the string values returned by a provided callback function
- Logical assignment operators - &&=,- ||=,- ??=
- Private properties that cannot be referenced outside of the class, like - class P { #privField = 42; #privMethod() { } }
- Static initialization blocks in classes that run when the class itself is evaluated, like - class S { static { console.log("init block") } }
- WeakRefto hold a weak reference to another object, without preventing that object from getting garbage-collected
- A - causeproperty for Error instances to indicate the original cause of the error
- Extended internationalization APIs. Examples: - let egyptLocale = new Intl.Locale("ar-EG") egyptLocale.numberingSystems // [ "arab" ] egyptLocale.calendars // [ "gregory", "coptic", "islamic", "islamic-civil", "islamic-tbla" ] egyptLocale.hourCycles // [ "hc12" ] egyptLocale.timeZones // [ "Africa/Cairo" ] egyptLocale.textInfo // { "direction": "rtl" } egyptLocale.weekInfo // { "firstDay": 6, "weekend" : [5, 6], "minimalDays": 1 } Intl.supportedValuesOf("collation"); // [ "compat", "emoji", "eor", "phonebk", ... ] Intl.supportedValuesOf("calendar"); // [ "buddhist", "chinese", "coptic", "dangi", ... ] // Other supported values: "currency", "numberingSystem", "timeZone", "unit" let germanLocale = new Intl.Locale("de") germanLocale.collations // [ "emoji", "eor", "phonebk" ] germanLocale.weekInfo // { "firstDay": 1, "weekend" : [6, 7], "minimalDays": 4 } let ukrainianCalendarNames = new Intl.DisplayNames(["uk"], { type: "calendar" }) ukrainianCalendarNames.of("buddhist") // "буддійський календар" let frenchDateTimeFieldNames = new Intl.DisplayNames(["fr"], { type: "dateTimeField" }) frenchDateTimeFieldNames.of("day") // "jour" let japaneseDialectLangNames = new Intl.DisplayNames(["ja"], { type: "language" }) let japaneseStandardLangNames = new Intl.DisplayNames(["ja"], { type: "language", languageDisplay: "standard" }) japaneseDialectLangNames.of('en-US') // "アメリカ英語" japaneseDialectLangNames.of('en-GB') // "イギリス英語" japaneseStandardLangNames.of('en-US') // "英語 (アメリカ合衆国)" japaneseStandardLangNames.of('en-GB') // "英語 (イギリス)" let americanDateTimeFormat = new Intl.DateTimeFormat("en-US", { timeZoneName: "longGeneric" }) americanDateTimeFormat.formatRange(new Date(0), new Date()) // e.g. with a German local time: // "1/1/1970, Central European Standard Time – 1/16/2024, Central European Time" let swedishCurrencyNames = new Intl.DisplayNames(["sv"], { type: "currency" }) swedishCurrencyNames.of("TZS") // "tanzanisk shilling" let americanNumberFormat = new Intl.NumberFormat("en-US", { style: "currency", currency: "EUR", maximumFractionDigits: 0 }) americanNumberFormat.formatRange(1.5, 10) // "€2 – €10" let welshPluralRules = new Intl.PluralRules("cy") welshPluralRules.selectRange(1, 3) // "few"
Active AQL query cursors metric
The arangodb_aql_cursors_active metric has been added and shows the number
of active AQL query cursors.
AQL query cursors are created for queries that produce more results than
specified in the batchSize query option (default value: 1000). Such results
can be fetched incrementally by client operations in chunks.
As it is unclear if and when a client will fetch any remaining data from a
cursor, every cursor has a server-side timeout value (TTL) after which it is
considered inactive and garbage-collected.
Per-collection compaction in cluster
The PUT /_api/collection/{collection-name}/compact endpoint of the HTTP API
can now be used to start the compaction for a specific collection in cluster
deployments. This feature was previously available for single servers only.
RocksDB .sst file partitioning (experimental)
The following experimental startup options for RockDB .sst file partitioning have been added:
- --rocksdb.partition-files-for-documents
- --rocksdb.partition-files-for-primary-index
- --rocksdb.partition-files-for-edge-index
- --rocksdb.partition-files-for-persistent-index
Enabling any of these options makes RocksDB’s compaction write the data for different collections/shards/indexes into different .sst files. Otherwise, the document data from different collections/shards/indexes can be mixed and written into the same .sst files.
When these options are enabled, the RocksDB compaction is more efficient since a lot of different collections/shards/indexes are written to in parallel. The disadvantage of enabling these options is that there can be more .sst files than when the option is turned off, and the disk space used by these .sst files can be higher. In particular, on deployments with many collections/shards/indexes this can lead to a very high number of .sst files, with the potential of outgrowing the maximum number of file descriptors the ArangoDB process can open. Thus, these options should only be enabled on deployments with a limited number of collections/shards/indexes.
ArangoSearch file descriptor metric
The following metric has been added:
- arangodb_search_file_descriptors: Current count of opened file descriptors for an ArangoSearch index.
More instant Hot Backups
Introduced in: v3.10.10, v3.11.3
Cluster deployments no longer wait for all in-progress transactions to get committed when a user requests a Hot Backup. The waiting could cause deadlocks and thus Hot Backups to fail, in particular in ArangoGraph. Now, Hot Backups are created immediately and commits have to wait until the backup process is done.
In-memory edge cache startup options and metrics
Introduced in: v3.11.4
The following startup options have been added:
- --cache.max-spare-memory-usage: the maximum memory usage for spare tables in the in-memory cache.
- --cache.high-water-multiplier: controls the cache’s effective memory usage limit. The user-defined memory limit (i.e.- --cache.size) is multiplied with this value to create the effective memory limit, from which on the cache tries to free up memory by evicting the oldest entries. The default value is- 0.56, matching the previously hardcoded 56% for the cache subsystem.- You can increase the multiplier to make the cache subsystem use more memory, but this may overcommit memory because the cache memory reclamation procedure is asynchronous and can run in parallel to other tasks that insert new data. In case a deployment’s memory usage is already close to the maximum, increasing the multiplier can lead to out-of-memory (OOM) kills. 
The following metrics have been added:
| Label | Description | 
|---|---|
| rocksdb_cache_edge_compressed_inserts_total | Total number of compressed inserts into the in-memory edge cache. | 
| rocksdb_cache_edge_empty_inserts_total | Total number of insertions into the in-memory edge cache for non-connected edges. | 
| rocksdb_cache_edge_inserts_total | Total number of insertions into the in-memory edge cache. | 
Observability of in-memory cache subsystem
Introduced in: v3.10.11, v3.11.4
The following metrics have been added to improve the observability of the in-memory cache subsystem:
- rocksdb_cache_free_memory_tasks_total: Total number of free memory tasks that were scheduled by the in-memory edge cache subsystem. This metric will be increased whenever the cache subsystem schedules a task to free up memory in one of the managed in-memory caches. It is expected to see this metric rising when the cache subsystem hits its global memory budget.
- rocksdb_cache_free_memory_tasks_duration_total: Total amount of time spent inside the free memory tasks of the in-memory cache subsystem. Free memory tasks are scheduled by the cache subsystem to free up memory in existing cache hash tables.
- rocksdb_cache_migrate_tasks_total: Total number of migrate tasks that were scheduled by the in-memory edge cache subsystem. This metric will be increased whenever the cache subsystem schedules a task to migrate an existing cache hash table to a bigger or smaller size.
- rocksdb_cache_migrate_tasks_duration_total: Total amount of time spent inside the migrate tasks of the in-memory cache subsystem. Migrate tasks are scheduled by the cache subsystem to migrate existing cache hash tables to a bigger or smaller table.
Detached scheduler threads
Introduced in: v3.10.13, v3.11.5
A scheduler thread now has the capability to detach itself from the scheduler if it observes the need to perform a potentially long running task, like waiting for a lock. This allows a new scheduler thread to be started and prevents scenarios where all threads are blocked waiting for a lock, which has previously led to deadlock situations. From v3.12.1 onward, coroutines are used instead of detaching threads as the underlying mechanism.
Threads waiting for more than 1 second on a collection lock will detach themselves.
The following startup option has been added:
- --server.max-number-detached-threads: The maximum number of detached scheduler threads. Note that this startup option is deprecated from v3.12.1 onward because a different mechanism than detaching is used, obsoleting the option and no longer having an effect.
The following metric as been added:
- arangodb_scheduler_num_detached_threads: The number of worker threads currently started and detached from the scheduler.
Monitoring per collection/database/user
Introduced in: v3.10.13, v3.11.7
The following metrics have been introduced to track per-shard requests on DB-Servers:
- arangodb_collection_leader_reads_total: The number of read requests on leaders, per shard, and optionally also split by user.
- arangodb_collection_leader_writes_total: The number of write requests on leaders, per shard, and optionally also split by user.
- arangodb_collection_requests_bytes_read_total: The number of bytes read in read requests on leaders.
- arangodb_collection_requests_bytes_written_total: The number of bytes written in write requests on leaders and followers.
To opt into these metrics, you can use the new --server.export-shard-usage-metrics
startup option. It can be set to one of the following values on DB-Servers:
- disabled: No shard usage metrics are recorded nor exported. This is the default value.
- enabled-per-shard: This makes DB-Servers collect per-shard usage metrics.
- enabled-per-shard-per-user: This makes DB-Servers collect per-shard and per-user metrics. This is more granular than- enabled-per-shardbut can produce a lot of metrics.
Whenever a shard is accessed in read or write mode by one of the following operations, the metrics are populated dynamically, either with a per-user label or not, depending on the above setting. The metrics are retained in memory on DB-Servers. Removing databases, collections, or users that are already included in the metrics won’t remove the metrics until the DB-Server is restarted.
The following operations increase the metrics:
- AQL queries: an AQL query increases the read or write counters exactly once for each involved shard. For shards that are accessed in read/write mode, only the write counter is increased.
- Single-document insert, update, replace, and remove operations: for each such operation, the write counter is increased once for the affected shard.
- Multi-document insert, update, replace, and remove operations: for each such operation, the write counter is increased once for each shard that is affected by the operation. Note that this includes collection truncate operations.
- Single and multi-document read operations: for each such operation, the read counter is increased once for each shard that is affected by the operation.
The metrics are increased when any of the above operations start, and they are not decreased should an operation abort or if an operation does not lead to any actual reads or writes.
As there can be many of these dynamic metrics based on the number of shards
and/or users in the deployment, these metrics are turned off by default.
When turned on, the metrics are exposed only via the new
GET /_admin/usage-metrics endpoint. They are not exposed via the existing
metrics GET /_admin/metrics endpoint.
Note that internal operations, such as internal queries executed for statistics gathering, internal garbage collection, and TTL index cleanup are not counted in these metrics. Additionally, all requests that are using the superuser JWT for authentication and that do not have a specific user set are not counted.
Enabling these metrics can likely result in a small latency overhead of a few percent for write operations. The exact overhead depends on several factors, such as the type of operation (single or multi-document operation), replication factor, network latency, etc.
Compression for cluster-internal traffic
The following startup options have been added to optionally compress relevant cluster-internal traffic:
- --network.compression-method: The compression method used for cluster-internal requests.
- --network.compress-request-threshold: The HTTP request body size from which on cluster-internal requests are transparently compressed.
If the --network.compression-method startup option is set to none (default), then no
compression is performed. To enable compression for cluster-internal requests,
you can set this option to either deflate, gzip, lz4, or auto.
The deflate and gzip compression methods are general purpose but can
have significant CPU overhead for performing the compression work.
The lz4 compression method compresses slightly worse but has a lot lower
CPU overhead for performing the compression.
The auto compression method uses deflate by default and lz4 for
requests that have a size that is at least 3 times the configured threshold
size.
The compression method only matters if --network.compress-request-threshold
is set to a value greater than zero. This option configures a threshold value
from which on the outgoing requests will be compressed. If the threshold is
set to a value of 0, then no compression is performed. If the threshold
is set to a value greater than 0, then the size of the request body is
compared against the threshold value, and compression happens if the
uncompressed request body size exceeds the threshold value.
The threshold can thus be used to avoid futile compression attempts for too
small requests.
Compression for all Agency traffic is disabled regardless of the settings of these options.
Read-only shards metric
Introduced in: v3.12.1
The following cluster health metric has been added:
| Label | Description | 
|---|---|
| arangodb_vocbase_shards_read_only_by_write_concern | Number of shards that are read-only due to an undercut of the write concern. | 
Log queue overwhelm metric
Introduced in: v3.12.1
The following metric has been added, indicating whether the log queue is overwhelmed:
| Label | Description | 
|---|---|
| arangodb_logger_messages_dropped_total | Total number of dropped log messages. | 
The related startup option for controlling the size of the log queue has a
default of 16384 instead of 10000 now.
Scheduler dequeue time metrics
Introduced in: v3.12.1
The following scheduler metrics have been added, reporting how long it takes to pick items from different job queues.
| Label | Description | 
|---|---|
| arangodb_scheduler_high_prio_dequeue_hist | Time required to take an item from the high priority queue. | 
| arangodb_scheduler_medium_prio_dequeue_hist | Time required to take an item from the medium priority queue. | 
| arangodb_scheduler_low_prio_dequeue_hist | Time required to take an item from the low priority queue. | 
| arangodb_scheduler_maintenance_prio_dequeue_hist | Time required to take an item from the maintenance priority queue. | 
Option to skip fast lock round for Stream Transactions
Introduced in: v3.12.1
A skipFastLockRound option has been added to let you disable the fast lock
round for Stream Transactions. The option defaults to false so that
fast locking is tried.
Disabling the fast lock round is not necessary unless there are many concurrent Stream Transactions queued that all try to lock the same collection exclusively. In this case, the fast locking is subpar because exclusive locks will prevent it from succeeding. Skipping the fast locking makes each actual locking operation take longer than with fast locking but it guarantees a deterministic locking order. This avoids deadlocking and retrying which can occur with the fast locking. Overall, skipping the fast lock round can be faster in this scenario.
The fast lock round should not be skipped for read-only Stream Transactions as it degrades performance if there are no concurrent transactions that use exclusive locks on the same collection.
See the JavaScript API and the HTTP API for details.
Individual log levels per log output
Introduced in: v3.12.2
You can now configure the log level for each log topic per log output. You can use this feature to log verbosely to a file but print less information in the command-line, for instance.
The repeatable --log.level startup option lets you set the log levels for
log topics as before. You can now additionally specify the levels for individual
topics in --log.output by appending a semicolon and a comma-separated mapping
of log topics and levels after the destination to override the --log.level
configuration:
--log.level memory=warning
--log.output "file:///path/to/file;queries=trace,requests=info"
--log.output "-;all=error"
This sets the log level to warning for the memory topic, which applies to
all outputs unless overridden. The first output is a file with verbose trace
logging for the queries topic, info-level logging for requests,
warning-level logging for memory, and default levels for all other topics.
The second output is to the standard output (command-line), using the all
pseudo-topic to set the log levels to error for all topics.
Furthermore, the HTTP API has been extended to let you query and set the log levels for individual outputs at runtime.
Lost subordinate transactions metric
Introduced in: v3.12.4
The following metric about partially committed or aborted transactions on DB-Servers in a cluster has been added:
| Label | Description | 
|---|---|
| arangodb_vocbase_transactions_lost_subordinates_total | Counts the number of lost subordinate transactions on database servers. | 
RocksDB upgrade
Introduced in: v3.12.6
The RocksDB library has been upgraded from version 7.2.0 to 9.5.0.
As a result, you may see performance improvements while using slightly less resources especially for mixed workloads.
The following new RocksDB functionality is exposed in ArangoDB:
- Different types of block caches, LRU and HyperClockCache (HCC), selectable via
the new --rocksdb.block-cache-typestartup option
- A --rocksdb.block-cache-estimated-entry-chargestartup option to configure the HCC.
- RocksDB table format version 6 (not downwards-compatible to older versions of RocksDB).
- RocksDB blob caching (if blobs are enabled for the documents column family),
which you can enable via --rocksdb.enable-blob-cache.
- Using blob files only from a certain level onwards (if blobs are enabled for
the documents column family), which you can enable via
--rocksdb.blob-file-starting-level.
- Blob cache prepopulation, which you can enable via --rocksdb.prepopulate-blob-cache.
- An option to generate Bloom/Ribbon filters that minimize memory internal
fragmentation, which you can enable with --rocksdb.optimize-filters-for-memory.
The following RocksDB metrics have been added:
| Label | Description | 
|---|---|
| rocksdb_block_cache_charge_per_entry | Average size of entries in RocksDB block cache. | 
| rocksdb_block_cache_entries | Number of entries in the RocksDB block cache. | 
| rocksdb_live_blob_file_garbage_size | Size of garbage in live RocksDB .blob files. | 
| rocksdb_live_blob_file_size | Size of live RocksDB .blob files. | 
| rocksdb_num_blob_files | Number of live RocksDB .blob files. | 
API call recording
Introduced in: v3.12.5
A new /_admin/server/api-calls endpoint has been added to let you retrieve a
list of the most recent requests with a timestamp and the endpoint. This feature
is for debugging purposes.
You can configure the memory limit for this feature with the following startup option:
- --server.api-recording-memory-limit: Size limit for the list of API call records (default:- 26214400).
This means that 25 MiB of memory is reserved by default.
API call recording is enabled by default but you can disable it via the new
--server.api-call-recording startup option.
The /_admin/server/api-calls endpoint exposes the recorded API calls.
It is enabled by default. You can disable it altogether by setting the new
--log.recording-api-enabled startup option to false.
A metric has been added for the time spent on API call recording to track the impact of this feature:
| Label | Description | 
|---|---|
| arangodb_api_recording_call_time | Execution time histogram for API recording calls in nanoseconds. | 
See HTTP interface for server logs for details.
AQL query recording
Introduced in: v3.12.6
A new /_admin/server/aql-queries endpoint has been added to let you retrieve a
list of the most recent AQL queries with a timestamp and information about the
submitted query. This feature is for debugging purposes.
You can configure the memory limit for this feature with the following startup option:
- --server.aql-recording-memory-limit: Size limit for the list of AQL query records (default:- 26214400bytes)
This means that 25 MiB of memory is reserved by default.
AQL query recording is enabled by default but you can disable it via the new
--server.aql-query-recording startup option.
This and the /_admin/server/api-calls endpoint are referred to as the
recording API, which exposes the recorded API calls and AQL queries. It is
enabled by default. Users with administrative access to the _system database
can call the endpoints. You can further restrict the access to only the
superuser by setting --log.recording-api-enabled to jwt, or disable the
endpoints altogether by setting the option to false.
A metric has been added for the time spent on AQL query recording to track the impact of this feature:
| Label | Description | 
|---|---|
| arangodb_aql_recording_call_time | Execution time histogram for AQL recording calls in nanoseconds. | 
See HTTP interface for server logs for details.
Access tokens
Introduced in: v3.12.5
A new authentication feature has been added that lets you use access tokens for either creating JWT session tokens or directly authenticate with an access token instead of a password.
You can create multiple access tokens for a single user account, set expiration dates, and individually revoke tokens.
See the HTTP API documentation.
@PID@ and @TEMP_BASE_DIR@ placeholders for startup options 
Introduced in: v3.12.6
You can use special placeholders in startup options like --log.output that get
substituted as follows:
- @PID@: Replaced at runtime with the actual process ID. This has already been supported using a literal occurrence of- $PID.
- @TEMP_BASE_DIR@: Replaced at runtime with the current temporary directory, e.g.- /tmp/arangodb_i37Xxh(automatically created on startup with a randomly generated suffix).
Keep in mind that @NAME@ is also the syntax for using the value of an
environment variable NAME. If there is an environment variable called PID or
TEMP_BASE_DIR, then @PID@ or @TEMP_BASE_DIR@ is substituted with the
value of the respective environment variable.
Client tools
Protocol aliases for endpoints
You can now use http:// and https:// as aliases for tcp:// and ssl://
in the --server.endpoint startup option with all client tools.
arangodump
--ignore-collection startup option 
arangodump now supports a --ignore-collection startup option that you can
specify multiple times to exclude the specified collections from a dump.
It cannot be used together with the existing --collection option for specifying
collections to include.
Improved dump performance and size
arangodump has extended parallelization capabilities to work not only at the collection level, but also at the shard level. In combination with the newly added support for the VelocyPack format that ArangoDB uses internally, database dumps can now be created and restored more quickly and occupy less disk space. This major performance boost makes dumps and restores up to several times faster, which is extremely useful when dealing with large shards.
- Whether the new parallel dump variant is used is controlled by the newly added - --use-parallel-dumpstartup option. The default value is- true.
- To achieve the best dump performance and the smallest data dumps in terms of size, you can additionally use the - --dump-vpackoption. The resulting dump data is then stored in the more compact but binary VelocyPack format instead of the text-based JSON format. The output file size can be less even compared to compressed JSON. It can also lead to faster dumps because there is less data to transfer and no conversion from the server-internal format (VelocyPack) to JSON is needed. Note, however, that this option is experimental and disabled by default.
- Optionally, you can make arangodump write multiple output files per collection/shard. The file splitting allows for better parallelization when writing the results to disk, which in case of non-split files must be serialized. You can enable it by setting the - --split-filesoption to- true. This option is disabled by default because dumps created with this option enabled cannot be restored into previous versions of ArangoDB.
- You can enable the new - --compress-transferstartup option for compressing the dump data on the server for a faster transfer. This is helpful especially if the network is slow or its capacity is maxed out. The data is decompressed on the client side and recompressed if you enable the- --compress-outputoption.
Server-side resource usage limits and metrics
The following arangod startup options can be used to limit
the resource usage of parallel arangodump invocations:
- --dump.max-memory-usage: Maximum memory usage (in bytes) to be used by the server-side parts of all ongoing arangodump invocations. This option can be used to limit the amount of memory for prefetching and keeping results on the server side when arangodump is invoked with the- --parallel-dumpoption. It does not have an effect for arangodump invocations that did not use the- --parallel-dumpoption. Note that the memory usage limit is not exact and that it can be slightly exceeded in some situations to guarantee progress.
- --dump.max-docs-per-batch: Maximum number of documents per batch that can be used in a dump. If an arangodump invocation requests higher values than configured here, the value is automatically capped to this value. Will only be followed for arangodump invocations that use the--parallel-dumpoption.
- --dump.max-batch-size: Maximum batch size value (in bytes) that can be used in a dump. If an arangodump invocation requests larger batch sizes than configured here, the actual batch sizes is capped to this value. Will only be followed for arangodump invocations that use the -- -parallel-dumpoption.
- --dump.max-parallelism: Maximum parallelism (number of server-side threads) that can be used in a dump. If an arangodump invocation requests a higher number of prefetch threads than configured here, the actual number of server-side prefetch threads is capped to this value. Will only be followed for arangodump invocations that use the- --parallel-dumpoption.
The following metrics have been added to observe the behavior of parallel arangodump operations on the server:
- arangodb_dump_memory_usage: Current memory usage of all ongoing arangodump operations on the server.
- arangodb_dump_ongoing: Number of currently ongoing arangodump operations on the server.
- arangodb_dump_threads_blocked_total: Number of times a server-side dump thread was blocked because it honored the server-side memory limit for dumps.
Automatic retries
Introduced in: v3.12.1
arangodump retries dump requests to the server in more cases: read, write and connection errors. This makes the creation of dumps more likely to succeed despite any temporary errors that can occur.
arangorestore
The following startup option has been added that allows arangorestore to override
the writeConcern value specified in a database dump when creating new
collections:
- --write-concern: Override the- writeConcernvalue for collections. Can be specified multiple times, e.g.- --write-concern 2 --write-concern myCollection=3to set the write concern for the collection called- myCollectionto 3 and for all other collections to 2.
arangoimport
Maximum number of import errors
The following startup option has been added to arangoimport:
- --max-errors: The maximum number of errors after which the import is stopped. The default value is- 20.
You can use this option to limit the amount of errors displayed by arangoimport, and to abort the import after this value has been reached.
Automatic file format detection
The default value for the --type startup option has been changed from json
to auto. arangoimport now automatically detects the type of the import file
based on the file extension.
The following file extensions are automatically detected:
- .json
- .jsonl
- .csv
- .tsv
If the file extension doesn’t correspond to any of the mentioned types, the
import defaults to the json format.
Transparent compression of requests and responses
Startup options to enable transparent compression of the data that is sent between a client tool and the ArangoDB server have been added. See the Server options section above that includes a description of the added client tool options.
Internal changes
Upgraded bundled library versions
For ArangoDB 3.12, the bundled version of rclone is 1.65.2. Check if your rclone configuration files require changes.
The bundled version of the OpenSSL library has been upgraded to 3.2.1.
From version 3.11.10 onward, ArangoDB uses the glibc C standard library implementation with an LGPL-3.0 license instead of libmusl. Notably, it features string functions that are better optimized for common CPUs.
