ArangoDB v3.13 is under development and not released yet. This documentation is not final and potentially incomplete.

Combining queries with subqueries in AQL

Subqueries let you form complex requests and allow you to process more data in with a single query

How to use subqueries

Wherever an expression is allowed in AQL, a subquery can be placed. A subquery is a query part that can introduce its own local variables without affecting variables and values in its outer scope(s).

It is required that subqueries be put inside parentheses ( and ) to explicitly mark their start and end points:

FOR p IN persons
  LET recommendations = ( // subquery start
    FOR r IN recommendations
      FILTER p.id == r.personId
      SORT p.rank DESC
      LIMIT 10
      RETURN r
  ) // subquery end
  RETURN { person : p, recommendations : recommendations }

A subquery’s result can be assigned to a variable with LET as shown above, so that it can be referenced multiple times or just to improve the query readability.

Function calls also use parentheses and AQL allows you to omit an extra pair if you want to use a subquery as sole argument for a function, e.g. MAX(<subquery>) instead of MAX((<subquery>)):

FOR p IN persons
  COLLECT city = p.city INTO g
  RETURN {
    city : city,
    numPersons : LENGTH(g),
    maxRating: MAX( // subquery start
      FOR r IN g
      RETURN r.p.rating
    ) // subquery end
  }

The extra wrapping is required if there is more than one function argument, however, e.g. NOT_NULL((RETURN "ok"), "fallback").

Subqueries may also include other subqueries.

Subquery results and unwinding

Subqueries always return a result array, even if there is only a single return value:

RETURN ( RETURN 1 )
[ [ 1 ] ]

To avoid such a nested data structure, FIRST() can be used for example:

RETURN FIRST( RETURN 1 )
[ 1 ]

To unwind the result array of a subquery so that each element is returned as top-level element in the overall query result, you can use a FOR loop:

FOR elem IN (RETURN 1..3) // [1,2,3]
  RETURN elem
[
  1,
  2,
  3
]

Without unwinding, the query would be RETURN (RETURN 1..3) and the result a nested array [ [ 1, 2, 3 ] ] with a single top-level element.

Evaluation of subqueries

Subqueries that are used inside expressions are pulled out of these expressions and conditionally executed beforehand. The effective behavior is short-circuiting evaluation. An example is the ternary operator.

Consider the following query:

RETURN RAND() > 0.5 ? (RETURN 1) : 0

It get transformed into something more like this, with the calculation of the subquery happening before the evaluation of the condition:

LET temp1 = RAND() > 0.5
LET temp2 = (FILTER temp1 RETURN 1)
RETURN temp1 ? temp2 : 0

The condition is evaluated separately and stored in a variable, then the subquery is executed conditionally using a FILTER operation. Finally, the ternary operator reuses the result of the evaluated condition to determine which value to return.

The short-circuiting behavior allows you to write queries like the following:

LET maybe = DOCUMENT("coll/does_not_exist")
LET dependent = maybe ? (
  FOR attr IN ATTRIBUTES(maybe)
    RETURN attr
) : "document not found"
RETURN dependent

The maybe variable can be null, which cannot be iterated over with FOR and calling ATTRIBUTES() with null as argument would raise a query warning. However, the subquery is only executed if DOCUMENT() found a document, so this query works without issues.

Similarly, when you use subqueries as sub-expressions that are combined with logical AND or OR, the subqueries are only executed if the outcome is yet to be determined:

RETURN false AND (RETURN ASSERT(false, "executed"))
RETURN true OR (RETURN ASSERT(false, "executed"))

If the first operand of a logical AND is false, the overall result is false regardless of the second operand. If the first operand of a logical OR is true, the overall result is true regardless of the second operand. As the outcome is already determined, there is no need to execute the subqueries in these cases.

Subqueries that are used inside expressions are pulled out of these expressions and executed beforehand. That means that subqueries do not participate in lazy evaluation of operands, for example in the ternary operator.

Consider the following query:

RETURN RAND() > 0.5 ? (RETURN 1) : 0

It get transformed into something more like this, with the calculation of the subquery happening before the evaluation of the condition:

LET temp1 = (RETURN 1)
LET temp2 = RAND() > 0.5 ? temp1 : 0
RETURN temp2

The subquery is executed regardless of the condition. In other words, there is no short-circuiting that would avoid the subquery from running in the case that the condition evaluates to false. You may need to take this into account to avoid query errors like

Query: AQL: collection or array expected as operand to FOR loop; you provided a value of type ’null’ (while executing)

LET maybe = DOCUMENT("coll/does_not_exist")
LET dependent = maybe ? (
  FOR attr IN ATTRIBUTES(maybe)
    RETURN attr
) : "document not found"
RETURN dependent

The problem is that the subquery is executed under all circumstances, despite the check whether DOCUMENT() found a document or not. It does not take into account that maybe can be null, which cannot be iterated over with FOR. A possible solution is to fall back to an empty array in the subquery to effectively prevent the loop body from being run:

LET maybe = DOCUMENT("coll/does_not_exist")
LET dependent = maybe ? (
  FOR attr IN NOT_NULL(ATTRIBUTES(maybe || {}), [])
    RETURN attr
) : "document not found"
RETURN dependent

The additional fallback maybe || {} prevents a query warning

invalid argument type in call to function ‘ATTRIBUTES()’

that originates from a null value getting passed to the ATTRIBUTES() function that expects an object.

Similarly, when you use subqueries as sub-expressions that are combined with logical AND or OR, the subqueries are always executed:

RETURN false AND (RETURN ASSERT(false, "executed"))
RETURN true OR (RETURN ASSERT(false, "executed"))

If the first operand of a logical AND is false, the overall result is false regardless of the second operand. If the first operand of a logical OR is true, the overall result is true regardless of the second operand. However, the subqueries are run nonetheless, causing both example queries to fail.

You can prevent the subqueries from executing by prepending a FILTER operation with the value of the logical operator’s first operand and negating it in case of an OR:

LET cond = false
RETURN cond AND (FILTER cond RETURN ASSERT(false, "executed"))
LET cond = true
RETURN cond OR (FILTER !cond RETURN ASSERT(false, "executed"))