Common Errors in AQL

Avoid injection vulnerabilities and avoid pitfalls like incorrect operator usage performance issues when using ArangoDB’s query language

Trailing semicolons in query strings

Many SQL databases allow sending multiple queries at once. In this case, multiple queries are separated using the semicolon character. Often it is also supported to execute a single query that has a semicolon at its end.

AQL does not support this, and it is a parse error to use a semicolon at the end of an AQL query string.

String concatenation

In AQL, strings must be concatenated using the CONCAT() function. Joining them together with the + operator is not supported. Especially as JavaScript programmer it is easy to walk into this trap:

RETURN "foo" + "bar" // [ 0 ]
RETURN "foo" + 123   // [ 123 ]
RETURN "123" + 200   // [ 323 ]

The arithmetic plus operator expects numbers as operands, and will try to implicitly cast them to numbers if they are of different type. "foo" and "bar" are casted to 0 and then added to together (still zero). If an actual number is added, that number will be returned (adding zero doesn’t change the result). If the string is a valid string representation of a number, then it is casted to a number. Thus, adding "123" and 200 results in two numbers being added up to 323.

To concatenate elements (with implicit casting to string for non-string values), do:

RETURN CONCAT("foo", "bar") // [ "foobar" ]
RETURN CONCAT("foo", 123)   // [ "foo123" ]
RETURN CONCAT("123", 200)   // [ "123200" ]

Parameter injection vulnerability

Parameter injection means that potentially malicious content is inserted into a query which may change its meaning. It is a security issue that may allow an attacker to execute arbitrary queries on the database data.

It often occurs if applications trustfully insert user-provided inputs into a query string, and do not fully or incorrectly filter them. It also occurs often when applications build queries naively, without using security mechanisms often provided by database software or querying mechanisms.

AQL is not vulnerable to parameter injection in itself, but queries might be constructed on the client-side, on an application server or in a Foxx service. Assembling query strings with simple string concatenation looks trivial, but is potentially unsafe. You should use bind parameters instead whenever possible, use query building functionality if provided by a driver (see arangojs AQL Helpers  for example) or at least sanitize user input with great care.

Parameter injection examples

Below you find a simple query using the JavaScript API that is fed with some dynamic input value, pretending it coming from a web form. This could be the case in a Foxx service. The route happily picks up the input value, and puts it into a query:

// evil!
var what = req.params("searchValue");  // user input value from web form
// ...
var query = "FOR doc IN collection FILTER doc.value == " + what + " RETURN doc";
db._query(query, params).toArray();

The above will probably work fine for numeric input values.

What could an attacker do to this query? Here are a few suggestions to use for the searchValue parameter:

  • for returning all documents in the collection:
    1 || true
  • for removing all documents:
    1 || true REMOVE doc IN collection //
  • for inserting new documents:
    1 || true INSERT { foo: "bar" } IN collection //

It should have become obvious that this is extremely unsafe and should be avoided. A pattern often seen to counteract this is trying to quote and escape potentially unsafe input values before putting them into query strings. This may work in some situations, but it is easy to overlook something or get it subtly wrong:

// We are sanitizing now, but it is still evil!
var value = req.params("searchValue").replace(/'/g, '');
// ...
var query = "FOR doc IN collection FILTER doc.value == '" + value + "' RETURN doc";
db._query(query, params).toArray();

The above example uses single quotes for enclosing the potentially unsafe user input, and also replaces all single quotes in the input value beforehand. Not only may that change the user input (leading to subtle errors such as “why does my search for O'Brien not return any results?”), but it is also still unsafe. If the user input contains a backslash at the end (e.g. foo bar\), that backslash will escape the closing single quote, allowing the user input to break out of the string fence again.

It gets worse if user input is inserted into the query at multiple places. Let us assume we have a query with two dynamic values:

query = "FOR doc IN collection FILTER doc.value == '" + value +
        "' && doc.type == '" + type + "' RETURN doc";

If an attacker inserted \ for parameter value and || true REMOVE doc IN collection // for parameter type, then the effective query would become:

FOR doc IN collection
  FILTER doc.value == '\' && doc.type == ' || true
  REMOVE doc IN collection //' RETURN doc

… which is highly undesirable. The backslash escapes the closing single quote, turning the doc.type condition into a string, which gets compared to doc.value. Further more, an always true or-condition as well as a remove operation are injected, changing the query purpose entirely. The original return operation gets commented out and the query will truncate the collection instead of returning a few documents.

Avoiding parameter injection

Instead of mixing query string fragments with user inputs naively via string concatenation, use either bind parameters or a query builder. Both can help to avoid the problem of injection, because they allow separating the actual query operations (like FOR, INSERT, REMOVE) from (user input) values.

Below, the focus is on bind parameters. This is not to say that query builders shouldn’t be used. They were simply omitted here for the sake of simplicity.

What bind parameters are

Bind parameters in AQL queries are special tokens that act as placeholders for actual values. Here’s an example:

FOR doc IN collection
  FILTER doc.value == @what
  RETURN doc

In the above query, @what is a bind parameter. In order to execute this query, a value for bind parameter @what must be specified. Otherwise query execution will fail with error 1551 (no value specified for declared bind parameter). If a value for @what gets specified, the query can be executed. However, the query string and the bind parameter values (i.e. the contents of the @what bind parameter) will be handled separately. What’s in the bind parameter will always be treated as a value, and it can’t get out of its sandbox and change the semantic meaning of a query.

How bind parameters are used

To execute a query with bind parameters, the query string (containing the bind parameters) and the bind parameter values are specified separately (note that when the bind parameter value is assigned, the prefix @ needs to be omitted):

// query string with bind parameter
var query = "FOR doc IN collection FILTER doc.value == @what RETURN doc";

// actual value for bind parameter
var params = { what: 42 };

// run query, specifying query string and bind parameter separately
db._query(query, params).toArray();

If a malicious user would set @what to a value of 1 || true, this wouldn’t do any harm. AQL would treat the contents of @what as a single string token, and the meaning of the query would remain unchanged. The actually executed query would be:

FOR doc IN collection
  FILTER doc.value == "1 || true"
  RETURN doc

Thanks to bind parameters it is also impossible to turn a selection (i.e. read-only) query into a data deletion query.

Using JavaScript variables as bind parameters

There is also a template string generator function aql that can be used to safely (and conveniently) built AQL queries using JavaScript variables and expressions. It can be invoked as follows:

const aql = require('@arangodb').aql; // not needed in arangosh

var value = "some input value";
var query = aql`FOR doc IN collection
  FILTER doc.value == ${value}
  RETURN doc`;
var result = db._query(query).toArray();

Note that an ES6 template string is used for populating the query variable. The string is assembled using the aql generator function which is bundled with ArangoDB. The template string can contain references to JavaScript variables or expressions via ${...}. In the above example, the query references a variable named value. The aql function generates an object with two separate attributes: the query string, containing references to bind parameters, and the actual bind parameter values.

Bind parameter names are automatically generated by the aql function:

var value = "some input value";
aql`FOR doc IN collection FILTER doc.value == ${value} RETURN doc`;

{
  "query" : "FOR doc IN collection FILTER doc.value == @value0 RETURN doc",
  "bindVars" : {
    "value0" : "some input value"
  }
}

Using bind parameters in dynamic queries

Bind parameters are helpful, so it makes sense to use them for handling the dynamic values. You can even use them for queries that itself are highly dynamic, for example with conditional FILTER and LIMIT parts. Here’s how to do this:

// Note: this example has a slight issue... hang on reading
var query = "FOR doc IN collection";
var params = { };

if (useFilter) {
  query += " FILTER doc.value == @what";
  params.what = req.params("searchValue");
}

if (useLimit) {
  // not quite right, see below
  query += " LIMIT @offset, @count";
  params.offset = req.params("offset");
  params.count = req.params("count");
}

query += " RETURN doc";
db._query(query, params).toArray();

Note that in this example we’re back to string concatenation, but without the problem of the query being vulnerable to arbitrary modifications.

Input value validation and sanitation

Still you should prefer to be paranoid, and try to detect invalid input values as early as possible, at least before executing a query with them. This is because some input parameters may affect the runtime behavior of queries negatively or, when modified, may lead to queries throwing runtime errors instead of returning valid results. This isn’t something an attacker should deserve.

LIMIT is a good example for this: if used with a single argument, the argument should be numeric. When LIMIT is given a string value, executing the query will fail. You may want to detect this early and don’t return an HTTP 500 (as this would signal attackers that they were successful breaking your application).

Another problem with LIMIT is that high LIMIT values are likely more expensive than low ones, and you may want to disallow using LIMIT values exceeding a certain threshold.

Here is what you could do in such cases:

var query = "FOR doc IN collection LIMIT @count RETURN doc";

// some default value for limit
var params = { count: 100 };

if (useLimit) {
  var count = req.params("count");

  // abort if value does not look like an integer
  if (! preg_match(/^d+$/, count)) {
    throw "invalid count value!";
  }

  // actually turn it into an integer
  params.count = parseInt(count, 10); // turn into numeric value
}

if (params.count < 1 || params.count > 1000) {
  // value is outside of accepted thresholds
  throw "invalid count value!";
}

db._query(query, params).toArray();

This is a bit more complex, but that is a price you are likely willing to pay for a bit of extra safety. In reality you may want to use a framework for validation (such as joi  which comes bundled with ArangoDB) instead of writing your own checks all over the place.

Bind parameter types

There are two types of bind parameters in AQL:

  • Bind parameters for values:
    Those are prefixed with a single @ in AQL queries, and are specified without the prefix when they get their value assigned. These bind parameters can contain any valid JSON value.

    Examples: @what, @searchValue

  • Bind parameters for collections:
    These are prefixed with @@ in AQL queries, and are replaced with the name of a collection. When the bind parameter value is assigned, the parameter itself must be specified with a single @ prefix. Only string values are allowed for this type of bind parameters.

    Examples: @@collection, @@edgeColl

The latter type of bind parameter is probably not used as often, and it should not be used together with user input. Otherwise users may freely determine on which collection your AQL queries will operate on (this might be a valid use case, but normally it is extremely undesired).

Unexpected long running queries

Slow queries can have various reasons and be legitimate for queries with a high computational complexity or if they touch a lot of data. Use the Explain feature to inspect execution plans and verify that appropriate indexes are utilized. Also check for mistakes such as references to the wrong variables.

A literal collection name, which is not part of constructs like FOR, UPDATE ... IN etc., stands for an array of all documents of that collection and can cause an entire collection to be materialized before further processing. It should thus be avoided.

Check the execution plan for /* all collection documents */ and verify that it is intended. You should also see a warning if you execute such a query:

collection ‘coll’ used as expression operand

For example, instead of:

RETURN coll[* LIMIT 1]

… with the execution plan …

Execution plan:
 Id   NodeType          Est.   Comment
  1   SingletonNode        1   * ROOT
  2   CalculationNode      1     - LET #2 = coll   /* all collection documents */[* LIMIT  0, 1]   /* v8 expression */
  3   ReturnNode           1     - RETURN #2

… you can use the following equivalent query:

FOR doc IN coll
    LIMIT 1
    RETURN doc

… with the (better) execution plan:

Execution plan:
 Id   NodeType                  Est.   Comment
  1   SingletonNode                1   * ROOT
  2   EnumerateCollectionNode     44     - FOR doc IN Characters   /* full collection scan */
  3   LimitNode                    1       - LIMIT 0, 1
  4   ReturnNode                   1       - RETURN doc

Similarly, make sure you have not confused any variable names with collection names by accident:

LET names = ["John", "Mary", ...]
// supposed to refer to variable "names", not collection "Names"
FOR name IN Names
    ...

You can set the startup option --query.allow-collections-in-expressions to false to disallow collection names in arbitrary places in AQL expressions to prevent such mistakes. Also see ArangoDB Server Query Options