ArangoDB v3.10 reached End of Life (EOL) and is no longer supported.

This documentation is outdated. Please see the most recent stable version.

Create Test Data with AQL

How to fill a collection with dummy documents

We assume that there is already a collection to the hold documents called myCollection in below example queries.

One of the easiest ways to fill a collection with test data is to use an AQL query that iterates over a range.

Run the following AQL query e.g. from the AQL Editor in the web interface to insert 1,000 documents into the collection:

FOR i IN 1..1000
  INSERT { name: CONCAT("test", i) } IN myCollection

The number of documents to create can be modified easily be adjusting the range boundary values.

If you want to inspect the result immediately, add RETURN NEW at the end of the query.

To create more complex test data, adjust the AQL query. Let us say we also want a status attribute, and fill it with integer values between 1 to 5 (inclusive), with equal distribution. A good way to achieve this is to use the modulo operator (%):

FOR i IN 1..1000
  INSERT {
    name: CONCAT("test", i),
    status: 1 + (i % 5)
  } IN myCollection

To create pseudo-random values, use the RAND() function. It creates pseudo-random numbers between 0 and 1. Use some factor to scale the random numbers, and FLOOR() to convert the scaled number back to an integer.

For example, the following query populates the value attribute with numbers between 100 and 150 (inclusive):

FOR i IN 1..1000
  INSERT {
    name: CONCAT("test", i),
    value: 100 + FLOOR(RAND() * (150 - 100 + 1))
  } IN myCollection

After the test data has been created, it is often helpful to verify it. The RAND() function is also a good candidate for retrieving a random sample of the documents in the collection. This query will retrieve 10 random documents:

FOR doc IN myCollection
  SORT RAND()
  LIMIT 10
  RETURN doc

The COLLECT clause is an easy mechanism to run an aggregate analysis on some attribute. Let us say we wanted to verify the data distribution inside the status attribute. In this case we could run:

FOR doc IN myCollection
  COLLECT value = doc.value WITH COUNT INTO count
  RETURN {
    value: value,
    count: count
  }

The above query will provide the number of documents per distinct value.

We can make the JSON result a bit more compact by using the value as attribute key, the count as attribute value and merge everything into a single result object. Note that attribute keys can only be strings, but for our purposes here it is acceptable.

RETURN MERGE(
  FOR doc IN myCollection
    COLLECT value = doc.value WITH COUNT INTO count
    RETURN {
      [value]: count
    }
)