ArangoDB v3.13 is under development and not released yet. This documentation is not final and potentially incomplete.

HTTP interface for vector indexes

Introduced in: v3.12.4

Create a vector index

POST /_db/{database-name}/_api/index
Creates a vector index for the collection collection-name, if it does not already exist.
Path Parameters
  • The name of the database.

Query Parameters
  • The collection name.

HTTP Headers
    Request Body application/json object
    • A list with exactly one attribute path to specify where the vector embedding is stored in each document. The vector data needs to be populated before creating the index.

      If you want to index another vector embedding attribute, you need to create a separate vector index.

    • Set this option to true to keep the collection/shards available for write operations by not using an exclusive write lock for the duration of the index creation.

    • A user-defined name for the index for easier identification. If not specified, a name is automatically generated.

    • The number of threads to use for indexing. Default: 2

    • The parameters as used by the Faiss library.

      • How many neighboring centroids to consider for the search results by default. The larger the number, the slower the search but the better the search results. The default is 1. You should generally use a higher value here or per query via the nProbe option of the vector similarity functions.

      • The vector dimension. The attribute to index needs to have this many elements in the array that stores the vector embedding.

      • You can specify an index factory string that is forwarded to the underlying Faiss library, allowing you to combine different advanced options. Examples:

        • "IVF100_HNSW10,Flat"
        • "IVF100,SQ4"
        • "IVF10_HNSW5,Flat"
        • "IVF100_HNSW5,PQ256x16" The base index must be an inverted file (IVF) to work with ArangoDB. If you don’t specify an index factory, the value is equivalent to IVF<nLists>,Flat. For more information on how to create these custom indexes, see the Faiss Wiki .

      • Possible values: "cosine", "l2"

        Whether to use cosine or l2 (Euclidean) distance calculation.

      • The number of Voronoi cells to partition the vector space into, respectively the number of centroids in the index. What value to choose depends on the data distribution and chosen metric. According to The Faiss library paper , it should be around 15 * sqrt(N) where N is the number of documents in the collection, respectively the number of documents in the shard for cluster deployments. A bigger value produces more correct results but increases the training time and thus how long it takes to build the index. It cannot be bigger than the number of documents.

      • The number of iterations in the training process. The default is 25. Smaller values lead to a faster index creation but may yield worse search results.

    • The index type. Needs to be "vector".

    Responses
    • The index exists already.

    • The index is created as there is no such existing index.

    • The collection is unknown.