Pregel HTTP API

The HTTP API for Pregel lets you execute, cancel, and list Pregel jobs

See Distributed Iterative Graph Processing (Pregel) for details.

Start a Pregel job execution

POST /_api/control_pregel
To start an execution you need to specify the algorithm name and a named graph (SmartGraph in cluster). Alternatively you can specify the vertex and edge collections. Additionally you can specify custom parameters which vary for each algorithm.
Request Body application/json object
  • Name of the algorithm. One of:

    • "pagerank" - Page Rank
    • "sssp" - Single-Source Shortest Path
    • "connectedcomponents" - Connected Components
    • "wcc" - Weakly Connected Components
    • "scc" - Strongly Connected Components
    • "hits" - Hyperlink-Induced Topic Search
    • "effectivecloseness" - Effective Closeness
    • "linerank" - LineRank
    • "labelpropagation" - Label Propagation
    • "slpa" - Speaker-Listener Label Propagation

  • List of edge collection names. Please note that there are special sharding requirements for collections in order to be used with Pregel.

  • Name of a graph. Either this or the parameters vertexCollections and edgeCollections are required. Please note that there are special sharding requirements for graphs in order to be used with Pregel.

  • General as well as algorithm-specific options.

    The most important general option is “store”, which controls whether the results computed by the Pregel job are written back into the source collections or not.

    Another important general option is “parallelism”, which controls the number of parallel threads that work on the Pregel job at most. If “parallelism” is not specified, a default value may be used. In addition, the value of “parallelism” may be effectively capped at some server-specific value.

    The option “useMemoryMaps” controls whether to use disk based files to store temporary results. This might make the computation disk-bound, but allows you to run computations which would not fit into main memory. It is recommended to set this flag for larger datasets.

    The attribute “shardKeyAttribute” specifies the shard key that edge collections are sharded after (default: "vertex").

  • List of vertex collection names. Please note that there are special sharding requirements for collections in order to be used with Pregel.

Responses
  • HTTP 200 is returned in case the Pregel was successfully created and the reply body is a string with the id to query for the status or to cancel the execution.

  • An HTTP 400 error is returned if the set of collections for the Pregel job includes a system collection, or if the collections to not conform to the sharding requirements for Pregel jobs.

  • An HTTP 403 error is returned if there are not sufficient privileges to access the collections specified for the Pregel job.

  • An HTTP 404 error is returned if the specified “algorithm” is not found, or the graph specified in “graphName” is not found, or at least one the collections specified in “vertexCollections” or “edgeCollections” is not found.

Examples

Run the Weakly Connected Components (WCC) algorithm against a graph and store the results in the vertices as attribute component:

curl -X POST --header 'accept: application/json' --data-binary @- --dump - 'http://localhost:8529/_api/control_pregel' <<'EOF'
{
  "algorithm": "wcc",
  "graphName": "connectedComponentsGraph",
  "params": {
    "maxGSS": 36,
    "resultField": "component"
  }
}
EOF
Show output

Get a Pregel job execution status

GET /_api/control_pregel/{id}
Returns the current state of the execution, the current global superstep, the runtime, the global aggregator values as well as the number of sent and received messages.
Path Parameters
  • Pregel execution identifier.

Query Parameters
    HTTP Headers
      Responses
      • HTTP 200 is returned in case the job execution ID was valid and the state is returned along with the response.

          Response Body application/json object
          The information about the Pregel job.
        • The algorithm used by the job.

        • The algorithm execution time. Is shown when the computation started.

        • The date and time when the job was created.

        • The Pregel run details.

          • The aggregated details of the full Pregel run. The values are totals of all the DB-Server.

            • Information about the global supersteps.

              • A list of objects with details for each global superstep.

                • The number of bytes used in memory for the messages in this step.

                • The number of messages received in this step.

                • The number of messages sent in this step.

                • The number of vertices that have been processed in this step.

            • The status of the in memory graph.

              • The number of edges that are loaded from the database into memory.

              • The number of bytes used in-memory for the loaded graph.

              • The number of vertices that are loaded from the database into memory.

              • The number of vertices that are written back to the database after the Pregel computation finished. It is only set if the store parameter is set to true.

            • The time at which the status was measured.

          • The details of the Pregel for every DB-Server. Each object key is a DB-Server ID,

            and each value is a nested object similar to the aggregatedStatus attribute.

            In a single server deployment, there is only a single entry with an empty string as key.

        • The total number of edges processed.

        • The date and time when the job results expire. The expiration date is only meaningful for jobs that were completed, canceled or resulted in an error. Such jobs are cleaned up by the garbage collection when they reach their expiration date/time.

        • The number of global supersteps executed.

        • Computation time of each global super step. Is shown when the computation started.

        • The ID of the Pregel job, as a string.

        • This attribute is used by Programmable Pregel Algorithms (ppa, experimental). The value is only populated once the algorithm has finished.

        • The startup runtime of the execution. The startup time includes the data loading time and can be substantial.

        • The state of the execution. The following values can be returned:

          • "none": The Pregel run has not started yet.
          • "loading": The graph is being loaded from the database into memory before executing the algorithm.
          • "running": The algorithm is executing normally.
          • "storing": The algorithm finished, but the results are still being written back into the collections. Only occurs if the store parameter is set to true.
          • "done": The execution is done. This means that storing is also done. This event is announced in the server log (requires at least the info log level for the pregel log topic).
          • "canceled": The execution was permanently canceled, either by the user or by an error.
          • "in error": The execution is in an error state. This can be caused by primary DB-Servers being unreachable or unresponsive. The execution might recover later, or switch to "canceled" if it is not able to recover successfully.
          • "recovering": The execution is actively recovering and switches back to running if the recovery is successful.
          • "fatal error": The execution has failed and cannot recover.

        • The time for storing the results if the job includes results storage. Is shown when the storing started.

        • The total runtime of the execution up to now (if the execution is still ongoing).

        • The TTL (time to live) value for the job results, specified in seconds. The TTL is used to calculate the expiration date for the job’s results.

        • The total number of vertices processed.

      • An HTTP 404 error is returned if no Pregel job with the specified execution number is found or the execution number is invalid.

      Examples

      Get the execution status of a Pregel job:

      curl --header 'accept: application/json' --dump - 'http://localhost:8529/_api/control_pregel/68814'
      Show output

      List the running Pregel jobs

      GET /_api/control_pregel
      Returns a list of currently running and recently finished Pregel jobs without retrieving their results.
      Responses
      • Is returned when the list of jobs can be retrieved successfully.

          Response Body application/json
        • A list of objects describing the Pregel jobs.

          • The algorithm used by the job.

          • The algorithm execution time. Is shown when the computation started.

          • The date and time when the job was created.

          • The Pregel run details.

            • The aggregated details of the full Pregel run. The values are totals of all the DB-Server.

              • Information about the global supersteps.

                • A list of objects with details for each global superstep.

                  • The number of bytes used in memory for the messages in this step.

                  • The number of messages received in this step.

                  • The number of messages sent in this step.

                  • The number of vertices that have been processed in this step.

              • The status of the in memory graph.

                • The number of edges that are loaded from the database into memory.

                • The number of bytes used in-memory for the loaded graph.

                • The number of vertices that are loaded from the database into memory.

                • The number of vertices that are written back to the database after the Pregel computation finished. It is only set if the store parameter is set to true.

              • The time at which the status was measured.

            • The details of the Pregel for every DB-Server. Each object key is a DB-Server ID,

              and each value is a nested object similar to the aggregatedStatus attribute.

              In a single server deployment, there is only a single entry with an empty string as key.

          • The total number of edges processed.

          • The date and time when the job results expire. The expiration date is only meaningful for jobs that were completed, canceled or resulted in an error. Such jobs are cleaned up by the garbage collection when they reach their expiration date/time.

          • The number of global supersteps executed.

          • Computation time of each global super step. Is shown when the computation started.

          • The ID of the Pregel job, as a string.

          • This attribute is used by Programmable Pregel Algorithms (ppa, experimental). The value is only populated once the algorithm has finished.

          • The startup runtime of the execution. The startup time includes the data loading time and can be substantial.

          • The state of the execution. The following values can be returned:

            • "none": The Pregel run has not started yet.
            • "loading": The graph is being loaded from the database into memory before executing the algorithm.
            • "running": The algorithm is executing normally.
            • "storing": The algorithm finished, but the results are still being written back into the collections. Only occurs if the store parameter is set to true.
            • "done": The execution is done. This means that storing is also done. This event is announced in the server log (requires at least the info log level for the pregel log topic).
            • "canceled": The execution was permanently canceled, either by the user or by an error.
            • "in error": The execution is in an error state. This can be caused by primary DB-Servers being unreachable or unresponsive. The execution might recover later, or switch to "canceled" if it is not able to recover successfully.
            • "recovering": The execution is actively recovering and switches back to running if the recovery is successful.
            • "fatal error": The execution has failed and cannot recover.

          • The time for storing the results if the job includes results storage. Is shown when the storing started.

          • The total runtime of the execution up to now (if the execution is still ongoing).

          • The TTL (time to live) value for the job results, specified in seconds. The TTL is used to calculate the expiration date for the job’s results.

          • The total number of vertices processed.

      Examples

      Get the status of all active Pregel jobs:

      curl --header 'accept: application/json' --dump - 'http://localhost:8529/_api/control_pregel/'
      Show output

      Cancel a Pregel job execution

      DELETE /_api/control_pregel/{id}

      Cancel an execution which is still running, and discard any intermediate results. This immediately frees all memory taken up by the execution, and makes you lose all intermediary data.

      You might get inconsistent results if you requested to store the results and then cancel an execution when it is already in its "storing" state (or "done" state in versions prior to 3.7.1). The data is written multi-threaded into all collection shards at once. This means there are multiple transactions simultaneously. A transaction might already be committed when you cancel the execution job. Therefore, you might see some updated documents, while other documents have no or stale results from a previous execution.

      Path Parameters
      • Pregel execution identifier.

      Query Parameters
        HTTP Headers
          Responses
          • HTTP 200 is returned if the job execution ID was valid.

          • An HTTP 404 error is returned if no Pregel job with the specified execution number is found or the execution number is invalid.

          Examples

          Cancel a Pregel job to stop the execution or to free up the results if it was started with "store": false and is in the done state:

          curl -X DELETE --header 'accept: application/json' --dump - 'http://localhost:8529/_api/control_pregel/69149'
          Show output

          Get the execution statistics of a Pregel job

          GET /_api/control_pregel/history/{id}

          Returns the current state of the execution, the current global superstep, the runtime, the global aggregator values, as well as the number of sent and received messages.

          The execution statistics are persisted to a system collection and kept until you remove them, whereas the /_api/control_pregel/{id} endpoint only keeps the information temporarily in memory.

          Path Parameters
          • Pregel job identifier.

          Query Parameters
            HTTP Headers
              Responses
              • is returned if the Pregel job ID is valid and the execution statistics are returned along with the response.

                  Response Body application/json object
                  The information about the Pregel job.
                • The algorithm used by the job.

                • The algorithm execution time. Is shown when the computation started.

                • The date and time when the job was created.

                • The Pregel run details.

                  • The aggregated details of the full Pregel run. The values are totals of all the DB-Server.

                    • Information about the global supersteps.

                      • A list of objects with details for each global superstep.

                        • The number of bytes used in memory for the messages in this step.

                        • The number of messages received in this step.

                        • The number of messages sent in this step.

                        • The number of vertices that have been processed in this step.

                    • The status of the in memory graph.

                      • The number of edges that are loaded from the database into memory.

                      • The number of bytes used in-memory for the loaded graph.

                      • The number of vertices that are loaded from the database into memory.

                      • The number of vertices that are written back to the database after the Pregel computation finished. It is only set if the store parameter is set to true.

                    • The time at which the status was measured.

                  • The details of the Pregel for every DB-Server. Each object key is a DB-Server ID,

                    and each value is a nested object similar to the aggregatedStatus attribute.

                    In a single server deployment, there is only a single entry with an empty string as key.

                • The total number of edges processed.

                • The date and time when the job results expire. The expiration date is only meaningful for jobs that were completed, canceled or resulted in an error. Such jobs are cleaned up by the garbage collection when they reach their expiration date/time.

                • The number of global supersteps executed.

                • Computation time of each global super step. Is shown when the computation started.

                • The ID of the Pregel job, as a string.

                • This attribute is used by Programmable Pregel Algorithms (ppa, experimental). The value is only populated once the algorithm has finished.

                • The startup runtime of the execution. The startup time includes the data loading time and can be substantial.

                • The state of the execution. The following values can be returned:

                  • "none": The Pregel run has not started yet.
                  • "loading": The graph is being loaded from the database into memory before executing the algorithm.
                  • "running": The algorithm is executing normally.
                  • "storing": The algorithm finished, but the results are still being written back into the collections. Only occurs if the store parameter is set to true.
                  • "done": The execution is done. This means that storing is also done. This event is announced in the server log (requires at least the info log level for the pregel log topic).
                  • "canceled": The execution was permanently canceled, either by the user or by an error.
                  • "in error": The execution is in an error state. This can be caused by primary DB-Servers being unreachable or unresponsive. The execution might recover later, or switch to "canceled" if it is not able to recover successfully.
                  • "recovering": The execution is actively recovering and switches back to running if the recovery is successful.
                  • "fatal error": The execution has failed and cannot recover.

                • The time for storing the results if the job includes results storage. Is shown when the storing started.

                • The total runtime of the execution up to now (if the execution is still ongoing).

                • The TTL (time to live) value for the job results, specified in seconds. The TTL is used to calculate the expiration date for the job’s results.

                • The total number of vertices processed.

              • is returned if no Pregel job with the specified ID is found or if the ID is invalid.

              Examples

              Get the execution status of a Pregel job:

              curl --header 'accept: application/json' --dump - 'http://localhost:8529/_api/control_pregel/history/69299'
              Show output

              Get the execution statistics of all Pregel jobs

              GET /_api/control_pregel/history

              Returns a list of currently running and finished Pregel jobs without retrieving their results.

              The execution statistics are persisted to a system collection and kept until you remove them, whereas the /_api/control_pregel endpoint only keeps the information temporarily in memory.

              Responses
              • is returned if the list of jobs can be retrieved successfully.

                  Response Body application/json
                • A list of objects describing the Pregel jobs.

                  • The algorithm used by the job.

                  • The algorithm execution time. Is shown when the computation started.

                  • The date and time when the job was created.

                  • The Pregel run details.

                    • The aggregated details of the full Pregel run. The values are totals of all the DB-Server.

                      • Information about the global supersteps.

                        • A list of objects with details for each global superstep.

                          • The number of bytes used in memory for the messages in this step.

                          • The number of messages received in this step.

                          • The number of messages sent in this step.

                          • The number of vertices that have been processed in this step.

                      • The status of the in memory graph.

                        • The number of edges that are loaded from the database into memory.

                        • The number of bytes used in-memory for the loaded graph.

                        • The number of vertices that are loaded from the database into memory.

                        • The number of vertices that are written back to the database after the Pregel computation finished. It is only set if the store parameter is set to true.

                      • The time at which the status was measured.

                    • The details of the Pregel for every DB-Server. Each object key is a DB-Server ID,

                      and each value is a nested object similar to the aggregatedStatus attribute.

                      In a single server deployment, there is only a single entry with an empty string as key.

                  • The total number of edges processed.

                  • The date and time when the job results expire. The expiration date is only meaningful for jobs that were completed, canceled or resulted in an error. Such jobs are cleaned up by the garbage collection when they reach their expiration date/time.

                  • The number of global supersteps executed.

                  • Computation time of each global super step. Is shown when the computation started.

                  • The ID of the Pregel job, as a string.

                  • This attribute is used by Programmable Pregel Algorithms (ppa, experimental). The value is only populated once the algorithm has finished.

                  • The startup runtime of the execution. The startup time includes the data loading time and can be substantial.

                  • The state of the execution. The following values can be returned:

                    • "none": The Pregel run has not started yet.
                    • "loading": The graph is being loaded from the database into memory before executing the algorithm.
                    • "running": The algorithm is executing normally.
                    • "storing": The algorithm finished, but the results are still being written back into the collections. Only occurs if the store parameter is set to true.
                    • "done": The execution is done. This means that storing is also done. This event is announced in the server log (requires at least the info log level for the pregel log topic).
                    • "canceled": The execution was permanently canceled, either by the user or by an error.
                    • "in error": The execution is in an error state. This can be caused by primary DB-Servers being unreachable or unresponsive. The execution might recover later, or switch to "canceled" if it is not able to recover successfully.
                    • "recovering": The execution is actively recovering and switches back to running if the recovery is successful.
                    • "fatal error": The execution has failed and cannot recover.

                  • The time for storing the results if the job includes results storage. Is shown when the storing started.

                  • The total runtime of the execution up to now (if the execution is still ongoing).

                  • The TTL (time to live) value for the job results, specified in seconds. The TTL is used to calculate the expiration date for the job’s results.

                  • The total number of vertices processed.

              Examples

              Get the status of all active and past Pregel jobs:

              curl --header 'accept: application/json' --dump - 'http://localhost:8529/_api/control_pregel/history'
              Show output

              Remove the execution statistics of a past Pregel job

              DELETE /_api/control_pregel/history/{id}
              Removes the persisted execution statistics of a finished Pregel job.
              Path Parameters
              • The Pregel job identifier.

              Query Parameters
                HTTP Headers
                  Responses
                  • is returned if the Pregel job ID is valid.

                  • is returned if no Pregel job with the specified ID is found or if the ID is invalid.

                  Examples

                  Remove the persisted execution statistics of a finished Pregel job:

                  curl -X DELETE --header 'accept: application/json' --dump - 'http://localhost:8529/_api/control_pregel/history/69633'
                  Show output

                  Remove the execution statistics of all past Pregel jobs

                  DELETE /_api/control_pregel/history
                  Removes the persisted execution statistics of all past Pregel jobs.
                  Responses
                  • is returned if all persisted execution statistics have been successfully deleted.

                  Examples

                  Remove the persisted execution statistics of all past Pregel jobs:

                  curl -X DELETE --header 'accept: application/json' --dump - 'http://localhost:8529/_api/control_pregel/history'
                  Show output