Data Science

ArangoDB lets you apply analytics and machine learning to graph data at scale

ArangoDB’s Graph Analytics and GraphML capabilities provide various solutions in data science and data analytics. Multiple data science personas within the engineering space can make use of ArangoDB’s set of tools and technologies that enable analytics and machine learning on graph data.

ArangoDB, as the foundation for GraphML, comes with the following key features:

  • Scalable: designed to support true scalability with high performance for enterprise use cases.
  • Simple Ingestion: easy integration in existing data infrastructure with connectors to all leading data processing and data ecosystems.
  • Source-Available: extensibility and community.
  • NLP Support: built-in text processing, search, and similarity ranking.

ArangoDB Machine Learning Architecture

Graph Analytics vs. GraphML

This section classifies the complexity of the queries we can answer - like running a simple query that shows what is the path that goes from one node to another, or more complex tasks like node classification, link prediction, and graph classification.

Graph Query

When running a query with AQL on a graph, the query goes from a vertex to an edge, and then the edge indicates what the next connected vertex will be.

Graph queries can answer questions like Who can introduce me to person X?

Graph Query

Graph Analytics

Graph analytics or graph algorithms is what you run on a graph if you want to know aggregate information about your graph, while analyzing the entire graph.

Graph analytics can answer questions like Who are the most connected persons?

Graph Analytics

GraphML

When applying machine learning on a graph, you can predict connections, get better product recommendations, and also classify vertices, edges, and graphs.

GraphML can answer questions like:

  • Is there a connection between person X and person Y?
  • Will a customer churn?
  • Is this particular transaction Anomalous?

Graph ML

Use Cases

This section contains an overview of different use cases where Graph Analytics and GraphML can be applied.

GraphML

GraphML capabilities of using more data outperform conventional deep learning methods and solve high-computational complexity graph problems, such as:

  • Drug discovery, repurposing, and predicting adverse effects.
  • Personalized product/service recommendation.
  • Supply chain and logistics.

With GraphML, you can also predict relationships and structures, such as:

  • Predict molecules for treating diseases (precision medicine).
  • Predict fraudulent behavior, credit risk, purchase of product or services.
  • Predict relationships among customers, accounts.

ArangoDB uses well-known GraphML frameworks like Deep Graph Library  and PyTorch Geometric  and connects to these external machine learning libraries. When coupled to ArangoDB, you are essentially integrating them with your graph dataset.

Example: ArangoFlix

ArangoFlix is a complete movie recommendation application that predicts missing links between a user and the movies they have not watched yet.

This interactive tutorial  demonstrates how to integrate ArangoDB with PyTorch Geometric to build recommendation systems using Graph Neural Networks (GNNs).

The full ArangoFlix demo website is accessible from the ArangoGraph Insights Platform, the managed cloud for ArangoDB. You can open the demo website that connects to your running database from the Examples tab of your deployment.

You can try out the ArangoGraph Insights Platform free of charge for 14 days. Sign up at dashboard.arangodb.cloud .

The ArangoFlix demo uses five different recommendation methods:

  • Content-Based using AQL
  • Collaborative Filtering using AQL
  • Content-Based using ML
  • Matrix Factorization
  • Graph Neural Networks

ArangoFlix demo

The ArangoFlix website not only offers an example of how the user recommendations might look like in real life, but it also provides information on a recommendation method, an AQL query, a custom graph visualization for each movie, and more.

Sample datasets

If you want to try out ArangoDB’s data science features, you may use the arango_datasets Python package to load sample datasets into a deployment.