Software-as-a-Service (SaaS) and elastic computing capacities had arrived in the cloud long before databases were just getting started. Reasons for this included infrastructure requirements of database technologies, compliance concerns, and data gravity. In recent years, however, Database-as-a-Service (DBaaS) has caught up. Now included: graph databases.
The use of graph technology – connected to the cloud – is considered one of the most important trends for 2021. According to Gartner, 30 percent of companies worldwide will use graph databases in the next two to three years to quickly access the right data context for decision-making.
The Perfect Mix: Graphs, Data Science & Cloud
Graph analytics provides data scientists insight into the relationships between different entities, such as organizations, people, and transactions. The aim is to recognize and check patterns in data that remain undetected with conventional analyses (e.g., relational databases).
The graphs are boosted by combining them with machine learning functionalities and graph algorithms, with which data sources and documents can be searched indefinitely and in real-time. The areas of application are wide. In medicine, graph data science is used, e.g., B., to research new treatment options for diabetes. Manufacturers, in turn, use graph technology for supply chain management ( SCM ) and product data management (PDM) to identify causes of errors during quality controls. Banks and insurance companies use graphs to fight money laundering and tax evasion fraud.
The transition to the cloud is now the next logical step for graph databases to create more freedom and agility for developers and graph applications. The result is Graph Database-as-a-Service (DBaaS). The move to the cloud is urgently needed to react flexibly and drive innovation even in uncertain times.
Ease Of Use And Flexibility Of DBaaS
There are two reasons for the trend towards Database-as-a-Service (DBaaS): ease of use and flexibility. For one, developers can focus on programming applications without managing the infrastructure. On the other hand, the transition to cloud service shortens the time to value and enables applications to be delivered much faster than on-premises. Another advantage is the significantly faster development of your apps and reduced costs. For example, the back end automatically grows with the requirements when writing a program. At the same time, only what is used is billed in the cloud.
The trend towards GDBaaS is becoming more and more established: The graph database provider Neo4j reported that around 90 percent of customers were running their graph-based applications in the cloud in 2020. For many companies, it is also the first time that they have ever used graph technology. In January, Neo4j introduced the enterprise version of Aura, which gives users a dedicated Virtual Private Cloud (VPC). Their data and infrastructure are isolated from another enterprise/graph users. The Neo4j GDBaaS is available on the Google Cloud Platform and is an early access program on Amazon Web Services (AWS). Aura’s self-monitoring and self-healing architecture is based on two fundamental technologies: Kubernetes and causal clustering.
Kubernetes Container Orchestration
As a standard container orchestration system, Kubernetes offers a reliable and efficient way to manage the server, network, and storage infrastructure. The processes are distributed to the existing servers so that the workloads and services always have the necessary resources to run them. Faulty processes are automatically restarted, which increases availability and reduces downtime.
The container-centric management environment is the basic requirement for efficient IaaS, PaaS, and DBaaS. However, the consistent distribution of systems such as graph databases presents particularly complex needs. It must be ensured how tens of thousands of databases can be orchestrated simultaneously, automated, and reliably. Algorithms alone are not enough here. Neo4j Aura used the modular building blocks of Kubernetes to develop a custom Kubernetes operator. Thus, the GDBaaS can ensure its process for updates (rolling updates) and utilize the functionality provided by Kubernetes.
High Availability Through Causal Clustering
To ensure the highly available architecture of the graph database, the raft consensus algorithm was implemented. Raft is a consensus algorithm developed as an alternative to the Paxos algorithm family. It provides the basis for distributed transactions. In the graph database, Raft allows a cluster of servers to work together. Developers can thus direct or redirect transactions either locally or to other members of the collection and flexibly control the load distribution in the cluster. This guarantees high scalability and high availability. Updates, security patches, and on-demand scaling of the database can be performed without downtime.
With Neo4j Aura Enterprise, each database is operated in a causal cluster of three servers distributed across different data centers. If one of these servers or even the data center fails, the data remains protected, and the application continues to run.
infrastructure And Security
However, a Graph-Database-as-a-Service does not only have to be able to rely on its cloud instances. The infrastructure in the cloud also includes network components and storage devices. Wherever possible, GDBaaS should therefore fall back on the existing, scalable services of the public cloud providers. In this way, the databases have all the properties of the underlying system in terms of security and resilience. End-to-end encryption and isolation within dedicated virtual networks provided in the cloud provider’s systems ensure additional data security.
The cloud means more flexibility, security, reliability, and lower costs for most companies. Databases are no exception. Once the requirements for storing, querying, and managing data in the cloud are met, nothing stands in the way of the “perfect mix” of graphs, data science, and cloud.