Alibaba released few days ago have made the decision to release the source code of your distributed database management system "PolarDB" which is based on PostgreSQL, the code is open under the Apache 2.0 license.
For those unfamiliar with PolarDB, you should know that this is a relational database based on the cloud developed by Alibaba that extends PostgreSQL capabilities for distributed data storage with integrity and support for ACID transactions in the context of the entire global database, distributed across different cluster nodes.
PolarDB too supports distributed SQL query processing, providing fault tolerance and redundant data storage to replenish information after one or more nodes fail. If you need to expand your storage, just add new nodes to the cluster.
PolarDB consists of two parts: extensions and a set of patches for PostgreSQL. The patches extend the capabilities of the PostgreSQL core and the extensions include separately implemented components of PostgreSQL, such as a distributed transaction management mechanism, global services, a distributed SQL query processor, additional metadata, tools to manage a cluster, implement a cluster, and simplify the migration of existing systems to it.
The patches add a distributed version of the multiversion concurrency control mechanism (MVCC) to the PostgreSQL core for different isolation levels. Most of the PolarDB functionality has been moved to extensions, which reduces dependency on PostgreSQL and simplifies the upgrade and deployment of PolarDB-based solutions (simplifies the transition to new versions of PostgreSQL and maintains full PostgreSQL compatibility).
There are three basic components in a cluster: database nodes (DN), cluster manager (CM) and transaction management service (TM), additionally, a proxy load balancer may be involved. Each of the components is a separate process and can run on different physical servers. The database nodes serve client SQL queries and at the same time act as coordinators of the execution of distributed queries with the participation of other database nodes.
The cluster administrator monitors the status of each database node, stores cluster configuration and provides tools for managing, backing up, load balancing, updating, starting, and stopping nodes. The transaction management service is responsible for maintaining overall integrity throughout the cluster.
PolarDB is based on the Shared-nothing distributed computing architecture according to which data is distributed during storage to different nodes, without using a common storage for all nodes and each node is responsible for the piece of data linked to it and executes related query data.
Each table is fragmented using primary key hashes. If the request covers data located on different nodes, the distributed transaction execution engine and the transaction coordinator are connected to ensure atomicity, consistency, isolation, and reliability (ACID).
To ensure fault tolerance, each segment is replicated across at least three nodes. To save resources, the full data includes only two replicas and one is limited to storing the write-behind log (WAL). One of the two full replica nodes is chosen as the leader and participates in request processing, while the second node acts as a spare for the data segment under consideration, and the third participates in the selection of the primary node and can be used to restore information in case of failure of two nodes with full replicas.
Data replication between cluster nodes is organized using the Paxos algorithm, which ensures consistent consensus determination in a network with potentially untrusted nodes. It should be noted that the full functionality of the PolarDB DBMS is planned to be released in three versions.
Finally, if you are interested in knowing more about it, you can consult the details in the following link.