Chapter 10: Database and Distributed Computing

Key Features and Design Principles

For high performance distributed computing, DolphinDB implements a clustered-based design, featuring a distributed file system, distributed databases, distributed SQL queries, parallel computing, remote function call, and a generic distributed computing framework.

  • Distributed file system (DFS)

DolphinDB clusters work on two modes: standalone mode and DFS mode. In standalone mode, the system directly uses the file system of the operation system for data storage and uses static routing to locate partitions. In DFS mode, DolphinDB creates distributed databases and conducts distributed computing on top of the distributed file system. Compared to the file system in standalone mode, DFS provides support for transactions, fault tolerance, and load balance.

  • Data partitions and distributed database

DolphinDB provides various partitioning schemes, including sequential, value, range and list partitions. It also supports a composite partition with up to 3 partitioning columns.

  • Distributed SQL queries

The syntax of distributed SQL queries in DolphinDB is the same as the syntax of regular SQL queries. When a SQL query is executed in a distributed environment, the system determines on-the-fly if it needs to look for data on remote nodes.

  • Parallel computing

DolphinDB supports parallel computing through multi-threading. Different from process-based computing, multi-threading supports memory sharing, which greatly improves the computing performance.

  • Remote function call

It is very convenient to send scripts to remote nodes for execution.

  • Distributed computing framework

The distributed computing framework in DolphinDB supports MapReduce and iterative calculation. DolphinDB provides built-in functions to support these calculations, and users can easily implement their own MapReduce or iterative functions with the simple and expressive programming interfaces of DolphinDB.