在并行数据库中,一个计划会被划分成多个子计划,这个子计划在不同系统里,称呼不同,本文在调研的基础上做一个小结。

DB

Sub Plan Name

OceanBase

DFO

Oracle

DFO

SQL Server

Branch

Greenplum

Slice

Flink

Operator Chain

TiDB

Task

Presto

Stage / Fragment

ClickHouse

N/A

OceanBase

DFO

分布式计划以数据重分布点为边界,切分为可以并行执行的逻辑子计划,每个子计划由一个 DFO 进行封装。

source

下图中展示了包含 6个 DFO 的计划树。

分布式数据库中的子计划命名_ide

Oracle

DFO

A parallel execution plan is carried out as a series of producer/consumer operations. Parallel execution (PX) servers that produce data for subsequent operations are called producers, PX servers that require the output of other operations are called consumers. Each producer or consumer parallel operation is performed by a set of PX servers called PX server sets. The number of PX servers in PX server set is called Degree of Parallelism (DOP). The basic unit of work for a PX server set is called a data flow operation (DFO).

source

SQL Server

branch

If you think of an execution plan as a tree, a branch is an area of the plan that groups one or more operators between Parallelism operators, also called Exchange Iterators.

source

Greenplum

slice

To achieve maximum parallelism during query execution, Greenplum divides the work of the query plan into slices. A slice is a portion of the plan that segments can work on independently. A query plan is sliced wherever a motion operation occurs in the plan, with one slice on each side of the motion.

source

Flink

operator chain

An Operator Chain consists of two or more consecutive Operators without any repartitioning in between. Operators within the same Operator Chain forward records to each other directly without going through serialization or Flink’s network stack.

source

task (instance of operator chain?)

Multiple operations/operators can be chained together using a feature called chaining. A group of one or multiple (chained) operators that Flink considers as a unit of scheduling is called a task.

source

TiDB

TBD

task is a new version of PhysicalPlanInfo. It stores cost information for a task. A task may be CopTask, RootTask, MPPTaskMeta or a ParallelTask.

source

Presto

stage

When Presto executes a query, it does so by breaking up the execution into a hierarchy of stages. For example, if Presto needs to aggregate data from one billion rows stored in Hive, it does so by creating a root stage to aggregate the output of several other stages all of which are designed to implement different sections of a distributed query plan.

The hierarchy of stages that comprises a query resembles a tree. Every query has a root stage which is responsible for aggregating the output from other stages. Stages are what the coordinator uses to model a distributed query plan, but stages themselves don’t run on Presto workers.

source

fragment

Each plan fragment is executed by a single or multiple Presto nodes. Fragments separation represent the data exchange between Presto nodes. Fragment type specifies how the fragment is executed by Presto nodes and how the data is distributed between fragments

source

ClickHouse

Not Available

There is no global query plan for distributed query execution. Each node has its local query plan for its part of the job. We only have simple one-pass distributed query execution: we send queries for remote nodes and then merge the results. But this is not feasible for complicated queries with high cardinality GROUP BYs or with a large amount of temporary data for JOIN. In such cases, we need to “reshuffle” data between servers, which requires additional coordination. ClickHouse does not support that kind of query execution, and we need to work on it.

source