主要介绍limit...offset...是如何实现的。首先看下数据结构。

1数据结构

PG/GP limit...offset...实现机制_大数据

Limit算子的描述结构式LimitState,它的结构如上图。PlanState是计划节点的描述信息;重要结构成员limitOffset和limitCount分别是limit算子计算offset和limit返回数量的表达式计算步骤,这个结构在ExecInitLimit中进行初始化;offset和count分别保存表达式计算的结果,也就是offset值和limit值;noCount表示是否有Limit,比如仅有offset语句;lstate表示算子执行的状态机;position作为中间使用值,表示最近一次返回tuple的位置;subSlot作为指针,指向子节点中获取的tuple。

2 执行

Limit算子的执行由函数ExecLimit来完成,看下它的代码:

PG/GP limit...offset...实现机制_数据结构_02

主要分为2步:1)通过ExecLimit_guts函数来执行limit算子主要步骤,返回值result即为limit算子返回的tuple值;2)如果result为NULL也就是limit返回了所有tuple,并且设置了没有设置expect_rescan,就需要执行ExecSquelchNode通知下层节点,我已经获取了所有需要的tuple了,不再需要后续的tuple了。

我们主要看ExecLimit_guts函数。

ExecLimit_guts

主要是根据状态机状态来执行对应步骤。

typedef enum
{
LIMIT_INITIAL, /*LIMIT node初始状态 */
LIMIT_RESCAN, /* rescan after recomputing parameters */
LIMIT_EMPTY, /* 没有要返回的行了*/
LIMIT_INWINDOW, /* have returned a row in the window */
LIMIT_SUBPLANEOF, /* at EOF of subplan (within window) */
LIMIT_WINDOWEND, /* stepped off end of window */
LIMIT_WINDOWSTART /* stepped off beginning of window */
} LimitStateCond;
tatic TupleTableSlot *  ExecLimit_guts(PlanState *pstate)
{
switch (node->lstate){
case LIMIT_INITIAL:
//计算limit及offset
recompute_limits(node);
//计算完立即进入LIMIT_RESCAN
case LIMIT_RESCAN://fetch tuple知道到达offset处
if (!ScanDirectionIsForward(direction))
return NULL;
//limit窗口为0,返回null
if (node->count <= 0 && !node->noCount){
node->lstate = LIMIT_EMPTY;
return NULL;
}
for (;;){
slot = ExecProcNode(outerPlan);
if (TupIsNull(slot)){
//子节点没有slot了
node->lstate = LIMIT_EMPTY;
return NULL;
}
node->subSlot = slot;//下层节点获取后存入这里
if (++node->position > node->offset)
break;//满足offset窗口
}
//获取了满足窗口的第一个tuple
node->lstate = LIMIT_INWINDOW;
break;
case LIMIT_EMPTY:
return NULL;
case LIMIT_INWINDOW:
if (ScanDirectionIsForward(direction)){
if (!node->noCount && node->position - node->offset >= node->count){
node->lstate = LIMIT_WINDOWEND;//达到了limit个数
if (!(node->ps.state->es_top_eflags & EXEC_FLAG_BACKWARD))
(void) ExecShutdownNode(outerPlan);
return NULL;
}//fetch一个tuple
slot = ExecProcNode(outerPlan);
if (TupIsNull(slot)){
node->lstate = LIMIT_SUBPLANEOF;
return NULL;
}
node->subSlot = slot;
node->position++;
}else{
if (node->position <= node->offset + 1){
node->lstate = LIMIT_WINDOWSTART;
return NULL;
}
slot = ExecProcNode(outerPlan);
if (TupIsNull(slot))
elog(ERROR, "LIMIT subplan failed to run backwards");
node->subSlot = slot;
node->position--;
}
break;
case LIMIT_SUBPLANEOF:
if (ScanDirectionIsForward(direction))
return NULL;
slot = ExecProcNode(outerPlan);
if (TupIsNull(slot))
elog(ERROR, "LIMIT subplan failed to run backwards");
node->subSlot = slot;
node->lstate = LIMIT_INWINDOW;
/* position does not change 'cause we didn't advance it before */
break;


case LIMIT_WINDOWEND:
if (ScanDirectionIsForward(direction))
return NULL;


slot = node->subSlot;
node->lstate = LIMIT_INWINDOW;
/* position does not change 'cause we didn't advance it before */
break;


case LIMIT_WINDOWSTART:
if (!ScanDirectionIsForward(direction))
return NULL;
slot = node->subSlot;
node->lstate = LIMIT_INWINDOW;
break;
default:
elog(ERROR, "impossible LIMIT state: %d",
(int) node->lstate);
slot = NULL; /* keep compiler quiet */
break;
}
return slot;
}

计算处limit值和offset后,首先进入LIMIT_RESCAN状态,不断从下层节点fetch tuple直到到达offset处,然后进入LIMIT_INWINDOW状态:获取一个记录并返回,直到达到limit个数。