PG守护进程（Postmaster）——后台一等公民进程保活

原创

mb62de8abf75c00 2022-07-30 00:02:41 ©著作权

文章标签 数据库 database redis 保活 hive 文章分类 后端开发

©著作权归作者所有：来自51CTO博客作者mb62de8abf75c00的原创作品，请联系作者获取转载授权，否则将追究法律责任

pmdie/reaper/sigusr1_handler -> PostmasterStateMachine --> StartupDataBase
ServerLoop(postmaster.c)/reaper/sigusr1_handler --> StartBackgroundWriter
ServerLoop(postmaster.c)/reaper/PostmasterStateMachine/sigusr1_handler --> StartCheckpointer
ServerLoop(postmaster.c)/reaper --> StartWalWriter
ServerLoop(postmaster.c)/sigusr1_handler --> MaybeStartWalReceiver --> StartWalReceiver

StartupDataBase进程保活

由于StartupDataBase进程是后台一等公民进程的头胎，所以其保活的机制和其他进程分开描述。PostmasterStateMachine函数（推进 postmaster 的状态机并采取适当的行动）会调用StartupDataBase函数。如下流程表示如果我们需要从崩溃中恢复，请等待所有非 syslogger 子项退出，然后重置 shmem 和 StartupDataBase进程。PM_NO_CHILDREN代表所有重要的孩子进程都离开了（all important children have exited）。这是StartupDataBase进程唯一需要保活的地方。

if (FatalError && pmState == PM_NO_CHILDREN) {
    ereport(LOG, (errmsg("all server processes terminated; reinitializing")));    
    ResetBackgroundWorkerCrashTimes(); /* allow background workers to immediately restart */
    shmem_exit(1);    
    LocalProcessControlFile(true); /* re-read control file into local memory */
    reset_shared(PostPortNumber);
    StartupPID = StartupDataBase();
    Assert(StartupPID != 0);
    StartupStatus = STARTUP_RUNNING;
    pmState = PM_STARTUP;    
    AbortStartTime = 0; /* crash recovery started, reset SIGKILL flag */
  }

其他后台一等公民进程保活

ServerLoop中保活

需要在ServerLoop函数中保活的后台一等公民进程有StartBackgroundWriter、StartCheckpointer、StartWalWriter和StartWalReceiver。ServerLoop函数中保活的位置在给客户端创建新连接之后，如下所示。

static int ServerLoop(void){
 | -- nSockets = initMasks(&readmask);
 | -- for (;;){
       | -- if (pmState == PM_WAIT_DEAD_END)
       | -- else
             | -- selres = select(nSockets, &rmask, NULL, NULL, &timeout); // 监听新连接
       | -- if (selres > 0)  
             | -- for (i = 0; i < MAXLISTEN; i++)  
                  | -- port = ConnCreate(ListenSocket[i]);
       | -- ServerLoop函数中保活流程

StartCheckpointer和StartBackgroundWriter函数在如果没有BackgroundWriter后台写入进程正在运行，并且我们没有处于阻止它的状态运行。如果失败也没关系，我们稍后再试一次。对于checkpointer也是如此。

/* If no background writer process is running, and we are not in a state that prevents it, start one.  It doesn't matter if this fails, we'll just try again later.  Likewise for the checkpointer. */
    if (pmState == PM_RUN || pmState == PM_RECOVERY || pmState == PM_HOT_STANDBY) {
      if (CheckpointerPID == 0) CheckpointerPID = StartCheckpointer();
      if (BgWriterPID == 0) BgWriterPID = StartBackgroundWriter();
    }

同样，如果我们丢失了 walwriter 进程，请尝试启动一个新进程。但这仅在正常操作中需要（否则我们不能编写任何新的 WAL）

/* Likewise, if we have lost the walwriter process, try to start a new one.  But this is needed only in normal operation (else we cannot be writing any new WAL). */
    if (WalWriterPID == 0 && pmState == PM_RUN) WalWriterPID = StartWalWriter();

如果设置了WalReceiverRequested，则尝试启动WalReceiver进程，这个可以查看前面的博客。

/* If we need to start a WAL receiver, try to do that now */
    if (WalReceiverRequested) MaybeStartWalReceiver();

reaper中保活

reaper函数设置为postmaster的SIGCHLD信号处理函数，用于处于子进程termination。调用waitpid函数等待子进程退出，获取其退出码exitstatus。最对启动进程的退出码进行相应的处理，处理完后会包活一些后台一等公民。

while ((pid = waitpid(-1, &exitstatus, WNOHANG)) > 0) {    
    if (pid == StartupPID) { /* Check if this child was a startup process. */
      StartupPID = 0;

      if (Shutdown > NoShutdown && (EXIT_STATUS_0(exitstatus) || EXIT_STATUS_1(exitstatus))) {
        StartupStatus = STARTUP_NOT_RUNNING;
        pmState = PM_WAIT_BACKENDS;    
        continue; /* PostmasterStateMachine logic does the rest */
      }
      if (EXIT_STATUS_3(exitstatus)) {
        ereport(LOG,(errmsg("shutdown at recovery target")));
        StartupStatus = STARTUP_NOT_RUNNING;
        Shutdown = Max(Shutdown, SmartShutdown);
        TerminateChildren(SIGTERM);
        pmState = PM_WAIT_BACKENDS;    
        continue; /* PostmasterStateMachine logic does the rest */
      }
      if (pmState == PM_STARTUP && !EXIT_STATUS_0(exitstatus)) {
        LogChildExit(LOG, _("startup process"), pid, exitstatus);
        ereport(LOG, (errmsg("aborting startup due to startup process failure")));
        ExitPostmaster(1);
      }
      if (!EXIT_STATUS_0(exitstatus)) {
        if (StartupStatus == STARTUP_SIGNALED) StartupStatus = STARTUP_NOT_RUNNING;
        else StartupStatus = STARTUP_CRASHED;
        HandleChildCrash(pid, exitstatus, _("startup process"));
        continue;
      }

      /* Startup succeeded, commence normal operations */
      StartupStatus = STARTUP_NOT_RUNNING;
      FatalError = false;
      Assert(AbortStartTime == 0);
      ReachedNormalRunning = true;
      pmState = PM_RUN;
      connsAllowed = ALLOW_ALL_CONNS;

      /* Crank up the background tasks, if we didn't do that already
       * when we entered consistent recovery state.  It doesn't matter
       * if this fails, we'll just try again later. */ 
      // 启动后台任务，如果我们在进入一致恢复状态时还没有这样做的话。 如果失败也没关系，我们稍后再试一次。
      if (CheckpointerPID == 0) CheckpointerPID = StartCheckpointer();
      if (BgWriterPID == 0) BgWriterPID = StartBackgroundWriter();
      if (WalWriterPID == 0) WalWriterPID = StartWalWriter();

StartBackgroundWriter、StartCheckpointer和StartWalWriter在启动进程成功完成自己的目标之后，会尝试启动。

sigusr1_handler中保活

StartBackgroundWriter和StartCheckpointer在postmaster收到启动进程发送的PMSIGNAL_RECOVERY_STARTED信号之后启动，需要注意的是pmState处于PM_STARTUP，shutdown处于NoShutdown状态。RECOVERY_STARTED 和 BEGIN_HOT_STANDBY 信号在意外状态下被忽略。如果启动进程快速启动，完成恢复，退出，我们可能会先处理启动进程的死亡。在这种情况下，我们不想回到恢复状态。

/* RECOVERY_STARTED and BEGIN_HOT_STANDBY signals are ignored in
   * unexpected states. If the startup process quickly starts up, completes
   * recovery, exits, we might process the death of the startup process
   * first. We don't want to go back to recovery in that case. */
  if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_STARTED) && pmState == PM_STARTUP && Shutdown == NoShutdown) {    
    FatalError = false; /* WAL redo has started. We're out of reinitialization. */
    /* Crank up the background tasks.  It doesn't matter if this fails, we'll just try again later. */
    CheckpointerPID = StartCheckpointer();
    BgWriterPID = StartBackgroundWriter();

StartWalReceiver进程在启动进程发送PMSIGNAL_START_WALRECEIVER信号之后，设置WalReceiverRequested为true，并尝试启动WalReceiver进程。

if (CheckPostmasterSignal(PMSIGNAL_START_WALRECEIVER)) {
    /* Startup Process wants us to start the walreceiver process. */
    /* Start immediately if possible, else remember request for later. */
    WalReceiverRequested = true;
    MaybeStartWalReceiver();
  }

PostmasterStateMachine中保活

如果我们的状态机处于等待后端退出状态，查看后端进程是否全部退出。PM_WAIT_BACKENDS 状态在我们没有常规后端（包括 autovac workers）、没有 bgworkers（包括unconnected ones）、没有 walwriter、autovac launcher或 bgwriter 时结束。如果我们正在进行崩溃恢复或立即关闭，那么我们希望检查点也退出，否则不会。归档器、统计信息和系统记录器进程被忽略，因为它们没有连接到共享内存；我们在这里也忽略了dead_end子进程。 Walsenders 也被忽略，它们将在写入检查点记录checkpoint record后被终止，就像归档程序一样。

/* If we are in a state-machine state that implies waiting for backends to exit, see if they're all gone, and change state if so. */ // 如果我们处于等待后端退出的状态机状态，请查看它们是否全部消失，如果是则更改状态
  if (pmState == PM_WAIT_BACKENDS) {
    /* PM_WAIT_BACKENDS state ends when we have no regular backends
     * (including autovac workers), no bgworkers (including unconnected
     * ones), and no walwriter, autovac launcher or bgwriter.  If we are
     * doing crash recovery or an immediate shutdown then we expect the
     * checkpointer to exit as well, otherwise not. The archiver, stats,
     * and syslogger processes are disregarded since they are not
     * connected to shared memory; we also disregard dead_end children
     * here. Walsenders are also disregarded, they will be terminated
     * later after writing the checkpoint record, like the archiver
     * process. */
    if (CountChildren(BACKEND_TYPE_ALL - BACKEND_TYPE_WALSND) == 0 && StartupPID == 0 && WalReceiverPID == 0 &&
      BgWriterPID == 0 && (CheckpointerPID == 0 || (!FatalError && Shutdown < ImmediateShutdown)) && WalWriterPID == 0 && AutoVacPID == 0) {

如果关闭状态是Immediate或出现错误，则pmState跳转到PM_WAIT_DEAD_END。 StartCheckpointer

if (Shutdown >= ImmediateShutdown || FatalError) {
        /* Start waiting for dead_end children to die.  This state change causes ServerLoop to stop creating new ones */
        pmState = PM_WAIT_DEAD_END;
        /* We already SIGQUIT'd the archiver and stats processes, if any, when we started immediate shutdown or entered FatalError state. */
      } else {

如果关闭状态不为Immediate，即为正常关闭，所有常规的子进程都关闭了，这时候需要告诉Checkpointer写入关闭检查点XLOG记录。首先如果Checkpointer进程已经不再了，则调用StartCheckpointer函数启动一下。启动后，则需要向Checkpointer进程发送SIGUSR2信号，设置pmState为PM_SHUTDOWN。如果启动后，Checkpointer进程还是没能运行，也就是我们未能派生检查点进程，只需关闭即可。任何需要的清理都将在下次重新启动时进行。我们设置 FatalError 以便在退出时记录“异常关闭”消息。设置pmState为PM_WAIT_DEAD_END。

/* If we get here, we are proceeding with normal shutdown. All the regular children are gone, and it's time to tell the checkpointer to do a shutdown checkpoint. */        
        if (CheckpointerPID == 0) CheckpointerPID = StartCheckpointer(); /* Start the checkpointer if not running */    
        if (CheckpointerPID != 0) /* And tell it to shut down */
        {
          signal_child(CheckpointerPID, SIGUSR2);
          pmState = PM_SHUTDOWN;
        }else {
          /* If we failed to fork a checkpointer, just shut down. Any required cleanup will happen at next restart. We set FatalError so that an "abnormal shutdown" message gets logged when we exit. */
          FatalError = true;
          pmState = PM_WAIT_DEAD_END;      
          SignalChildren(SIGQUIT); /* Kill the walsenders, archiver and stats collector too */
          if (PgArchPID != 0) signal_child(PgArchPID, SIGQUIT);
          if (PgStatPID != 0) signal_child(PgStatPID, SIGQUIT);
        }
      }