下午做题遇到一个这样的问题,之前没太关注过,打算学习学习,避免主从配置踩坑。

postgresql源码学习(54)—— HotStandby从库必须设置大于等于主库的参数_从库参数

       题干搜一搜,没搜出啥有用的玩意…渣翻成英文搜一搜,搜出来了官方文档:PostgreSQL: Documentation: 14: 27.4. Hot Standby

一、 从库必须>=主库的几个参数

    这里把几个版本的文档都看了看,发现一点有意思的区别。

1. pg 11版本

https://www.postgresql.org/docs/11/hot-standby.html

max_connections,max_prepared_transactions,max_locks_per_transaction,max_worker_processes

  • 加大参数时:先加从库,再加主库
  • 减小参数时:先减主库,再减从库

如果设错了会怎么样?还蛮严重的,从库会启动不了。

   

If these parameters are not set high enough then the standby will refuse to start. Higher values can then be supplied and the server restarted to begin recovery again.

2. pg 12版本

PostgreSQL: Documentation: 12: 26.5. Hot Standby

max_wal_senders,变成了5个,13版本的文档跟12一样。

3. pg 14版本

tracking transaction IDs, locks, and prepared transactions的共享内存大小相关。为了避免从库在recovery过程在耗尽相关共享内存,从库这些共享内存的结构必须>=主库。

     例如,如果主库使用了一个prepared transaction,但从库没有为它分配共享内存,那么recovery操作会无法继续,直到参数设置正确。

       文档还给了个设置错误时的例子

WARNING:  hot standby is not possible because of insufficient parameter settings
DETAIL:  max_connections = 80 is a lower setting than on the primary server, where its value was 100.
LOG:  recovery has paused
DETAIL:  If recovery is unpaused, the server will shut down.
HINT:  You can then restart the server after making the necessary configuration changes.

注意这个DETAIL,当参数设错时,从库由直接不能启动变成了暂停recovery操作(刚好前两天看了新特性文档,又找了遍没找到这项,可能只算是个小优化?)。

二、 源码学习

1. 找到对应函数

最简单的就是直接搜索告警内容

postgresql源码学习(54)—— HotStandby从库必须设置大于等于主库的参数_从库参数_02

postgresql源码学习(54)—— HotStandby从库必须设置大于等于主库的参数_从库参数_03

2. RecoveryRequiresIntParameter函数

      这里还发现它其实对HotStandby从库才有这些参数要求,SetRecoveryPause很明显就是暂停recovery的函数。

static void
RecoveryRequiresIntParameter(const char *param_name, int currValue, int minValue)
{
    if (currValue < minValue)
    {
        if (LocalHotStandbyActive)
        {
            bool        warned_for_promote = false;

            ereport(WARNING,
                    (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                     errmsg("hot standby is not possible because of insufficient parameter settings"),
                     errdetail("%s = %d is a lower setting than on the primary server, where its value was %d.",
                               param_name,
                               currValue,
                               minValue)));

            SetRecoveryPause(true);

            ereport(LOG,
                    (errmsg("recovery has paused"),
                     errdetail("If recovery is unpaused, the server will shut down."),
                     errhint("You can then restart the server after making the necessary configuration changes.")));

            while (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED)
            {
                HandleStartupProcInterrupts();

                if (CheckForStandbyTrigger())
                {
                    if (!warned_for_promote)
                        ereport(WARNING,
                                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                                 errmsg("promotion is not possible because of insufficient parameter settings"),

                        /*
                         * Repeat the detail from above so it's easy to find
                         * in the log.
                         */
                                 errdetail("%s = %d is a lower setting than on the primary server, where its value was %d.",
                                           param_name,
                                           currValue,
                                           minValue),
                                 errhint("Restart the server after making the necessary configuration changes.")));
                    warned_for_promote = true;
                }

                /*
                 * If recovery pause is requested then set it paused.  While
                 * we are in the loop, user might resume and pause again so
                 * set this every time.
                 */
                ConfirmRecoveryPaused();

                /*
                 * We wait on a condition variable that will wake us as soon
                 * as the pause ends, but we use a timeout so we can check the
                 * above conditions periodically too.
                 */
                ConditionVariableTimedSleep(&XLogCtl->recoveryNotPausedCV, 1000,
                                            WAIT_EVENT_RECOVERY_PAUSE);
            }
            ConditionVariableCancelSleep();
        }

        ereport(FATAL,
                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                 errmsg("recovery aborted because of insufficient parameter settings"),
        /* Repeat the detail from above so it's easy to find in the log. */
                 errdetail("%s = %d is a lower setting than on the primary server, where its value was %d.",
                           param_name,
                           currValue,
                           minValue),
                 errhint("You can restart the server after making the necessary configuration changes.")));
    }
}

3. SetRecoveryPause函数

       来看看它是怎么暂停的,其实就是获取锁,改了控制文件的recoveryPauseState状态。并且这里只是把状态改为RECOVERY_PAUSE_REQUESTED,实际暂停状态由ConfirmRecoveryPaused函数修改。

/*
 * Set the recovery pause state.
 *
 * If recovery pause is requested then sets the recovery pause state to
 * 'pause requested' if it is not already 'paused'.  Otherwise, sets it
 * to 'not paused' to resume the recovery.  The recovery pause will be
 * confirmed by the ConfirmRecoveryPaused.
 */
void
SetRecoveryPause(bool recoveryPause)
{
    SpinLockAcquire(&XLogCtl->info_lck);

    if (!recoveryPause)
        XLogCtl->recoveryPauseState = RECOVERY_NOT_PAUSED;
    else if (XLogCtl->recoveryPauseState == RECOVERY_NOT_PAUSED)
        XLogCtl->recoveryPauseState = RECOVERY_PAUSE_REQUESTED;

    SpinLockRelease(&XLogCtl->info_lck);

    if (!recoveryPause)
        ConditionVariableBroadcast(&XLogCtl->recoveryNotPausedCV);
}

4. ConfirmRecoveryPaused函数

       把状态由RECOVERY_PAUSE_REQUESTED改为RECOVERY_PAUSED

/*
 * Confirm the recovery pause by setting the recovery pause state to
 * RECOVERY_PAUSED.
 */
static void
ConfirmRecoveryPaused(void)
{
    /* If recovery pause is requested then set it paused */
    SpinLockAcquire(&XLogCtl->info_lck);
    if (XLogCtl->recoveryPauseState == RECOVERY_PAUSE_REQUESTED)
        XLogCtl->recoveryPauseState = RECOVERY_PAUSED;
    SpinLockRelease(&XLogCtl->info_lck);
}

5. CheckRequiredParameterValues函数

        搜一下我们的5个参数,会发现它们的检查都在本函数中。

/*
 * Check to see if required parameters are set high enough on this server
 * for various aspects of recovery operation.
 */
static void
CheckRequiredParameterValues(void)
{
    /*
     * For archive recovery, the WAL must be generated with at least 'replica'
     * wal_level.
     */
    if (ArchiveRecoveryRequested && ControlFile->wal_level == WAL_LEVEL_MINIMAL)
    {
        ereport(FATAL,
                (errmsg("WAL was generated with wal_level=minimal, cannot continue recovering"),
                 errdetail("This happens if you temporarily set wal_level=minimal on the server."),
                 errhint("Use a backup taken after setting wal_level to higher than minimal.")));
    }

    /*
     * For Hot Standby, the WAL must be generated with 'replica' mode, and we
     * must have at least as many backend slots as the primary.
     */
    if (ArchiveRecoveryRequested && EnableHotStandby)
    {
        /* We ignore autovacuum_max_workers when we make this test. */
        RecoveryRequiresIntParameter("max_connections",
                                     MaxConnections,
                                     ControlFile->MaxConnections);
        RecoveryRequiresIntParameter("max_worker_processes",
                                     max_worker_processes,
                                     ControlFile->max_worker_processes);
        RecoveryRequiresIntParameter("max_wal_senders",
                                     max_wal_senders,
                                     ControlFile->max_wal_senders);
        RecoveryRequiresIntParameter("max_prepared_transactions",
                                     max_prepared_xacts,
                                     ControlFile->max_prepared_xacts);
        RecoveryRequiresIntParameter("max_locks_per_transaction",
                                     max_locks_per_xact,
                                     ControlFile->max_locks_per_xact);
    }
}

三、 这5个参数到底是干啥的?

1. max_connections

       实例的最大并发连接数(Sets the maximum number of concurrent connections.)。

2. max_wal_senders

       同时运行的wal sender process的最大数量(Sets the maximum number of simultaneously running WAL sender processes.)。

       pg 12开始,该进程数量不计入max_connections,这样即使db连接数爆,也不影响主从同步。Make max_wal_senders not count as part of max_connections.

https://www.postgresql.org/docs/12/release-12.html

3. max_worker_processes

        同时运行的工作进程的最大数量(Maximum number of concurrent worker processes.)

4. max_locks_per_transaction

平均的对象锁的个数,默认值是64。

单个事务可以同时获得的对象锁的数目可以超过max_locks_per_transaction的值,只要共享锁表中还有剩余空间。

5. max_prepared_transactions

此参数用于指定分布式事务两步提交中,准备事务的最大数量。默认值为0,即不支持分布式事务。如果要设置,max_prepared_transactions建议不小于max_connections,这样每个session都可以至少有一个可用的准备事务。

参考

PostgreSQL: Documentation: 14: 27.4. Hot Standby

pg12新特性-max_wal_senders从max_connections分离_瀚高PG实验室

http://t.zoukankan.com/f2flow-p-6050469.html

max_locks_per_transaction 设置过大引发数据库启动问题 - 墨天轮问答