PgStat_Msg统计信息消息

PgStat_Msg联合体包含了所有的统计消息,PgStat_MsgHdr作为所有消息的消息头,也是PgStat_Msg联合体中的一种,其包含了StatMsgType类型的m_type和消息数据大小int类型的m_size。PgStat_MsgDummy作为空信息,仅仅包含了PgStat_MsgHdr作为自己唯一的成员,除此之外的消息都在StatMsgType类型成员之外增加了自定义的成员。

PgStat_Msg

PgStat_Msg对应成员名

StatMsgType

用途

执行函数

PgStat_MsgHdr

msg_hdr

PgStat_MsgDummy

msg_dummy

PGSTAT_MTYPE_DUMMY

空信息

PgStat_MsgInquiry

msg_inquiry

PGSTAT_MTYPE_INQUIRY

通知收集器写统计文件消息

pgstat_recv_inquiry

PgStat_MsgTabstat

msg_tabstat

PGSTAT_MTYPE_TABSTAT

发送表和缓冲访问统计消息

pgstat_recv_tabstat

PgStat_MsgTabpurge

msg_tabpurge

PGSTAT_MTYPE_TABPURGE

发送无效表的消息

pgstat_recv_tabpurge

PgStat_MsgDropdb

msg_dropdb

PGSTAT_MTYPE_DROPDB

发送删除的数据库信息消息

pgstat_recv_dropdb

PgStat_MsgResetcounter

msg_resetcounter

PGSTAT_MTYPE_RESETCOUNTER

通知收集器重置计数器消息

pgstat_recv_resetcounter

PgStat_MsgResetsharedcounter

msg_resetsharedcounter

PGSTAT_MTYPE_RESETSHAREDCOUNTER

pgstat_recv_resetsharedcounter

PgStat_MsgResetsinglecounter

msg_resetsinglecounter

PGSTAT_MTYPE_RESETSINGLECOUNTER

pgstat_recv_resetsinglecounter

PgStat_MsgResetslrucounter

msg_resetslrucounter

PGSTAT_MTYPE_RESETSLRUCOUNTER

pgstat_recv_resetslrucounter

PgStat_MsgResetreplslotcounter

msg_resetreplslotcounter

PGSTAT_MTYPE_RESETREPLSLOTCOUNTER

pgstat_recv_resetreplslotcounter

PgStat_MsgAutovacStart

msg_autovacuum_start

PGSTAT_MTYPE_AUTOVAC_START

发送一个数据库即将被清理的消息

pgstat_recv_autovac

PgStat_MsgVacuum

msg_vacuum

PGSTAT_MTYPE_VACUUM

发送VACUUM或VACUUM ANALYZE执行完消息

pgstat_recv_vacuum

PgStat_MsgAnalyze

msg_analyze

PGSTAT_MTYPE_ANALYZE

发送ANALYZE执行消息

pgstat_recv_analyze

PgStat_MsgArchiver

msg_archiver

PGSTAT_MTYPE_ARCHIVER

pgstat_recv_archiver

PgStat_MsgBgWriter

msg_bgwriter

PGSTAT_MTYPE_BGWRITER

bgwriter通知更新统计信息消息

pgstat_recv_bgwriter

PgStat_MsgWal

msg_wal

PGSTAT_MTYPE_WAL

pgstat_recv_wal

PgStat_MsgSLRU

msg_slru

PGSTAT_MTYPE_SLRU

pgstat_recv_slru

PgStat_MsgFuncstat

msg_funcstat

PGSTAT_MTYPE_FUNCSTAT

发送函数使用统计信息消息

pgstat_recv_funcstat

PgStat_MsgFuncpurge

msg_funcpurge

PGSTAT_MTYPE_FUNCPURGE

发送无效函数的消息

pgstat_recv_funcpurge

PgStat_MsgRecoveryConflict

msg_recoveryconflict

PGSTAT_MTYPE_RECOVERYCONFLICT

pgstat_recv_recoveryconflict

PgStat_MsgDeadlock

msg_deadlock

PGSTAT_MTYPE_DEADLOCK

pgstat_recv_deadlock

PgStat_MsgTempFile

msg_tempfile

PGSTAT_MTYPE_TEMPFILE

pgstat_recv_tempfile

PgStat_MsgChecksumFailure

msg_checksumfailure

PGSTAT_MTYPE_CHECKSUMFAILURE

pgstat_recv_checksum_failure

PgStat_MsgReplSlot

msg_replslot

PGSTAT_MTYPE_REPLSLOT

pgstat_recv_replslot

PgStat_MsgConnect

msg_connect

PGSTAT_MTYPE_CONNECT

pgstat_recv_connect

PgStat_MsgDisconnect

msg_disconnect

PGSTAT_MTYPE_DISCONNECT

pgstat_recv_disconnect

磁盘statfile文件与内存统计结构体

我们知道在数据库集簇的目录下有与统计信息收集器相关的文件:global子文件夹下的pgstat.stat文件用于保存当前全局的统计信息;pg_stat_tmp文件则是PgStat进程和各个后台进程进行交互的临时文件所在地。这里我们将讲述PgStat进程中与读写statfile相关函数流程,如下图中的绿色所示流程。

  • pgstat_read_statsfiles其实就是将statfile文件中存储的统计信息恢复到db hash table、Per-database table/function hash table,将statfile文件中存储的统计信息恢复到global、archiver、SLRU stats结构体。
  • pgstat_write_statsfiles其实就是将globalStats的stats_timestamp设置为当前时间戳。向文件中写入globalStats、archiverStats、walStats、slruStats结构体数据。 获取DB hash table中的PgStat_StatDBEntry entry,并写入PgStat_StatDBEntry,如果需要,将此数据库的表和函数统计数据写入相应的数据库统计信息文件。获取replSlotStatHashhash table中的PgStat_StatReplSlotEntry,并写入PgStat_StatReplSlotEntry中。
  • PG守护进程(Postmaster)——辅助进程PgStat统计消息_时间戳

  • ​static HTAB * pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)​​函数从数据库集簇下存在的统计信息文件中读取信息并创建databases哈希表返回。如果onlydb参数不是InvalidOid,意味着我们仅仅需要oid为onlydb加上**共享catalog(“DB 0”)**的统计信息;我们仍然为所有数据库返回DB hash table,但是我们不会为其他数据库的表和函数创建哈希表。permanent参数指示从永久文件中读取而不是从临时文件中读取。当该参数为true时(PgStat进程启动时才会设置为true),读取后删除文件;内存中的状态现在是权威的,如果其他人读取这些文件,文件数据应该是过期。如果deep参数设置为true,也就是要求所谓的deep read,表和函数统计信息将会被读取,否则表和函数的哈希表将保持为空,也就是不读取。其流程如下所示:
  1. 根据permanent参数选定永久或临时的.stat文件。
  2. 创建pgStatLocalContext内存上下文以存放db hash table,创建key为数据库OID、value为PgStat_StatDBEntry的哈希表。
  3. 清理global, archiver, WAL and SLRU统计结构体。
  4. 将globalStats、archiverStats、walStats和slruStats的stat_reset_timestamp设置为当前时间戳,以读取磁盘上的统计信息文件。
  5. 尝试打开stats文件。如果它不存在,后端只返回零,收集器只从零开始,计数器为空。如果成功,校验stats文件。
  6. 从stats文件中读取global、archiver、SLRU stats结构体。
  7. 从stats文件中读取依据不同类型读取Entry,针对PgStat_StatDBEntry类型,将其加入DB hash table;如果对数据库中的表和函数感兴趣,则需要读取PgStat_StatTabEntry或PgStat_StatFuncEntry,并将其加入Per-database table/function hash table。如果指定deep,请从数据库特定文件中读取数据(具体查看函数pgstat_read_db_statsfile)。否则,我们只保留哈希表为空。针对PgStat_StatReplSlotEntry类型,将其加入replSlotStatHash hash table。
  8. 最后,如果permanent参数选定永久,删除永久统计信息文件。
static HTAB *pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep){
PgStat_StatDBEntry *dbentry; PgStat_StatDBEntry dbbuf;
HASHCTL hash_ctl;
HTAB *dbhash;
FILE *fpin;
int32 format_id;
bool found;
const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename; // 根据permanent参数选定永久或临时的.stat文件
int i;

pgstat_setup_memcxt(); /* The tables will live in pgStatLocalContext. */ // 创建pgStatLocalContext内存上下文以存放db hash table
hash_ctl.keysize = sizeof(Oid); /* Create the DB hashtable */ // 哈希表的key为数据库OID
hash_ctl.entrysize = sizeof(PgStat_StatDBEntry); // 哈希表的value为PgStat_StatDBEntry
hash_ctl.hcxt = pgStatLocalContext;
dbhash = hash_create("Databases hash", PGSTAT_DB_HASH_SIZE, &hash_ctl, HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);

memset(&globalStats, 0, sizeof(globalStats)); /* Clear out global, archiver, WAL and SLRU statistics so they start from zero in case we can't load an existing statsfile. */
memset(&archiverStats, 0, sizeof(archiverStats));
memset(&walStats, 0, sizeof(walStats));
memset(&slruStats, 0, sizeof(slruStats));
/* Set the current timestamp (will be kept only in case we can't load an existing statsfile). */
globalStats.stat_reset_timestamp = GetCurrentTimestamp(); // 将globalStats.stat_reset_timestamp设置为当前时间戳
archiverStats.stat_reset_timestamp = globalStats.stat_reset_timestamp;
walStats.stat_reset_timestamp = globalStats.stat_reset_timestamp;
for (i = 0; i < SLRU_NUM_ELEMENTS; i++) /* Set the same reset timestamp for all SLRU items too. */
slruStats[i].stat_reset_timestamp = globalStats.stat_reset_timestamp;

/* Try to open the stats file. If it doesn't exist, the backends simply return zero for anything and the collector simply starts from scratch with empty counters. ENOENT is a possibility if the stats collector is not running or has not yet written the stats file the first time. Any other failure condition is suspicious. */ // 尝试打开stats文件。如果它不存在,后端只返回零,收集器只从零开始,计数器为空。如果stats收集器未运行或尚未首次写入stats文件,则可能出现ENOENT。任何其他故障情况都是可疑的。
if ((fpin = AllocateFile(statfile, PG_BINARY_R)) == NULL) {
if (errno != ENOENT) ereport(pgStatRunningInCollector ? LOG : WARNING,(errcode_for_file_access(),errmsg("could not open statistics file \"%s\": %m",statfile)));
return dbhash;
}
if (fread(&format_id, 1, sizeof(format_id), fpin) != sizeof(format_id) || format_id != PGSTAT_FILE_FORMAT_ID) { /* Verify it's of the expected format. */
ereport(pgStatRunningInCollector ? LOG : WARNING, (errmsg("corrupted statistics file \"%s\"", statfile)));
goto done;
}

if (fread(&globalStats, 1, sizeof(globalStats), fpin) != sizeof(globalStats)) { /* Read global stats struct */
ereport(pgStatRunningInCollector ? LOG : WARNING,(errmsg("corrupted statistics file \"%s\"", statfile)));
memset(&globalStats, 0, sizeof(globalStats));
goto done;
}

/* In the collector, disregard the timestamp we read from the permanent stats file; we should be willing to write a temp stats file immediately upon the first request from any backend. This only matters if the old file's timestamp is less than PGSTAT_STAT_INTERVAL ago, but that's not an unusual scenario. */ // 在收集器中,忽略从永久统计文件读取的时间戳;我们应该愿意在任何后端发出第一个请求时立即编写一个临时统计文件。这仅在旧文件的时间戳小于PGSTAT_STAT_ INTERVAL ago时才重要,但这并不是一个不寻常的情况
if (pgStatRunningInCollector) globalStats.stats_timestamp = 0;

if (fread(&archiverStats, 1, sizeof(archiverStats), fpin) != sizeof(archiverStats)){ /* Read archiver stats struct */
ereport(pgStatRunningInCollector ? LOG : WARNING,(errmsg("corrupted statistics file \"%s\"", statfile)));
memset(&archiverStats, 0, sizeof(archiverStats));
goto done;
}
if (fread(&walStats, 1, sizeof(walStats), fpin) != sizeof(walStats)){ /* Read WAL stats struct */
ereport(pgStatRunningInCollector ? LOG : WARNING,(errmsg("corrupted statistics file \"%s\"", statfile)));
memset(&walStats, 0, sizeof(walStats));
goto done;
}
if (fread(slruStats, 1, sizeof(slruStats), fpin) != sizeof(slruStats)){ /* Read SLRU stats struct */
ereport(pgStatRunningInCollector ? LOG : WARNING,(errmsg("corrupted statistics file \"%s\"", statfile)));
memset(&slruStats, 0, sizeof(slruStats));
goto done;
}

for (;;){ /* We found an existing collector stats file. Read it and put all the hashtable entries into place. */ // 我们找到了一个现有的收集器统计文件。阅读它并将所有哈希表条目放在适当的位置
switch (fgetc(fpin)) {
case 'D': /* 'D' A PgStat_StatDBEntry struct describing a database follows. */
if (fread(&dbbuf, 1, offsetof(PgStat_StatDBEntry, tables),fpin) != offsetof(PgStat_StatDBEntry, tables)){
ereport(pgStatRunningInCollector ? LOG : WARNING,(errmsg("corrupted statistics file \"%s\"",statfile)));
goto done;
}
dbentry = (PgStat_StatDBEntry *) hash_search(dbhash, (void *) &dbbuf.databaseid,HASH_ENTER, &found); /* Add to the DB hash */
if (found){
ereport(pgStatRunningInCollector ? LOG : WARNING,(errmsg("corrupted statistics file \"%s\"",statfile)));
goto done;
}
memcpy(dbentry, &dbbuf, sizeof(PgStat_StatDBEntry));
dbentry->tables = NULL;
dbentry->functions = NULL;
/* In the collector, disregard the timestamp we read from the permanent stats file; we should be willing to write a temp stats file immediately upon the first request from any backend. */ // 在收集器中,忽略从永久统计文件读取的时间戳;我们应该愿意在任何后端发出第一个请求时立即编写一个临时统计文件
if (pgStatRunningInCollector) dbentry->stats_timestamp = 0;
/* Don't create tables/functions hashtables for uninteresting databases. */ // 对于不感兴趣的数据库不要创建表和函数
if (onlydb != InvalidOid){
if (dbbuf.databaseid != onlydb && dbbuf.databaseid != InvalidOid) break;
}

hash_ctl.keysize = sizeof(Oid);
hash_ctl.entrysize = sizeof(PgStat_StatTabEntry);
hash_ctl.hcxt = pgStatLocalContext;
dbentry->tables = hash_create("Per-database table",PGSTAT_TAB_HASH_SIZE,&hash_ctl,
HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
hash_ctl.keysize = sizeof(Oid);
hash_ctl.entrysize = sizeof(PgStat_StatFuncEntry);
hash_ctl.hcxt = pgStatLocalContext;
dbentry->functions = hash_create("Per-database function",
PGSTAT_FUNCTION_HASH_SIZE,&hash_ctl,HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);

/* If requested, read the data from the database-specific file. Otherwise we just leave the hashtables empty. */ // 如果需要,请从数据库特定文件中读取数据。否则,我们只保留哈希表为空
if (deep) pgstat_read_db_statsfile(dbentry->databaseid,dbentry->tables, dbentry->functions, permanent);
break;

case 'R':{ /* 'R' A PgStat_StatReplSlotEntry struct describing a replication slot follows. */
PgStat_StatReplSlotEntry slotbuf; PgStat_StatReplSlotEntry *slotent;
if (fread(&slotbuf, 1, sizeof(PgStat_StatReplSlotEntry), fpin)!= sizeof(PgStat_StatReplSlotEntry)){
ereport(pgStatRunningInCollector ? LOG : WARNING,(errmsg("corrupted statistics file \"%s\"",statfile)));
goto done;
}
if (replSlotStatHash == NULL){ /* Create hash table if we don't have it already. */
HASHCTL hash_ctl;
hash_ctl.keysize = sizeof(NameData);
hash_ctl.entrysize = sizeof(PgStat_StatReplSlotEntry);
hash_ctl.hcxt = pgStatLocalContext;
replSlotStatHash = hash_create("Replication slots hash", PGSTAT_REPLSLOT_HASH_SIZE, &hash_ctl, HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
}
slotent = (PgStat_StatReplSlotEntry *) hash_search(replSlotStatHash,(void *) &slotbuf.slotname, HASH_ENTER, NULL);
memcpy(slotent, &slotbuf, sizeof(PgStat_StatReplSlotEntry));
break;
}
case 'E':goto done;
default:
ereport(pgStatRunningInCollector ? LOG : WARNING,(errmsg("corrupted statistics file \"%s\"",statfile)));
goto done;
}
}
done:
FreeFile(fpin);
if (permanent){ /* If requested to read the permanent file, also get rid of it. */
elog(DEBUG2, "removing permanent stats file \"%s\"", statfile);
unlink(statfile);
}
return dbhash;
}

​static bool pgstat_write_statsfile_needed(void)​​函数用于判定目前内存统计信息是否有更新,如果pending_write_requests不为NIL,则说明有统计信息需要写入到statsfile文件中。

static bool pgstat_write_statsfile_needed(void){
if (pending_write_requests != NIL)
return true;
return false; /* Everything was written recently */
}

​static void pgstat_write_statsfiles(bool permanent, bool allDbs)​​​函数写入全局统计文件以及请求的DB文件。
permanent指定写入永久文件而不是临时文件。如果为true(仅在PgStat进程关闭时发生),还将删除临时文件,以便在新PgStat进程准备就绪之前,在新postmaster监管下启动的后端无法读取旧数据。当“allDbs”为false时,将只写入请求的数据库(在pending_write_requests中列出);否则,将写入所有数据库。其流程如下所示:

  1. 根据permanent参数选定永久或临时的.stat文件,根据permanent参数选定永久或临时的.stat文件。
  2. 将globalStats的stats_timestamp设置为当前时间戳。向文件中写入globalStats、archiverStats、walStats、slruStats结构体数据。
  3. 获取DB hash table中的PgStat_StatDBEntry entry,并写入PgStat_StatDBEntry,如果需要,将此数据库的表和函数统计数据写入相应的数据库统计信息文件。
  4. 获取replSlotStatHashhash table中的PgStat_StatReplSlotEntry,并写入PgStat_StatReplSlotEntry中。
  5. 如果指定permanent,则需要删除pgstat_stat_filename文件。
  6. 最后清理pending_write_requests链表
static void pgstat_write_statsfiles(bool permanent, bool allDbs) {
HASH_SEQ_STATUS hstat;
PgStat_StatDBEntry *dbentry;
FILE *fpout;
int32 format_id;
const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname; // 根据permanent参数选定永久或临时的.tmp文件
const char *statfile = permanent ? PGSTAT_STAT_PERMANENT_FILENAME : pgstat_stat_filename; // 根据permanent参数选定永久或临时的.stat文件
int rc;

fpout = AllocateFile(tmpfile, PG_BINARY_W); /* Open the statistics temp file to write out the current values. */
if (fpout == NULL) {
ereport(LOG,(errcode_for_file_access(), errmsg("could not open temporary statistics file \"%s\": %m",tmpfile)));
return;
}

globalStats.stats_timestamp = GetCurrentTimestamp(); /* Set the timestamp of the stats file. */
format_id = PGSTAT_FILE_FORMAT_ID; /* Write the file header --- currently just a format ID. */
rc = fwrite(&format_id, sizeof(format_id), 1, fpout);
(void) rc; /* we'll check for error with ferror */
rc = fwrite(&globalStats, sizeof(globalStats), 1, fpout); /* Write global stats struct */
(void) rc; /* we'll check for error with ferror */
rc = fwrite(&archiverStats, sizeof(archiverStats), 1, fpout); /* Write archiver stats struct */
(void) rc; /* we'll check for error with ferror */
rc = fwrite(&walStats, sizeof(walStats), 1, fpout); /* Write WAL stats struct */
(void) rc; /* we'll check for error with ferror */
rc = fwrite(slruStats, sizeof(slruStats), 1, fpout); /* Write SLRU stats struct */
(void) rc; /* we'll check for error with ferror */
hash_seq_init(&hstat, pgStatDBHash); /* Walk through the database table. */
while ((dbentry = (PgStat_StatDBEntry *) hash_seq_search(&hstat)) != NULL) {
/* Write out the table and function stats for this DB into the appropriate per-DB stat file, if required. */ // 如果需要,将此数据库的表和函数统计数据写入相应的每数据库统计文件
if (allDbs || pgstat_db_requested(dbentry->databaseid)) {
dbentry->stats_timestamp = globalStats.stats_timestamp; /* Make DB's timestamp consistent with the global stats */ // 使DB的时间戳与全局统计一致
pgstat_write_db_statsfile(dbentry, permanent); // 调用pgstat_write_db_statsfile为数据库表或函数写入磁盘统计信息文件
}
fputc('D', fpout); /* Write out the DB entry. We don't write the tables or functions pointers, since they're of no use to any other process. */
rc = fwrite(dbentry, offsetof(PgStat_StatDBEntry, tables), 1, fpout);
(void) rc; /* we'll check for error with ferror */
}

if (replSlotStatHash) { /* Write replication slot stats struct */
PgStat_StatReplSlotEntry *slotent;
hash_seq_init(&hstat, replSlotStatHash);
while ((slotent = (PgStat_StatReplSlotEntry *) hash_seq_search(&hstat)) != NULL){
fputc('R', fpout);
rc = fwrite(slotent, sizeof(PgStat_StatReplSlotEntry), 1, fpout);
(void) rc; /* we'll check for error with ferror */
}
}

fputc('E', fpout); /* No more output to be done. Close the temp file and replace the old pgstat.stat with it. The ferror() check replaces testing for error after each individual fputc or fwrite above. */
if (ferror(fpout)){
ereport(LOG,(errcode_for_file_access(),errmsg("could not write temporary statistics file \"%s\": %m",tmpfile)));
FreeFile(fpout);
unlink(tmpfile);
}else if (FreeFile(fpout) < 0){
ereport(LOG,(errcode_for_file_access(),errmsg("could not close temporary statistics file \"%s\": %m",tmpfile)));
unlink(tmpfile);
}else if (rename(tmpfile, statfile) < 0){
ereport(LOG,(errcode_for_file_access(),errmsg("could not rename temporary statistics file \"%s\" to \"%s\": %m",tmpfile, statfile)));
unlink(tmpfile);
}

if (permanent) unlink(pgstat_stat_filename);

list_free(pending_write_requests); /* Now throw away the list of requests. Note that requests sent after we started the write are still waiting on the network socket. */
pending_write_requests = NIL;
}

PG守护进程(Postmaster)——辅助进程PgStat统计消息_时间戳_02

PgStat_MsgInquiry消息类型

在上图中的根据消息类型调用相应的处理函数流程中,有一种特殊的消息类型PgStat_MsgInquiry,该类型的消息由backend后端进程发送给PgStat辅助进程以请求将统计信息写入到stats files中。通常,PgStat_MsgInquiry消息会提示(prompt)全局统计文件、共享目录的统计文件和指定数据库的统计文件的写入动作。如果databaseid为InvalidId,则只写入前两个,也就是只写入全局统计文件、共享目录的统计文件。仅当现有文件的时间戳早于指定的cutoff_time时,才会写入新文件;这防止了在多个请求几乎同时到达时进行重复工作,假设后端进程发送的请求在过去有一点截止时间cutoff_times。clock_time应为请求者的当前本地时间;PgStat收集器使用它来检查系统时钟是否倒转,但除非发生这种情况,否则它不会起作用。然而,我们假设clock_time>=cutoff_time。前面的注释翻译有点看不懂,这里贴一下原始注释,然后分析代码理解一下。(Ordinarily, an inquiry message prompts writing of the global stats file, the stats file for shared catalogs, and the stats file for the specified database. If databaseid is InvalidOid, only the first two are written. New file(s) will be written only if the existing file has a timestamp older than the specified cutoff_time; this prevents duplicated effort when multiple requests arrive at nearly the same time, assuming that backends send requests with cutoff_times a little bit in the past. clock_time should be the requestor’s current local time; the collector uses this to check for the system clock going backward, but it has no effect unless that occurs. We assume clock_time >= cutoff_time, though.)

typedef struct PgStat_MsgInquiry{
PgStat_MsgHdr m_hdr;
TimestampTz clock_time; /* observed local clock time */
TimestampTz cutoff_time; /* minimum acceptable file timestamp */
Oid databaseid; /* requested DB (InvalidOid => shared only) */
} PgStat_MsgInquiry;

pgstat_recv_inquiry函数用于处理PgStat_MsgInquiry函数,通过分析该函数可以理解backend后端进程请求将统计信息写入到stats files的流程。该函数第一步需要先看看pending_write_requests列表上是否已经有后端进程请求此数据库的写入请求,如果有则会导致pgstat_recv_inquiry函数直接返回;对于pgStatLocalContext内存上下文中没有对应PgStat_StatDBEntry项目的,直接向pending_write_requests列表中添加该DB OID写入到stats files的请求;如果上次写入此数据库的时间>请求的截止时间且当前时间 < dbentry->stats_timestamp说明主机时间被往以前调整了,需要强制新的统计文件写入以恢复同步(向pending_write_requests列表中添加该DB OID写入到stats files的请求),否则是过时请求;如果msg->cutoff_time <= dbentry->stats_timestamp,说明是过时请求,pgstat_recv_inquiry函数直接返回

static void pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len) {
PgStat_StatDBEntry *dbentry;
/* If there's already a write request for this DB, there's nothing to do. Note that if a request is found, we return early and skip the below check for clock skew. This is okay, since the only way for a DB request to be present in the list is that we have been here since the last write round. It seems sufficient to check for clock skew once per write round. */ // 如果已经有此数据库的写入请求,则无需执行任何操作。请注意,如果找到请求,我们将提前返回并跳过下面的时钟偏移检查。这没关系,因为DB请求出现在列表中的唯一方法是,自上次写循环以来,我们一直在这里。每次写循环检查一次时钟偏移似乎就足够了。
if (list_member_oid(pending_write_requests, msg->databaseid)) return;

/* Check to see if we last wrote this database at a time >= the requested cutoff time. If so, this is a stale request that was generated before we updated the DB file, and we don't need to do so again. If the requestor's local clock time is older than stats_timestamp, we should suspect a clock glitch, ie system time going backwards; though the more likely explanation is just delayed message receipt. It is worth expending a GetCurrentTimestamp call to be sure, since a large retreat in the system clock reading could otherwise cause us to neglect to update the stats file for a long time. */ // 检查是否上次写入此数据库的时间>=请求的截止时间。如果是这样的话,这是一个在更新DB文件之前生成的过时请求,我们不需要再次这样做。如果请求者的本地时钟时间早于stats_timestamp,我们应该怀疑时钟故障,即系统时间倒退;尽管更可能的解释是延迟的消息接收。确实值得花费一次GetCurrentTimestamp调用,因为系统时钟读数的大幅度下降可能会导致我们长时间忽略更新stats文件
dbentry = pgstat_get_db_entry(msg->databaseid, false);
if (dbentry == NULL) {
/* We have no data for this DB. Enter a write request anyway so that the global stats will get updated. This is needed to prevent backend_read_statsfile from waiting for data that we cannot supply, in the case of a new DB that nobody has yet reported any stats for. See the behavior of pgstat_read_db_statsfile_timestamp. */ // 我们没有这个数据库的数据。无论如何输入写请求,以便更新全局统计信息。这是为了防止backend_read_statsfile等待我们无法提供的数据,在没有人报告任何统计数据的新数据库的情况下。查看pgstat_read_db_statsfile_timestamp的行为
}else if (msg->clock_time < dbentry->stats_timestamp){ // 检查是否上次写入此数据库的时间>=请求的截止时间
TimestampTz cur_ts = GetCurrentTimestamp();
if (cur_ts < dbentry->stats_timestamp){
/* Sure enough, time went backwards. Force a new stats file write to get back in sync; but first, log a complaint. */ // 果然,时间倒流了。强制新的统计文件写入以恢复同步;但首先,记录投诉
char *writetime, *mytime;
writetime = pstrdup(timestamptz_to_str(dbentry->stats_timestamp)); /* Copy because timestamptz_to_str returns a static buffer */
mytime = pstrdup(timestamptz_to_str(cur_ts));
ereport(LOG,(errmsg("stats_timestamp %s is later than collector's time %s for database %u",writetime, mytime, dbentry->databaseid)));
pfree(writetime);
pfree(mytime);
}
else { /* Nope, it's just an old request. Assuming msg's clock_time is >= its cutoff_time, it must be stale, so we can ignore it. */ // 不,这只是一个老请求。假设msg的clock_time>=其cutoff_time,它必须是过时的,因此我们可以忽略它
return;
}
}else if (msg->cutoff_time <= dbentry->stats_timestamp){ /* Stale request, ignore it */ // 过时的请求,忽略它
return;
}

pending_write_requests = lappend_oid(pending_write_requests, msg->databaseid); /* We need to write this DB, so create a request. */
}

PG守护进程(Postmaster)——辅助进程PgStat统计消息_hive_03

PgStat辅助进程收集的统计信息主要用于查询优化时的代价估算。在PostgreSQL的查询优化过程中,查询请求的不同执行方案是通过建立不同的路径(Path)来表达的。在生成了许多符合条件的路径之后,从中选择出代价最小的路径转化为一个计划,这个计划将被传递给执行器执行。因此优化器的核心工作就是建立许多路径,然后从中找到最优的路径。造成同一个查询请求有不同路径的主要原因:表不同的访问方式(如顺序访问Sequential Access、索引访问Index Access、使用TID直接访问元组);表间不同的连接方式(嵌套循环连接Nest-loop join、归并连接Merge Join、Hash连接Hash join);表间不同的连接顺序(左连接Lefft-join、右连接Right-join、布希连接Bushy-join)。评价路径优劣的依据是用系统表pg_statistic中的系统统计信息估计出的不同路径的代价。下篇博客将描述PgStat收集的统计消息如何存储到pg_statistic系统表中。