创建分区表举例
使用手册提供的例子说明问题:假定我们正在为一个大型的冰激凌公司构建数据库。该公司每天测量最高温度以及每个区域的冰激凌销售情况。概念上,我们想要一个这样的表。
CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
);
我们知道大部分查询只会访问上周的、上月的或者上季度的数据,因为这个表的主要用途是为管理层准备在线报告。为了减少需要被存放的旧数据量,我们决定只保留最近3年的数据。在每个月的开始我们将去除掉最早的那个月的数据。在这种情况下我们可以使用分区技术来帮助我们满足对measurement表的所有不同需求。要在这种情况下使用声明式分区,可采用下面的步骤:
通过指定PARTITION BY子句把measurement表创建为分区表,该子句包括分区方法(这个例子中是RANGE)以及用作分区键的列列表。
CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
) PARTITION BY RANGE (logdate);
我们可以通过select relkind, relispartition, relhassubclass from pg_class where relname = 'measurement'
来查看该表目前的相关属性。
select relkind, relispartition, relhassubclass from pg_class where relname = 'measurement';
| relkind | relispartition | relhassubclass |
| p | f | f | # measurement为分区表,该表不是子分区,目前没有下级子表
你可能需要决定在分区键中使用多列进行范围分区。当然,这通常会导致较大数量的分区,其中每一个个体都比较小。另一方面,使用较少的列可能会导致粗粒度的分区策略得到较少数量的分区。如果条件涉及这些列中的一部分或者全部,访问分区表的查询将不得不扫描较少的分区。例如,考虑一个使用列lastname和firstname(按照这样的顺序)作为分区键进行范围分区的表。
创建分区。每个分区的定义必须指定对应于父表的分区方法和分区键的边界。注意,如果指定的边界使得新分区的值会与已有分区中的值重叠,则会导致错误。向父表中插入无法映射到任何现有分区的数据将会导致错误,这种情况下应该手工增加一个合适的分区。分区以普通PostgreSQL表(或者可能是外部表)的方式创建。可以为每个分区单独指定表空间和存储参数。
CREATE TABLE measurement_y2006m02 PARTITION OF measurement
FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');
CREATE TABLE measurement_y2006m03 PARTITION OF measurement
FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');
...
CREATE TABLE measurement_y2007m11 PARTITION OF measurement
FOR VALUES FROM ('2007-11-01') TO ('2007-12-01');
CREATE TABLE measurement_y2007m12 PARTITION OF measurement
FOR VALUES FROM ('2007-12-01') TO ('2008-01-01')
TABLESPACE fasttablespace;
CREATE TABLE measurement_y2008m01 PARTITION OF measurement
FOR VALUES FROM ('2008-01-01') TO ('2008-02-01')
WITH (parallel_workers = 4)
TABLESPACE fasttablespace;
我们可以通过select relkind, relispartition, relhassubclass from pg_class where relname = 'measurement_y2007m11'
来查看该表目前的相关属性。
select relkind, relispartition, relhassubclass from pg_class where relname = 'measurement';
| relkind | relispartition | relhassubclass |
| p | f | t | # measurement为分区表,该表不是子分区,有下级子表
select relkind, relispartition, relhassubclass from pg_class where relname = 'measurement_y2006m02';
| relkind | relispartition | relhassubclass |
| r | t | f | # measurement为普通表,该表是子分区,没有下级子表
select relkind, relispartition, relhassubclass from pg_class where relname = 'measurement_y2006m03';
| relkind | relispartition | relhassubclass |
| r | t | f | # measurement为普通表,该表是子分区,没有下级子表
...
为了实现子分区,在创建分区的命令中指定PARTITION BY子句,例如:
CREATE TABLE measurement_y2006m01 PARTITION OF measurement
FOR VALUES FROM ('2006-01-01') TO ('2006-02-01')
PARTITION BY RANGE (peaktemp);
在创建了measurement_y2006m01的分区之后,任何被插入到measurement中且被映射到measurement_y2006m01的数据(或者直接被插入到measurement_y2006m01的数据,假定它满足这个分区的分区约束)将被基于peaktemp列进一步重定向到measurement_y2006m01的一个分区。指定的分区键可以与父亲的分区键重叠,不过在指定子分区的边界时要注意它接受的数据集合是分区自身边界允许的数据集合的一个子集,系统不会尝试检查事情情况是否如此。
我们可以通过select relkind, relispartition, relhassubclass from pg_class where relname = 'measurement_y2006m01
来查看该表目前的相关属性。
select relkind, relispartition, relhassubclass from pg_class where relname = 'measurement';
| relkind | relispartition | relhassubclass |
| p | f | t | # measurement为分区表,该表不是子分区,有下级子表
select relkind, relispartition, relhassubclass from pg_class where relname = 'measurement_y2006m01';
| relkind | relispartition | relhassubclass |
| p | t | f | # measurement为分区表,该表是子分区,没有下级子表(还没创建子分区)
select relkind, relispartition, relhassubclass from pg_class where relname = 'measurement_y2006m02';
| relkind | relispartition | relhassubclass |
| r | t | f | # measurement为普通表,该表是子分区,没有下级子表
上述流程创建的表的关系如下所示:
表名 relkind relispartition relhassubclass
measurement 分区表 父表(非分区) 有下级子表
| - measurement_y2006m01 分区表 子分区 无下级子表
| - measurement_y2006m02 普通表 子分区 无下级子表
| - measurement_y2006m03 普通表 子分区 无下级子表
|
创建继承表举例
虽然内建的声明式分区适合于大部分常见的用例,但还是有一些场景需要更加灵活的方法。分区可以使用表继承来实现,这能够带来一些声明式分区不支持的特性,例如:
- 对声明式分区来说,分区必须具有和分区表正好相同的列集合,而在表继承中,子表可以有父表中没有出现过的额外列。
- 表继承允许多继承。
- 声明式分区仅支持范围、列表以及哈希分区,而表继承允许数据按照用户的选择来划分(不过注意,如果约束排除不能有效地剪枝子表,查询性能可能会很差)。
- 在使用声明式分区时,一些操作比使用表继承时要求更长的持锁时间。例如,向分区表中增加分区或者从分区表移除分区要求在父表上取得一个ACCESS EXCLUSIVE锁,而在常规继承的情况下一个SHARE UPDATE EXCLUSIVE锁就足够了。
我们使用上面用过的同一个measurement表。我们可以通过select relkind, relispartition, relhassubclass from pg_class where relname = 'measurement'
来查看该表目前的相关属性。
select relkind, relispartition, relhassubclass from pg_class where relname = 'measurement';
| relkind | relispartition | relhassubclass |
| r | f | f | # measurement为普通,该表不是子分区,目前没有下级子表
为了使用继承实现分区,可使用下面的步骤:创建“主”表,所有的“子”表都将从它继承。这个表将不包含数据。不要在这个表上定义任何检查约束,除非想让它们应用到所有的子表上。同样,在这个表上定义索引或者唯一约束也没有意义。对于我们的例子来说,主表是最初定义的measurement表。创建数个“子”表,每一个都从主表继承。通常,这些表将不会在从主表继承的列集合之外增加任何列。正如声明性分区那样,这些表就是普通的PostgreSQL表(或者外部表)。
CREATE TABLE measurement_y2006m02 () INHERITS (measurement);
CREATE TABLE measurement_y2006m03 () INHERITS (measurement);
...
CREATE TABLE measurement_y2007m11 () INHERITS (measurement);
CREATE TABLE measurement_y2007m12 () INHERITS (measurement);
CREATE TABLE measurement_y2008m01 () INHERITS (measurement);
我们可以通过select relkind, relispartition, relhassubclass from pg_class where relname = 'measurement'
来查看该表目前的相关属性。
select relkind, relispartition, relhassubclass from pg_class where relname = 'measurement';
| relkind | relispartition | relhassubclass |
| r | f | t | # measurement为普通,该表不是子分区,有下级子表
select relkind, relispartition, relhassubclass from pg_class where relname = 'measurement_y2006m02';
| relkind | relispartition | relhassubclass |
| r | f | f | # measurement_y2006m02为普通,该表不是子分区,目前没有下级子表
如果我们再创建一个表measurement_y2006m01继承自measurement,创建一个表measurement_y2006m01d01继承自measurement_y2006m01。我们通过select relkind, relispartition, relhassubclass from pg_class where relname = 'measurement'
来查看该表目前的相关属性。
CREATE TABLE measurement_y2006m01 () INHERITS (measurement);
CREATE TABLE measurement_y2006m01d01 () INHERITS (measurement_y2006m01);
select relkind, relispartition, relhassubclass from pg_class where relname = 'measurement';
| relkind | relispartition | relhassubclass |
| r | f | t | # measurement为普通,该表不是子分区,有下级子表
select relkind, relispartition, relhassubclass from pg_class where relname = 'measurement_y2006m01';
| relkind | relispartition | relhassubclass |
| r | f | t | # measurement_y2006m01为普通表,该表不是子分区,有下级子表
select relkind, relispartition, relhassubclass from pg_class where relname = 'measurement_y2006m01d01';
| relkind | relispartition | relhassubclass |
| r | f | f | # measurement_y2006m01d01为普通表,该表不是子分区,目前没有下级子表
为子表增加不重叠的表约束来定义每个分区允许的键值。确保约束能保证不同子表允许的键值之间没有重叠。典型的例子是:
CHECK ( x = 1 )
CHECK ( county IN ( 'Oxfordshire', 'Buckinghamshire', 'Warwickshire' ))
CHECK ( outletID >= 100 AND outletID < 200 )
像下面这样创建子表会更好:
CREATE TABLE measurement_y2006m02 (
CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' )
) INHERITS (measurement);
CREATE TABLE measurement_y2006m03 (
CHECK ( logdate >= DATE '2006-03-01' AND logdate < DATE '2006-04-01' )
) INHERITS (measurement);
...
CREATE TABLE measurement_y2007m11 (
CHECK ( logdate >= DATE '2007-11-01' AND logdate < DATE '2007-12-01' )
) INHERITS (measurement);
CREATE TABLE measurement_y2007m12 (
CHECK ( logdate >= DATE '2007-12-01' AND logdate < DATE '2008-01-01' )
) INHERITS (measurement);
CREATE TABLE measurement_y2008m01 (
CHECK ( logdate >= DATE '2008-01-01' AND logdate < DATE '2008-02-01' )
) INHERITS (measurement);
上述流程创建的表的关系如下所示:
表名 relkind relispartition relhassubclass
measurement 普通表 非分区 有下级子表
| - measurement_y2006m01 普通表 非分区 有下级子表
| - measurement_y2006m01d01 普通表 非分区 无下级子表
| - measurement_y2006m02 普通表 非分区 无下级子表
| - measurement_y2006m03 普通表 非分区 无下级子表
|
ExecVacuum流程对不同表的处理
首先我们向void ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
函数传入VacuumStmt结构体,该函数首先会处理解析VacuumStmt->options选项列表,设置VacuumParams params结构体。最后调用vacuum(vacstmt->rels, ¶ms, NULL, isTopLevel)
函数。vacstmt->rels成员存放的是VacuumRelation结构体列表,代表的是需要进行analyze的表relation和列va_cols。
void vacuum(List *relations, VacuumParams *params, BufferAccessStrategy bstrategy, bool isTopLevel)
函数的形参relations如果没有设置为NIL,那它就是要进行vacuum的表(a list of VacuumRelation to process);否则就需要处理数据库中所有相关的表。vacuum前期工作是:进行事务块检测、不可重入校验、为跨事务存储创建特殊内存上下文、建立要处理的relation列表和决定是否需要启动/提交自己的事务。这里我们关注建立要处理的relation列表流程。
/* Build list of relation(s) to process, putting any new data in vac_context for safekeeping. */
if (relations != NIL) {
List *newrels = NIL; ListCell *lc;
foreach(lc, relations) { // 遍历上图中的VacuumStmt->rels列表
VacuumRelation *vrel = lfirst_node(VacuumRelation, lc);
List *sublist;
MemoryContext old_context;
sublist = expand_vacuum_rel(vrel, params->options); // 调用expand_vacuum_rel函数获取VacuumRelation代表的表所关联的表
old_context = MemoryContextSwitchTo(vac_context);
newrels = list_concat(newrels, sublist);
MemoryContextSwitchTo(old_context);
}
relations = newrels;
} else relations = get_all_vacuum_rels(params->options); // 需要获取数据库中所有相关的表
get_all_vacuum_rels函数流程比较简单,扫描pg_class系统表,把RELKIND_RELATION、RELKIND_MATVIEW、RELKIND_PARTITIONED_TABLE类型的表做成VacuumRelation列表返回。
expand_vacuum_rel函数获取VacuumRelation代表的表所关联的哪些表呢?
- 如果VacuumRelation->oid被指定了,也就是明确指定分析该表,直接返回。
- 如果VacuumRelation->oid需要分析表的oid没有被指定(通常直接执行analyze table流程中,需要分析表的oid是没有被指定)。取
relid = RangeVarGetRelidExtended(vrel->relation, AccessShareLock, rvr_opts, NULL, NULL)
需要分析表的oid;通过通过需要分析的表oid从syscahce查找对应的pg_class条目,通过classForm->relkind对应字段,如果为RELKIND_PARTITIONED_TABLE)【除了分区叶子子表,其他分区表都属于该类,如上例子分析所示】,设置include_parts为true;如果include_parts为true,则需要调用find_all_inheritors函数(上一篇我们介绍过这个函数)查找以relid为根节点表的继承树上的所有子表;最终返回VacuumRelation列表,每个元素代表一个需要分析的表。
比如:分析measurement和measurement_y2006m01表,include_parts就应该设置为true,也就是分区表;其他measurement_y2006m02、measurement_y2006m03和measurement_y2006m04是普通表。如果分析measurement表,如下继承树上的所有子表都将作为VacuumRelation结构体返回;如果分析measurement_y2006m01表,仅仅将measurement_y2006m01表作为VacuumRelation结构体返回。
表名 relkind relispartition relhassubclass
measurement 分区表 父表(非分区) 有下级子表
| - measurement_y2006m01 分区表 子分区 无下级子表
| - measurement_y2006m02 普通表 子分区 无下级子表
| - measurement_y2006m03 普通表 子分区 无下级子表
|
static List *expand_vacuum_rel(VacuumRelation *vrel, int options){
List *vacrels = NIL;
MemoryContext oldcontext;
if (OidIsValid(vrel->oid)) { /* If caller supplied OID, there's nothing we need do here. */
oldcontext = MemoryContextSwitchTo(vac_context);
vacrels = lappend(vacrels, vrel);
MemoryContextSwitchTo(oldcontext);
}else{ /* Process a specific relation, and possibly partitions thereof */
Oid relid; HeapTuple tuple; Form_pg_class classForm;
bool include_parts; int rvr_opts;
/* We transiently take AccessShareLock to protect the syscache lookup below, as well as find_all_inheritors's expectation that the caller holds some lock on the starting relation. */ // 我们暂时使用AccessShareLock来保护下面的syscache查找,以及find_all_inheriters对调用方持有起始关系的某些锁的期望
rvr_opts = (options & VACOPT_SKIP_LOCKED) ? RVR_SKIP_LOCKED : 0;
relid = RangeVarGetRelidExtended(vrel->relation, AccessShareLock, rvr_opts, NULL, NULL);
/* If the lock is unavailable, emit the same log statement that vacuum_rel() and analyze_rel() would. */ // 如果锁不可用,则发出与vacuum_rel和analyze_rel相同的日志语句。
if (!OidIsValid(relid)){
if (options & VACOPT_VACUUM) ereport(WARNING,(errcode(ERRCODE_LOCK_NOT_AVAILABLE),errmsg("skipping vacuum of \"%s\" --- lock not available",vrel->relation->relname)));
else ereport(WARNING,(errcode(ERRCODE_LOCK_NOT_AVAILABLE),errmsg("skipping analyze of \"%s\" --- lock not available",vrel->relation->relname)));
return vacrels;
}
/* To check whether the relation is a partitioned table and its ownership, fetch its syscache entry. */ // 要检查关系是否是分区表及其所有权,请获取其syscache条目
tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid)); // 通过需要分析的表oid从syscahce查找对应的pg_class条目
if (!HeapTupleIsValid(tuple)) elog(ERROR, "cache lookup failed for relation %u", relid);
classForm = (Form_pg_class) GETSTRUCT(tuple);
if (vacuum_is_relation_owner(relid, classForm, options)){ /* Make a returnable VacuumRelation for this rel if user is a proper owner. */
oldcontext = MemoryContextSwitchTo(vac_context);
vacrels = lappend(vacrels, makeVacuumRelation(vrel->relation,relid,vrel->va_cols)); // 将VacuumRelation结构体的信息、需要分析的表oid和需要分析的列制作成VacuumRelation结构体
MemoryContextSwitchTo(oldcontext);
}
include_parts = (classForm->relkind == RELKIND_PARTITIONED_TABLE); // 除了分区叶子子表,其他分区表都属于该类
ReleaseSysCache(tuple);
/* If it is, make relation list entries for its partitions. Note that the list returned by find_all_inheritors() includes the passed-in OID, so we have to skip that. There's no point in taking locks on the individual partitions yet, and doing so would just add unnecessary deadlock risk. For this last reason we do not check yet the ownership of the partitions, which get added to the list to process. Ownership will be checked later on anyway. */
if (include_parts){
List *part_oids = find_all_inheritors(relid, NoLock, NULL); ListCell *part_lc;
foreach(part_lc, part_oids){
Oid part_oid = lfirst_oid(part_lc);
if (part_oid == relid) continue; /* ignore original table */ // 将指定分析的表剔除,因为find_all_inheritors返回的列表第一个元素就是分析的表剔除
oldcontext = MemoryContextSwitchTo(vac_context);
vacrels = lappend(vacrels, makeVacuumRelation(NULL, part_oid, vrel->va_cols)); /* We omit a RangeVar since it wouldn't be appropriate to complain about failure to open one of these relations later. */
MemoryContextSwitchTo(oldcontext);
}
}
UnlockRelationOid(relid, AccessShareLock);
}
return vacrels;
}
analyze流程对不同表的处理
void analyze_rel(Oid relid, RangeVar *relation, VacuumParams *params, List *va_cols, bool in_outer_xact, BufferAccessStrategy bstrategy)
函数分析单个表,形参relid标识要分析的relation oid。如果提供了形参relation,则使用其中的名称报告打开/锁定表的任何失败;一旦我们成功打开表,就不要使用它,因为它可能已过时。我们知道analyze命令可以分析整个数据库、指定的某几个表、指定表的某几个列,所以这里我们可以传入va_cols代表表的某几个列。
void analyze_rel(Oid relid, RangeVar *relation, VacuumParams *params, List *va_cols, bool in_outer_xact, BufferAccessStrategy bstrategy) {
Relation onerel; int elevel;
AcquireSampleRowsFunc acquirefunc = NULL; BlockNumber relpages = 0;
...
onerel = vacuum_open_relation(relid, relation, params->options & ~(VACOPT_VACUUM), params->log_min_duration >= 0, ShareUpdateExclusiveLock);
if (!onerel)return; /* leave if relation could not be opened or locked */
if (!vacuum_is_relation_owner(RelationGetRelid(onerel),onerel->rd_rel, params->options & VACOPT_ANALYZE)){ relation_close(onerel, ShareUpdateExclusiveLock);return; }
if (RELATION_IS_OTHER_TEMP(onerel)){ relation_close(onerel, ShareUpdateExclusiveLock);return; }
/* We can ANALYZE any table except pg_statistic. See update_attstats */
if (RelationGetRelid(onerel) == StatisticRelationId){ relation_close(onerel, ShareUpdateExclusiveLock);return; }
if (onerel->rd_rel->relkind == RELKIND_RELATION || onerel->rd_rel->relkind == RELKIND_MATVIEW) { /* Check that it's of an analyzable relkind, and set up appropriately. */
acquirefunc = acquire_sample_rows; /* Regular table, so we'll use the regular row acquisition function */
relpages = RelationGetNumberOfBlocks(onerel); /* Also get regular table's size */
}else if (onerel->rd_rel->relkind == RELKIND_FOREIGN_TABLE){ /* For a foreign table, call the FDW's hook function to see whether it supports analysis. */
FdwRoutine *fdwroutine; bool ok = false;
fdwroutine = GetFdwRoutineForRelation(onerel, false);
if (fdwroutine->AnalyzeForeignTable != NULL)
ok = fdwroutine->AnalyzeForeignTable(onerel, &acquirefunc,&relpages);
if (!ok){ ereport(WARNING,(errmsg("skipping \"%s\" --- cannot analyze this foreign table",RelationGetRelationName(onerel)))); relation_close(onerel, ShareUpdateExclusiveLock); return; }
} else if (onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) {
/* For partitioned tables, we want to do the recursive ANALYZE below. */ }
else { /* No need for a WARNING if we already complained during VACUUM */
if (!(params->options & VACOPT_VACUUM)) ereport(WARNING,(errmsg("skipping \"%s\" --- cannot analyze non-tables or special system tables",RelationGetRelationName(onerel))));
relation_close(onerel, ShareUpdateExclusiveLock); return;
}
...
if (onerel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE) /* Do the normal non-recursive ANALYZE. We can skip this for partitioned tables, which don't contain any rows. */
do_analyze_rel(onerel, params, va_cols, acquirefunc, relpages, false, in_outer_xact, elevel);
if (onerel->rd_rel->relhassubclass) /* If there are child tables, do recursive ANALYZE. */
do_analyze_rel(onerel, params, va_cols, acquirefunc, relpages, true, in_outer_xact, elevel);
...
}
如果分析的表是measurement_y2006m02子分区(普通表),那么只要执行analyze_rel(measurement_y2006m02_oid, …),采样函数使用acquire_sample_rows,表页数使用RelationGetNumberOfBlocks(onerel)获取;另外,如果分析的表为FDW(FDW表直接定义使用或者作为分区表的叶子分区使用),调用fdwroutine->AnalyzeForeignTable(onerel, &acquirefunc,&relpages)
获取采样函数和表页数;然后直接进入第一处的do_analyze_rel(onerel, params, va_cols, acquirefunc, relpages, false, in_outer_xact, elevel)
。如果分析的表是measurement表,则需要顺序执行analyze_rel对measurement继承树进行广度遍历的的所有子分区的表进行分析;首先会对measurement表进行分析,该表是分区表且有下级子表,走第二个分支do_analyze_rel(onerel, params, va_cols, acquirefunc, relpages, true, in_outer_xact, elevel)
;当执行到measurement_y2006m01分区表时,由于其是分区表,不会走第一个do_analyze_rel
,又因为其无下级子表,也不会走第二个do_analyze_rel
,所以该表不会进行分析;当执行到measurement_y2006m02,由于是普通表且无下级子表,采样函数使用acquire_sample_rows,表页数使用RelationGetNumberOfBlocks(onerel)获取,执行第一个do_analyze_rel
。如果measurement_y2006m01表下有子分区的话,其relhassubclass为true,也就是有下级子表,其会走第二个do_analyze_rel
,因此第二个分支是用于处理有下级子表的分区表的流程。
表名 relkind relispartition relhassubclass
measurement 分区表 父表(非分区) 有下级子表
| - measurement_y2006m01 分区表 子分区 无下级子表
| - measurement_y2006m02 普通表 子分区 无下级子表
| - measurement_y2006m03 普通表 子分区 无下级子表
|
如果分析的表是measurement_y2006m02,和上述流程相似,采样函数使用acquire_sample_rows,表页数使用RelationGetNumberOfBlocks(onerel)获取,走第一个do_analyze_rel分支。如果分析的表是measurement,则需要顺序执行analyze_rel对measurement继承树进行广度遍历的的所有子分区的表进行分析;对于measurement_y2006m01表由于是普通表,所以会进入第一个do_analyze_rel分支,由于又有下级子表,又会进入第二个do_analyze_rel分支;对于measurement_y2006m02、measurement_y2006m03、measurement_y2006m04这些普通表,则直接进入第一个分支。
表名 relkind relispartition relhassubclass
measurement 普通表 非分区 有下级子表
| - measurement_y2006m01 普通表 非分区 有下级子表
| - measurement_y2006m01d01 普通表 非分区 无下级子表
| - measurement_y2006m02 普通表 非分区 无下级子表
| - measurement_y2006m03 普通表 非分区 无下级子表
|
do_analyze_rel对分区表(有下级子表)的处理
从上面的分析可以看出对于measurement和measurement_y2006m01两类表analyze_rel函数都会进入do_analyze_rel(onerel, params, va_cols, acquirefunc, relpages, true, in_outer_xact, elevel)
分支(注意第6个参数inh为true)。这里来分析一下该函数对于有下级子表的表的分析动作。我们知道do_analyze_rel函数主要分为准备工作、采样工作、更新统计信息流程三大流程。
do_analyze_rel函数准备工作流程如下:创建anl_context内存上下文,并切换到该内存上下文中;切换到表所有者的userid,以便任何索引函数都作为该用户运行;需要进一步确定要分析一个表中的哪些列:用户可能指定只分析表中的某几个列,被频繁访问的列才更有被分析的价值,然后还要打开待分析表的所有索引,看看是否有可以被分析的列。对于准备工作流程来说,分析有子表的父表时,不去获取父表索引的列。
static void do_analyze_rel(Relation onerel, VacuumParams *params, List *va_cols, AcquireSampleRowsFunc acquirefunc, BlockNumber relpages, bool inh, bool in_outer_xact, int elevel) {
int attr_cnt,tcnt,i,ind, nindexes, targrows,numrows, save_sec_context,save_nestlevel;
double totalrows,totaldeadrows; Oid save_userid;
Relation *Irel; bool hasindex;
VacAttrStats **vacattrstats; AnlIndexData *indexdata; HeapTuple *rows;
PGRUsage ru0; TimestampTz starttime = 0; MemoryContext caller_context;
anl_context = AllocSetContextCreate(CurrentMemoryContext,"Analyze",ALLOCSET_DEFAULT_SIZES); /* Set up a working context so that we can easily free whatever junk gets created. */
caller_context = MemoryContextSwitchTo(anl_context);
GetUserIdAndSecContext(&save_userid, &save_sec_context); /* Switch to the table owner's userid, so that any index functions are run as that user. Also lock down security-restricted operations and arrange to make GUC variable changes local to this command. */
SetUserIdAndSecContext(onerel->rd_rel->relowner, save_sec_context | SECURITY_RESTRICTED_OPERATION);
save_nestlevel = NewGUCNestLevel();
// 处理被分析的列,这里我们不需要看
/* Determine which columns to analyze Note that system attributes are never analyzed, so we just reject them at the lookup stage. We also reject duplicate column mentions. (We could alternatively ignore duplicates, but analyzing a column twice won't work; we'd end up making a conflicting update in pg_statistic.) */
if (va_cols != NIL) { // 有指明需要分析的列
Bitmapset *unique_cols = NULL; ListCell *le;
vacattrstats = (VacAttrStats **) palloc(list_length(va_cols) * sizeof(VacAttrStats *));
tcnt = 0;
foreach(le, va_cols){
char *col = strVal(lfirst(le));
i = attnameAttNum(onerel, col, false);
if (i == InvalidAttrNumber) ereport(ERROR,(errcode(ERRCODE_UNDEFINED_COLUMN),errmsg("column \"%s\" of relation \"%s\" does not exist",col, RelationGetRelationName(onerel))));
if (bms_is_member(i, unique_cols)) ereport(ERROR,(errcode(ERRCODE_DUPLICATE_COLUMN),errmsg("column \"%s\" of relation \"%s\" appears more than once",col, RelationGetRelationName(onerel))));
unique_cols = bms_add_member(unique_cols, i);
vacattrstats[tcnt] = examine_attribute(onerel, i, NULL);
if (vacattrstats[tcnt] != NULL) tcnt++;
}
attr_cnt = tcnt;
}else{ // 获取表的所有列
attr_cnt = onerel->rd_att->natts;
vacattrstats = (VacAttrStats **)=palloc(attr_cnt * sizeof(VacAttrStats *));
tcnt = 0;
for (i = 1; i <= attr_cnt; i++)={
vacattrstats[tcnt] = examine_attribute(onerel, i, NULL);
if (vacattrstats[tcnt] != NULL) tcnt++;
}
attr_cnt = tcnt;
}
/* Open all indexes of the relation, and see if there are any analyzable columns in the indexes. We do not analyze index columns if there was an explicit column list in the ANALYZE command, however. If we are doing a recursive scan, we don't want to touch the parent's indexes at all. */
if (!inh) vac_open_indexes(onerel, AccessShareLock, &nindexes, &Irel);
else { // 分析有下级子表的父表时,不处理父表索引
Irel = NULL; nindexes = 0;
}
hasindex = (nindexes > 0);
indexdata = NULL;
if (hasindex) {
indexdata = (AnlIndexData *) palloc0(nindexes * sizeof(AnlIndexData));
for (ind = 0; ind < nindexes; ind++){
AnlIndexData *thisdata = &indexdata[ind];
IndexInfo *indexInfo;
thisdata->indexInfo = indexInfo = BuildIndexInfo(Irel[ind]);
thisdata->tupleFract = 1.0; /* fix later if partial */
if (indexInfo->ii_Expressions != NIL && va_cols == NIL){
ListCell *indexpr_item = list_head(indexInfo->ii_Expressions);
thisdata->vacattrstats = (VacAttrStats **)palloc(indexInfo->ii_NumIndexAttrs * sizeof(VacAttrStats *));
tcnt = 0;
for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++){
int keycol = indexInfo->ii_IndexAttrNumbers[i];
if (keycol == 0){/* Found an index expression */
Node *indexkey;
if (indexpr_item == NULL) /* shouldn't happen */elog(ERROR, "too few entries in indexprs list");
indexkey = (Node *) lfirst(indexpr_item);
indexpr_item = lnext(indexpr_item);
thisdata->vacattrstats[tcnt] = examine_attribute(Irel[ind], i + 1, indexkey);
if (thisdata->vacattrstats[tcnt] != NULL) tcnt++;
}
}
thisdata->attr_cnt = tcnt;
}
}
}
do_analyze_rel函数采样工作流程如下:使用所有可分析列中的最坏情况,确定需要采样的行数;使用采样函数获取采样行;计算统计数据。这里可以看出对于有子表的父表来说,需要使用指定acquire_inherited_sample_rows函数进行采样。
= 100;
for (i = 0; i < attr_cnt; i++){ if (targrows < vacattrstats[i]->minrows) targrows = vacattrstats[i]->minrows; }
for (ind = 0; ind < nindexes; ind++) {
AnlIndexData *thisdata = &indexdata[ind];
for (i = 0; i < thisdata->attr_cnt; i++){ if (targrows < thisdata->vacattrstats[i]->minrows) targrows = thisdata->vacattrstats[i]->minrows; }
}
rows = (HeapTuple *) palloc(targrows * sizeof(HeapTuple)); /* Acquire the sample rows */
if (inh) numrows = acquire_inherited_sample_rows(onerel, elevel,rows, targrows,&totalrows, &totaldeadrows);
else numrows = (*acquirefunc) (onerel, elevel, rows, targrows, &totalrows, &totaldeadrows);
do_analyze_rel函数更新统计信息流程如下:将完成的统计数据行发送到pg_statistic,替换目标列的任何以前的统计数据;构建扩展统计数据;更新pg_class中的页面/元组统计;向统计收集器报告分析
/* Compute the statistics. Temporary results during the calculations for each column are stored in a child context. The calc routines are responsible to make sure that whatever they store into the VacAttrStats structure is allocated in anl_context. */
if (numrows > 0) {
MemoryContext col_context, old_context;
col_context = AllocSetContextCreate(anl_context,"Analyze Column",ALLOCSET_DEFAULT_SIZES);
old_context = MemoryContextSwitchTo(col_context);
for (i = 0; i < attr_cnt; i++){
VacAttrStats *stats = vacattrstats[i];
AttributeOpts *aopt;
stats->rows = rows;
stats->tupDesc = onerel->rd_att;
stats->compute_stats(stats, std_fetch_func, numrows, totalrows);
/* If the appropriate flavor of the n_distinct option is specified, override with the corresponding value. */
aopt = get_attribute_options(onerel->rd_id, stats->attr->attnum);
if (aopt != NULL){
float8 n_distinct;
n_distinct = inh ? aopt->n_distinct_inherited : aopt->n_distinct; // 这里不同
if (n_distinct != 0.0) stats->stadistinct = n_distinct;
}
MemoryContextResetAndDeleteChildren(col_context);
}
if (hasindex) compute_index_stats(onerel, totalrows,indexdata, nindexes, rows, numrows, col_context);
MemoryContextSwitchTo(old_context);
MemoryContextDelete(col_context);
/*
* Emit the completed stats rows into pg_statistic, replacing any
* previous statistics for the target columns. (If there are stats in
* pg_statistic for columns we didn't process, we leave them alone.)
*/
update_attstats(RelationGetRelid(onerel), inh, attr_cnt, vacattrstats); // 这里不同
for (ind = 0; ind < nindexes; ind++){
AnlIndexData *thisdata = &indexdata[ind];
update_attstats(RelationGetRelid(Irel[ind]), false, thisdata->attr_cnt, thisdata->vacattrstats);
}
/*
* Build extended statistics (if there are any).
*
* For now we only build extended statistics on individual relations,
* not for relations representing inheritance trees.
*/
if (!inh)
BuildRelationExtStatistics(onerel, totalrows, numrows, rows, attr_cnt, vacattrstats); // 这里不同
}
/* Update pages/tuples stats in pg_class ... but not if we're doing inherited stats.
*/
if (!inh) { // 这里不同
BlockNumber relallvisible;
visibilitymap_count(onerel, &relallvisible, NULL);
vac_update_relstats(onerel,
relpages,
totalrows,
relallvisible,
hasindex,
InvalidTransactionId,
InvalidMultiXactId,
in_outer_xact);
}
/*
* Same for indexes. Vacuum always scans all indexes, so if we're part of
* VACUUM ANALYZE, don't overwrite the accurate count already inserted by
* VACUUM.
*/
if (!inh && !(params->options & VACOPT_VACUUM)) // 这里不同
{
for (ind = 0; ind < nindexes; ind++)
{
AnlIndexData *thisdata = &indexdata[ind];
double totalindexrows;
totalindexrows = ceil(thisdata->tupleFract * totalrows);
vac_update_relstats(Irel[ind],
RelationGetNumberOfBlocks(Irel[ind]),
totalindexrows,
0,
false,
InvalidTransactionId,
InvalidMultiXactId,
in_outer_xact);
}
}
/*
* Report ANALYZE to the stats collector, too. However, if doing
* inherited stats we shouldn't report, because the stats collector only
* tracks per-table stats. Reset the changes_since_analyze counter only
* if we analyzed all columns; otherwise, there is still work for
* auto-analyze to do.
*/
if (!inh) // 这里不同
pgstat_report_analyze(onerel, totalrows, totaldeadrows, (va_cols == NIL));
}
分布键系统表、优化器怎么使用分布键进行选择路径、Greenplum统计信息