PostgreSQL的存储管理器的任务是满足ACID兼容的复杂的工作。 拥有一套有据可查的一系列算法,我不会在这里赘述了。 像任何存储系统或数据库, PostgreSQL的表就可以开始采取自由空间(因为它有时也被称为,膨胀)。 虽然使用的自由空间是不是一件坏事,也有一些情况下它可以变得笨拙。
在一个高级别:
- 在一个UPDATE或DELETE Postgres的,该行不会被删除。它被标记为重新使用“自由空间”。
- 如果有适合的自由空间(或者你有没有新行)没有入境行,也被标记为空闲的“死行”,现在占用的磁盘空间。
有时候我们看到的是,只有几千行总计不到1 GB的表, 我已经看到了使用的磁盘空间100GB。 这是由一个老版本的石英调度使用的队列。 该表是高容量,纯INSERT和DELETE。 从来没有为死空间被发现并重新使用的任何机会。
因此, 我们已经确定的交通模式具有高容量UPDATE和DELETE可导致膨胀, 但是,实际上,我怎么监督呢?
有真的在这里,虽然两所学校:
- 深层扫描表, 读取所有的活的&失效的行
- 基于目录的估计
##难道膨胀一定不好?
在继续寻找膨胀之前,让我们理解其含意。 如果你已经花了很多时间与数据库管理员, 你可能听说过他们讨论的自由空间。 自由空间是已被分配的磁盘空间,并且可用于使用。 这种类型的可用空间是非常有效的使用与分配新的块。 我们的目标不是消除所有膨胀。我们的目标是消除过度膨胀。 这里的查询将要或者明确发现或估计的关系的可用空间的量。 我不建议去上对所有建成自由空间的一个堂吉诃德式的探索。只是消除多余的膨胀。
深层扫描
第一个策略将是确定膨胀的最精确的方法。 您可以看到到底有多少膨胀是在表中。没有与此战略的几个问题:
- 关系的深层扫描需要时间
- 从服务器深度扫描需求的I/O
换句话说:这将需要很长的时间,最有可能影响系统性能。
我今天不打算深入了解此选项,如果你有兴趣,看看在pgstattuple等模块。 (注:pgstattuple等也有一个pgstattuple_approx)中的最新版本的功能。 虽然它确实接近膨胀(使深扫描稍快),但它仍然扫描的关系。
##系统元数据(catalog)
Postgres的收集有关表和索引的统计信息, 以便有效地对它们进行查询(这是通过“分析”,另一天的讨论完成)。 一些良好的放置查询可以使用这些统计信息来估计表中的膨胀的量。 由于我们使用的是元数据,不需要深度扫描。
缺点是,我们不会有100%准确的统计数据,这只是一个估计值。
我喜欢一个战略,让我快速生成潜在问题的进一步挖掘之前的列表。 从本质上讲,我开始估计再钻,在寻找更多的细节。
很高兴,我们拥有两个很受欢迎的表膨胀估计查询。
最为著名的是作为check_postgres.pl(Nagios 中的PostgreSQL插件)的一部分。 就个人而言,我还发现他有一个难点是他的结果上手稍稍难一点。
我目前最喜欢的查询语句是另外一个,提供了潜在膨胀问题,结果可读性好 链接:
/* WARNING: executed with a non-superuser role,
* the query inspect only tables you are granted to read.
* This query is compatible with PostgreSQL 9.0 and more
*/
SELECT current_database(), schemaname, tblname, bs*tblpages AS real_size,
(tblpages-est_tblpages)*bs AS extra_size,
CASE WHEN tblpages - est_tblpages > 0
THEN 100 * (tblpages - est_tblpages)/tblpages::float
ELSE 0
END AS extra_ratio, fillfactor, (tblpages-est_tblpages_ff)*bs AS bloat_size,
CASE WHEN tblpages - est_tblpages_ff > 0
THEN 100 * (tblpages - est_tblpages_ff)/tblpages::float
ELSE 0
END AS bloat_ratio, is_na
-- , (pst).free_percent + (pst).dead_tuple_percent AS real_frag
FROM (
SELECT ceil( reltuples / ( (bs-page_hdr)/tpl_size ) ) + ceil( toasttuples / 4 ) AS est_tblpages,
ceil( reltuples / ( (bs-page_hdr)*fillfactor/(tpl_size*100) ) ) + ceil( toasttuples / 4 ) AS est_tblpages_ff,
tblpages, fillfactor, bs, tblid, schemaname, tblname, heappages, toastpages, is_na
-- , stattuple.pgstattuple(tblid) AS pst
FROM (
SELECT
( 4 + tpl_hdr_size + tpl_data_size + (2*ma)
- CASE WHEN tpl_hdr_size%ma = 0 THEN ma ELSE tpl_hdr_size%ma END
- CASE WHEN ceil(tpl_data_size)::int%ma = 0 THEN ma ELSE ceil(tpl_data_size)::int%ma END
) AS tpl_size, bs - page_hdr AS size_per_block, (heappages + toastpages) AS tblpages, heappages,
toastpages, reltuples, toasttuples, bs, page_hdr, tblid, schemaname, tblname, fillfactor, is_na
FROM (
SELECT
tbl.oid AS tblid, ns.nspname AS schemaname, tbl.relname AS tblname, tbl.reltuples,
tbl.relpages AS heappages, coalesce(toast.relpages, 0) AS toastpages,
coalesce(toast.reltuples, 0) AS toasttuples,
coalesce(substring(
array_to_string(tbl.reloptions, ' ')
FROM '%fillfactor=#"__#"%' FOR '#')::smallint, 100) AS fillfactor,
current_setting('block_size')::numeric AS bs,
CASE WHEN version()~'mingw32' OR version()~'64-bit|x86_64|ppc64|ia64|amd64' THEN 8 ELSE 4 END AS ma,
24 AS page_hdr,
23 + CASE WHEN MAX(coalesce(null_frac,0)) > 0 THEN ( 7 + count(*) ) / 8 ELSE 0::int END
+ CASE WHEN tbl.relhasoids THEN 4 ELSE 0 END AS tpl_hdr_size,
sum( (1-coalesce(s.null_frac, 0)) * coalesce(s.avg_width, 1024) ) AS tpl_data_size,
bool_or(att.atttypid = 'pg_catalog.name'::regtype) AS is_na
FROM pg_attribute AS att
JOIN pg_class AS tbl ON att.attrelid = tbl.oid
JOIN pg_namespace AS ns ON ns.oid = tbl.relnamespace
JOIN pg_stats AS s ON s.schemaname=ns.nspname
AND s.tablename = tbl.relname AND s.inherited=false AND s.attname=att.attname
LEFT JOIN pg_class AS toast ON tbl.reltoastrelid = toast.oid
WHERE att.attnum > 0 AND NOT att.attisdropped
AND tbl.relkind = 'r'
GROUP BY 1,2,3,4,5,6,7,8,9,10, tbl.relhasoids
ORDER BY 2,3
) AS s
) AS s2
) AS s3;
-- WHERE NOT is_na
-- AND tblpages*((pst).free_percent + (pst).dead_tuple_percent)::float4/100 >= 1
正如我所说,这些查询的复杂性是很高的。让我们来看看在输出(如图片,因为文本换行)
current_database | schemaname | tblname | real_size | extra_size | extra_ratio | fillfactor | bloat_size | bloat_ratio | is_na
------------------+------------+------------------------+------------+------------+-------------------+------------+--------------+-------------------+-------
postgres | public | pgbench_accounts | 1411891200 | 39084032 | 2.76820423556716 | 10 | -12316139520 | 0 | f
postgres | public | pgbench_branches | 1105920 | 1097728 | 99.2592592592593 | 10 | 1064960 | 96.2962962962963 | f
postgres | public | pgbench_history | 16867328 | 73728 | 0.437105390966489 | 100 | 73728 | 0.437105390966489 | f
postgres | public | pgbench_tellers | 21422080 | 21372928 | 99.7705544933078 | 10 | 20979712 | 97.9349904397706 | f
postgres | public | t_stats | 32768 | 0 | 0 | 100 | 0 | 0 | f
postgres | snapshots | snap | 65536 | 0 | 0 | 100 | 0 | 0 | f
postgres | snapshots | snap_databases | 2424832 | 327680 | 13.5135135135135 | 100 | 327680 | 13.5135135135135 | t
postgres | snapshots | snap_indexes | 9330688 | 327680 | 3.51185250219491 | 100 | 327680 | 3.51185250219491 | t
postgres | snapshots | snap_pg_locks | 5980160 | 483328 | 8.08219178082192 | 100 | 483328 | 8.08219178082192 | f
postgres | snapshots | snap_settings | 24576 | 0 | 0 | 100 | 0 | 0 | f
postgres | snapshots | snap_stat_activity | 1449984 | 65536 | 4.51977401129944 | 100 | 65536 | 4.51977401129944 | t
postgres | snapshots | snap_statio_all_tables | 29868032 | 974848 | 3.26385079539221 | 100 | 974848 | 3.26385079539221 | t
postgres | snapshots | snap_user_tables | 5472256 | 270336 | 4.94011976047904 | 100 | 270336 | 4.94011976047904 | t
我们来到这里是一个非常不错的概述, 显示我们(以字节为单位所有尺寸)的关系的real_size,有多少是额外的,什么额外的比例。最后一列,is_na(不适用)是重要的。事实上,如果你正在使用的“名称”的数据类型此列是真实的。 “名称”数据类型抛出了臃肿的估计,可以给你的无效数据。
由于这些都是估计,我们必须把它们和一粒盐。 你可能想,如果你不熟悉的工作量已经把它从pg_stat_user_tables和比较,以你的使用率。
回收空间,这是另一天的话题,但是,看看VACUUM FULL(需要全表锁) pg_repack(使用最少的锁定回收空间扩展).
现在您对臃肿的一些信息。在这一点上,你可以用pgstattuple等模块, 或者开始更积极的真空瞄准一些嫌疑/或移动压实。
除此(之前提到的)之外查询,还有一个Nagios中流行的check_postgres.pl脚本:
SELECT
current_database() AS db,
schemaname,
tablename,
reltuples::bigint AS tups,
relpages::bigint AS pages,
otta,
ROUND(CASE
WHEN otta=0
OR sml.relpages=0
OR sml.relpages=otta THEN
0.0
ELSE sml.relpages/otta::numeric END,1) AS tbloat,
CASE
WHEN relpages < otta THEN
0
ELSE relpages::bigint - otta
END AS wastedpages,
CASE
WHEN relpages < otta THEN
0
ELSE bs*(sml.relpages-otta)::bigint
END AS wastedbytes,
CASE
WHEN relpages < otta THEN
'0 bytes'::text
ELSE (bs*(relpages-otta))::bigint || ' bytes'
END AS wastedsize, iname, ituples::bigint AS itups, ipages::bigint AS ipages, iotta, ROUND(CASE
WHEN iotta=0
OR ipages=0
OR ipages=iotta THEN
0.0
ELSE ipages/iotta::numeric END,1) AS ibloat,
CASE
WHEN ipages < iotta THEN
0
ELSE ipages::bigint - iotta
END AS wastedipages,
CASE
WHEN ipages < iotta THEN
0
ELSE bs*(ipages-iotta)
END AS wastedibytes,
CASE
WHEN ipages < iotta THEN
'0 bytes'
ELSE (bs*(ipages-iotta))::bigint || ' bytes'
END AS wastedisize,
CASE
WHEN relpages < otta THEN
CASE
WHEN ipages < iotta THEN
0
ELSE bs*(ipages-iotta::bigint)
END ELSE
CASE
WHEN ipages < iotta THEN
bs*(relpages-otta::bigint)
ELSE bs*(relpages-otta::bigint + ipages-iotta::bigint)
END END AS totalwastedbytes
FROM
(SELECT nn.nspname AS schemaname,
cc.relname AS tablename,
COALESCE(cc.reltuples,
0) AS reltuples,
COALESCE(cc.relpages,
0) AS relpages,
COALESCE(bs,
0) AS bs,
COALESCE(CEIL((cc.reltuples*((datahdr+ma- (CASE
WHEN datahdr%ma=0 THEN
ma
ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)),0) AS otta, COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages, COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- very rough approximation, assumes ALL cols
FROM pg_class cc
JOIN pg_namespace nn
ON cc.relnamespace = nn.oid
AND nn.nspname <> 'information_schema'
LEFT JOIN
(SELECT ma,
bs,
foo.nspname,
foo.relname,
(datawidth+(hdr+ma-(case
WHEN hdr%ma=0 THEN
ma
ELSE hdr%ma END)))::numeric AS datahdr, (maxfracsum*(nullhdr+ma-(case
WHEN nullhdr%ma=0 THEN
ma
ELSE nullhdr%ma END))) AS nullhdr2
FROM
(SELECT ns.nspname,
tbl.relname,
hdr,
ma,
bs,
SUM((1-coalesce(null_frac,
0))*coalesce(avg_width,
2048)) AS datawidth,
MAX(coalesce(null_frac,
0)) AS maxfracsum,
hdr+
(SELECT 1+count(*)/8
FROM pg_stats s2
WHERE null_frac<>0
AND s2.schemaname = ns.nspname
AND s2.tablename = tbl.relname ) AS nullhdr
FROM pg_attribute att
JOIN pg_class tbl
ON att.attrelid = tbl.oid
JOIN pg_namespace ns
ON ns.oid = tbl.relnamespace
LEFT JOIN pg_stats s
ON s.schemaname=ns.nspname
AND s.tablename = tbl.relname
AND s.inherited=false
AND s.attname=att.attname,
(SELECT
(SELECT current_setting('block_size')::numeric) AS bs,
CASE
WHEN SUBSTRING(SPLIT_PART(v, ' ', 2)
FROM '#"[0-9]+.[0-9]+#"%' for '#') IN ('8.0','8.1','8.2') THEN
27
ELSE 23
END AS hdr,
CASE
WHEN v ~ 'mingw32'
OR v ~ '64-bit' THEN
8
ELSE 4
END AS ma
FROM
(SELECT version() AS v) AS foo ) AS constants
WHERE att.attnum > 0
AND tbl.relkind='r'
GROUP BY 1,2,3,4,5 ) AS foo ) AS rs
ON cc.relname = rs.relname
AND nn.nspname = rs.nspname
LEFT JOIN pg_index i
ON indrelid = cc.oid
LEFT JOIN pg_class c2
ON c2.oid = i.indexrelid ) AS sml;