mysql磁盘空间碎片回收
存储,学习,共享
有关关系数据库磁盘碎片空间回收利用的一般性概念至少应包含三层含义:
1、数据库中已有的数据被删除(delete)后,对于这些数据原有的碎片空间,查询(select)数据时是否会被扫描
2、再次添加(insert)数据时,碎片空间是否会被重复利用
3、如何物理地回收这些碎片空间,以减小存储压力,回收碎片对系统有何影响
每一个数据库都有自己的回收机制,甚至对于同一个数据库不同的存储引擎也不尽相同,如对于Oracle关系数据库系统,运用freelist/pct_free/pct_used三个典型的存储参数来自动管理碎片回收,本文的目的即通过实验来演示:mysql在常用的innodb存储引擎独立表空间下的碎片回收机制
版本:
mysql>\!mysql--version;
mysql Ver14.14Distrib5.5.40,forLinux(x86_64)usingreadline5.1
是否启用独立表空间:
mysql>\!grepper/etc/my.cnf
innodb_file_per_table=1
测试表结构(为保证本实验对于空间碎片敏感的纯粹性,在此并不创建索引):
mysql>usetest
Databasechanged
mysql>showcreatetabletest.t\G
***************************1.row***************************
Table:t
CreateTable:CREATETABLE`t`(
`a`int(11)DEFAULTNULL,
`b`bigint(20)DEFAULTNULL,
`c`decimal(18,2)DEFAULTNULL,
`d`char(8)DEFAULTNULL,
`e`varchar(30)DEFAULTNULL,
`f`timestampNOT NULLDEFAULTCURRENT_TIMESTAMPON UPDATECURRENT_TIMESTAMP
)ENGINE=InnoDBDEFAULTCHARSET=utf8
1rowinset(0.00sec)
检查测试表存储引擎:
mysql>selectengine,table_rowsestimate_rows,data_freefrominformation_schema.tablestwheret.table_name='t';
+--------+---------------+-----------+
|engine|estimate_rows|data_free|
+--------+---------------+-----------+
|InnoDB| 0| 0|
+--------+---------------+-----------+
1rowinset(0.00sec)
查看磁盘占用:
mysql>\!ls-lh|grepibd
-rw-rw----.1mysqlmysql 96KOct3115:43t.ibd
构建10万条测试数据(自动化测试数据构建工具点击这里):
mysql>calltest.build_data('test','t',100000,@ret);
QueryOK,0rowsaffected,1warning(2min41.64sec)
抽取测试数据样本:
mysql>select*fromtest.tlimit5offset10000;
+-----------+-----------+--------------+----------+-------------------------+---------------------+
|a |b |c |d |e |f |
+-----------+-----------+--------------+----------+-------------------------+---------------------+
|847093097|257145648|744448981.01|X9508079|X5206930446631210500000|2014-10-3117:02:10|
|751041280|193127296|712512671.68|X9831815|X7783695196531428500000|2014-10-3117:02:10|
|942303127|376407105| 55126175.34|X1464095|X5666694683643704000000|2014-10-3117:02:10|
|394118594|270584598|170567235.14|X410824 |X6937103948497310000000|2014-10-3117:02:10|
|345304763|645392663|191049053.51|X190673 |X5221894239282136000000|2014-10-3117:02:10|
+-----------+-----------+--------------+----------+-------------------------+---------------------+
5rowsinset(0.00sec)
再次查看磁盘占用,统计表记录数,查看扫描计划:
mysql>\!ls-lh|grepibd
-rw-rw----.1mysqlmysql 17MOct3116:24t.ibd
mysql>select count(e)fromtest.t;
+----------+
|count(e)|
+----------+
| 100000|
+----------+
1rowinset(0.04sec)
mysql>explainselect count(e)fromtest.t;
+----+-------------+-------+------+---------------+------+---------+------+--------+-------+
|id|select_type|table|type|possible_keys|key |key_len|ref |rows |Extra|
+----+-------------+-------+------+---------------+------+---------+------+--------+-------+
| 1|SIMPLE |t |ALL |NULL |NULL|NULL |NULL|100236| |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------+
1rowinset(0.00sec)
删除1/3的测试数据:
mysql>deletefromtest.twheremod(a,3)=0;
QueryOK,33279rowsaffected(0.29sec)
再次查看磁盘占用,统计表记录数,查看空间碎片和扫描计划:
mysql>\!ls-lh|grepibd
-rw-rw----.1mysqlmysql 17MOct3116:29t.ibd
mysql>select count(e)fromtest.t;
+----------+
|count(e)|
+----------+
| 66721|
+----------+
1rowinset(0.04sec)
mysql>explainselect count(e)fromtest.t;
+----+-------------+-------+------+---------------+------+---------+------+-------+-------+
|id|select_type|table|type|possible_keys|key |key_len|ref |rows |Extra|
+----+-------------+-------+------+---------------+------+---------+------+-------+-------+
| 1|SIMPLE |t |ALL |NULL |NULL|NULL |NULL|98370| |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------+
1rowinset(0.00sec)
初步结论:1、已删除的碎片空间对于新的查询并不敏感,即这些空间碎片通常是也会被扫描的,扫描计划失真率=1-66721/98370=32.2%
再次构建3万条测试数据,插入测试表:
mysql>calltest.build_data('test','t',30000,@ret);
QueryOK,0rowsaffected,1warning(48.21sec)
mysql>\!ls-lh|grepibd
-rw-rw----.1mysqlmysql 19MOct3116:44t.ibd
mysql>select count(e)fromtest.t;
+----------+
|count(e)|
+----------+
| 96721|
+----------+
1rowinset(0.04sec)
mysql>explainselect count(e)fromtest.t;
+----+-------------+-------+------+---------------+------+---------+------+--------+-------+
|id|select_type|table|type|possible_keys|key |key_len|ref |rows |Extra|
+----+-------------+-------+------+---------------+------+---------+------+--------+-------+
| 1|SIMPLE |t |ALL |NULL |NULL|NULL |NULL|101968| |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------+
1rowinset(0.00sec)
初步结论:2、对于新插入的数据,磁盘空间仅有少量增加,说明空间有一部分确实被重新利用了,扫描计划失真率=1-96721/101968=5.2%
重复上述验证过程,先删除全表一半的测试数据,再插入4万条测试数据,分别查看空间碎片和扫描计划:
mysql>deletefromtest.twheremod(a,3)=1;
QueryOK,43307rowsaffected(0.36sec)
mysql>\!ls-lh|grepibd
-rw-rw----.1mysqlmysql 19MOct3116:49t.ibd
mysql>select count(e)fromtest.t;
+----------+
|count(e)|
+----------+
| 53414|
+----------+
1rowinset(0.03sec)
mysql>explainselect count(e)fromtest.t;
+----+-------------+-------+------+---------------+------+---------+------+-------+-------+
|id|select_type|table|type|possible_keys|key |key_len|ref |rows |Extra|
+----+-------------+-------+------+---------------+------+---------+------+-------+-------+
| 1|SIMPLE |t |ALL |NULL |NULL|NULL |NULL|97990| |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------+
1rowinset(0.00sec)
mysql>calltest.build_data('test','t',40000,@ret);
QueryOK,0rowsaffected,1warning(1min3.24sec)
mysql>\!ls-lh|grepibd
-rw-rw----.1mysqlmysql 19MOct3116:56t.ibd
mysql>select count(e)fromtest.t;
+----------+
|count(e)|
+----------+
| 93414|
+----------+
1rowinset(0.04sec)
mysql>explainselect count(e)fromtest.t;
+----+-------------+-------+------+---------------+------+---------+------+--------+-------+
|id|select_type|table|type|possible_keys|key |key_len|ref |rows |Extra|
+----+-------------+-------+------+---------------+------+---------+------+--------+-------+
| 1|SIMPLE |t |ALL |NULL |NULL|NULL |NULL|100241| |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------+
1rowinset(0.00sec)
重复验证1:
1、大批量删除导致扫描失真率=1-53414/97990=45.5%,即高度失真
2、此次物理磁盘文件没有变化,足以说明碎片空间完全被重新利用了,重新再大批量插入数据后,扫描计划失真率=1-93414/100241=6.8%
再次重复上述验证过程,删除全表,再插入2万条测试数据,分别查看空间碎片和扫描计划:
mysql>deletefromtest.t;
QueryOK,93414rowsaffected(1.49sec)
mysql>\!ls-lh|grepibd
-rw-rw----.1mysqlmysql 19MOct3117:00t.ibd
mysql>select count(e)fromtest.t;
+----------+
|count(e)|
+----------+
| 0|
+----------+
1rowinset(0.00sec)
mysql>explainselect count(e)fromtest.t;
+----+-------------+-------+------+---------------+------+---------+------+-------+-------+
|id|select_type|table|type|possible_keys|key |key_len|ref |rows |Extra|
+----+-------------+-------+------+---------------+------+---------+------+-------+-------+
| 1|SIMPLE |t |ALL |NULL |NULL|NULL |NULL|97670| |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------+
1rowinset(0.00sec)
mysql>calltest.build_data('test','t',20000,@ret);
QueryOK,0rowsaffected,1warning(31.63sec)
mysql>\!ls-lh|grepibd
-rw-rw----.1mysqlmysql 19MOct3117:02t.ibd
mysql>select count(e)fromtest.t;
+----------+
|count(e)|
+----------+
| 20000|
+----------+
1rowinset(0.01sec)
mysql>explainselect count(e)fromtest.t;
+----+-------------+-------+------+---------------+------+---------+------+-------+-------+
|id|select_type|table|type|possible_keys|key |key_len|ref |rows |Extra|
+----+-------------+-------+------+---------------+------+---------+------+-------+-------+
| 1|SIMPLE |t |ALL |NULL |NULL|NULL |NULL|20183| |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------+
1rowinset(0.00sec)
重复验证2:
1、全表删除导致扫描失真率=97670/93414<上次的实际数据>=104.6%,即极度失真
2、碎片空间仍然被完全利用,插入批量数据后,扫描计划失真率=1-20000/20183=9%
前面关于删除后新增数据的验证都是批量插入,类似BI场景,最后模仿一次OLTP场景,删除全表,再插入1条测试数据,分别查看空间碎片和扫描计划:
mysql>deletefromtest.t;
QueryOK,20000rowsaffected(0.18sec)
mysql>\!ls-lh|grepibd
-rw-rw----.1mysqlmysql 19MOct3117:18t.ibd
mysql>select count(e)fromtest.t;
+----------+
|count(e)|
+----------+
| 0|
+----------+
1rowinset(0.00sec)
mysql>explainselect count(e)fromtest.t;
+----+-------------+-------+------+---------------+------+---------+------+-------+-------+
|id|select_type|table|type|possible_keys|key |key_len|ref |rows |Extra|
+----+-------------+-------+------+---------------+------+---------+------+-------+-------+
| 1|SIMPLE |t |ALL |NULL |NULL|NULL |NULL|19481| |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------+
1rowinset(0.00sec)
mysql>calltest.build_data('test','t',1,@ret);
QueryOK,0rowsaffected,1warning(0.08sec)
mysql>\!ls-lh|grepibd
-rw-rw----.1mysqlmysql 19MOct3117:18t.ibd
mysql>select count(e)fromtest.t;
+----------+
|count(e)|
+----------+
| 1|
+----------+
1rowinset(0.00sec)
mysql>explainselect count(e)fromtest.t;
+----+-------------+-------+------+---------------+------+---------+------+-------+-------+
|id|select_type|table|type|possible_keys|key |key_len|ref |rows |Extra|
+----+-------------+-------+------+---------------+------+---------+------+-------+-------+
| 1|SIMPLE |t |ALL |NULL |NULL|NULL |NULL|19482| |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------+
1rowinset(0.00sec)
重复验证3:
1、全表删除导致扫描失真率=19481/20000<上次的实际数据>=97.4%,即严重失真
2、碎片空间仍然被完全利用,插入批量数据后,扫描计划失真率=1-1/19482≈100%,即全体失真
如何回收物理磁盘空间,本文介绍两种方法:
1、基于innodb存储引擎的mysql数据库专用回收方法(类似oracle的move,保留当前数据,推荐):
mysql>\!ls-lh|grepibd
-rw-rw----.1mysqlmysql 19MOct3117:19t.ibd
mysql>altertabletest.tengine=innodb;
QueryOK,1rowaffected(0.35sec)
Records:1 Duplicates:0 Warnings:0
mysql>\!ls-lh|grepibd
-rw-rw----.1mysqlmysql 96KOct3118:28t.ibd
时间消耗测量:
– 重新构造数据
mysql>calltest.build_data('test','t',125000,@ret);
QueryOK,0rowsaffected,1warning(3min20.30sec)
mysql>insertintotselect*fromt;
QueryOK,125000rowsaffected(3.12sec)
Records:125000 Duplicates:0 Warnings:0
mysql>insertintotselect*fromt;
QueryOK,250000rowsaffected(6.56sec)
Records:250000 Duplicates:0 Warnings:0
mysql>insertintotselect*fromt;
QueryOK,500000rowsaffected(12.49sec)
Records:500000 Duplicates:0 Warnings:0
mysql>select count(1)fromtest.t;
+----------+
|count(1)|
+----------+
| 1000000|
+----------+
1rowinset(0.39sec)
mysql>altertabletest.tengine=innodb;
QueryOK,1000000rowsaffected(29.26sec)
Records:1000000 Duplicates:0 Warnings:0
mysql>altertabletest.tengine=innodb;
QueryOK,1000000rowsaffected(30.59sec)
Records:1000000 Duplicates:0 Warnings:0
mysql>altertabletest.tengine=innodb;
QueryOK,1000001rowsaffected(29.10sec)
Records:1000001 Duplicates:0 Warnings:0
-- 删除一半数据
mysql>deletefromtest.twheremod(a,2)=0;
QueryOK,499376rowsaffected(12.08sec)
mysql>select count(*)fromtest.t;
+----------+
|count(*)|
+----------+
| 500625|
+----------+
1rowinset(0.23sec)
mysql>altertabletest.tengine=innodb;
QueryOK,500625rowsaffected(15.15sec)
Records:500625 Duplicates:0 Warnings:0
-- 删除全部测试数据,仅保留1条
mysql>deletefromtest.twherea<>1;
QueryOK,500624rowsaffected(8.90sec)
mysql>select count(*)fromtest.t;
+----------+
|count(*)|
+----------+
| 1|
+----------+
1rowinset(0.01sec)
mysql>altertabletest.tengine=innodb;
QueryOK,1rowaffected(0.35sec)
Records:1 Duplicates:0 Warnings:0
可以看到,重置存储引擎消耗的时间与已删除的数据无关,与剩下数据呈现线性相关,数据与时间消耗比约为100万:30秒、50万:15秒
2、数据库通用方法,截断表回收(数据全部删除,慎用):
mysql>\!ls-lh|grepibd
-rw-rw----.1mysqlmysql 96KOct3118:28t.ibd
mysql>calltest.build_data('test','t',1000,@ret);
QueryOK,0rowsaffected,1warning(1.74sec)
mysql>\!ls-lh|grepibd
-rw-rw----.1mysqlmysql176KOct3118:30t.ibd
mysql>select count(e)fromtest.t;
+----------+
|count(e)|
+----------+
| 1001|
+----------+
1rowinset(0.01sec)
mysql>truncatetabletest.t;
QueryOK,0rowsaffected(0.14sec)
mysql>\!ls-lh|grepibd
-rw-rw----.1mysqlmysql 96KOct3118:34t.ibd
mysql>select count(e)fromtest.t;
+----------+
|count(e)|
+----------+
| 0|
+----------+
1rowinset(0.00sec)
总结: 通过上述不断重复的实验现象,可以很容易地解答本文开头提出的三个命题:
1、对于大量删除后产生的碎片空间,查询时会被扫描,这会严影响查询性能
2、再次添加数据,碎片空间会被重新利用,此时分两种情况:
类似BI场景的大批量数据插入,再查询只会存在较小的失真率(本文<10%),这里隐含地说明mysql进行了相当程度的优化
类似OLTP场景的渐进性数据插入,则会继续保持大批量删除的严重失真,但这个失真率是存在较大浮动的,浮动系数=“删除数据/全表数据”比例,简单地说就是删除的数据越多,失真越严重
3、对于基于innodb存储引擎的mysql数据库,要回收磁盘空间,推荐使用重置表存储引擎的方法:alter table table_name engine=innodb(重置期间所有写会话会被阻塞,读会话不受影响),重置消耗时间与已删除数据无关,仅与剩下数据呈线性相关,数据与时间消耗比例约为100万:30秒
延伸:关于mysql物理磁盘空间回收的问题,似乎缺乏一个系统层面的解决办法(类似于Postgresql的Vacuum,或Oracle的Freelist自动化碎片管理)。碎片空间可以重复利用,但会引发查询性能不可控地下降,这一问题往往在一些类似BI的批量作业项目优为突显,但作为创业阶段的企业,开始既没有大量资金选购商业数据库,也没有相当的研发能力部署分布式集群,更为重要的是这些在创业阶段都没有任何的必要性,这才符合mysql最原本的价值定位,企业的运作也才符合《精益创业》里的快速认知引擎,否则只能面临无端的窘境