mysql 5.1已经到了beta版,官方网站上也陆续有一些文章介绍,比如上次看到的Improving Database Performance withPartitioning。在使用分区的前提下,可以用mysql实现非常大的数据量存储。今天在mysql的站上又看到一篇进阶的文章—— 按日期分区存储。如果能够实现按日期分区,这对某些时效性很强的数据存储是相当实用的功能。下面是从这篇文章中摘录的一些内容。
错误的按日期分区例子
最直观的方法,就是直接用年月日这种日期格式来进行常规的分区:
CODE:
1. create table rms
(ddate
)
2.
->
partition by range
(d
)
3.
->
(partitionp0 values less than
(
'1995-01-01'
),
4.
->
partition p1VALUES LESS THAN
(
'2010-01-01'
)
);
上面的例子中,就是直接用"Y-m-d"的格式来对一个table进行分区,可惜想当然往往不能奏效,会得到一个错误信息:
ERROR 1064 (42000): VALUES value must be of same type aspartition function near '),
partition p1 VALUES LESS THAN ('2010-01-01'))' at line 3
上述分区方式没有成功,而且明显的不经济,老练的DBA会用整型数值来进行分区:
CODE:
1.
mysql> CREATE TABLE part_date1
2.
->
(
3.
->
c2varchar
(
30
)
4.
->
c3 date defaultNULL
)engine=myisam
5.
->
partition by range
(cast
(date_format
(c3,
'%Y%m%d'
) as signed
)
)
6.
->
(PARTITIONp0 VALUES LESS THAN
(
19950101
),
7.
->
PARTITION p1 VALUES LESS THAN
(
19960101
)
8.
->
PARTITION p2 VALUES LESS THAN
(
19970101
)
9.
->
PARTITION p3 VALUES LESS THAN
(
19980101
)
10.
->
PARTITION p4 VALUES LESS THAN
(
19990101
)
11.
->
PARTITION p5 VALUES LESS THAN
(
20000101
)
12.
->
PARTITION p6 VALUES LESS THAN
(
20010101
)
13.
->
PARTITION p7 VALUES LESS THAN
(
20020101
)
14.
->
PARTITION p8 VALUES LESS THAN
(
20030101
)
15.
->
PARTITION p9 VALUES LESS THAN
(
20040101
)
16.
->
PARTITION p10 VALUES LESS THAN
(
20100101
),
17.
->
PARTITION p11 VALUES LESS THAN MAXVALUE
);
18. 0 rows affected
(
0.
01sec
)
搞定?接着往下分析
CODE:
1.
mysql> explain partitions
2.
->
select count
(*
)
3.
->
c3> date
'1995-01-01' and c3<date
'1995-12-31'\G
4. 1.
row ***************************
5.
id:
1
6.
7.
8.
partitions:p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11
9.
type: ALL
10.
possible_keys: NULL
11.
12.
13.
14.
rows:
8100000
15.
16. 1 row in set
(
0.
00sec
)
万恶的mysql居然对上面的sql使用全表扫描,而不是按照我们的日期分区分块查询。原文中解释到MYSQL的优化器并不认这种日期形式的分区,花了大量的篇幅来引诱俺走上歧路,过分。
正确的日期分区例子
mysql优化器支持以下两种内置的日期函数进行分区:
- TO_DAYS()
- YEAR()
看个例子:
CODE:
1.
mysql> CREATE TABLE part_date3
2.
->
(
3.
->
c2varchar
(
30
)
4.
->
c3 date defaultNULL
)engine=myisam
5.
->
partition by range
(to_days
(c3
)
)
6.
->
(PARTITIONp0 VALUES LESS THAN
(to_days
(
'1995-01-01'
)
),
7.
->
PARTITION p1 VALUES LESS THAN
(to_days
(
'1996-01-01'
)
)
8.
->
PARTITION p2 VALUES LESS THAN
(to_days
(
'1997-01-01'
)
)
9.
->
PARTITION p3 VALUES LESS THAN
(to_days
(
'1998-01-01'
)
)
10.
->
PARTITION p4 VALUES LESS THAN
(to_days
(
'1999-01-01'
)
)
11.
->
PARTITION p5 VALUES LESS THAN
(to_days
(
'2000-01-01'
)
)
12.
->
PARTITION p6 VALUES LESS THAN
(to_days
(
'2001-01-01'
)
)
13.
->
PARTITION p7 VALUES LESS THAN
(to_days
(
'2002-01-01'
)
)
14.
->
PARTITION p8 VALUES LESS THAN
(to_days
(
'2003-01-01'
)
)
15.
->
PARTITION p9 VALUES LESS THAN
(to_days
(
'2004-01-01'
)
)
16.
->
PARTITION p10 VALUES LESS THAN
(to_days
(
'2010-01-01'
)
),
17.
->
PARTITION p11 VALUES LESS THAN MAXVALUE
);
18. 0 rows affected
(
0.
00sec
)
以to_days()函数分区成功,我们分析一下看看:
CODE:
1.
mysql> explain partitions
2.
->
select count
(*
)
3.
->
c3> date
'1995-01-01' and c3<date
'1995-12-31'\G
4. 1.
row ***************************
5.
id:
1
6.
7.
8.
partitions: p1
9.
type: ALL
10.
possible_keys: NULL
11.
12.
13.
14.
rows:
808431
15.
16. 1 row in set
(
0.
00sec
)
可以看到,mysql优化器这次不负众望,仅仅在p1分区进行查询。在这种情况下查询,真的能够带来提升查询效率么?下面分别对这次建立的part_date3和之前分区失败的part_date1做一个查询对比:
CODE:
1. (*
)
2.
->
c3> date
'1995-01-01' and c3<date
'1995-12-31';
3.
+----------+
4. (*
)
5.
+----------+
6.
805114
7.
+----------+
8. 1 row in set
(
4.
11sec
)
9.
10. (*
)
11.
->
c3> date
'1995-01-01' and c3<date
'1995-12-31';
12.
+----------+
13. (*
)
14.
+----------+
15.
805114
16.
+----------+
17. 1 row in set
(
40.
33sec
)
可以看到,分区正确的话query花费时间为4秒,而分区错误则花费时间40秒(相当于没有分区),效率有90%的提升!所以我们千万要正确的使用分区功能,分区后务必用explain验证,这样才能获得真正的性能提升。
注意:
在mysql5.1中建立分区表的语句中,只能包含下列函数:
ABS()
CEILING() and FLOOR() (在使用这2个函数的建立分区表的前提是使用函数的分区键是INT类型),例如
mysql> CREATE TABLE t (c FLOAT) PARTITION BY LIST( FLOOR(c) )(
-> PARTITION p0 VALUES IN (1,3,5),
-> PARTITION p1 VALUES IN (2,4,6)
-> );;
ERROR 1491 (HY000): The PARTITION function returns the wrong type
mysql> CREATE TABLE t (c int) PARTITION BY LIST( FLOOR(c) )(
-> PARTITION p0 VALUES IN (1,3,5),
-> PARTITION p1 VALUES IN (2,4,6)
-> );
Query OK, 0 rows affected (0.01 sec)
DAY()
DAYOFMONTH()
DAYOFWEEK()
DAYOFYEAR()
DATEDIFF()
EXTRACT()
HOUR()
MICROSECOND()
MINUTE()
MOD()
MONTH()
QUARTER()
SECOND()
TIME_TO_SEC()
TO_DAYS()
WEEKDAY()
YEAR()
YEARWEEK()