mysql 5.1已经到了beta版,官方网站上也陆续有一些文章介绍,比如上次看到的Improving Database Performance withPartitioning。在使用分区的前提下,可以用mysql实现非常大的数据量存储。今天在mysql的站上又看到一篇进阶的文章—— 按日期分区存储。如果能够实现按日期分区,这对某些时效性很强的数据存储是相当实用的功能。下面是从这篇文章中摘录的一些内容。

错误的按日期分区例子

最直观的方法,就是直接用年月日这种日期格式来进行常规的分区:



PLAIN TEXT



CODE:

    1.  create table rms 
           (ddate 
           )
    2.    
           ->  
            
           partition by range 
           (d 
           )
    3.    
            -> 
           (partitionp0 values less than 
           ( 
           '1995-01-01' 
           ), 
          
    4.    
           ->  
            
           partition p1VALUES LESS THAN 
           ( 
           '2010-01-01' 
           ) 
           );




    上面的例子中,就是直接用"Y-m-d"的格式来对一个table进行分区,可惜想当然往往不能奏效,会得到一个错误信息:

    ERROR 1064 (42000): VALUES value must be of same type aspartition function near '),
    partition p1 VALUES LESS THAN ('2010-01-01'))' at line 3

    上述分区方式没有成功,而且明显的不经济,老练的DBA会用整型数值来进行分区:



    PLAIN TEXT



    CODE:

      1. 
              mysql> CREATE TABLE part_date1 
            
      2.    
             ->  
                
               
              (  
             
      3.    
             ->  
              
             c2varchar 
             ( 
             30 
             )
      4.    
             ->  
              
             c3 date defaultNULL 
             )engine=myisam 
            
      5.    
             ->  
                
               
              
             partition by range 
             (cast 
             (date_format 
             (c3, 
             '%Y%m%d' 
             ) as signed 
             ) 
             )
      6.    
              -> 
             (PARTITIONp0 VALUES LESS THAN 
             ( 
             19950101 
             ), 
            
      7.    
              -> 
             PARTITION p1 VALUES LESS THAN 
             ( 
             19960101 
             )
      8.    
              -> 
             PARTITION p2 VALUES LESS THAN 
             ( 
             19970101 
             )
      9.    
              -> 
             PARTITION p3 VALUES LESS THAN 
             ( 
             19980101 
             )
      10.    
              -> 
             PARTITION p4 VALUES LESS THAN 
             ( 
             19990101 
             )
      11.    
              -> 
             PARTITION p5 VALUES LESS THAN 
             ( 
             20000101 
             )
      12.    
              -> 
             PARTITION p6 VALUES LESS THAN 
             ( 
             20010101 
             )
      13.    
              -> 
             PARTITION p7 VALUES LESS THAN 
             ( 
             20020101 
             )
      14.    
              -> 
             PARTITION p8 VALUES LESS THAN 
             ( 
             20030101 
             )
      15.    
              -> 
             PARTITION p9 VALUES LESS THAN 
             ( 
             20040101 
             )
      16.    
              -> 
             PARTITION p10 VALUES LESS THAN 
             ( 
             20100101 
             ), 
            
      17.    
              -> 
             PARTITION p11 VALUES LESS THAN MAXVALUE 
             ); 
            
      18. 0 rows affected 
             ( 
             0. 
             01sec 
             )




      搞定?接着往下分析



      PLAIN TEXT



      CODE:

        1. 
                mysql> explain partitions 
              
        2.    
                -> 
               select count 
               (* 
               )
        3.    
               ->  
                  
                 
                
               c3> date 
               '1995-01-01' and c3<date 
               '1995-12-31'\G 
              
        4.  1. 
               row *************************** 
              
        5.    
                  
                 
                  
                  
               id: 
               1
        6. 
        7.    
                  
                 
               
        8.    
               partitions:p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11 
              
        9.    
                  
                 
                  
               type: ALL 
              
        10. 
                possible_keys: NULL 
              
        11.    
                  
                 
                  
               
        12.    
                  
               
        13.    
                  
                 
                  
               
        14.    
                  
                 
                  
               rows: 
                8100000
        15.    
                  
                 
               
        16. 1 row in set 
                ( 
               0. 
               00sec 
               )




        万恶的mysql居然对上面的sql使用全表扫描,而不是按照我们的日期分区分块查询。原文中解释到MYSQL的优化器并不认这种日期形式的分区,花了大量的篇幅来引诱俺走上歧路,过分。

        正确的日期分区例子

        mysql优化器支持以下两种内置的日期函数进行分区:

        • TO_DAYS()
        • YEAR()

        看个例子:



        PLAIN TEXT



        CODE:

          1. 
                  mysql> CREATE TABLE part_date3 
                
          2.    
                 ->  
                    
                   
                  (  
                 
          3.    
                 ->  
                  
                 c2varchar 
                 ( 
                 30 
                 )
          4.    
                 ->  
                  
                 c3 date defaultNULL 
                 )engine=myisam 
                
          5.    
                 ->  
                    
                   
                  
                 partition by range 
                 (to_days 
                 (c3 
                 ) 
                 )
          6.    
                  -> 
                 (PARTITIONp0 VALUES LESS THAN 
                 (to_days 
                 ( 
                 '1995-01-01' 
                 ) 
                 ), 
                
          7.    
                  -> 
                 PARTITION p1 VALUES LESS THAN 
                 (to_days 
                 ( 
                 '1996-01-01' 
                 ) 
                 )
          8.    
                  -> 
                 PARTITION p2 VALUES LESS THAN 
                 (to_days 
                 ( 
                 '1997-01-01' 
                 ) 
                 )
          9.    
                  -> 
                 PARTITION p3 VALUES LESS THAN 
                 (to_days 
                 ( 
                 '1998-01-01' 
                 ) 
                 )
          10.    
                  -> 
                 PARTITION p4 VALUES LESS THAN 
                 (to_days 
                 ( 
                 '1999-01-01' 
                 ) 
                 )
          11.    
                  -> 
                 PARTITION p5 VALUES LESS THAN 
                 (to_days 
                 ( 
                 '2000-01-01' 
                 ) 
                 )
          12.    
                  -> 
                 PARTITION p6 VALUES LESS THAN 
                 (to_days 
                 ( 
                 '2001-01-01' 
                 ) 
                 )
          13.    
                  -> 
                 PARTITION p7 VALUES LESS THAN 
                 (to_days 
                 ( 
                 '2002-01-01' 
                 ) 
                 )
          14.    
                  -> 
                 PARTITION p8 VALUES LESS THAN 
                 (to_days 
                 ( 
                 '2003-01-01' 
                 ) 
                 )
          15.    
                  -> 
                 PARTITION p9 VALUES LESS THAN 
                 (to_days 
                 ( 
                 '2004-01-01' 
                 ) 
                 )
          16.    
                  -> 
                 PARTITION p10 VALUES LESS THAN 
                 (to_days 
                 ( 
                 '2010-01-01' 
                 ) 
                 ), 
                
          17.    
                  -> 
                 PARTITION p11 VALUES LESS THAN MAXVALUE 
                 ); 
                
          18. 0 rows affected 
                 ( 
                 0. 
                 00sec 
                 )




          以to_days()函数分区成功,我们分析一下看看:



          PLAIN TEXT



          CODE:

            1. 
                    mysql> explain partitions 
                  
            2.    
                    -> 
                   select count 
                   (* 
                   )
            3.    
                   ->  
                      
                     
                    
                   c3> date 
                   '1995-01-01' and c3<date 
                   '1995-12-31'\G 
                  
            4.  1. 
                   row *************************** 
                  
            5.    
                      
                     
                      
                      
                   id: 
                   1
            6. 
            7.    
                      
                     
                   
            8.    
                   partitions: p1 
                  
            9.    
                      
                     
                      
                   type: ALL 
                  
            10. 
                    possible_keys: NULL 
                  
            11.    
                      
                     
                      
                   
            12.    
                      
                   
            13.    
                      
                     
                      
                   
            14.    
                      
                     
                      
                   rows: 
                    808431
            15.    
                      
                     
                   
            16. 1 row in set 
                    ( 
                   0. 
                   00sec 
                   )



            可以看到,mysql优化器这次不负众望,仅仅在p1分区进行查询。在这种情况下查询,真的能够带来提升查询效率么?下面分别对这次建立的part_date3和之前分区失败的part_date1做一个查询对比:



            PLAIN TEXT



            CODE:

              1. (* 
                     )
              2.    
                     ->  
                        
                       
                      
                     c3> date 
                     '1995-01-01' and c3<date 
                     '1995-12-31'; 
                    
              3. 
                      +----------+ 
                    
              4. (* 
                     )
              5. 
                      +----------+ 
                    
              6.    
                     805114
              7. 
                      +----------+ 
                    
              8. 1 row in set 
                      ( 
                     4. 
                     11sec 
                     )
              9. 
              10. (* 
                     )
              11.    
                     ->  
                        
                       
                      
                     c3> date 
                     '1995-01-01' and c3<date 
                     '1995-12-31'; 
                    
              12. 
                      +----------+ 
                    
              13. (* 
                     )
              14. 
                      +----------+ 
                    
              15.    
                     805114
              16. 
                      +----------+ 
                    
              17. 1 row in set 
                      ( 
                     40. 
                     33sec 
                     )



              可以看到,分区正确的话query花费时间为4秒,而分区错误则花费时间40秒(相当于没有分区),效率有90%的提升!所以我们千万要正确的使用分区功能,分区后务必用explain验证,这样才能获得真正的性能提升。


              注意:

              在mysql5.1中建立分区表的语句中,只能包含下列函数:
              ABS()
              CEILING() and FLOOR() (在使用这2个函数的建立分区表的前提是使用函数的分区键是INT类型),例如



              mysql> CREATE TABLE t (c FLOAT) PARTITION BY LIST( FLOOR(c) )(
                  -> PARTITION p0 VALUES IN (1,3,5),
                  -> PARTITION p1 VALUES IN (2,4,6)
                  -> );;
              ERROR 1491 (HY000): The PARTITION function returns the wrong type
               
              mysql> CREATE TABLE t (c int) PARTITION BY LIST( FLOOR(c) )(
                  -> PARTITION p0 VALUES IN (1,3,5),
                  -> PARTITION p1 VALUES IN (2,4,6)
                  -> );
              Query OK, 0 rows affected (0.01 sec)



              DAY()
              DAYOFMONTH()
              DAYOFWEEK()
              DAYOFYEAR()
              DATEDIFF()
              EXTRACT()
              HOUR()
              MICROSECOND()
              MINUTE()
              MOD()
              MONTH()
              QUARTER()
              SECOND()
              TIME_TO_SEC()
              TO_DAYS()
              WEEKDAY()
              YEAR()
              YEARWEEK()