cluster_factor,这是一个判断表上的数据分布是不是和索引一样有序的一个值
它的取值范围在表的block的个数和表的行数之间,越接近block的个数说明表的数据分布越有序(和索引一样有序),越是接近行数越说明表上的数据分布是混乱的
可以这样理解,当我们按照索引上的顺序访问表的时候,每当我们需要的数据不在当前block上的时候,我们就要“跳到”其他block上进行访问。如果表上的数据排列和索引是完全一样的话,我们跳的次数等于表的Block的个数
如果是另一个极端,表的数据分布极其的混乱,我们访问每一行数据都要跳一次,那我们最后跳的次数就等于行数,跳的次数就是CLUSTER_FACTOR
CLUSTER_FACTOR对Oracle执行计划会产生重要影响。这个值越高,说明索引的使用效率将会越差。这个值会存在于一个区间内,区间的最小值为表占用的数据块数,最大值为表拥有的数据行数
创建表CF_TEST,有序插入
SQL>   CREATE TABLE CF_TEST( ID NUMBER(32), NAME VARCHAR2(80)) TABLEspace USERS  pctfree 0 INITRANS 1  maxtrans 255 STORAGE(INITIAL 1M  next 1M  minextents 1  maxextents UNLIMITED );
Table created.
SQL>   begin
for i in 1..80000 loop
insert into CF_TEST(id,name)
values(i,dbms_random.string('a',80));
end loop;
commit;
end;
  /
PL/SQL procedure successfully completed.
创建CF_TEST2无序插入
SQL> CREATE TABLE CF_TEST2(ID NUMBER(32), NAME VARCHAR2(80) )TABLESPACE USERS pctfree 0 initrans 1  maxtrans 255  STORAGE(INITIAL 1M NEXT 1M  minextents 1  maxextents UNLIMITED );
Table created.
SQL> insert into CF_TEST2 nologging  SELECT * FROM CF_TEST order by dbms_random.random;
80000 rows created.
SQL> COMMIT;
Commit complete.
SQL> alter TABLE CF_TEST add constraint pk_CF_TEST primary key(id);
Table altered.
SQL> alter TABLE CF_TEST2 add constraint pk_CF_TEST2 primary key(id);
Table altered.
SQL> col segment_name for a25
SQL> SELECT t.SEGMENT_NAME, t.SEGMENT_TYPE, t.BLOCKS FROM user_segments t WHERE t.SEGMENT_NAME like '%CF_TEST%';
SEGMENT_NAME              SEGMENT_TYPE           BLOCKS
------------------------- ------------------ ----------
CF_TEST                   TABLE                    1024
CF_TEST2                  TABLE                    1024
PK_CF_TEST                INDEX                     256
PK_CF_TEST2               INDEX                     256
分别统计数据CLUSTER_FACTOR
SQL>begin
dbms_stats.gather_TABLE_stats(ownname => user,tabname => 'CF_TEST',cascade => true);
end;
  4  /
PL/SQL procedure successfully completed.
SQL>begin
dbms_stats.gather_TABLE_stats(ownname => user,tabname => 'CF_TEST2',cascade => true);
end;
  4  /
PL/SQL procedure successfully completed.
SQL> SELECT i.index_name, i.CLUSTERING_FACTOR FROM dba_indexes i WHERE i.index_name like 'PK_CF_TEST%';
INDEX_NAME                     CLUSTERING_FACTOR
------------------------------ -----------------
PK_CF_TEST                                   908                         靠,这个接近行块数1024
PK_CF_TEST2                                79919                    这个接近于行数80000
这说明,当数据行的存储顺序和索引顺序越接近,CLUSTER_FACTOR越小,越有利于使用索引。
下面来测试下对执行计划的影响
SQL> alter system flush buffer_cache;
System altered.
SQL> set autotrace traceonly;
SQL> set linesize 180
SQL> SELECT * FROM CF_TEST WHERE id > 2000 and id < 8000;
5999 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1785914651
------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |  6006 |   498K|   83   (0)| 00:00:01 |
|   1 | TABLE ACCESS BY INDEX ROWID| CF_TEST    |  6006 |   498K|    83   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | PK_CF_TEST |  6006 |       |    14   (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID">2000 AND "ID"<8000)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       877  consistent gets
          0  physical reads
          0  redo size
     590743  bytes sent via SQL*Net to client
       4881  bytes received via SQL*Net from client
        401  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       5999  rows processed
SQL> alter system flush buffer_cache;
System altered.
SQL> set autotrace traceonly;
SQL> SELECT * FROM CF_TEST2 WHERE id > 2000 and id < 8000;
5999 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 4079432394
------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          |  6000 |   498K|  222   (1)| 00:00:03 |
|*  1 | TABLE ACCESS FULL| CF_TEST2 |  6000 |   498K|   222   (1)| 00:00:03 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ID"<8000 AND "ID">2000)
Statistics
----------------------------------------------------------
        150  recursive calls
          0  db block gets
       1378  consistent gets
        969  physical reads
          0  redo size
     590743  bytes sent via SQL*Net to client
       4881  bytes received via SQL*Net from client
        401  SQL*Net roundtrips to/from client
          4  sorts (memory)
          0  sorts (disk)
       5999  rows processed
从上面看出来CLUSTER_FACTOR对执行计划的巨大影响,前者走索引扫描,COST只有83。而后者全表扫描,COST为222,接近于前者的四倍左右
前者的一致性读与物理读也远远低于后者。
所以我们建议对于出于查询目的来说表的数据尽量有序,对于出于插入以及更新目的来说未必(reverse index)。