它的取值范围在表的block的个数和表的行数之间,越接近block的个数说明表的数据分布越有序(和索引一样有序),越是接近行数越说明表上的数据分布是混乱的
可以这样理解,当我们按照索引上的顺序访问表的时候,每当我们需要的数据不在当前block上的时候,我们就要“跳到”其他block上进行访问。如果表上的数据排列和索引是完全一样的话,我们跳的次数等于表的Block的个数
如果是另一个极端,表的数据分布极其的混乱,我们访问每一行数据都要跳一次,那我们最后跳的次数就等于行数,跳的次数就是CLUSTER_FACTOR
CLUSTER_FACTOR对Oracle执行计划会产生重要影响。这个值越高,说明索引的使用效率将会越差。这个值会存在于一个区间内,区间的最小值为表占用的数据块数,最大值为表拥有的数据行数
创建表CF_TEST,有序插入
SQL> CREATE TABLE CF_TEST( ID NUMBER(32), NAME VARCHAR2(80)) TABLEspace USERS pctfree 0 INITRANS 1 maxtrans 255 STORAGE(INITIAL 1M next 1M minextents 1 maxextents UNLIMITED );
Table created.
SQL> begin
for i in 1..80000 loop
insert into CF_TEST(id,name)
values(i,dbms_random.string('a',80));
end loop;
commit;
end;
/
PL/SQL procedure successfully completed.
创建CF_TEST2无序插入
SQL> CREATE TABLE CF_TEST2(ID NUMBER(32), NAME VARCHAR2(80) )TABLESPACE USERS pctfree 0 initrans 1 maxtrans 255 STORAGE(INITIAL 1M NEXT 1M minextents 1 maxextents UNLIMITED );
Table created.
SQL> insert into CF_TEST2 nologging SELECT * FROM CF_TEST order by dbms_random.random;
80000 rows created.
SQL> COMMIT;
Commit complete.
Table altered.
SQL> alter TABLE CF_TEST2 add constraint pk_CF_TEST2 primary key(id);
Table altered.
SQL> col segment_name for a25
SQL> SELECT t.SEGMENT_NAME, t.SEGMENT_TYPE, t.BLOCKS FROM user_segments t WHERE t.SEGMENT_NAME like '%CF_TEST%';
SEGMENT_NAME SEGMENT_TYPE BLOCKS
------------------------- ------------------ ----------
CF_TEST TABLE 1024
CF_TEST2 TABLE 1024
PK_CF_TEST INDEX 256
PK_CF_TEST2 INDEX 256
分别统计数据CLUSTER_FACTOR
SQL>begin
dbms_stats.gather_TABLE_stats(ownname => user,tabname => 'CF_TEST',cascade => true);
end;
4 /
PL/SQL procedure successfully completed.
SQL>begin
dbms_stats.gather_TABLE_stats(ownname => user,tabname => 'CF_TEST2',cascade => true);
end;
4 /
PL/SQL procedure successfully completed.
INDEX_NAME CLUSTERING_FACTOR
------------------------------ -----------------
PK_CF_TEST 908 靠,这个接近行块数1024
PK_CF_TEST2 79919 这个接近于行数80000
这说明,当数据行的存储顺序和索引顺序越接近,CLUSTER_FACTOR越小,越有利于使用索引。
下面来测试下对执行计划的影响
SQL> alter system flush buffer_cache;
System altered.
SQL> set autotrace traceonly;
SQL> set linesize 180
SQL> SELECT * FROM CF_TEST WHERE id > 2000 and id < 8000;
5999 rows selected.
----------------------------------------------------------
Plan hash value: 1785914651
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6006 | 498K| 83 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| CF_TEST | 6006 | 498K| 83 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | PK_CF_TEST | 6006 | | 14 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID">2000 AND "ID"<8000)
----------------------------------------------------------
0 recursive calls
0 db block gets
877 consistent gets
0 physical reads
0 redo size
590743 bytes sent via SQL*Net to client
4881 bytes received via SQL*Net from client
401 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
5999 rows processed
System altered.
SQL> set autotrace traceonly;
SQL> SELECT * FROM CF_TEST2 WHERE id > 2000 and id < 8000;
5999 rows selected.
----------------------------------------------------------
Plan hash value: 4079432394
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 6000 | 498K| 222 (1)| 00:00:03 |
|* 1 | TABLE ACCESS FULL| CF_TEST2 | 6000 | 498K| 222 (1)| 00:00:03 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("ID"<8000 AND "ID">2000)
----------------------------------------------------------
150 recursive calls
0 db block gets
1378 consistent gets
969 physical reads
0 redo size
590743 bytes sent via SQL*Net to client
4881 bytes received via SQL*Net from client
401 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
5999 rows processed
从上面看出来CLUSTER_FACTOR对执行计划的巨大影响,前者走索引扫描,COST只有83。而后者全表扫描,COST为222,接近于前者的四倍左右
前者的一致性读与物理读也远远低于后者。
所以我们建议对于出于查询目的来说表的数据尽量有序,对于出于插入以及更新目的来说未必(reverse index)。