您好,登錄后才能下訂單哦!
這篇文章主要介紹“什么是Clustering Factor”,在日常操作中,相信很多人在什么是Clustering Factor問題上存在疑惑,小編查閱了各式資料,整理出簡單好用的操作方法,希望對大家解答”什么是Clustering Factor”的疑惑有所幫助!接下來,請跟著小編一起來學習吧!
Clustering Factor(CF)聚簇因子,意味著對索引的表的群集度,用來比較索引的順序程度與表的雜亂排序程度的一個度量。
集群因子用于判斷索引回表需要消耗的物理 I/O 次數(shù)。
CF是在假設(shè)內(nèi)存大小只能載入一個塊,這是伴隨索引掃描所需要的表掃描次數(shù)的計算值。
更準確地說,他表示沿著索引的葉塊,在ROWID值上代表塊編號的第1~15位的值與之前ROWID作出比較后被更替的次數(shù)。
如:
有五個塊組成的索引和由五個塊組成的表,而一個塊里有4行記錄,那么總行數(shù)就是20行。
順序的掃描索引的同時讀取相對應(yīng)的表,這時可能有兩個極端情況:
1、CF最低時
如果一個索引塊所包含的ROWID都被一個表塊所包含,則通過索引掃描表時,只掃描索引5次和表5次,就可以得到想要的數(shù)據(jù)。
此時CF=5(表塊的掃描次數(shù)),CF的最小值與表的塊數(shù)相同
2、CF最高時
如果一個索引塊所包含的ROWID分別被不同的表塊所包含,則通過索引掃描表時,
必須掃描20次[4(索引塊數(shù))+4(索引塊數(shù))+4(索引塊數(shù))+4(索引塊數(shù))+4(索引塊數(shù)) = 20次]才能獲得想要的數(shù)據(jù)。
這時CF=20(表塊的掃描次數(shù)),CF最大值與表的行數(shù)相同
聚簇因子大致的計算方法順序如下:
1、進行一次索引全掃描
2、檢查索引塊中的rowid信息。比較與前一個rowid是否指向同一個數(shù)據(jù)塊。若不同,則聚簇因子加1.
3、當完成整個的索引掃面后,即得到該索引的聚簇因子的數(shù)值。
計算過程
SELECT * FROM (SELECT ID ,ROWID FROM TMP_CF ORDER BY ID) WHERE ROWNUM <= 5; ID ROWID ---------- ------------------ 2 AAAVqxAAEAAAAfDACL 3 AAAVqxAAEAAAAfDAAf 3 AAAVqxAAEAAAAfDAJR 4 AAAVqxAAEAAAAfDAGZ 4 AAAVqxAAEAAAAfDAKP
首先我們比較 2、3 對應(yīng)的 ROWID 是否在同一個數(shù)據(jù)塊,如果在同一個數(shù)據(jù)塊,Clustering Factor +0;
如果不在同一個數(shù)據(jù)塊,那么 Clustering Factor 值加 1。
然后我們比較 3、4 對應(yīng)的 ROWID 是否在同一個數(shù)據(jù)塊,如果在同一個數(shù)據(jù)塊,Clustering Factor 值不變;
如果不在同一個數(shù)據(jù)塊,那么 Clustering Factor 值加 1。
接下來我們比較4、5對應(yīng)的ROWID是否在同一個數(shù)據(jù)塊,如果在同一個數(shù)據(jù)塊,Clustering Factor +0;
如果不在同一個數(shù)據(jù)塊,那么 Clustering Factor 值加 1。
像上面步驟一樣,一直這樣有序地比較下去,直到比較完索引中最后一個鍵值。
表中的數(shù)據(jù)塊數(shù) <= 聚簇因子的取值范圍 <= 表中的記錄數(shù)
如果集群因子與塊數(shù)接近,說明表的數(shù)據(jù)基本上是有序的,而且其順序基本與索引順序一樣。
這樣在進行索引范圍或者索引全掃描的時候,回表只需要讀取少量的數(shù)據(jù)塊就能完成。
如果集群因子與表記錄數(shù)接近,說明表的數(shù)據(jù)和索引順序差異很大,在進行索引范圍掃描或者索引全掃描的時候,回表會讀取更多的數(shù)據(jù)塊。
集群因子只會影響索引范圍掃描(INDEX RANGE SCAN)以及索引全掃描(INDEX FULL SCAN),因為只有這兩種索引掃描方式會有大量數(shù)據(jù)回表。
集群因子不會影響索引唯一掃描(INDEX UNIQUE SCAN),因為索引唯一掃描只返回一條數(shù)據(jù)。
集群因子更不會影響索引快速全掃描(INDEX FAST FULL SCAN),因為索引快速全掃描不回表。
集群因子影響的是索引回表的物理 I/O 次數(shù)。
我們假設(shè)索引范圍掃描返回了 1 000 行數(shù)據(jù),如果 buffer cache 中沒有緩存表的數(shù)據(jù)塊,
假設(shè)這 1000行數(shù)據(jù)都在同一個數(shù)據(jù)塊中,那么回表需要耗費的物理 I/O 就只需要一個;
假設(shè)這 1000 行數(shù)據(jù)都在不同的數(shù)據(jù)塊中,那么回表就需要耗費 1 000 個物理 I/O。
因此,集群因子影響索引回表的物理 I/O 次數(shù)。
假設(shè)I/O不使用內(nèi)存,CF越高讀取表塊的次數(shù)就越多,因此物理I/O也會增加。
即,CF越高,通過ROWID讀取表塊的次數(shù)就會越多,db file sequential read也會相應(yīng)的增加。
通過高速緩沖區(qū)讀取過一次的塊因為不在發(fā)生物理I/O,因此CF高未必一定會降低SQL語句的性能。
但是大量掃描CF值高的索引,則需要讀取的表塊增加,因此可能給性能帶來嚴重問題。
比如當ORACLE計算索引范圍掃描(IRS)及其回表時的成本時,
使用如下的公式:
IRS COST=I/O COST + CPU COST
其中 I/O COST = INDEX ACCESS I/O COST + TABLE ACCESS I/O COST
其中:
INDEX ACCESS I/O COST = BLEVEL+CEIL(#LEAF_BLOCKS*IX_SEL)
TABLE ACCESS I/O COST = CEIL(CLUSTERING_FACTOR*IX_SEL_WITH_FILTERS)
索引的聚簇因子越大
--》從該索引回表的成本越大
--》使用該索引的綜合成本越大
--》使用該索引的可能越小
對索引收集統(tǒng)計信息,則索引的聚簇因子會被記錄在 DBA_INDEXES的CLUSTING_FACTOR列。
如果CF與表的塊數(shù)相當就是“好現(xiàn)象”;如果CF和表的行數(shù)相當就不是“好現(xiàn)象”。
如果SQL語句的性能問題被判斷位由于聚簇因子導致,如何進行優(yōu)化?
1、將索引掃描換成全表掃描
2、使用其他索引
3、按照索引的排列順序重新創(chuàng)建表(最后的選擇)
create table new_table as select ... from old_table order by indexd_column
4、某些情況,設(shè)置table_cached_blocks參數(shù)
5、設(shè)置set_index_stats的聚簇因子值clstfct
注意: 當索引范圍掃描,索引全掃描不回表或者返回數(shù)據(jù)量很少的時候,不管集群因子多大,對 SQL 查詢性能幾乎沒有任何影響。 在進行 SQL 優(yōu)化的時候,往往會建立合適的組合索引消除回表,或者建立組合索引盡量減少回表次數(shù)。
1、聚簇因子高對執(zhí)行計劃的影響
2、聚簇因子低對執(zhí)行計劃的影響
3、調(diào)整聚簇因子
3.1、根據(jù)索引列重新創(chuàng)建表
3.2、設(shè)置table_cached_blocks參數(shù)
3.3、設(shè)置set_index_stats的聚簇因子值clstfct
/* 創(chuàng)建測試數(shù)據(jù) */ /* 為了制造出聚簇因子較高的索引,對生成的ID數(shù)值取隨機數(shù)1至1000。*/ SQL> create table tmp_cf as select ceil(dbms_random.value*1000) id,'a' name from dual connect by rownum<=1000; Table created. /* 在ID列上創(chuàng)建索引 */ SQL> create index ind_id on tmp_cf(id); Index created. /* 收集統(tǒng)計信息 */ SQL> exec dbms_stats.gather_table_stats('demo','tmp_cf'); PL/SQL procedure successfully completed. /* 開啟autotrace查看執(zhí)行計劃 */ SQL> set autot traceonly /* 執(zhí)行查詢,條件為ID<=6 */ SQL> select * from tmp_cf where id<=6; Execution Plan ---------------------------------------------------------- Plan hash value: 1290879752 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 6 | 36 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TMP_CF | 6 | 36 | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("ID"<=6) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 4 consistent gets 0 physical reads 0 redo size 676 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 6 rows processed
/* 從執(zhí)行計劃中可見,在只返回6行的情況下,估算出的走索引并回表的成本仍然大于全表掃描。 */
/* 嘗試改條件為 id<=5 ,id<=4,id<=3 進一步減少返回的行數(shù)。 */
SQL> select * from tmp_cf where id<=5; Execution Plan ---------------------------------------------------------- Plan hash value: 1290879752 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 | 30 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TMP_CF | 5 | 30 | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("ID"<=5) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 4 consistent gets 0 physical reads 0 redo size 668 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 5 rows processed SQL> select * from tmp_cf where id<=4; Execution Plan ---------------------------------------------------------- Plan hash value: 1290879752 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 4 | 24 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TMP_CF | 4 | 24 | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("ID"<=4) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 4 consistent gets 0 physical reads 0 redo size 668 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 5 rows processed SQL> select * from tmp_cf where id<=3; Execution Plan ---------------------------------------------------------- Plan hash value: 1290879752 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3 | 18 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TMP_CF | 3 | 18 | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("ID"<=3) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 4 consistent gets 0 physical reads 0 redo size 652 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 3 rows processed SQL> select * from tmp_cf where id<=2; Execution Plan ---------------------------------------------------------- Plan hash value: 2401189212 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2 | 12 | 3 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| TMP_CF | 2 | 12 | 3 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IND_ID | 2 | | 2 (0)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"<=2) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 4 consistent gets 0 physical reads 0 redo size 592 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
/* 可見,當返回的行數(shù)為1行時,才選擇了用索引并回表的訪問路徑。*/
/* 創(chuàng)建測試數(shù)據(jù) */ /* 創(chuàng)建測試表,但這次ID列的值為順序產(chǎn)生。 */ SQL> create table tmp_cf_2 as select rownum id,rpad(rownum,50,'a') col1 from dual connect by rownum<=1000; Table created. /* 在ID列上創(chuàng)建索引 */ SQL> create index ind_id_2 on tmp_cf_2(id); Index created. /* 執(zhí)行查詢,條件為ID<=6 */ SQL> select * from tmp_cf_2 where id<=6; Execution Plan ---------------------------------------------------------- Plan hash value: 1887041669 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 6 | 390 | 3 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| TMP_CF_2 | 6 | 390 | 3 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IND_ID_2 | 6 | | 2 (0)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"<=6) Note ----- - dynamic sampling used for this statement (level=2) Statistics ---------------------------------------------------------- 10 recursive calls 0 db block gets 23 consistent gets 1 physical reads 0 redo size 986 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 6 rows processed
/* 從上面可見,此時已經(jīng)采用了先索引再回表的訪問路徑,說明此時其計算出來的成本,是小于全表掃描的。*/
/* 繼續(xù)執(zhí)行查詢,將條件依次為ID<=200,ID<=222,ID<=223,逐步增加返回的行數(shù) */
SQL> select * from tmp_cf_2 where id<=200; Execution Plan ---------------------------------------------------------- Plan hash value: 1887041669 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 200 | 13000 | 5 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| TMP_CF_2 | 200 | 13000 | 5 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IND_ID_2 | 200 | | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"<=200) Note ----- - dynamic sampling used for this statement (level=2) Statistics ---------------------------------------------------------- 7 recursive calls 0 db block gets 48 consistent gets 0 physical reads 0 redo size 14949 bytes sent via SQL*Net to client 667 bytes received via SQL*Net from client 15 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 200 rows processed SQL> select * from tmp_cf_2 where id<=222; Execution Plan ---------------------------------------------------------- Plan hash value: 1887041669 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 222 | 14430 | 5 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| TMP_CF_2 | 222 | 14430 | 5 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IND_ID_2 | 222 | | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"<=222) Note ----- - dynamic sampling used for this statement (level=2) Statistics ---------------------------------------------------------- 7 recursive calls 0 db block gets 50 consistent gets 0 physical reads 0 redo size 16455 bytes sent via SQL*Net to client 678 bytes received via SQL*Net from client 16 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 222 rows processed SQL> select * from tmp_cf_2 where id<=223; Execution Plan ---------------------------------------------------------- Plan hash value: 2467544485 ------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 223 | 14495 | 5 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TMP_CF_2 | 223 | 14495 | 5 (0)| 00:00:01 | ------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("ID"<=223) Note ----- - dynamic sampling used for this statement (level=2) Statistics ---------------------------------------------------------- 7 recursive calls 0 db block gets 43 consistent gets 0 physical reads 0 redo size 15679 bytes sent via SQL*Net to client 678 bytes received via SQL*Net from client 16 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 223 rows processed
/* 從上可見,當返回223行時,才開始采用全表掃描的方式。*/
/* 查看一下兩個索引的聚簇因子 */
SQL> select index_name,clustering_factor from user_indexes where index_name in ('IND_ID','IND_ID_2'); INDEX_NAME CLUSTERING_FACTOR ------------------------------ ----------------- IND_ID 443 IND_ID_2 9
當索引的聚簇因子較高時,會導致該索引回表的成本成大,使得該索引的綜合成本越大,因此使用該索引的可能性越低。
但是,如果在執(zhí)行計劃出現(xiàn)錯誤,使用了該索引,并且該索引的聚簇因子很大,那么通過索引回表的次數(shù)就會很大,同時會造成大量的db file sequential read等待事件。
/* 根據(jù)tmp_cf的表索引列排完序在創(chuàng)建新表tmp_cf_3 */
SQL> create table tmp_cf_3 as select * from tmp_cf order by id; Table created. /* 創(chuàng)建索引 */ SQL> create index ind_id_3 on tmp_cf_3(id); Index created. /* 執(zhí)行查詢 id<=6 */ SQL> select * from tmp_cf_3 where id <= 6; 6 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2546856486 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 6 | 96 | 3 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| TMP_CF_3 | 6 | 96 | 3 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IND_ID_3 | 6 | | 2 (0)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"<=6) Note ----- - dynamic sampling used for this statement (level=2) Statistics ---------------------------------------------------------- 10 recursive calls 0 db block gets 15 consistent gets 1 physical reads 0 redo size 692 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 6 rows processed
/* 從上面可見,此時已經(jīng)采用了先索引再回表的訪問路徑,說明此時其計算出來的成本,是小于全表掃描的。*/
再次查看聚簇因子 SQL> select index_name,clustering_factor from user_indexes where index_name in ('IND_ID','IND_ID_2','IND_ID_3'); INDEX_NAME CLUSTERING_FACTOR ------------------------------ ----------------- IND_ID 443 IND_ID_2 9 IND_ID_3 2
根據(jù)索引列排完序創(chuàng)建的表,然后在原索引列創(chuàng)建索引,此時的聚簇因子很小。
在設(shè)置該參數(shù)前,有一種方式可以在創(chuàng)建索引前先計算出該索引的聚簇因子。
/* 創(chuàng)建測試數(shù)據(jù) */ SQL> create table t1 as select * from dba_objects; Table created. 計算聚簇因子(需要在object_name列創(chuàng)建索引) SELECT SUM(CASE WHEN BLOCK#1 = BLOCK#2 AND FILE#1 = FILE#2 THEN 0 ELSE 1 END) CLUSTERING_FACTOR FROM (SELECT DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID) FILE#1 ,LEAD(DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID), 1, NULL) OVER(ORDER BY OBJECT_NAME) FILE#2 ,DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) BLOCK#1 ,LEAD(DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID), 1, NULL) OVER(ORDER BY OBJECT_NAME) BLOCK#2 FROM T1 WHERE OBJECT_NAME IS NOT NULL); 或者 /*更推薦使用下面*/ SELECT SUM(COUNT_CHG) FROM (SELECT CASE WHEN SUBSTRB(ROWID, 1, 15) <> LAG(SUBSTRB(ROWID, 1, 15), 1, '000000000') OVER(ORDER BY OBJECT_NAME, ROWID) THEN 1 ELSE 0 END COUNT_CHG FROM T1 WHERE OBJECT_NAME IS NOT NULL); SUM(COUNT_CHG) -------------- 44076
/* 實際創(chuàng)建索引并且查看聚簇因子 */
SQL> create index t1_i1 on t1(object_name); Index created. SQL> select index_name,clustering_factor from user_indexes where table_name = 'T1' and index_name = 'T1_I1'; INDEX_NAME CLUSTERING_FACTOR ------------------------------ ----------------- T1_I1 44076
/* 可以看到計算的聚簇因子的值和實際創(chuàng)建完索引的聚簇因子的值是一樣的 */
1、該參數(shù)自12C引入(11g也有效)。
2、告訴索引統(tǒng)計信息收集工具,假設(shè)索引所在的基表上有多少個數(shù)據(jù)塊已經(jīng)在內(nèi)存中。
3、取值范圍為1至255,默認值為1。
4、對小表上索引的聚簇因子影響大,而對于大表上索引的聚簇因子影響小。
/* 接此前tmp_cf表的實驗 */ /* 查看ID列上索引的聚簇因子 */ SQL> select index_name,num_rows,clustering_factor from dba_ind_statistics where table_name='TMP_CF' and table_owner='DEMO'; INDEX_NAME NUM_ROWS CLUSTERING_FACTOR ------------------------------ ---------- ----------------- IND_ID 1000 443 查看當前的TABLE_CACHED_BLOCKS的值 SQL> select dbms_stats.get_prefs('table_cached_blocks','DEMO','TMP_CF') from dual; DBMS_STATS.GET_PREFS('TABLE_CACHED_BLOCKS','DEMO','TMP_CF') -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1 修改為16 SQL> exec dbms_stats.set_table_prefs('DEMO','TMP_CF','TABLE_CACHED_BLOCKS',16); PL/SQL procedure successfully completed. SQL> select dbms_stats.get_prefs('table_cached_blocks','DEMO','TMP_CF') from dual; DBMS_STATS.GET_PREFS('TABLE_CACHED_BLOCKS','DEMO','TMP_CF') -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 16 查看聚簇因子 SQL> select index_name,num_rows,clustering_factor from dba_ind_statistics where table_name='TMP_CF' and table_owner='DEMO'; INDEX_NAME NUM_ROWS CLUSTERING_FACTOR ------------------------------ ---------- ----------------- IND_ID 1000 443 收集索引的統(tǒng)計信息 SQL> exec dbms_stats.gather_index_stats('DEMO','IND_ID'); PL/SQL procedure successfully completed. SQL> select index_name,num_rows,clustering_factor from dba_ind_statistics where table_name='TMP_CF' and table_owner='DEMO'; INDEX_NAME NUM_ROWS CLUSTERING_FACTOR ------------------------------ ---------- ----------------- IND_ID 1000 2
/*此時,再執(zhí)行查詢,條件為ID<=6 看此時的表現(xiàn)*/
SQL> set autot traceonly SQL> select * from tmp_cf where id<=6; Execution Plan ---------------------------------------------------------- Plan hash value: 3869640664 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 6 | 330 | 3 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| TMP_CF | 6 | 330 | 3 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IND_ID | 6 | | 2 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"<=6) Statistics ---------------------------------------------------------- 0 recursive calls 3 db block gets 18 consistent gets 0 physical reads 0 redo size 876 bytes sent via SQL*Net to client 607 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 4 rows processed
/* 從執(zhí)行計劃中可見,原來走全表掃描,現(xiàn)在變成走索引范圍掃描了。 */
/* 嘗試改條件為 id<=100 ,id<=200,id<=300 逐步增大返回的行數(shù)。 看看何時不走索引了*/
SQL> select * from tmp_cf where id<=100; 110 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2401189212 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100 | 600 | 3 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| TMP_CF | 100 | 600 | 3 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IND_ID | 100 | | 2 (0)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"<=100) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 59 consistent gets 0 physical reads 0 redo size 3034 bytes sent via SQL*Net to client 601 bytes received via SQL*Net from client 9 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 110 rows processed SQL> select * from tmp_cf where id<=200; 206 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2401189212 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 200 | 1200 | 3 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| TMP_CF | 200 | 1200 | 3 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IND_ID | 200 | | 2 (0)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"<=200) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 118 consistent gets 0 physical reads 0 redo size 5205 bytes sent via SQL*Net to client 667 bytes received via SQL*Net from client 15 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 206 rows processed SQL> select * from tmp_cf where id<=300; 286 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3869640664 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 300 | 16500 | 5 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| TMP_CF | 300 | 16500 | 5 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IND_ID | 300 | | 2 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"<=300) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 271 consistent gets 0 physical reads 0 redo size 21347 bytes sent via SQL*Net to client 816 bytes received via SQL*Net from client 21 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 286 rows processed SQL> select * from tmp_cf where id<=332; 336 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2401189212 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 333 | 1998 | 3 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| TMP_CF | 333 | 1998 | 3 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IND_ID | 333 | | 2 (0)| 00:00:01 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"<=332) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 195 consistent gets 0 physical reads 0 redo size 8307 bytes sent via SQL*Net to client 766 bytes received via SQL*Net from client 24 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 336 rows processed SQL> select * from tmp_cf where id<=333; 339 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1290879752 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 334 | 2004 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TMP_CF | 334 | 2004 | 3 (0)| 00:00:01 | ---------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("ID"<=333) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 26 consistent gets 0 physical reads 0 redo size 7592 bytes sent via SQL*Net to client 766 bytes received via SQL*Net from client 24 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 339 rows processed
/* 如上實驗,我們可以看到,TABLE_CACHED_BLOCKS對聚簇因子的大小是有影響的。而聚簇因子變小后,CBO會更傾向于使用該索引,而不是全表掃描。*/
下面以設(shè)置TABLE_CACHED_BLOCKS為3來描述計算過程:
準備一個存儲三條記錄的數(shù)組,每條記錄有兩列,一列存儲塊ID,一列存儲順序號。
按順序遍歷索引,并依此順序為每個索引條目提供一個順序號。
從當前索引條目中提取塊id并在數(shù)組中搜索匹配的塊ID。
如果你找到匹配的,就用當前的順序號更新數(shù)組的順序號;
如果沒有找到匹配項,那么用當前塊ID和順序號替換數(shù)組中順序號最小的記錄,并增加clustering_factor的計數(shù)。
/* 創(chuàng)建測試用表 */ /* 為了構(gòu)造一個行數(shù)為10行,但會占用多個數(shù)據(jù)塊的表,所以,我們把其中一個列C1,用長度為2000個字節(jié)的字符填充。同時,ID列中的值也是隨機的,非順序的。*/ SQL> create table tmp_cf_4 as select ceil(dbms_random.value(1,10)) id,lpad('x',2000,'x') c1 from dual connect by rownum<=10; Table created. /* 在ID列上創(chuàng)建索引 */ SQL> create index ind_tmp_cf_4_id on tmp_cf_4(id) tablespace demo; Index created. /* 查看此時該索引的聚簇因子值 */ SQL> select index_name,num_rows,clustering_factor from dba_ind_statistics where table_name='TMP_CF_4' and table_owner='DEMO'; INDEX_NAME NUM_ROWS CLUSTERING_FACTOR ------------------------------ ---------- ----------------- IND_TMP_CF_4_ID 10 8 /* 設(shè)置測試表TMP_CF_4上的TABLE_CACHED_BLOCKS的參數(shù)為3后,收集索引上的統(tǒng)計信息,并查看其聚簇因子的值*/ SQL> exec dbms_stats.set_table_prefs('DEMO','TMP_CF_4','TABLE_CACHED_BLOCKS',3); PL/SQL procedure successfully completed. SQL> exec dbms_stats.gather_index_stats('DEMO','IND_TMP_CF_4_ID'); PL/SQL procedure successfully completed. SQL> select index_name,num_rows,clustering_factor from dba_ind_statistics where table_name='TMP_CF_4' and table_owner='DEMO'; INDEX_NAME NUM_ROWS CLUSTERING_FACTOR ------------------------------ ---------- ----------------- IND_TMP_CF_4_ID 10 5 /* 此時,我們可以看到其聚簇因子的值由此前的8變小到了5. */ /* 下面,我們來看一下,這個新值是如何計算出來的? */ /* 按ID列在索引中的順序顯示其值,順序號以及對應(yīng)行所在的塊號 */ SELECT ROWNUM RN ,ID ,BLOCK_NUM FROM (SELECT ID ,DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) BLOCK_NUM FROM TMP_CF_4 ORDER BY ID ,ROWID); RN ID BLOCK_NUM ---------- ---------- ---------- 1 2 4859 2 3 4862 3 4 4860 4 5 4861 5 6 4859 6 6 4859 7 6 4861 8 8 4860 9 8 4860 10 10 4861 10 rows selected. /* 打開一個EXCEL表,模擬一個存儲3行記錄的數(shù)組,并計算聚簇因子的值 */ RN ID BLOCK_NUM 塊ID順序號 ---------- ---------- ---------- 1 2 4859 104861 2 3 4862 64859 3 4 4860 94860 4 5 4861 5 6 4859 6 6 4859 7 6 4861 8 8 4860 CF5 9 8 4860 10 10 4861
3.3、設(shè)置set_index_stats的聚簇因子值clstfct
exec dbms_stats.set_index_stats
('DEMO', #索引的屬主
'IND_TAB_CACHED_1_ID', #索引名
clstfct=>100 #聚簇因子值
);
注:設(shè)置后,要把該索引所在的基表統(tǒng)計信息加鎖
(dbms_stats.lock_table_stats)。以避免下次收集統(tǒng)計時,被新值覆蓋。(dbms_stats包無單獨針對索引統(tǒng)計信息加鎖的方法)
/* 創(chuàng)建測試數(shù)據(jù) */ /* 為了制造出聚簇因子較高的索引,對生成的ID數(shù)值取隨機數(shù)1至1000。*/ SQL> create table tmp_cf_5 as select ceil(dbms_random.value*1000) id,'a' name from dual connect by rownum<=1000; Table created. /* 在ID列上創(chuàng)建索引 */ SQL> create index ind_tmp_cf_5_id on tmp_cf_5(id); Index created. /* 收集統(tǒng)計信息 */ SQL> exec dbms_stats.gather_table_stats('demo','tmp_cf_5'); PL/SQL procedure successfully completed. /* 開啟autotrace查看執(zhí)行計劃 */ SQL> select * from tmp_cf_5 where id<=6; Execution Plan ---------------------------------------------------------- Plan hash value: 4131497868 ------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 7 | 42 | 3 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TMP_CF_5 | 7 | 42 | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("ID"<=6) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 4 consistent gets 0 physical reads 0 redo size 668 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 5 rows processed 當前的索引的聚簇因子 SQL> select index_name,num_rows,clustering_factor from dba_ind_statistics where table_name='TMP_CF_5' and table_owner='DEMO'; INDEX_NAME NUM_ROWS CLUSTERING_FACTOR ------------------------------ ---------- ----------------- IND_TMP_CF_5_ID 1000 438 修改當前的聚簇因子值 SQL> exec dbms_stats.set_index_stats('DEMO','IND_TMP_CF_5_ID',clstfct=>10); PL/SQL procedure successfully completed. SQL> select index_name,num_rows,clustering_factor from dba_ind_statistics where table_name='TMP_CF_5' and table_owner='DEMO'; INDEX_NAME NUM_ROWS CLUSTERING_FACTOR ------------------------------ ---------- ----------------- IND_TMP_CF_5_ID 1000 10 再次查看執(zhí)行計劃 SQL> select * from tmp_cf_5 where id<=6; Execution Plan ---------------------------------------------------------- Plan hash value: 4231491781 ----------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 7 | 42 | 3 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| TMP_CF_5 | 7 | 42 | 3 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IND_TMP_CF_5_ID | 7 | | 2 (0)| 00:00:01 | ----------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"<=6) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 4 consistent gets 0 physical reads 0 redo size 668 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 5 rows processed
/* 從上面可見,此時已經(jīng)采用了先索引再回表的訪問路徑,說明此時其計算出來的成本,是小于全表掃描的。*/
到此,關(guān)于“什么是Clustering Factor”的學習就結(jié)束了,希望能夠解決大家的疑惑。理論與實踐的搭配能更好的幫助大家學習,快去試試吧!若想繼續(xù)學習更多相關(guān)知識,請繼續(xù)關(guān)注億速云網(wǎng)站,小編會繼續(xù)努力為大家?guī)砀鄬嵱玫奈恼拢?/p>
免責聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點不代表本網(wǎng)站立場,如果涉及侵權(quán)請聯(lián)系站長郵箱:is@yisu.com進行舉報,并提供相關(guān)證據(jù),一經(jīng)查實,將立刻刪除涉嫌侵權(quán)內(nèi)容。