<u id="66661"></u>

溫馨提示×

溫馨提示×

您好，登錄后才能下訂單哦！

密碼登錄×

忘記密碼？

登錄注冊×

獲取短信驗證碼

其他方式登錄

點擊登錄注冊即表示同意《億速云用戶服務(wù)條款》

用戶登錄×

賬戶密碼登錄

請使用微信掃描上方二維碼

使用幫助

請求超時！

請點擊重新獲取二維碼

什么是Clustering Factor

發(fā)布時間：2021-11-05 10:57:11 來源：億速云閱讀：266 作者：iii 欄目：關(guān)系型數(shù)據(jù)庫

這篇文章主要介紹“什么是Clustering Factor”，在日常操作中，相信很多人在什么是Clustering Factor問題上存在疑惑，小編查閱了各式資料，整理出簡單好用的操作方法，希望對大家解答”什么是Clustering Factor”的疑惑有所幫助！接下來，請跟著小編一起來學習吧！

一、什么是聚簇因子

Clustering Factor（CF）聚簇因子，意味著對索引的表的群集度,用來比較索引的順序程度與表的雜亂排序程度的一個度量。

集群因子用于判斷索引回表需要消耗的物理 I/O 次數(shù)。

CF是在假設(shè)內(nèi)存大小只能載入一個塊，這是伴隨索引掃描所需要的表掃描次數(shù)的計算值。

更準確地說，他表示沿著索引的葉塊，在ROWID值上代表塊編號的第1~15位的值與之前ROWID作出比較后被更替的次數(shù)。

如：

有五個塊組成的索引和由五個塊組成的表，而一個塊里有4行記錄，那么總行數(shù)就是20行。

順序的掃描索引的同時讀取相對應(yīng)的表，這時可能有兩個極端情況：

1、CF最低時

如果一個索引塊所包含的ROWID都被一個表塊所包含，則通過索引掃描表時，只掃描索引5次和表5次，就可以得到想要的數(shù)據(jù)。

此時CF=5（表塊的掃描次數(shù)），CF的最小值與表的塊數(shù)相同

2、CF最高時

如果一個索引塊所包含的ROWID分別被不同的表塊所包含，則通過索引掃描表時，

必須掃描20次[4(索引塊數(shù))+4(索引塊數(shù))+4(索引塊數(shù))+4(索引塊數(shù))+4(索引塊數(shù)) = 20次]才能獲得想要的數(shù)據(jù)。

這時CF=20（表塊的掃描次數(shù)），CF最大值與表的行數(shù)相同

二、聚簇因子計算過程

聚簇因子大致的計算方法順序如下：

1、進行一次索引全掃描

2、檢查索引塊中的rowid信息。比較與前一個rowid是否指向同一個數(shù)據(jù)塊。若不同，則聚簇因子加1.

3、當完成整個的索引掃面后，即得到該索引的聚簇因子的數(shù)值。

計算過程

SELECT *
FROM   (SELECT ID
              ,ROWID
        FROM   TMP_CF
        ORDER  BY ID)
WHERE  ROWNUM <= 5;        
        ID ROWID
---------- ------------------
         2 AAAVqxAAEAAAAfDACL
         3 AAAVqxAAEAAAAfDAAf
         3 AAAVqxAAEAAAAfDAJR
         4 AAAVqxAAEAAAAfDAGZ
         4 AAAVqxAAEAAAAfDAKP

首先我們比較 2、3 對應(yīng)的 ROWID 是否在同一個數(shù)據(jù)塊，如果在同一個數(shù)據(jù)塊，Clustering Factor +0；

如果不在同一個數(shù)據(jù)塊，那么 Clustering Factor 值加 1。

然后我們比較 3、4 對應(yīng)的 ROWID 是否在同一個數(shù)據(jù)塊，如果在同一個數(shù)據(jù)塊，Clustering Factor 值不變；

如果不在同一個數(shù)據(jù)塊，那么 Clustering Factor 值加 1。

接下來我們比較4、5對應(yīng)的ROWID是否在同一個數(shù)據(jù)塊，如果在同一個數(shù)據(jù)塊，Clustering Factor +0；

如果不在同一個數(shù)據(jù)塊，那么 Clustering Factor 值加 1。

像上面步驟一樣，一直這樣有序地比較下去，直到比較完索引中最后一個鍵值。

表中的數(shù)據(jù)塊數(shù) <= 聚簇因子的取值范圍 <= 表中的記錄數(shù)

三、聚簇因子的影響

如果集群因子與塊數(shù)接近，說明表的數(shù)據(jù)基本上是有序的，而且其順序基本與索引順序一樣。

這樣在進行索引范圍或者索引全掃描的時候，回表只需要讀取少量的數(shù)據(jù)塊就能完成。

如果集群因子與表記錄數(shù)接近，說明表的數(shù)據(jù)和索引順序差異很大，在進行索引范圍掃描或者索引全掃描的時候，回表會讀取更多的數(shù)據(jù)塊。

集群因子只會影響索引范圍掃描（INDEX RANGE SCAN）以及索引全掃描（INDEX FULL SCAN），因為只有這兩種索引掃描方式會有大量數(shù)據(jù)回表。

集群因子不會影響索引唯一掃描（INDEX UNIQUE SCAN），因為索引唯一掃描只返回一條數(shù)據(jù)。

集群因子更不會影響索引快速全掃描（INDEX FAST FULL SCAN），因為索引快速全掃描不回表。

集群因子影響的是索引回表的物理 I/O 次數(shù)。

我們假設(shè)索引范圍掃描返回了 1 000 行數(shù)據(jù)，如果 buffer cache 中沒有緩存表的數(shù)據(jù)塊，

假設(shè)這 1000行數(shù)據(jù)都在同一個數(shù)據(jù)塊中，那么回表需要耗費的物理 I/O 就只需要一個；

假設(shè)這 1000 行數(shù)據(jù)都在不同的數(shù)據(jù)塊中，那么回表就需要耗費 1 000 個物理 I/O。

因此，集群因子影響索引回表的物理 I/O 次數(shù)。

假設(shè)I/O不使用內(nèi)存，CF越高讀取表塊的次數(shù)就越多，因此物理I/O也會增加。

即，CF越高，通過ROWID讀取表塊的次數(shù)就會越多，db file sequential read也會相應(yīng)的增加。

通過高速緩沖區(qū)讀取過一次的塊因為不在發(fā)生物理I/O，因此CF高未必一定會降低SQL語句的性能。

但是大量掃描CF值高的索引，則需要讀取的表塊增加，因此可能給性能帶來嚴重問題。

比如當ORACLE計算索引范圍掃描（IRS）及其回表時的成本時，

使用如下的公式：

IRS COST=I/O COST + CPU COST

其中 I/O COST = INDEX ACCESS I/O COST + TABLE ACCESS I/O COST

其中:

INDEX ACCESS I/O COST = BLEVEL+CEIL(#LEAF_BLOCKS*IX_SEL)

TABLE ACCESS I/O COST = CEIL(CLUSTERING_FACTOR*IX_SEL_WITH_FILTERS)

索引的聚簇因子越大

--》從該索引回表的成本越大

--》使用該索引的綜合成本越大

--》使用該索引的可能越小

對索引收集統(tǒng)計信息，則索引的聚簇因子會被記錄在 DBA_INDEXES的CLUSTING_FACTOR列。

如果CF與表的塊數(shù)相當就是“好現(xiàn)象”；如果CF和表的行數(shù)相當就不是“好現(xiàn)象”。

四、如何優(yōu)化

如果SQL語句的性能問題被判斷位由于聚簇因子導致，如何進行優(yōu)化？

1、將索引掃描換成全表掃描

2、使用其他索引

3、按照索引的排列順序重新創(chuàng)建表(最后的選擇)

create table new_table as select ... from old_table order by indexd_column

4、某些情況，設(shè)置table_cached_blocks參數(shù)

5、設(shè)置set_index_stats的聚簇因子值clstfct

注意：
當索引范圍掃描，索引全掃描不回表或者返回數(shù)據(jù)量很少的時候，不管集群因子多大，對 SQL 查詢性能幾乎沒有任何影響。
在進行 SQL 優(yōu)化的時候，往往會建立合適的組合索引消除回表，或者建立組合索引盡量減少回表次數(shù)。

五、聚簇因子對執(zhí)行計劃的影響的實驗：

1、聚簇因子高對執(zhí)行計劃的影響

2、聚簇因子低對執(zhí)行計劃的影響

3、調(diào)整聚簇因子

3.1、根據(jù)索引列重新創(chuàng)建表

3.2、設(shè)置table_cached_blocks參數(shù)

3.3、設(shè)置set_index_stats的聚簇因子值clstfct

1、聚簇因子高對執(zhí)行計劃的影響

/* 創(chuàng)建測試數(shù)據(jù) */
/* 為了制造出聚簇因子較高的索引，對生成的ID數(shù)值取隨機數(shù)1至1000。*/
SQL> create table tmp_cf as select ceil(dbms_random.value*1000) id,'a' name from dual connect by rownum<=1000;
Table created.
/* 在ID列上創(chuàng)建索引 */
SQL> create index ind_id on tmp_cf(id);
Index created.
/* 收集統(tǒng)計信息 */
SQL> exec dbms_stats.gather_table_stats('demo','tmp_cf');
PL/SQL procedure successfully completed.
/* 開啟autotrace查看執(zhí)行計劃 */
SQL> set autot traceonly
/* 執(zhí)行查詢，條件為ID<=6 */
SQL> select * from tmp_cf where id<=6;
Execution Plan
----------------------------------------------------------
Plan hash value: 1290879752
----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |     6 |    36 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TMP_CF |     6 |    36 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ID"<=6)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        676  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          6  rows processed

/* 從執(zhí)行計劃中可見，在只返回6行的情況下，估算出的走索引并回表的成本仍然大于全表掃描。 */

/* 嘗試改條件為 id<=5 ，id<=4,id<=3 進一步減少返回的行數(shù)。 */

SQL> select * from tmp_cf where id<=5;
Execution Plan
----------------------------------------------------------
Plan hash value: 1290879752
----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |     5 |    30 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TMP_CF |     5 |    30 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ID"<=5)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        668  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          5  rows processed
SQL> select * from tmp_cf where id<=4;
Execution Plan
----------------------------------------------------------
Plan hash value: 1290879752
----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |     4 |    24 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TMP_CF |     4 |    24 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ID"<=4)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        668  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          5  rows processed
SQL> select * from tmp_cf where id<=3;
Execution Plan
----------------------------------------------------------
Plan hash value: 1290879752
----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |     3 |    18 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TMP_CF |     3 |    18 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ID"<=3)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        652  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          3  rows processed
SQL> select * from tmp_cf where id<=2;
Execution Plan
----------------------------------------------------------
Plan hash value: 2401189212
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |     2 |    12 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TMP_CF |     2 |    12 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_ID |     2 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"<=2)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        592  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

/* 可見，當返回的行數(shù)為1行時，才選擇了用索引并回表的訪問路徑。*/

2、聚簇因子低對執(zhí)行計劃的影響

/* 創(chuàng)建測試數(shù)據(jù) */
/* 創(chuàng)建測試表，但這次ID列的值為順序產(chǎn)生。 */
SQL> create table tmp_cf_2 as select rownum id,rpad(rownum,50,'a') col1 from dual connect by rownum<=1000;
Table created.
/* 在ID列上創(chuàng)建索引 */
SQL> create index ind_id_2 on tmp_cf_2(id);
Index created.
/* 執(zhí)行查詢，條件為ID<=6 */
SQL> select * from tmp_cf_2 where id<=6;
Execution Plan
----------------------------------------------------------
Plan hash value: 1887041669
----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |     6 |   390 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TMP_CF_2 |     6 |   390 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_ID_2 |     6 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"<=6)
Note
-----
   - dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
         10  recursive calls
          0  db block gets
         23  consistent gets
          1  physical reads
          0  redo size
        986  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          6  rows processed

/* 從上面可見，此時已經(jīng)采用了先索引再回表的訪問路徑，說明此時其計算出來的成本，是小于全表掃描的。*/

/* 繼續(xù)執(zhí)行查詢，將條件依次為ID<=200,ID<=222,ID<=223,逐步增加返回的行數(shù) */

SQL> select * from tmp_cf_2 where id<=200;
Execution Plan
----------------------------------------------------------
Plan hash value: 1887041669
----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |   200 | 13000 |     5   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TMP_CF_2 |   200 | 13000 |     5   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_ID_2 |   200 |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"<=200)
Note
-----
   - dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
         48  consistent gets
          0  physical reads
          0  redo size
      14949  bytes sent via SQL*Net to client
        667  bytes received via SQL*Net from client
         15  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        200  rows processed
SQL> select * from tmp_cf_2 where id<=222;
Execution Plan
----------------------------------------------------------
Plan hash value: 1887041669
----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |   222 | 14430 |     5   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TMP_CF_2 |   222 | 14430 |     5   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_ID_2 |   222 |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"<=222)
Note
-----
   - dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
         50  consistent gets
          0  physical reads
          0  redo size
      16455  bytes sent via SQL*Net to client
        678  bytes received via SQL*Net from client
         16  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        222  rows processed
SQL> select * from tmp_cf_2 where id<=223;
Execution Plan
----------------------------------------------------------
Plan hash value: 2467544485
------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          |   223 | 14495 |     5   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TMP_CF_2 |   223 | 14495 |     5   (0)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ID"<=223)
Note
-----
   - dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
          7  recursive calls
          0  db block gets
         43  consistent gets
          0  physical reads
          0  redo size
      15679  bytes sent via SQL*Net to client
        678  bytes received via SQL*Net from client
         16  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        223  rows processed

/* 從上可見，當返回223行時，才開始采用全表掃描的方式。*/

/* 查看一下兩個索引的聚簇因子 */

SQL>  select index_name,clustering_factor from user_indexes where index_name in ('IND_ID','IND_ID_2');
INDEX_NAME                     CLUSTERING_FACTOR
------------------------------ -----------------
IND_ID                                       443
IND_ID_2                                       9

當索引的聚簇因子較高時，會導致該索引回表的成本成大，使得該索引的綜合成本越大，因此使用該索引的可能性越低。

但是，如果在執(zhí)行計劃出現(xiàn)錯誤，使用了該索引，并且該索引的聚簇因子很大，那么通過索引回表的次數(shù)就會很大，同時會造成大量的db file sequential read等待事件。

3.1、根據(jù)索引列重新創(chuàng)建表

/* 根據(jù)tmp_cf的表索引列排完序在創(chuàng)建新表tmp_cf_3 */

SQL> create table tmp_cf_3 as select * from tmp_cf order by id;
Table created.
/* 創(chuàng)建索引 */
SQL> create index ind_id_3 on tmp_cf_3(id);
Index created.
/* 執(zhí)行查詢 id<=6 */
SQL> select * from tmp_cf_3 where id <= 6;
6 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2546856486
----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |     6 |    96 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TMP_CF_3 |     6 |    96 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_ID_3 |     6 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"<=6)
Note
-----
   - dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
         10  recursive calls
          0  db block gets
         15  consistent gets
          1  physical reads
          0  redo size
        692  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          6  rows processed

/* 從上面可見，此時已經(jīng)采用了先索引再回表的訪問路徑，說明此時其計算出來的成本，是小于全表掃描的。*/

再次查看聚簇因子
SQL> select index_name,clustering_factor from user_indexes where index_name in ('IND_ID','IND_ID_2','IND_ID_3');
INDEX_NAME                     CLUSTERING_FACTOR
------------------------------ -----------------
IND_ID                                       443
IND_ID_2                                       9
IND_ID_3                                       2

根據(jù)索引列排完序創(chuàng)建的表，然后在原索引列創(chuàng)建索引，此時的聚簇因子很小。

3.2、設(shè)置table_cached_blocks參數(shù)

在設(shè)置該參數(shù)前，有一種方式可以在創(chuàng)建索引前先計算出該索引的聚簇因子。

/* 創(chuàng)建測試數(shù)據(jù) */
SQL> create table t1 as select * from dba_objects;
Table created.
計算聚簇因子（需要在object_name列創(chuàng)建索引）
SELECT SUM(CASE
             WHEN BLOCK#1 = BLOCK#2 AND FILE#1 = FILE#2 THEN
              0
             ELSE
              1
           END) CLUSTERING_FACTOR
FROM   (SELECT DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID) FILE#1
              ,LEAD(DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID), 1, NULL) OVER(ORDER BY OBJECT_NAME) FILE#2
              ,DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) BLOCK#1
              ,LEAD(DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID), 1, NULL) OVER(ORDER BY OBJECT_NAME) BLOCK#2
        FROM   T1
        WHERE  OBJECT_NAME IS NOT NULL);
或者
/*更推薦使用下面*/
SELECT SUM(COUNT_CHG)
FROM   (SELECT CASE
                 WHEN SUBSTRB(ROWID, 1, 15) <>
                      LAG(SUBSTRB(ROWID, 1, 15), 1, '000000000')
                  OVER(ORDER BY OBJECT_NAME, ROWID) THEN
                  1
                 ELSE
                  0
               END COUNT_CHG
        FROM   T1
        WHERE  OBJECT_NAME IS NOT NULL);
SUM(COUNT_CHG)
--------------
         44076

/* 實際創(chuàng)建索引并且查看聚簇因子 */

SQL> create index t1_i1 on t1(object_name);
Index created.
SQL> select  index_name,clustering_factor from user_indexes  where  table_name = 'T1' and index_name = 'T1_I1';
INDEX_NAME                     CLUSTERING_FACTOR
------------------------------ -----------------
T1_I1                                      44076

/* 可以看到計算的聚簇因子的值和實際創(chuàng)建完索引的聚簇因子的值是一樣的 */

參數(shù) TABLE_CACHED_BLOCKS：

1、該參數(shù)自12C引入(11g也有效)。

2、告訴索引統(tǒng)計信息收集工具，假設(shè)索引所在的基表上有多少個數(shù)據(jù)塊已經(jīng)在內(nèi)存中。

3、取值范圍為1至255，默認值為1。

4、對小表上索引的聚簇因子影響大，而對于大表上索引的聚簇因子影響小。

/* 接此前tmp_cf表的實驗 */
/* 查看ID列上索引的聚簇因子 */
SQL> select index_name,num_rows,clustering_factor from dba_ind_statistics where table_name='TMP_CF' and table_owner='DEMO';
INDEX_NAME                       NUM_ROWS CLUSTERING_FACTOR
------------------------------ ---------- -----------------
IND_ID                               1000               443
查看當前的TABLE_CACHED_BLOCKS的值
SQL> select dbms_stats.get_prefs('table_cached_blocks','DEMO','TMP_CF') from dual;
DBMS_STATS.GET_PREFS('TABLE_CACHED_BLOCKS','DEMO','TMP_CF')
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1
修改為16
SQL> exec dbms_stats.set_table_prefs('DEMO','TMP_CF','TABLE_CACHED_BLOCKS',16);
PL/SQL procedure successfully completed.
SQL> select dbms_stats.get_prefs('table_cached_blocks','DEMO','TMP_CF') from dual;
DBMS_STATS.GET_PREFS('TABLE_CACHED_BLOCKS','DEMO','TMP_CF')
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
16
查看聚簇因子
SQL> select index_name,num_rows,clustering_factor from dba_ind_statistics where table_name='TMP_CF' and table_owner='DEMO';
INDEX_NAME                       NUM_ROWS CLUSTERING_FACTOR
------------------------------ ---------- -----------------
IND_ID                               1000               443
收集索引的統(tǒng)計信息
SQL> exec dbms_stats.gather_index_stats('DEMO','IND_ID');
PL/SQL procedure successfully completed.
SQL> select index_name,num_rows,clustering_factor from dba_ind_statistics where table_name='TMP_CF' and table_owner='DEMO';
INDEX_NAME                       NUM_ROWS CLUSTERING_FACTOR
------------------------------ ---------- -----------------
IND_ID                               1000                 2

/*此時，再執(zhí)行查詢，條件為ID<=6 看此時的表現(xiàn)*/

SQL> set autot traceonly
SQL> select * from tmp_cf where id<=6;
Execution Plan
----------------------------------------------------------
Plan hash value: 3869640664
----------------------------------------------------------------------------------------------
| Id  | Operation                           | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |        |     6 |   330 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TMP_CF |     6 |   330 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | IND_ID |     6 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"<=6)
Statistics
----------------------------------------------------------
          0  recursive calls
          3  db block gets
         18  consistent gets
          0  physical reads
          0  redo size
        876  bytes sent via SQL*Net to client
        607  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          4  rows processed

/* 從執(zhí)行計劃中可見，原來走全表掃描，現(xiàn)在變成走索引范圍掃描了。 */

/* 嘗試改條件為 id<=100 ，id<=200,id<=300 逐步增大返回的行數(shù)。看看何時不走索引了*/

SQL> select * from tmp_cf where id<=100;
110 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2401189212
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |   100 |   600 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TMP_CF |   100 |   600 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_ID |   100 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"<=100)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         59  consistent gets
          0  physical reads
          0  redo size
       3034  bytes sent via SQL*Net to client
        601  bytes received via SQL*Net from client
          9  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        110  rows processed
SQL> select * from tmp_cf where id<=200;
206 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2401189212
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |   200 |  1200 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TMP_CF |   200 |  1200 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_ID |   200 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"<=200)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        118  consistent gets
          0  physical reads
          0  redo size
       5205  bytes sent via SQL*Net to client
        667  bytes received via SQL*Net from client
         15  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        206  rows processed
SQL> select * from tmp_cf where id<=300;
286 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3869640664
----------------------------------------------------------------------------------------------
| Id  | Operation                           | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |        |   300 | 16500 |     5   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TMP_CF |   300 | 16500 |     5   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | IND_ID |   300 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"<=300)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        271  consistent gets
          0  physical reads
          0  redo size
      21347  bytes sent via SQL*Net to client
        816  bytes received via SQL*Net from client
         21  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        286  rows processed
SQL> select * from tmp_cf where id<=332;  
336 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2401189212
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |   333 |  1998 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TMP_CF |   333 |  1998 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_ID |   333 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"<=332)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        195  consistent gets
          0  physical reads
          0  redo size
       8307  bytes sent via SQL*Net to client
        766  bytes received via SQL*Net from client
         24  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        336  rows processed
SQL> select * from tmp_cf where id<=333;
339 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1290879752
----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |   334 |  2004 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TMP_CF |   334 |  2004 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ID"<=333)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         26  consistent gets
          0  physical reads
          0  redo size
       7592  bytes sent via SQL*Net to client
        766  bytes received via SQL*Net from client
         24  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        339  rows processed

/* 如上實驗，我們可以看到，TABLE_CACHED_BLOCKS對聚簇因子的大小是有影響的。而聚簇因子變小后，CBO會更傾向于使用該索引，而不是全表掃描。*/

參數(shù) TABLE_CACHED_BLOCKS是如何影響聚簇因子的計算過程的：

下面以設(shè)置TABLE_CACHED_BLOCKS為3來描述計算過程：

準備一個存儲三條記錄的數(shù)組，每條記錄有兩列，一列存儲塊ID，一列存儲順序號。

按順序遍歷索引，并依此順序為每個索引條目提供一個順序號。

從當前索引條目中提取塊id并在數(shù)組中搜索匹配的塊ID。

如果你找到匹配的，就用當前的順序號更新數(shù)組的順序號；

如果沒有找到匹配項，那么用當前塊ID和順序號替換數(shù)組中順序號最小的記錄，并增加clustering_factor的計數(shù)。

/* 創(chuàng)建測試用表 */
/* 為了構(gòu)造一個行數(shù)為10行，但會占用多個數(shù)據(jù)塊的表，所以，我們把其中一個列C1，用長度為2000個字節(jié)的字符填充。同時，ID列中的值也是隨機的，非順序的。*/
SQL> create table tmp_cf_4 as select ceil(dbms_random.value(1,10)) id,lpad('x',2000,'x') c1 from dual connect by rownum<=10;
Table created.
/* 在ID列上創(chuàng)建索引 */
SQL> create index ind_tmp_cf_4_id on tmp_cf_4(id) tablespace demo;
Index created.
/* 查看此時該索引的聚簇因子值 */
SQL> select index_name,num_rows,clustering_factor from dba_ind_statistics where table_name='TMP_CF_4' and table_owner='DEMO';
INDEX_NAME                       NUM_ROWS CLUSTERING_FACTOR
------------------------------ ---------- -----------------
IND_TMP_CF_4_ID                        10                 8
/* 設(shè)置測試表TMP_CF_4上的TABLE_CACHED_BLOCKS的參數(shù)為3后，收集索引上的統(tǒng)計信息，并查看其聚簇因子的值*/
SQL> exec dbms_stats.set_table_prefs('DEMO','TMP_CF_4','TABLE_CACHED_BLOCKS',3);
PL/SQL procedure successfully completed.
SQL> exec dbms_stats.gather_index_stats('DEMO','IND_TMP_CF_4_ID');
PL/SQL procedure successfully completed.
SQL> select index_name,num_rows,clustering_factor from dba_ind_statistics where table_name='TMP_CF_4' and table_owner='DEMO';
INDEX_NAME                       NUM_ROWS CLUSTERING_FACTOR
------------------------------ ---------- -----------------
IND_TMP_CF_4_ID                        10                 5
/* 此時，我們可以看到其聚簇因子的值由此前的8變小到了5. */
/* 下面，我們來看一下，這個新值是如何計算出來的？ */
/* 按ID列在索引中的順序顯示其值，順序號以及對應(yīng)行所在的塊號 */
SELECT ROWNUM RN
      ,ID
      ,BLOCK_NUM
FROM   (SELECT ID
              ,DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) BLOCK_NUM
        FROM   TMP_CF_4
        ORDER  BY ID
                 ,ROWID);
        RN         ID  BLOCK_NUM
---------- ---------- ----------
         1          2       4859
         2          3       4862
         3          4       4860
         4          5       4861
         5          6       4859
         6          6       4859
         7          6       4861
         8          8       4860
         9          8       4860
        10         10       4861
10 rows selected.
/* 打開一個EXCEL表，模擬一個存儲3行記錄的數(shù)組，并計算聚簇因子的值 */
        RN         ID  BLOCK_NUM     塊ID順序號
---------- ---------- ----------     
         1          2       4859     104861
         2          3       4862     64859
         3          4       4860     94860
         4          5       4861     
         5          6       4859     
         6          6       4859     
         7          6       4861     
         8          8       4860     CF5
         9          8       4860
        10         10       4861

3.3、設(shè)置set_index_stats的聚簇因子值clstfct

exec dbms_stats.set_index_stats

('DEMO', #索引的屬主

'IND_TAB_CACHED_1_ID', #索引名

clstfct=>100 #聚簇因子值

);

注：設(shè)置后，要把該索引所在的基表統(tǒng)計信息加鎖

（dbms_stats.lock_table_stats）。以避免下次收集統(tǒng)計時，被新值覆蓋。（dbms_stats包無單獨針對索引統(tǒng)計信息加鎖的方法）

/* 創(chuàng)建測試數(shù)據(jù) */
/* 為了制造出聚簇因子較高的索引，對生成的ID數(shù)值取隨機數(shù)1至1000。*/
SQL> create table tmp_cf_5 as select ceil(dbms_random.value*1000) id,'a' name from dual connect by rownum<=1000;
Table created.
/* 在ID列上創(chuàng)建索引 */
SQL> create index ind_tmp_cf_5_id on tmp_cf_5(id);
Index created.
/* 收集統(tǒng)計信息 */
SQL> exec dbms_stats.gather_table_stats('demo','tmp_cf_5');
PL/SQL procedure successfully completed.
/* 開啟autotrace查看執(zhí)行計劃 */
SQL> select * from tmp_cf_5 where id<=6;
Execution Plan
----------------------------------------------------------
Plan hash value: 4131497868
------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          |     7 |    42 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TMP_CF_5 |     7 |    42 |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ID"<=6)
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        668  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          5  rows processed
當前的索引的聚簇因子
SQL> select index_name,num_rows,clustering_factor from dba_ind_statistics where table_name='TMP_CF_5' and table_owner='DEMO';
INDEX_NAME                       NUM_ROWS CLUSTERING_FACTOR
------------------------------ ---------- -----------------
IND_TMP_CF_5_ID                      1000               438
修改當前的聚簇因子值
SQL> exec dbms_stats.set_index_stats('DEMO','IND_TMP_CF_5_ID',clstfct=>10);
PL/SQL procedure successfully completed.
SQL> select index_name,num_rows,clustering_factor from dba_ind_statistics where table_name='TMP_CF_5' and table_owner='DEMO';
INDEX_NAME                       NUM_ROWS CLUSTERING_FACTOR
------------------------------ ---------- -----------------
IND_TMP_CF_5_ID                      1000                10
再次查看執(zhí)行計劃
SQL> select * from tmp_cf_5 where id<=6;
Execution Plan
----------------------------------------------------------
Plan hash value: 4231491781
-----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                 |     7 |    42 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TMP_CF_5        |     7 |    42 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_TMP_CF_5_ID |     7 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"<=6)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        668  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          5  rows processed

/* 從上面可見，此時已經(jīng)采用了先索引再回表的訪問路徑，說明此時其計算出來的成本，是小于全表掃描的。*/

到此，關(guān)于“什么是Clustering Factor”的學習就結(jié)束了，希望能夠解決大家的疑惑。理論與實踐的搭配能更好的幫助大家學習，快去試試吧！若想繼續(xù)學習更多相關(guān)知識，請繼續(xù)關(guān)注億速云網(wǎng)站，小編會繼續(xù)努力為大家?guī)砀鄬嵱玫奈恼拢?/p>

向AI問一下細節(jié)

推薦閱讀：

免責聲明：本站發(fā)布的內(nèi)容（圖片、視頻和文字）以原創(chuàng)、轉(zhuǎn)載和分享為主，文章觀點不代表本網(wǎng)站立場，如果涉及侵權(quán)請聯(lián)系站長郵箱：is@yisu.com進行舉報，并提供相關(guān)證據(jù)，一經(jīng)查實，將立刻刪除涉嫌侵權(quán)內(nèi)容。

上一篇新聞：
怎樣在Linux桌面添加用戶
下一篇新聞：
Oracle數(shù)據(jù)庫中怎么實現(xiàn)數(shù)據(jù)行遷移與行鏈接

猜你喜歡

AI
助
手

產(chǎn)品服務(wù)

地區(qū)劃分

專題活動

幫助支持

關(guān)于我們

售后咨詢

7*24小時在線電話：400-100-2938

7*24小時在線 QQ：800811969

關(guān)注億速云

億速云公眾號

手機網(wǎng)站二維碼