數(shù)據(jù)庫中cluster factor對執(zhí)行計劃的影響是什么

發(fā)布時間：2021-11-04 10:58:32 來源：億速云閱讀：151 作者：iii 欄目：關(guān)系型數(shù)據(jù)庫

這篇文章主要講解了“數(shù)據(jù)庫中cluster factor對執(zhí)行計劃的影響是什么”，文中的講解內(nèi)容簡單清晰，易于學(xué)習(xí)與理解，下面請大家跟著小編的思路慢慢深入，一起來研究和學(xué)習(xí)“數(shù)據(jù)庫中cluster factor對執(zhí)行計劃的影響是什么”吧！

cluster factor對執(zhí)行計劃的影響

測試環(huán)境：Linux 7.6 + ORACLE 19.6.1

1.創(chuàng)建測試環(huán)境

1.1 創(chuàng)建測試表并插入數(shù)據(jù)

CZH@czhpdb > create table test_ffs as select * from hr.employees;
 
Table created.
 
CZH@czhpdb > insert into test_ffs select * from test_ffs;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 296244252
 
---------------------------------------------------------------------------------------------
| Id  | Operation                        | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                 |          |   107 |  7383 |     3   (0)| 00:00:01 |
|   1 |  LOAD TABLE CONVENTIONAL         | TEST_FFS |       |       |            |          |
|   2 |   OPTIMIZER STATISTICS GATHERING |          |   107 |  7383 |     3   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL             | TEST_FFS |   107 |  7383 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
 
Note
-----
   - dynamic statistics used: statistics for conventional DML
 
 
Statistics
----------------------------------------------------------
         72  recursive calls
         89  db block gets
         81  consistent gets
         12  physical reads
      21576  redo size
        195  bytes sent via SQL*Net to client
        394  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
        107  rows processed

上面autotrace執(zhí)行計劃可以看到兩個新特性：

1.2 12c R1與19c兩個新特性

1.2.1 12c R1新特性O(shè)PTIMIZER STATISTICS GATHERING：

# OPTIMIZER STATISTICS GATHERING：12cR1以后的新特性，direct path load時，空表第一次加載數(shù)據(jù)時會自動收集統(tǒng)計信息。

# Oracle Database 12c introduced online statistics gathering for CREATE TABLE AS SELECT statements and direct-path inserts.

1.2.2 19c新特性real-time statistics

Oracle Database 19c introduces real-time statistics
, which extend online support to conventional DML statements
. Because statistics can go stale between DBMS_STATS jobs, real-time statistics helps the optimizer generate more optimal plans.Whereas bulk load operations gather all necessary statistics, real-time statistics augment rather than replace traditional statistics.

· Oracle introduced new parameters

· "_optimizer_gather_stats_on_conventional_dml" and "_optimizer_use_stats_on_conventional_dml" which are true by default

· "_optimizer_stats_on_conventional_dml_sample_rate" at 100%

· How does real time statistics works?

· By default the "_optimizer_gather_stats_on_conventional_dml" is true so its automatically kicks off

· When a DML operation is currently modifying a table (conventional), Oracle Database dynamically computes values for the most essential statistics if the above parameter is on.

· Consider a example of table that is having lot of inserts and rows are increasing. Real-time statistics keep track of the increasing row count as rows are being inserted. If the optimizer performs a hard parse of a new query, then the optimizer can use the real-time statistics to obtain a more accurate cost estimate.

· DBA_TAB_COL_STATISTICS and DBA_TAB_STATISITICS has columns NOTES tell real time statistics have been used. STATS_ON_CONVENTIONAL_DML

SELECT NVL(PARTITION_NAME, 'GLOBAL') PARTITION_NAME, NUM_ROWS, BLOCKS, NOTES

FROM   USER_TAB_STATISTICS

WHERE  TABLE_NAME = 'SALES'

ORDER BY 1, 4;

PARTITION_NAM   NUM_ROWS     BLOCKS NOTES

------------- ---------- ---------- -------------------------

GLOBAL           1837686       3315 STATS_ON_CONVENTIONAL_DML

1.3 插入大量數(shù)據(jù)并收集統(tǒng)計信息

CZH@czhpdb > set autot off
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > commit;
 
CZH@czhpdb > CREATE INDEX IDX_TEST_FFS ON TEST_FFS(EMPLOYEE_ID);
 
CZH@czhpdb > EXEC DBMS_STATS.GATHER_TABLE_STATS(user,’TEST_FFS’,cascade=>true);

1.4 使用Hint /+ gather_plan_statistics /獲取sql真實(shí)執(zhí)行計劃

# sqlplus中set autotrace與explain plan for都是CBO預(yù)估出來的執(zhí)行計劃，可能與真實(shí)執(zhí)行的并不相同，我們使用下面hint獲取真實(shí)執(zhí)行計劃。

CZH@czhpdb > SELECT /*+ gather_plan_statistics */ salary from test_ffs where employee_id < 100;
 
no rows selected
 
真實(shí)執(zhí)行計劃：
 
SYS@orcl2 > select * from table(dbms_xplan.display_cursor('c9qg9su5khysd',null,'allstats last'));
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  c9qg9su5khysd, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics */ salary from test_ffs where
employee_id < 100
 
Plan hash value: 296244252
 
----------------------------------------------------------------------------------------
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          |      1 |        |      0 |00:00:00.01 |    2288 |
|*  1 |  TABLE ACCESS FULL| TEST_FFS |      1 |      1 |      0 |00:00:00.01 |    2288 |
----------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("EMPLOYEE_ID"<100)

# 可以看到由于表中數(shù)據(jù)沒有employee_id < 100，我們認(rèn)為明顯走索引的sql并未選擇索引，那肯定是由于某些原因，cbo認(rèn)為走索引并不是最優(yōu)執(zhí)行路徑，我們就利用10053獲取為什么cbo認(rèn)為全表掃描cost更低。

SYS@orcl2 > alter system flush shared_pool;
 
System altered.

#如果不清空shared_pool或者使游標(biāo)失效，軟解析開啟10053事件，不會生成trace文件。

CZH@czhpdb > ALTER SESSION SET EVENTS '10053 trace name context forever,level 1';
 
Session altered.
 
CZH@czhpdb > SELECT /*+ gather_plan_statistics */ salary from test_ffs where employee_id < 100;
 
no rows selected
 
CZH@czhpdb > ALTER SESSION SET EVENTS '10053 trace name context off';
 
Session altered.

19c 10053：

# 可以從下面10053看到DK(distinct key),CLUF(clustering factor),IX_SEL,下一步將根據(jù)幾個參數(shù)計算為何CBO認(rèn)為走索引cost會高于全表掃描。

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: TEST_FFS  Alias: TEST_FFSonline table stats for conventional DML (block count: 2263 row count: 219029)   used on (TEST_FFS) block count: 5 -> 2263, row count: 107 -> 219136
  #Rows: 219136  SSZ: 0  LGR: 0  #Blks:  2263  AvgRowLen:  69.00  NEB: 0  ChainCnt:  0.00  ScanRate:  0.00  SPC: 0  RFL: 0  RNF: 0  CBK: 0  CHR: 0  KQDFLG: 193
  #IMCUs: 0  IMCRowCnt: 0  IMCJournalRowCnt: 0  #IMCBlocks: 0  IMCQuotient: 0.000000
Index Stats::
  Index: IDX_TEST_FFS  Col#: 1
  LVLS: 1  #LB: 458  #DK: 107  LB/K: 4.00  DB/K: 1524.00  CLUF: 163174.00  NRW: 219136.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 8192 BSZ: 1
  KKEISFLG: 1 
try to generate single-table filter predicates from ORs for query block SEL$1 (#0)
finally: "TEST_FFS"."EMPLOYEE_ID"<100
 
=======================================
SPD: BEGIN context at query block level
=======================================
Query Block SEL$1 (#0)
Return code in qosdSetupDirCtx4QB: NOCTX
=====================================
SPD: END context at query block level
=====================================
Access path analysis for TEST_FFS
***************************************
SINGLE TABLE ACCESS PATH 
  Single Table Cardinality Estimation for TEST_FFS[TEST_FFS] 
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
 
 kkecdn: Single Table Predicate:"TEST_FFS"."EMPLOYEE_ID"<100
online column stats for conventional DML used on (TEST_FFS.EMPLOYEE_ID) min: 100.00 -> 100.00, max: 206.00 -> 206.00, nnl: 0 -> 0, acl: 4 -> 0 
  Column (#1): EMPLOYEE_ID(NUMBER)
    AvgLen: 22 NDV: 107 Nulls: 0 Density: 0.009346 Min: 100.000000 Max: 206.000000
  Using density: 0.009346 of col #1 as selectivity of unpopular value pred
  Table: TEST_FFS  Alias: TEST_FFS
    Card: Original: 219136.000000  Rounded: 2048  Computed: 2048.000000  Non Adjusted: 2048.000000
  Scan IO  Cost (Disk) =   615.000000
  Scan CPU Cost (Disk) =   49272938.720000
  Cost of predicates:
    io = NOCOST, cpu = 50.000000, sel = 0.009346 flag = 2048  ("TEST_FFS"."EMPLOYEE_ID"<100)
  Total Scan IO  Cost  =   615.000000 (scan (Disk))
                         + 0.000000 (io filter eval) (= 0.000000 (per row) * 219136.000000 (#rows))
                       =   615.000000
  Total Scan CPU  Cost =   49272938.720000 (scan (Disk))
                         + 10956800.000000 (cpu filter eval) (= 50.000000 (per row) * 219136.000000 (#rows))
                       =   60229738.720000
  Access Path: TableScan
    Cost:  621.167026  Resp: 621.167026  Degree: 0
      Cost_io: 615.000000  Cost_cpu: 60229739
      Resp_io: 615.000000  Resp_cpu: 60229739
 ****** Costing Index IDX_TEST_FFS
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER
  Using density: 0.009346 of col #1 as selectivity of unpopular value pred
  Access Path: index (RangeScan)
    Index: IDX_TEST_FFS
    resc_io: 1531.000000  resc_cpu: 11906445
    ix_sel: 0.009346  ix_sel_with_filters: 0.009346 
    Cost: 1532.219121  Resp: 1532.219121  Degree: 1
  Best:: AccessPath: TableScan
         Cost: 621.167026  Degree: 1  Resp: 621.167026  Card: 2048.000000  Bytes: 0.000000
 
online column stats for conventional DML used on (TEST_FFS.SALARY) min: 2100.00 -> 2100.00, max: 24000.00 -> 24000.00, nnl: 0 -> 0, acl: 4 -> 0 
***************************************

2.調(diào)整cluster factor

2.1 cluster factor聚簇因子說明

cluster factor表示索引順序與表存儲數(shù)據(jù)一致性，順序掃描索引時，如果索引鍵值掃描到鍵值對應(yīng)的表數(shù)據(jù)行對應(yīng)的數(shù)據(jù)塊發(fā)生變化時，則cluster factor加1，所以cluster factor最低為表數(shù)據(jù)塊，最大為表數(shù)據(jù)行，與表存儲順序高度相關(guān)，如果表是按照順序插入，則cluster factor較低，如果表數(shù)據(jù)為無序插入，則cluster factor較高，這就是為什么同樣表數(shù)據(jù)情況下，執(zhí)行計劃會有時候有差別的原因。

索引掃描成本公式：

INDEX ACCESS I/O COST=BLEVEL+CEIL(#LEAF_BLOCKS*IX_SEL)

TABLE_ACCESS I/O COST=CEIL(CLUSTERING_FACTOR*IX_SEL_WITH_FILTERS)

IX_SEL與IX_SEL_WITH_FILTERS為索引選擇率與索引帶謂詞選擇率，一般為1/(DISTINCT KEY)值，本例中走全表掃描時，IX_SEL=1/107=0.009345，則計算走索引成本為：

ACCESS INDEX COST=INDEX ACCESS I/O COST + TABLE ACCESS I/O COST=2+CEIL(458*0.009345)+CEIL(163174*0.009345)=1540

近似等于CBO預(yù)計出來的1532，是高于全表掃的COST 615的，所以選擇走了全表掃描。

2.2 調(diào)整cluster factor

重建表，order by排序，降低cluster factor

CZH@czhpdb > create table test_ffs_03 as select * from test_ffs_02 order by employee_id;
 
Table created.
 
CZH@czhpdb > create index idx_test_ffs_03 on test_ffs_03(employee_id);
 
Index created.
 
CZH@czhpdb > select clustering_factor,index_name from user_indexes where index_name='IDX_TEST_FFS_03';
 
                       CLUSTERING_FACTOR INDEX_NAME
---------------------------------------- --------------------
                                    1128 IDX_TEST_FFS_03

# 可以看到cluster factor明顯降低。

CZH@czhpdb > select /*+ gather_plan_statistics */ salary from test_ffs_03 where employee_id < 100;
 
no rows selected
 
SYS@orcl2 > select * from table(dbms_xplan.display_cursor('8fpk2b8vzn5y2',null,'allstats last'));
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  8fpk2b8vzn5y2, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ salary from test_ffs_03 where
employee_id < 100
 
Plan hash value: 704625359
 
--------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                 |      1 |        |      0 |00:00:00.01 |       2 |      1 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TEST_FFS_03     |      1 |   1024 |      0 |00:00:00.01 |       2 |      1 |
|*  2 |   INDEX RANGE SCAN                  | IDX_TEST_FFS_03 |      1 |   1024 |      0 |00:00:00.01 |       2 |      1 |
--------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("EMPLOYEE_ID"<100)

感謝各位的閱讀，以上就是“數(shù)據(jù)庫中cluster factor對執(zhí)行計劃的影響是什么”的內(nèi)容了，經(jīng)過本文的學(xué)習(xí)后，相信大家對數(shù)據(jù)庫中cluster factor對執(zhí)行計劃的影響是什么這一問題有了更深刻的體會，具體使用情況還需要大家實(shí)踐驗證。這里是億速云，小編將為大家推送更多相關(guān)知識點(diǎn)的文章，歡迎關(guān)注！

向AI問一下細(xì)節(jié)