您好,登錄后才能下訂單哦!
http://blog.itpub.net/17203031/viewspace-681311/
實(shí)際中,我們常常遇到數(shù)據(jù)列值傾斜的情況。就是說,整個(gè)列數(shù)據(jù)取值有限。但是大部分?jǐn)?shù)據(jù)值都集中在少數(shù)一兩個(gè)取值里,其他取值比例極少。比如:一個(gè)數(shù)據(jù)列值有“N”、“B”、“M”、“P”、“Q”幾個(gè)取值,其中55%數(shù)據(jù)行取值為“N”,40%數(shù)據(jù)行取值為“B”,剩下的取值分布在5%的數(shù)據(jù)行中。對(duì)于這種結(jié)構(gòu)的數(shù)據(jù)列加索引,是存在一些問題的。
首先,默認(rèn)數(shù)據(jù)庫是會(huì)為所有的列值(非空)建立索引結(jié)構(gòu)。也就意味著無論是高頻度取值,還是低頻度取值,都會(huì)在索引結(jié)構(gòu)的葉節(jié)點(diǎn)上出現(xiàn)。當(dāng)然,這樣的大部分葉節(jié)點(diǎn)都是這些重復(fù)值。
其次,在CBO(基于成本優(yōu)化器)的作用下,對(duì)高頻度取值的搜索一般都不會(huì)選擇索引作為搜索路徑,因?yàn)檫M(jìn)行全表掃描可能效率更高。我們?yōu)閿?shù)據(jù)列建立了索引,但高頻詞的查詢永遠(yuǎn)不會(huì)走到索引路徑(下面的實(shí)驗(yàn)會(huì)證明這一點(diǎn))。
最后,建立的索引空間和時(shí)間消耗比較大。建立的索引涵蓋所有取值,對(duì)海量數(shù)據(jù)表而言,占有的空間勢(shì)必較大。同時(shí),在進(jìn)行小頻度數(shù)據(jù)查詢的時(shí)候,雖然會(huì)去走索引路徑,但是引起的邏輯物理讀也是有一些損耗。
下面引入一個(gè)解決方法
思路:既然高頻度值在查詢的時(shí)候不會(huì)走到索引路徑,可以考慮將其剔出構(gòu)建索引的過程,只為那些低頻度數(shù)據(jù)值建立索引結(jié)構(gòu)。這樣,建立的索引樹結(jié)構(gòu)相對(duì)較小,而且索引查詢的效率也能提升。
具體的方法是使用decode函數(shù)。decode(a,b,c,d,e…f)含義:如果a=b,則返回c,等于d,返回e,最后沒有匹配的情況下,返回f。針對(duì)上面的例子,可以使用decode(列名,‘N’, null,‘B’,null,列名),含義是,如果該列取值為N或者B,直接設(shè)置為null,否則才返回列值,并且以此建立函數(shù)索引。
這樣做借助了Oracle兩個(gè)功能:1、對(duì)null值不生成索引;2、函數(shù)索引;
下面通過實(shí)驗(yàn)來證明該方法:
1、構(gòu)建實(shí)驗(yàn)測(cè)試環(huán)境
--創(chuàng)建測(cè)試數(shù)據(jù)表tb_wjq
SEIANG@seiang11g>create table tb_wjq as select * from dba_objects where owner
in ('SEIANG','PUBLIC','HR','SYSMAN','XDB','BI','SYS');
Table created.
--使用腳本插入大量數(shù)據(jù)
begin
for i in 1..8 loop
insert /*+ append */ into tb_wjq select * from tb_wjq;
commit;
end loop;
end;
/
SEIANG@seiang11g>select count(*) from tb_wjq;
COUNT(*)
----------
9804160
Elapsed: 00:00:01.54
--用于實(shí)驗(yàn)的數(shù)據(jù)量分布情況
SEIANG@seiang11g>select owner,count(*) from tb_wjq group by owner;
OWNER COUNT(*)
------------------------------ ----------
SEIANG 3072
PUBLIC 4352256
HR 4352
SYSMAN 454912
XDB 149760
BI 1024
SYS 4838784
7 rows selected.
Elapsed: 00:00:02.44
可以看到,九萬多條數(shù)據(jù),絕大部分?jǐn)?shù)據(jù)集中到了PUBLIC、SYSMAN、SYS上,其他數(shù)據(jù)取值頻數(shù)較小。數(shù)據(jù)傾斜趨勢(shì)明顯。
2、建索引
--分別對(duì)owner列建立常規(guī)、函數(shù)索引。
SEIANG@seiang11g>create index idx_tb_wjq_owner_normal on tb_wjq(owner);
Index created.
Elapsed: 00:00:24.72
SEIANG@seiang11g>create index func_idx_tb_wjq_owner on tb_wjq(decode
(owner,'PUBLIC',null,'SYSMAN',null,'SYS',null,owner));
Index created.
Elapsed: 00:00:12.34
索引func_idx_tb_wjq_owner將PUBLIC、SYSMAN、SYS值轉(zhuǎn)化為null,剔出了建立索引的過程。不僅可以從上面創(chuàng)建索引所用的時(shí)間可以看出,而且從下面的索引段信息看,兩個(gè)索引所占的空間差異比較大,也證明了這點(diǎn)。
SEIANG@seiang11g>select
owner,segment_name,segment_type,bytes/1024/1024,blocks,extents from
dba_segments where segment_name='IDX_TB_WJQ_OWNER_NORMAL';
OWNER SEGMENT_NAME SEGMENT_TYPE BYTES/1024/1024 BLOCKS
EXTENTS
--------------- ------------------------------ ------------------
--------------- ---------- ----------
SEIANG IDX_TB_WJQ_OWNER_NORMAL INDEX 184 23552 94
Elapsed: 00:00:00.01
SEIANG@seiang11g>select
owner,segment_name,segment_type,bytes/1024/1024,blocks,extents from
dba_segments where segment_name='FUNC_IDX_TB_WJQ_OWNER';
OWNER SEGMENT_NAME SEGMENT_TYPE BYTES/1024/1024 BLOCKS
EXTENTS
--------------- ------------------------------ ------------------
--------------- ---------- ----------
SEIANG
FUNC_IDX_TB_WJQ_OWNER INDEX 3 384 18
由上可以看出,同樣是對(duì)一個(gè)數(shù)據(jù)列加索引。普通索引類型IDX_TB_WJQ_OWNER_NORMAL占據(jù)94個(gè)區(qū),23552個(gè)數(shù)據(jù)塊,空間約占184M。而函數(shù)索引FUNC_IDX_TB_WJQ_OWNER的空間只用了初始分配的18個(gè)區(qū),384個(gè)數(shù)據(jù)塊,空間約占3M。由此,空間優(yōu)勢(shì)直觀體現(xiàn)!
--收集統(tǒng)計(jì)數(shù)據(jù),由于是實(shí)驗(yàn)性質(zhì),而且數(shù)據(jù)量大,采用高采樣率收集統(tǒng)計(jì)信息。
SEIANG@seiang11g>exec dbms_stats.gather_table_stats('SEIANG', 'TB_WJQ',
cascade => true, estimate_percent => 100,method_opt => 'for all
indexed columns');
PL/SQL procedure successfully completed.
Elapsed: 00:00:49.67
3、檢索效率分析
針對(duì)owner數(shù)據(jù)量149760的XDB取值進(jìn)行分析。
--直接索引搜索:
SEIANG@seiang11g>select * from tb_wjq where owner='XDB';
149760 rows selected.
Elapsed: 00:00:01.89
Execution Plan
----------------------------------------------------------
Plan hash value: 3735191644
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 149K|
13M| 4708 (1)| 00:00:57 |
| 1 |
TABLE ACCESS BY INDEX ROWID| TB_WJQ | 149K|
13M| 4708 (1)| 00:00:57 |
|* 2 | INDEX RANGE SCAN | IDX_TB_WJQ_OWNER_NORMAL | 149K| |
348 (1)| 00:00:05 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"='XDB')
Statistics
----------------------------------------------------------
8 recursive calls
0 db block gets
26000
consistent gets
6349 physical reads
0 redo size
16482673 bytes sent via SQL*Net to client
110336 bytes received via SQL*Net from client
9985 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
149760 rows processed
發(fā)現(xiàn)采用BI作為搜索值時(shí),是進(jìn)行了索引搜索。下面是用函數(shù)索引搜索進(jìn)行對(duì)比。
SEIANG@seiang11g>select * from tb_wjq where decode
(owner,'PUBLIC',null,'SYSMAN',null,'SYS',null,owner)='XDB';
149760 rows selected.
Elapsed: 00:00:01.54
Execution Plan
----------------------------------------------------------
Plan hash value: 3652333940
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 149K|
13M| 7091 (1)| 00:01:26 |
| 1 |
TABLE ACCESS BY INDEX ROWID| TB_WJQ | 149K|
13M| 7091 (1)| 00:01:26 |
|* 2 | INDEX
RANGE SCAN |
FUNC_IDX_TB_WJQ_OWNER |
149K| | 316
(1)| 00:00:04 |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 -
access(DECODE("OWNER",'PUBLIC',NULL,'SYSMAN',NULL,'SYS',NULL,"OWNER")='XDB')
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
25998
consistent gets
0 physical reads
0 redo size
9017261 bytes sent via SQL*Net to client
110336 bytes received via SQL*Net from client
9985 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
149760 rows processed
對(duì)比后,我們可以發(fā)現(xiàn),使用函數(shù)索引的方法,在執(zhí)行時(shí)間、物理邏輯讀、CPU使用上有一定差異。
|
普通索引 |
函數(shù)索引 |
執(zhí)行時(shí)間 |
00: 00: 01.89 |
00: 00: 01.54 |
CPU使用 |
4708 |
7091 |
consistent gets |
26000 |
25998 |
physical reads |
6349 |
0 |
結(jié)論:使用函數(shù)索引處理偏值方法,在一定長度上優(yōu)化查詢效率和索引結(jié)構(gòu)。上表的數(shù)據(jù)表明,會(huì)使邏輯物理讀的消耗很大程度的減少(索引結(jié)構(gòu)簡化),同時(shí)連帶影響執(zhí)行時(shí)間的縮小。因?yàn)槭褂煤瘮?shù)要進(jìn)行計(jì)算,CPU使用率相對(duì)較高,在可以接受的范圍內(nèi)。
但是,這種方法是存在一些限制的,應(yīng)用前一定要仔細(xì)規(guī)劃。
首先,數(shù)據(jù)表數(shù)據(jù)要保證較大。因?yàn)楫吘购瘮?shù)索引的建立和搜索較普通索引消耗大,如果數(shù)據(jù)表小,帶來的優(yōu)化程度不能彌補(bǔ)消耗的成本,結(jié)果可能得不償失。筆者進(jìn)行的一系列實(shí)驗(yàn)中,也發(fā)現(xiàn)在數(shù)據(jù)量中等偏小時(shí),這種性能優(yōu)勢(shì)不能凸顯。
其次,列值傾斜趨勢(shì)明顯。通過開篇的討論我們不難發(fā)現(xiàn),列值傾斜的程度越高,使用函數(shù)索引剔出的數(shù)據(jù)量也就越大,生成的索引樹結(jié)構(gòu)也就越小越優(yōu)化。這一點(diǎn)是本方法的核心!
最后,使用函數(shù)索引搜索時(shí),搜索的取值頻數(shù)越高,優(yōu)化效果越好。在本例中,取值XDB的列有149760行,可以看出明顯的性能優(yōu)化。但是當(dāng)我們選擇值有1024條數(shù)據(jù)的BI值時(shí),這種優(yōu)化趨勢(shì)可以看到,但是明顯程度降低(實(shí)驗(yàn)結(jié)果如下所示)。這里的原因可能是數(shù)據(jù)量小時(shí),兩種方法邏輯物理讀的差異度縮小。
--直接索引
SEIANG@seiang11g>select * from tb_wjq where owner='BI';
1024 rows selected.
Elapsed: 00:00:00.03
Execution Plan
----------------------------------------------------------
Plan hash value: 3735191644
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1024 |
98K| 35 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TB_WJQ | 1024 |
98K| 35 (0)| 00:00:01 |
|* 2 | INDEX
RANGE SCAN |
IDX_TB_WJQ_OWNER_NORMAL |
1024 | | 5
(0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"='BI')
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
280 consistent gets
151 physical reads
0 redo size
98579 bytes sent via SQL*Net to client
1271 bytes received via SQL*Net from client
70 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1024 rows processed
--函數(shù)索引
SEIANG@seiang11g>select * from tb_wjq where decode
(owner,'PUBLIC',null,'SYSMAN',null,'SYS',null,owner)='BI';
1024 rows selected.
Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
Plan hash value: 3652333940
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1024 |
98K| 50 (0)| 00:00:01 |
| 1 |
TABLE ACCESS BY INDEX ROWID| TB_WJQ | 1024 |
98K| 50 (0)| 00:00:01 |
|* 2 | INDEX
RANGE SCAN |
FUNC_IDX_TB_WJQ_OWNER | 1024
| | 3
(0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 -
access(DECODE("OWNER",'PUBLIC',NULL,'SYSMAN',NULL,'SYS',NULL,"OWNER")='BI')
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
279
consistent gets
3 physical reads
0 redo size
33969 bytes sent via SQL*Net to client
1271 bytes received via SQL*Net from client
70 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1024 rows processed
但是如果是對(duì)SYS、PUBLIC或SYSMAN進(jìn)行查詢時(shí),將會(huì)跳過所有的索引,直徑進(jìn)行全表掃描。
SEIANG@seiang11g>select * from tb_wjq where owner='SYS';
4838784 rows selected.
Elapsed: 00:00:45.85
Execution Plan
----------------------------------------------------------
Plan hash value: 1501781665
----------------------------------------------------------------------------
| Id | Operation | Name | Rows
| Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT |
| 4838K| 452M| 39893
(1)| 00:07:59 |
|* 1 | TABLE
ACCESS FULL| TB_WJQ |
4838K| 452M| 39893 (1)| 00:07:59 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("OWNER"='SYS')
Statistics
----------------------------------------------------------
8 recursive calls
0 db block gets
457638 consistent gets
139684 physical reads
0 redo size
255169095 bytes sent via SQL*Net to client
3548958 bytes received via SQL*Net from client
322587 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4838784 rows processed
作者:SEian.G(苦練七十二變,笑對(duì)八十一難)
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。