溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊(cè)×
其他方式登錄
點(diǎn)擊 登錄注冊(cè) 即表示同意《億速云用戶服務(wù)條款》

海量數(shù)據(jù)遷移之通過(guò)rowid切分大表

發(fā)布時(shí)間:2020-08-12 14:40:38 來(lái)源:ITPUB博客 閱讀:113 作者:不一樣的天空w 欄目:關(guān)系型數(shù)據(jù)庫(kù)

在之前的章節(jié)中,討論過(guò)了通過(guò) 分區(qū)+并行等方式來(lái)進(jìn)行超大的表的切分,通過(guò)這種方式能夠極大的提高數(shù)據(jù)的平均分布,但是不是最完美的。
比如在數(shù)據(jù)量再提高幾個(gè)層次,我們假設(shè)這個(gè)表目前有1T的大小。有10個(gè)分區(qū),最大的分區(qū)有400G,那么如果我們想盡可能的平均的導(dǎo)出數(shù)據(jù),使用并行就不一定能夠那么奏效了。
比方說(shuō)我們要求每個(gè)dump文件控制在200M總有,那樣的話400G的分區(qū)就需要800個(gè)并行才能完成,在實(shí)際的數(shù)據(jù)庫(kù)維護(hù)中,我們知道默認(rèn)的并行數(shù)只有64個(gè),提高幾倍,也不可能超過(guò)800
所以在數(shù)據(jù)量極大的情況下,如果資源緊張,可能生成的dump就會(huì)比較大。

我們考慮使用rowid來(lái)滿足我們的需求。
我們可以根據(jù)需要來(lái)指定需要生成幾個(gè)dump文件。比如表subscriber有600M,那么如果按照200M為一個(gè)單位,我們需要生成3個(gè)dump文件。
如果想數(shù)據(jù)足夠平均,就需要在rowid上做點(diǎn)功夫。
我們先設(shè)定一個(gè)參數(shù)文件,如下的格式。
可以看到表memo數(shù)據(jù)量極大,按照200M一個(gè)單位,最大的分區(qū)(P9_A3000_E5)需要800個(gè)并行。
表ICE_AGREEMENT比較小,不是分區(qū)表,我們以x來(lái)臨時(shí)作為分區(qū)表的代名,在處理的時(shí)候可以方便的甄別

MEMO                                 P9_A3000_E0                           156
MEMO                                 P9_A3000_E1                           170
MEMO                                 P9_A3000_E2                           190
MEMO                                 P9_A3000_E3                           200
MEMO                                 P9_A3000_E4                           180
MEMO                                 P9_A3000_E5                           800
MEMO                                 PMAXVALUE_AMAXVALUE_EMAXVALUE         1
ICE_AGREEMENT                        x                                    36
CRIBER_HISTORY                       x                                    11

可以使用如下的腳本來(lái)完成rowid的切分。

#### $1 dba conn details
#### $2 table owner
#### $3 table_name
#### $4 subobject_name
#### $5 parallel_no
function normal_split
{
sqlplus -s $1 <<1eof
 set linesize 200
set pages 0
set feedback off
spool list/rowid_range_$3_x.lst
select rownum || ', ' ||' rowid between '||
chr(39)||dbms_rowid.rowid_create( 1, DOI, lo_fno, lo_block, 0 ) ||chr(39)|| ' and  ' ||
chr(39)||dbms_rowid.rowid_create( 1, DOI, hi_fno, hi_block, 1000000 )||chr(39) data
from (
SELECT DISTINCT DOI, grp,
first_value(relative_fno) over (partition BY DOI,grp order by relative_fno, block_id rows BETWEEN unbounded preceding AND unbounded following) lo_fno,
first_value(block_id ) over (partition BY DOI,grp order by relative_fno, block_id rows BETWEEN unbounded preceding AND unbounded following) lo_block,
last_value(relative_fno) over (partition BY DOI,grp order by relative_fno, block_id rows BETWEEN unbounded preceding AND unbounded following) hi_fno,
last_value(block_id+blocks-1) over (partition BY DOI,grp order by relative_fno, block_id rows BETWEEN unbounded preceding AND unbounded following) hi_block,
SUM(blocks) over (partition BY DOI,grp) sum_blocks,SUBOBJECT_NAME
     FROM(
SELECT   obj.OBJECT_ID,
                 obj.SUBOBJECT_NAME,
                 obj.DATA_OBJECT_ID     as DOI,
                 ext.relative_fno,
         ext.block_id,
         ( SUM(blocks) over () ) SUM,
         (SUM(blocks) over (ORDER BY DATA_OBJECT_ID,relative_fno, block_id)-0.01 ) sum_fno ,
         TRUNC( (SUM(blocks) over (ORDER BY DATA_OBJECT_ID,relative_fno, block_id)-0.01) / (SUM(blocks) over ()/ $5 ) ) grp,
         ext.blocks
FROM     dba_extents ext, dba_objects obj
WHERE    ext.segment_name = UPPER('$3')
AND      ext.owner        = UPPER('$2')
AND      obj.owner       =  ext.owner
AND      obj.object_name     = ext.segment_name
AND      obj.DATA_OBJECT_ID IS NOT NULL
ORDER BY DATA_OBJECT_ID, relative_fno, block_id
) order by  DOI,grp
);
spool off;
EOF
}

function partition_split
{
sqlplus -s $1 <<1eof
 set linesize 200
set pages 0
set feedback off
spool list/rowid_range_$3_$4.lst
select rownum || ', ' ||' rowid between '||
chr(39)||dbms_rowid.rowid_create( 1, DOI, lo_fno, lo_block, 0 ) ||chr(39)|| ' and  ' ||
chr(39)||dbms_rowid.rowid_create( 1, DOI, hi_fno, hi_block, 1000000 )||chr(39) data
from (
SELECT DISTINCT DOI, grp,
first_value(relative_fno) over (partition BY DOI,grp order by relative_fno, block_id rows BETWEEN unbounded preceding AND unbounded following) lo_fno,
first_value(block_id ) over (partition BY DOI,grp order by relative_fno, block_id rows BETWEEN unbounded preceding AND unbounded following) lo_block,
last_value(relative_fno) over (partition BY DOI,grp order by relative_fno, block_id rows BETWEEN unbounded preceding AND unbounded following) hi_fno,
last_value(block_id+blocks-1) over (partition BY DOI,grp order by relative_fno, block_id rows BETWEEN unbounded preceding AND unbounded following) hi_block,
SUM(blocks) over (partition BY DOI,grp) sum_blocks,SUBOBJECT_NAME
     FROM(
SELECT   obj.OBJECT_ID,
                 obj.SUBOBJECT_NAME,
                 obj.DATA_OBJECT_ID     as DOI,
                 ext.relative_fno,
         ext.block_id,
         ( SUM(blocks) over () ) SUM,
         (SUM(blocks) over (ORDER BY DATA_OBJECT_ID,relative_fno, block_id)-0.01 ) sum_fno ,
         TRUNC( (SUM(blocks) over (ORDER BY DATA_OBJECT_ID,relative_fno, block_id)-0.01) / (SUM(blocks) over ()/ $5 ) ) grp,
         ext.blocks
FROM     dba_extents ext, dba_objects obj
WHERE    ext.segment_name = UPPER('$3')
AND      ext.owner        = UPPER('$2')
AND      obj.owner       =  ext.owner
AND      obj.object_name     = ext.segment_name
AND      obj.DATA_OBJECT_ID IS NOT NULL
AND      obj.subobject_name=UPPER('$4')
ORDER BY DATA_OBJECT_ID, relative_fno, block_id
) order by  DOI,grp
);
spool off
EOF
}

sub_partition_name=$4

if [[ $sub_partition_name = 'x' ]]
then
normal_split $1 $2 $3 x $5
else
partition_split $1 $2 $3 $4 $5
fi

腳本比較長(zhǎng),需要的參數(shù)有5個(gè),因?yàn)樵L問(wèn)dba_extents,dba_objects需要一定的權(quán)限,可以使用dba權(quán)限的賬號(hào)即可。
第2個(gè)參數(shù)是表的owner,第3個(gè)參數(shù)是表名,第4個(gè)參數(shù)是分區(qū)表名(如果是分區(qū)表就是分區(qū)表名,如果不是就填x),第5個(gè)參數(shù)就是期望使用的并行度,能夠在一定程度上加快速度
簡(jiǎn)單演示一下,可以通過(guò)下面的方式來(lái)運(yùn)行腳本,我們指定生成10個(gè)dump這個(gè)表不是分區(qū)表。

ksh gen_rowid.sh n1/n1 prdowner subscriber_history x 10
1,  where  rowid between 'AAB4VPAAJAAD7qAAAA' and  'AAB4VPAAJAAD/R/EJA'
2,  where  rowid between 'AAB4VPAAJAAD/SAAAA' and  'AAB4VPAAKAABV5/EJA'
3,  where  rowid between 'AAB4VPAAKAABV6AAAA' and  'AAB4VPAALAAE/p/EJA'
4,  where  rowid between 'AAB4VPAALAAE/qAAAA' and  'AAB4VPAAMAAFFh/EJA'
5,  where  rowid between 'AAB4VPAAMAAFFiAAAA' and  'AAB4VPAAyAACuh/EJA'
6,  where  rowid between 'AAB4VPAAyAACuiAAAA' and  'AAB4VPAAzAACe5/EJA'
7,  where  rowid between 'AAB4VPAAzAACe6AAAA' and  'AAB4VPAA1AACZR/EJA'
8,  where  rowid between 'AAB4VPAA1AACZSAAAA' and  'AAB4VPAA2AACWR/EJA'
9,  where  rowid between 'AAB4VPAA2AACWSAAAA' and  'AAB4VPAA4AACP5/EJA'
10,  where  rowid between 'AAB4VPAA4AACQCAAAA' and  'AAB4VPAA5AACHx/EJA'
然后我們來(lái)看看數(shù)據(jù)是否足夠平均。
可以類似下面的方式驗(yàn)證,我們抽第1,2,10個(gè)。

SQL> select count(*)from subscriber_history  where  rowid between 'AAB4VPAAJAAD7qAAAA' and  'AAB4VPAAJAAD/R/EJA'
  2  ;

  COUNT(*)
----------
    328759

SQL> select count(*)from  subscriber_history   where  rowid between 'AAB4VPAAJAAD/SAAAA' and  'AAB4VPAAKAABV5/EJA'
  2  /

  COUNT(*)
----------
    318021

SQL> select count(*)from subscriber_history  where  rowid between 'AAB4VPAA4AACQCAAAA' and  'AAB4VPAA5AACHx/EJA';

  COUNT(*)
----------
    332638

可以看到數(shù)據(jù)還是很平均的,達(dá)到了我們的期望。


向AI問(wèn)一下細(xì)節(jié)

免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。

AI