溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊(cè)×
其他方式登錄
點(diǎn)擊 登錄注冊(cè) 即表示同意《億速云用戶服務(wù)條款》

記一次數(shù)據(jù)庫(kù)事故-ORA-15038

發(fā)布時(shí)間:2020-08-10 22:38:03 來(lái)源:ITPUB博客 閱讀:346 作者:abstractcyj 欄目:關(guān)系型數(shù)據(jù)庫(kù)

        背景:客戶有兩套較大的數(shù)據(jù)庫(kù)(一套為10T(A數(shù)據(jù)庫(kù)), 一套為20T(B數(shù)據(jù)庫(kù)))想遷移存儲(chǔ),版本均為12.2.0.1, 采用了ASM管理磁盤(pán),單實(shí)例。操作系統(tǒng)分別為linux 6, linux 7??蛻粝雽數(shù)據(jù)庫(kù)的存儲(chǔ)遷移到較差的存儲(chǔ),作為只讀用途。騰空B數(shù)據(jù)庫(kù)的存儲(chǔ)之后,將B數(shù)據(jù)庫(kù)占用的性能較好的存儲(chǔ)分配給A數(shù)據(jù)庫(kù),再將A數(shù)據(jù)庫(kù)的數(shù)據(jù)遷移到新分配的性能較好的存儲(chǔ)。

A數(shù)據(jù)庫(kù)的存儲(chǔ)為較好存儲(chǔ)與較差存儲(chǔ)的混搭,需要遷移兩次(第一次遷移騰出存儲(chǔ)空間,第二次遷移到新回收的存儲(chǔ))。這里總共就有3次遷移。因?yàn)閿?shù)據(jù)庫(kù)較大與備份所需磁盤(pán)空間的問(wèn)題,兩個(gè)數(shù)據(jù)庫(kù)均沒(méi)有備份。

       4月17日,客戶告訴我,領(lǐng)導(dǎo)想要下周二能將兩個(gè)庫(kù)遷移好(我將將要寫(xiě)遷移方案)。這樣就完全沒(méi)有時(shí)間寫(xiě)方案。同時(shí)兩個(gè)數(shù)據(jù)庫(kù)需要同時(shí)遷移。于是我們負(fù)責(zé)存儲(chǔ)的同事趕過(guò)來(lái),給兩個(gè)主機(jī)劃好了存儲(chǔ)。我做好了多路徑映射并創(chuàng)建了ASM磁盤(pán)組。同時(shí)停庫(kù),開(kāi)始做遷移。

      4月18日晚上,A數(shù)據(jù)庫(kù)數(shù)據(jù)文件拷貝完成。4月19日調(diào)整OCR,日志文件,臨時(shí)文件等。在此期間,B數(shù)據(jù)庫(kù)通過(guò)RMAN做COPY已經(jīng)完成了接近16T左右, 還剩余大致4T數(shù)據(jù)文件。A數(shù)據(jù)庫(kù)調(diào)整完成后,為了驗(yàn)證修改正確與否,重啟了操作系統(tǒng)。

操作系統(tǒng)起來(lái)之后,發(fā)現(xiàn)數(shù)據(jù)庫(kù)起不來(lái),同時(shí)新掛載的ASM磁盤(pán)組一直是offline。嘗試手動(dòng)mount, 報(bào)錯(cuò): 

    

ORA-15032: not all alterations performed

ORA-15038: disk '/dev/mapper/newdisk01' mismatch on 'Time Stamp' with target disk group [2434962992] [2434985720]

ORA-15038: disk '/dev/mapper/newdisk02' mismatch on 'Time Stamp' with target disk group [2434962992] [2434985720]

ORA-15038: disk '/dev/mapper/newdisk03' mismatch on 'Time Stamp' with target disk group [2434962992] [2434985720]

ORA-15038: disk '/dev/mapper/newdisk04' mismatch on 'Time Stamp' with target disk group [2434962992] [2434985720]

ORA-15038: di

      通過(guò)谷歌,MOS搜索, 可能是磁盤(pán)給其他ASM實(shí)例使用過(guò),或者是多路徑配置的問(wèn)題。嘗試修改asm_diskstring,嘗試手動(dòng)mount, 這次錯(cuò)誤變了:

     

WARNING: Disk Group NEWDATA containing spfile for this instance is not mounted

ORA-15032: not all alterations performed

ORA-15038: disk '/dev/mapper/3600c0ff0003af211de24985e01000000' mismatch on 'Time Stamp' with target disk group [2434985720] [2434962992]

      可以看到提示的磁盤(pán)變了。這個(gè)盤(pán)我并未加入ASM磁盤(pán)組,為什么也提示這個(gè)。我推測(cè)可能與原asm_diskstring='/dev/mapper/*'有關(guān)。雖然盤(pán)并未加入磁盤(pán)組,ASM實(shí)例啟動(dòng)的時(shí)候,還是去掃描了磁盤(pán)頭。通過(guò)kfed, amdu等工具掃描ASM磁盤(pán),都沒(méi)有問(wèn)題。同時(shí), amdu可以讀出數(shù)據(jù)文件。這樣就沒(méi)有丟失數(shù)據(jù)的風(fēng)險(xiǎn)(這里已經(jīng)被嚇尿了)。

    根據(jù)MOS文檔, Doc ID 2643105.1, 也有類似的癥狀,一個(gè)沒(méi)有加入到ASM磁盤(pán)組的磁盤(pán),阻止了磁盤(pán)組的正常啟動(dòng),同時(shí)也是發(fā)生在主機(jī)重啟之后:
    

SQL> alter diskgroup ACFS mount;
alter diskgroup ACFS mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15017: diskgroup "ACFS" cannot be mounted
ORA-15040: diskgroup is incomplete
ORA-15038: disk '/dev/<asmdevices>/asm2' mismatch on 'Time Stamp' with target
disk group [2026420570] [2026683750]
ORA-15038: disk '/dev/<asmdevices>/asm1' mismatch on 'Time Stamp' with target
disk group [2026420570] [2026683750]
ORA-15038: disk '/dev/<asmdevices>/asm0' mismatch on 'Time Stamp' with target

disk group [2026420570] [2026683750]

A Disk at OS level has this diskgroup information in its header incorrectly:

$GI_HOME/bin:>  kfed read <OS ASM device path>

kfdhdb.dskname: ACFS_4 ; 0x028: length=11
kfdhdb.grpname: ACFS ; 0x048: length=6                                          >>>>>>>>>>>>>>>>>>>Here
kfdhdb.fgname: ACFS_4 ; 0x068: length=11


Disks at OS level should not be updated manually after they have been added to a diskgroup. This will cause diskgroup to fail to mount. In this case, the bad disk was only on 1 node in the cluster. The Diskgroup mounted successfully on the other nodes.

ASM tried to mount the diskgroup with the bad OS disk:

SQL> ALTER DISKGROUP ACFS MOUNT /* asm agent *//* {1:52280:1676} */
NOTE: cache registered group ACFS number=1 incarn=0x47016be4
NOTE: cache began mount (not first) of group ACFS number=1 incarn=0x47016be4
NOTE: Assigning number (1,4) to disk (/dev/<asmdevices>/asm04)                   >>>>>>>>>>>>>>>>>>Here



However, only 3 disks at OS level are associated with this diskgroup in the ASM:

0 30 CLOSED MEMBER ONLINE NORMAL 245760 0 0 /dev/<asmdevices>/asm0
0 31 CLOSED MEMBER ONLINE NORMAL 245760 0 0 /dev/<asmdevices>/asm1
0 32 CLOSED MEMBER ONLINE NORMAL 245760 0 0 /dev/<asmdevices>/asm2

The device, /dev/<asmdevices>/asm04, is not listed.

    MOS給出的方案是dd掉問(wèn)題磁盤(pán)的磁盤(pán)頭。確實(shí),我在dd磁盤(pán)頭之后,手動(dòng)mount磁盤(pán)組成功。但是悲劇的是,B數(shù)據(jù)庫(kù)的RMAN Copy腳本出錯(cuò)了。

   

RMAN-03009: 位于 04/19/2020 19:13:28 的 c1 通道上的 backup 命令失敗

ORA-19502: 文件 "+NEWDATA/TESTDB/DATAFILE/tbs_pdata.697.1038107047", 塊編號(hào) 165696 (塊大小=16384) 上出現(xiàn)寫(xiě)入錯(cuò)誤

ORA-15079: ASM 文件已關(guān)閉

ORA-15079: ASM 文件已關(guān)閉


ASM日志:

NOTE: AMDU dump of disk group NEWDATA initiated at /u01/app/grid/diag/asm/+asm/+ASM/trace

Errors in file /u01/app/grid/diag/asm/+asm/+ASM/trace/+ASM_arb0_29651.trc  (incident=14697):

ORA-15335: ASM metadata corruption detected in disk group 'NEWDATA'

ORA-15130: diskgroup "NEWDATA" is being dismounted

ORA-15066: offlining disk "NEWDATA_0011" in group "NEWDATA" may result in a data loss

ORA-15196: invalid ASM block header [kfc.c:29757] [endian_kfbh] [2147483659] [1] [0 != 1]

ORA-15196: invalid ASM block header [kfc.c:29757] [endian_kfbh] [2147483659] [1] [0 != 1]

在B數(shù)據(jù)庫(kù)主機(jī)上確認(rèn),dd掉的那個(gè)磁盤(pán)正好是B數(shù)據(jù)庫(kù)新加入的那個(gè)磁盤(pán)組的一個(gè)成員。

無(wú)法,只得讓存儲(chǔ)工程師將這個(gè)盤(pán)從A數(shù)據(jù)庫(kù)主機(jī)的映射斷開(kāi),同時(shí)檢查有無(wú)其它磁盤(pán)映射了多個(gè)主機(jī)。

B庫(kù)的遷移,只得重新創(chuàng)建磁盤(pán)組重新開(kāi)始。

教訓(xùn):1. 這種很急,比較復(fù)雜的遷移,多半要出事。要學(xué)會(huì)拒絕。

           2. 存儲(chǔ)映射好之后,應(yīng)當(dāng)確認(rèn)兩個(gè)主機(jī)之間沒(méi)有映射同一個(gè)盤(pán)。

           3. 必須做好備份?;謴?fù)慢是一碼事,但是至少數(shù)據(jù)不會(huì)丟失,擔(dān)驚受怕的程度能少點(diǎn)

向AI問(wèn)一下細(xì)節(jié)

免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。

AI