溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點擊 登錄注冊 即表示同意《億速云用戶服務條款》

NBU備份錯誤的示例分析

發(fā)布時間:2021-11-17 11:14:35 來源:億速云 閱讀:295 作者:小新 欄目:云計算

這篇文章將為大家詳細講解有關NBU備份錯誤的示例分析,小編覺得挺實用的,因此分享給大家做個參考,希望大家閱讀完這篇文章后可以有所收獲。

在對系統(tǒng)進行例行檢查的時候,發(fā)現(xiàn)日常備份失敗。    

錯誤信息為:  

RMAN> backup incremental level 0 database;  

Starting backup at 10-MAR-08
using target database controlfile instead of recovery catalog
allocated channel: ORA_SBT_TAPE_1
channel ORA_SBT_TAPE_1: sid=120 devtype=SBT_TAPE
channel ORA_SBT_TAPE_1: VERITAS NetBackup for Oracle - Release 5.0GA (2003103006)
channel ORA_SBT_TAPE_1: starting incremental level 0 datafile backupset
channel ORA_SBT_TAPE_1: specifying datafile(s) in backupset
input datafile fno=00001 name=/dev/vx/rdsk/maindbdg/lv_main00
input datafile fno=00008 name=/opt/oracle/oradata/oradata/bjdb01/users01.dbf
input datafile fno=00039 name=/opt/oracle/oradata/oradata/bjdb01/xdb02.dbf
input datafile fno=00009 name=/opt/oracle/oradata/oradata/bjdb01/xdb01.dbf
input datafile fno=00003 name=/opt/oracle/oradata/oradata/bjdb01/cwmlite01.dbf
input datafile fno=00004 name=/opt/oracle/oradata/oradata/bjdb01/drsys01.dbf
input datafile fno=00006 name=/opt/oracle/oradata/oradata/bjdb01/odm01.dbf
input datafile fno=00007 name=/opt/oracle/oradata/oradata/bjdb01/tools01.dbf
channel ORA_SBT_TAPE_1: starting piece 1 at 10-MAR-08
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ORA_SBT_TAPE_1 channel at 03/10/2008 11:31:12
ORA-19506: failed to create sequential file, name="tpjatl1b_1_1", parms=""
ORA-27028: skgfqcre: sbtbackup returned error
ORA-19511: Error received from media manager layer, error text:
VxBSACreateObject: Failed with error:
Server Status: unable to allocate new media for backup, storage unit has none available
 

從這個錯誤信息上看似乎是空間不足造成的。不過雖然的備份錯誤信息變?yōu)椋?  

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ch00 channel at 03/10/2008 05:14:15
ORA-19502: write error on file "bk_26552_1_648968690", blockno 664577 (blocksize=512)
ORA-27030: skgfwrt: sbtwrite2 returned error
ORA-19511: Error received from media manager layer, error text:
VxBSASendData: Failed with error:
Server Status: Communication with the server has not been iniatated or the server status has not been retrieved from the server.
 

從這個錯誤上看,就不只是空間的問題了。  

通過圖形界面jnbSA,發(fā)現(xiàn)很多管理選項點擊后反應很慢,基本上出不來結果。于是采用bpadm從命令行方式進行查詢,從REPORT的PROBLEM中查詢到下面的信息:  

03/11/2008 01:45:04 backupcenter240 bpexpdate Could not build host list: client hostname could not be found
03/11/2008 02:13:34 backupcenter240 bjdb01 cannot write p_w_picpath to media id 000013, drive index 0, I/O
  錯誤
03/11/2008 02:13:48 backupcenter240 bjdb01 backup by oracle on client bjdb01 using policy oracle: media write error
03/11/2008 02:14:04 backupcenter240 bjdb01 backup of client bjdb01 exited with status 6 (the backup failed to back up the requested files)
03/11/2008 02:22:58 backupcenter240 bjdb01 cannot write p_w_picpath to media id 000013, drive index 0, I/O
錯誤
03/11/2008 02:23:12 backupcenter240 bjdb01 backup by oracle on client bjdb01 using policy oracle: media write error
03/11/2008 02:23:19 backupcenter240 bjdb01 suspending further backup attempts for client bjdb01, policy oracle, schedule Cumulative-Inc because it has exceeded the configured number of tries
03/11/2008 02:23:19 backupcenter240 bjdb01 backup of client bjdb01 exited with status 6 (the backup failed to back up the requested files)
03/11/2008 02:23:20 backupcenter240 - scheduler exiting - the backup failed to back up the requested files (6)
03/11/2008 09:32:42 backupcenter240 data03 cannot write p_w_picpath to media id 000016, drive index 0, I/O
錯誤
03/11/2008 09:32:53 backupcenter240 data03 DOWN'ing drive index 0, it has had at least 3 errors in last 12 hour(s)
03/11/2008 09:32:55 backupcenter240 data03 backup by oracle on client data03 using policy bjdb03-ora: media write error
03/11/2008 09:33:02 backupcenter240 data03 backup of client data03 exited with status 6 (the backup failed to back up the requested files)
03/11/2008 10:48:34 backupcenter240 data03 media manager terminated during mount of media id 000016, possible media mount timeout
03/11/2008 10:48:36 backupcenter240 data03 media manager terminated by parent process
03/11/2008 10:48:37 backupcenter240 data03 backup by oracle on client data03 using policy bjdb03-ora: the backup failed to back up the requested files
03/11/2008 10:48:38 backupcenter240 data03 suspending further backup attempts for client data03, policy bjdb03-ora, schedule diff because it has exceeded the configured number of tries
03/11/2008 10:48:38 backupcenter240 data03 backup of client data03 exited with status 6 (the backup failed to back up the requested files)
03/11/2008 13:55:03 backupcenter240 bpexpdate Could not build host list: client hostname could not be found
 

進一步查詢詳細的log信息,發(fā)現(xiàn)存在大量的錯誤:  

03/11/2008 18:23:59 backupcenter240 - cleaning job DB
03/11/2008 18:23:59 backupcenter240 - all drives are down for the specified robot number = 0, robot type = TLD and density = hcart
03/11/2008 18:23:59 backupcenter240 - no drives up on storage unit <backupcenter240-hcart-robot-tld-0>
03/11/2008 18:24:00 bjdb01 - all drives are down for the specified robot number = 0, robot type = TLD and density = hcart
03/11/2008 18:24:00 backupcenter240 - no drives up on storage unit <bjdb01-hcart-robot-tld-0>
03/11/2008 18:24:31 backupcenter240 - all drives are down for the specified robot number = 0, robot type = TLD and density = hcart
03/11/2008 18:24:31 backupcenter240 - no drives up on storage unit <unit_99>
03/11/2008 18:24:32 backupcenter240 - all drives are down for the specified robot number = 0, robot type = TLD and density = hcart
03/11/2008 18:24:32 backupcenter240 - no drives up on storage unit <unit_data>
03/11/2008 18:24:32 backupcenter240 data03 skipping backup of client data03, policy bjdb03-ora, schedule diff because it has exceeded the configured number of tries
 

從這個信息上看,似乎是機械手出現(xiàn)了問題。而且如果真的是機械手的問題,那么也可以解釋前后兩次備份錯誤信息的不同。當一個磁帶備份滿了之后,機械手嘗試更換新的磁帶,這時出現(xiàn)了故障,而對于當時備份的操作,就出現(xiàn)了無法寫入的錯誤,報錯沒有足夠空間。而隨后的備份由于機械手故障,而導致沒有可用的磁帶可以寫入,因此報錯NETBACKUP沒有初始化完成。  

繼續(xù)檢查media的報告,在匯總信息中看到:  

Number of ACTIVE media that, as of now:
There are no ACTIVE media present in the media database
 

這進一步確定了剛才的判斷,機械手故障導致可用的磁帶無法放到驅動器中,因此系統(tǒng)中沒有可用的介質。  

通過tpconfig檢查機械手的狀態(tài):  

Index DriveName DrivePath Type Shared Status
***** ********* ********** **** ****** ******
0 IBMULTRIUM-TD10 /dev/rmt/1cbn hcart Yes DOWN
TLD(0) Definition DRIVE=1
 

Currently defined robotics are:
TLD(0) robotic path = /dev/sg/c2t4l1,
volume database host = backupcenter240
 

機械手處于DOWN的狀態(tài),看來問題已經(jīng)基本確定了。  

嘗試使用robtest檢查機械手:  

bash-2.03# robtest
Configured robots with local control supporting test utilities:
TLD(0) robotic path = /dev/sg/c2t4l1
 

Robot Selection
---------------
1) TLD 0
2) none/quit
Enter choice: 1
 

Robot selected: TLD(0) robotic path = /dev/sg/c2t4l1  

Invoking robotic test utility:
/usr/openv/volmgr/bin/tldtest -r /dev/sg/c2t4l1 -d1 /dev/rmt/1cbn
 

Opening /dev/sg/c2t4l1
MODE_SENSE complete
Enter tld commands (? returns help information)
?
 

To exit the utility, type q or Q.  

init - Initialize element status
initrange <d#|s#|p#|t> [#]- Init element status range
allow - Allow media removal
prevent - Prevent media removal
extend - Extend media access port
retract - Retract media access port
mode - Mode sense
m <from> <to> - Move medium
pos <to> - Position to drive or slot
s [d|p|t|s [n]] [raw] - Read element status
inquiry - Display vendor and product ID
rezero - Rezero unit
inport - Ready inport (media access port)
debug - Toggle debug mode for this utility
test_ready - Send a TEST UNIT READY to the device
 

<from> <to> specifies drive (d#), slot (s#), media access port (p#),
or transport (t#)
<d#|s#|p#|t#> is drive #, slot #, media access port #, or transport #
[#] is number of elements for d, s, p, or t
NOTE - drive # is 1 - Number of drives
slot # is 1 - Number of slots
media access port # is 1 - Number of media access port elements
transport # is 1 - Number of transports
<type> = (d)rive, (s)lot, media access (p)ort, or (t)ransport
 

unload <drive> - Issue SCSI unload
<drive> = d1 or 1, d2 or 2, d3 or 3 ... d648 or 648
 

inquiry
Inquiry_data: STK L40 0213
test_ready
Unit is ready
q
 

Robot Selection
---------------
1) TLD 0
2) none/quit
Enter choice:
 

嘗試發(fā)出test_ready命令,等待一段時間后,發(fā)現(xiàn)機械手狀態(tài)已經(jīng)恢復正常:  

Index DriveName DrivePath Type Shared Status
***** ********* ********** **** ****** ******
0 IBMULTRIUM-TD10 /dev/rmt/1cbn hcart Yes UP
TLD(0) Definition DRIVE=1
 

Currently defined robotics are:
TLD(0) robotic path = /dev/sg/c2t4l1,
volume database host = backupcenter240
 

下面嘗試備份:  

$ rman target /  

Recovery Manager: Release 9.2.0.4.0 - 64bit Production  

Copyright (c) 1995, 2002, Oracle Corporation. All rights reserved.  

connected to target database: BJDB01 (DBID=3255963758)  

RMAN> backup current controlfile;  

Starting backup at 11-MAR-08
using target database controlfile instead of recovery catalog
allocated channel: ORA_SBT_TAPE_1
channel ORA_SBT_TAPE_1: sid=19 devtype=SBT_TAPE
channel ORA_SBT_TAPE_1: VERITAS NetBackup for Oracle - Release 5.0GA (2003103006)
channel ORA_SBT_TAPE_1: starting full datafile backupset
channel ORA_SBT_TAPE_1: specifying datafile(s) in backupset
including current controlfile in backupset
channel ORA_SBT_TAPE_1: starting piece 1 at 11-MAR-08
channel ORA_SBT_TAPE_1: finished piece 1 at 11-MAR-08
piece handle=ttjb17ur_1_1 comment=API Version 2.0,MMS Version 5.0.0.0
channel ORA_SBT_TAPE_1: backup set complete, elapsed time: 00:04:56
Finished backup at 11-MAR-08
 

Starting Control File Autobackup at 11-MAR-08
piece handle=c-3255963758-20080311-00 comment=API Version 2.0,MMS Version 5.0.0.0
Finished Control File Autobackup at 11-MAR-08
 

嘗試備份終于成功。  

可惜的是,備份小的文件似乎沒有問題,一旦備份文件比較大的時候,仍然出現(xiàn)上面的錯誤信息:  

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ch00 channel at 03/10/2008 05:14:15
ORA-19502: write error on file "bk_26552_1_648968690", blockno 664577 (blocksize=512)
ORA-27030: skgfwrt: sbtwrite2 returned error
ORA-19511: Error received from media manager layer, error text:
VxBSASendData: Failed with error:
Server Status: Communication with the server has not been iniatated or the server status has not been retrieved from the server.
 

而且后臺日志出現(xiàn)大量的IO錯誤信息:  

03/12/2008 09:42:51 backupcenter240 bjdb01 cannot write p_w_picpath to media id 000016, drive index 0, I/O   錯誤
03/12/2008 09:42:51 backupcenter240 bjdb01 FREEZING media id 000016, it has had at least 3 errors in the last 12 hour(s)
03/12/2008 09:43:08 backupcenter240 bjdb01 CLIENT bjdb01 POLICY oracle SCHED Default-Application-Backup EXIT STATUS 84 (media write error)
03/12/2008 09:43:08 backupcenter240 bjdb01 backup by oracle on client bjdb01: media write error
 

看來現(xiàn)在不僅僅是軟件問題了,經(jīng)過供應商最后確認,是帶庫的讀寫頭出現(xiàn)問題,最終通過更換配件,解決了這個問題。  

關于“NBU備份錯誤的示例分析”這篇文章就分享到這里了,希望以上內容可以對大家有一定的幫助,使各位可以學到更多知識,如果覺得文章不錯,請把它分享出去讓更多的人看到。

向AI問一下細節(jié)

免責聲明:本站發(fā)布的內容(圖片、視頻和文字)以原創(chuàng)、轉載和分享為主,文章觀點不代表本網(wǎng)站立場,如果涉及侵權請聯(lián)系站長郵箱:is@yisu.com進行舉報,并提供相關證據(jù),一經(jīng)查實,將立刻刪除涉嫌侵權內容。

nbu
AI