溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊(cè)×
其他方式登錄
點(diǎn)擊 登錄注冊(cè) 即表示同意《億速云用戶服務(wù)條款》

Oracle Study之--DataGuard 最大保護(hù)模式故障(ORA-16198)

發(fā)布時(shí)間:2020-07-14 17:58:01 來(lái)源:網(wǎng)絡(luò) 閱讀:1605 作者:客居天涯 欄目:關(guān)系型數(shù)據(jù)庫(kù)

Oracle Study之--DataGuard 最大保護(hù)模式故障(ORA-16198)


系統(tǒng)環(huán)境:

    操作系統(tǒng):RedHat EL5

    Oracle:   Oracle 11gR2 (11.2.0.1.0)


故障現(xiàn)象:

   Physical Standby在從Maximum Performance轉(zhuǎn)換到Maximum Protection時(shí),出現(xiàn)以下故障:

10:13:06 SYS@ prod1>startup force mount;
ORACLE instance started.
Total System Global Area  418484224 bytes
Fixed Size                  1336932 bytes
Variable Size             281020828 bytes
Database Buffers          130023424 bytes
Redo Buffers                6103040 bytes
Database mounted.

10:13:30 SYS@ prod1>select name,protection_mode from v$database;
NAME      PROTECTION_MODE
--------- --------------------
PROD1     MAXIMUM PROTECTION

Open DataBase失?。?/span>
10:07:04 SYS@ prod1>alter database open;
alter database open
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 4612
Session ID: 1 Serial number: 5

查看告警日志:
alter database open
Thu Jun 11 10:07:10 2015
LGWR: STARTING ARCH PROCESSES
Thu Jun 11 10:07:10 2015
ARC0 started with pid=19, OS id=4614 
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0: STARTING ARCH PROCESSES
Thu Jun 11 10:07:10 2015
ARC1 started with pid=20, OS id=4616 
Thu Jun 11 10:07:10 2015
ARC2 started with pid=21, OS id=4618 
ARC1: Archival started
ARC2: Archival started
ARC0: STARTING ARCH PROCESSES COMPLETE
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
ARC1: Becoming the heartbeat ARCH
LGWR: Primary database is in MAXIMUM PROTECTION mode
LGWR: Destination LOG_ARCHIVE_DEST_1 is not serviced by LGWR
Thu Jun 11 10:07:11 2015
NSS2 started with pid=18, OS id=4620 
Thu Jun 11 10:07:40 2015
ORA-16198: LGWR received timedout error from KSR
Errors in file /u01/app/oracle/diag/rdbms/bjdb/prod1/trace/prod1_lgwr_4565.trc:
ORA-16198: Timeout incurred on internal channel during remote archival
LGWR: Error 16198 verifying archivelog destination LOG_ARCHIVE_DEST_2
Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZED
LGWR: Continuing...
LGWR: Minimum of 1 applicable standby database required
Errors in file /u01/app/oracle/diag/rdbms/bjdb/prod1/trace/prod1_lgwr_4565.trc:
ORA-16072: a minimum of one standby database destination is required
Errors in file /u01/app/oracle/diag/rdbms/bjdb/prod1/trace/prod1_lgwr_4565.trc:
ORA-16072: a minimum of one standby database destination is required
LGWR (ospid: 4565): terminating the instance due to error 16072
Instance terminated by LGWR, pid = 4565

---從日志信息可以看出,所有的歸檔路徑都失敗,本地歸檔及遠(yuǎn)程歸檔均失??!

解決方法:

依據(jù)Oracle官方建議,修改net_timeout值:(主備庫(kù))


10:10:23 SYS@ prod1>alter system set log_archive_dest_2='service=shdb lgwr sync affirm VALID_FOR=(online_logfiles,primary_role) net_timeout=30 DB_UNIQUE_NAME=shdb';

10:11:25 SYS@ shdb>alter system set log_archive_dest_2='service=bjdb lgwr sync affirm VALID_FOR=(online_logfiles,primary_role) net_timeout=30 DB_UNIQUE_NAME=bjdb';

增加standby redo log:


主庫(kù):(在Maximum Performance下添加Standby redo logs)

10:20:35 SYS@ prod1>select group#,status ,bytes from v$log;
    GROUP# STATUS                BYTES
---------- ---------------- ----------
         1 INACTIVE           52428800
         2 CURRENT            52428800
         3 INACTIVE           52428800
10:20:54 SYS@ prod1>select member from v$logfile;
MEMBER
------------------------------------------------------------------------------------------------------------------------
/u01/app/oracle/oradata/prod1/redo03.log
/u01/app/oracle/oradata/prod1/redo02.log
/u01/app/oracle/oradata/prod1/redo01.log
/u01/app/oracle/oradata/prod1/std_redo01.log
/u01/app/oracle/oradata/prod1/std_redo02.log
6 rows selected.

10:21:03 SYS@ prod1>alter database add standby logfile 
10:21:25   2  '/u01/app/oracle/oradata/prod1/std_redo03.log' size 50m;
Database altered.

10:21:46 SYS@ prod1>alter database add standby logfile
10:21:51   2  '/u01/app/oracle/oradata/prod1/std_redo04.log' size 50m;
Database altered.

10:01:48 SYS@ shdb>select member from v$logfile;
MEMBER
------------------------------------------------------------------------------------------------------------------------
/u01/app/oracle/oradata/shdb/redo03.log
/u01/app/oracle/oradata/shdb/redo02.log
/u01/app/oracle/oradata/shdb/redo01.log
/disk2/arch_prod11_0_881851982.dbf
/u01/app/oracle/oradata/shdb/std_redo01.log
/u01/app/oracle/oradata/shdb/std_redo02.log
6 rows selected.

備庫(kù):

10:18:17 SYS@ shdb>alter database open;
Database altered.

10:20:21 SYS@ shdb>select group#,status ,bytes from v$log;
    GROUP# STATUS                BYTES
---------- ---------------- ----------
         1 CLEARING           52428800
         2 CLEARING           52428800
         3 CLEARING_CURRENT   52428800
         
10:20:45 SYS@ shdb>alter database add standby logfile 
10:22:41   2  '/u01/app/oracle/oradata/shdb/std_redo03.log' size 50m;
Database altered.

10:22:57 SYS@ shdb>alter database add standby logfile
10:23:02   2  '/u01/app/oracle/oradata/shdb/std_redo04.log' size 50m;
Database altered.

10:23:14 SYS@ shdb>col member for a50
10:23:23 SYS@ shdb>select group#,member from v$logfile;
    GROUP# MEMBER
---------- --------------------------------------------------
         3 /u01/app/oracle/oradata/shdb/redo03.log
         2 /u01/app/oracle/oradata/shdb/redo02.log
         1 /u01/app/oracle/oradata/shdb/redo01.log
         5 /u01/app/oracle/oradata/shdb/std_redo01.log
         6 /u01/app/oracle/oradata/shdb/std_redo02.log
         7 /u01/app/oracle/oradata/shdb/std_redo03.log
         8 /u01/app/oracle/oradata/shdb/std_redo04.log
8 rows selected.

-----經(jīng)過(guò)以上方式處理后,問(wèn)題依舊,在Maximum Protection模式下主庫(kù)依然不能被Open ;但在Maximum Availablity 和 Maximum Performance下主庫(kù)可以O(shè)pen 。出錯(cuò)原因依舊在探索。。。


參考文檔:


數(shù)據(jù)庫(kù)報(bào)ORA-16198故障的解決方法分析 

--------http://blog.itpub.net/28546804/viewspace-1260003/


1. 首先看官方文檔關(guān)于ORA-16198報(bào)錯(cuò)的說(shuō)明
.......................
報(bào)錯(cuò)可能原因是因?yàn)閚et_timeout設(shè)置低,在以前老版本默認(rèn)是10,建議更改為30
……………………………
The net_timeout attribute in the log_archive_dest_2 on the primary is
set too low so that
LNS couldn't finish sending redo block in 10 seconds in this example.
…………………………….
如果設(shè)置30還不行,請(qǐng)檢查磁盤(pán)的IO使用情況或者網(wǎng)絡(luò)傳輸情況
…………………………..
Note: If NET_TIMEOUT attribute has already been set to 30, and you still get ORA-16198, that means LNS couldn't finish sending redo block in 30 seconds.
The slowness may caused by:
1. Operating System. Please keep track of OS usage (like iostat).
2. Network. Please keep track network flow (like tcpdump).
……………………………
也有可能是BUG,受影響的版本為11.2.0.1或10.2.0.4,建議升級(jí)到11.2.0.2以上的版本
…………………………..
Bug 9259587  Multiple LGWR reconnect attempts in Data Guard MAXIMUM_AVAILABILITY
 This note gives a brief overview bug 9259587. 
Affects:
Product (Component) Oracle Server (Rdbms)
Range of versions believed to be affected Versions BELOW 12.1
Versions confirmed as being affected 11.2.0.1 10.2.0.4
Platforms affected Generic (all / most platforms affected)
Fixed:
This issue is fixed in 12.1 (Future Release) 11.2.0.2 (Server Patch Set)
Symptoms:
Related To:
Hang (Process Spins)
Active Dataguard (ADG)
Physical Standby Database / Dataguard
Description
…………………………………………………
發(fā)生的報(bào)錯(cuò),大概類似于下面的顯示
…………………………………………………
Rediscovery Notes:
 Alert log contains messages like:
  ORA-16198: LGWR received timedout error from KSR
  LGWR: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (16198)
  LGWR: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
  Errors in file 
  /app/oracle/diag/rdbms/ora11g_dga/ora11g/trace/ora11g_lgwr_290838.trc:
  ORA-16198: Timeout incurred on internal channel during remote archival
  LGWR: Network asynch I/O wait error 16198 log 2 service 'ora11g_DGb'
  LGWR: Error 16198 disconnecting from destination LOG_ARCHIVE_DEST_2 standby 
  host 'ora11g_DGb'
  Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZED
  LGWR: Failed to archive log 2 thread 1 sequence 1422 (16198)
…………………………………………………
In a Data Guard configuration using LGWR SYNC transport on one or more LOG_ARCHIVE_DEST_n parameters, and using a protection mode of MAXIMUM_AVAILABILITY, then if the primary database becomes disconnected from the standby database, LGWR continues to attempt to reconnect to the standby database. It should instead avoid attempts to reconnect until an ARCH process has re-established communication with the standby database.
所以可以確定的是:
報(bào)這種錯(cuò)誤主要發(fā)生在DATAGUARD這種架構(gòu)上,原因就是主機(jī)的日志向備機(jī)傳輸時(shí)沒(méi)在規(guī)定時(shí)間完成,或無(wú)法向備機(jī)傳送日志,那么我們就下面主要的兩種故障原因來(lái)進(jìn)行說(shuō)明:
2. 參數(shù)設(shè)置過(guò)低導(dǎo)致的故障
可能由于設(shè)置的LOG_ARCHIVE_DEST_2的NET_TIMEOUT值過(guò)低,導(dǎo)致的日志無(wú)法在規(guī)定時(shí)間傳輸完成,建議設(shè)置成30。
查詢NET_TIMEOUT:
SQL> select DEST_NAME,NET_TIMEOUT FROM V$ARCHIVE_DEST;
DEST_NAME                 NET_TIMEOUT
-------------------------         -----------
LOG_ARCHIVE_DEST_1                  0
LOG_ARCHIVE_DEST_2                 30
……………輸出省略
查看LOG_ARCHIVE_DEST_2參數(shù):
SQL> show parameter log_archive_dest_2
值為'service=orcl_std reopen=120 lgwr sync valid_for=(online_logfiles,primary_role) db_unique_name=orcl_std'
我沒(méi)有設(shè)置NET_TIMEOUT參數(shù),默認(rèn)卻是30,因?yàn)槲业陌姹臼?1.2.0.3的。
如果你的參數(shù)不是30,請(qǐng)進(jìn)行修改,參考如下:
SQL>ALTER SYSTEM SET LOG_ARCHIVE_DEST_2='service=orcl_std reopen=120 lgwr sync net_timeout=30 valid_for=(online_logfiles,primary_role) db_unique_name=orcl_std';
然后觀察一下是否還報(bào)此類問(wèn)題。
3. 由于網(wǎng)絡(luò)不通暢或存儲(chǔ)IO繁忙等其他原因?qū)е碌墓收?/span>
如果是由于網(wǎng)絡(luò)不通暢和存儲(chǔ)繁忙的原因?qū)е碌膱?bào)錯(cuò),請(qǐng)用操作系統(tǒng)命令類似于,tcpdump或IOSTAT,VMSTAT來(lái)查看相關(guān)資源使用情況,或聯(lián)系網(wǎng)絡(luò),存儲(chǔ)管理員來(lái)協(xié)助分析。
如果以上都沒(méi)問(wèn)題,還有一種可能性就是你主機(jī)或備機(jī)單獨(dú)改sys密碼了,但是相關(guān)的備機(jī)或主機(jī)沒(méi)有同時(shí)改,造成主機(jī)向備機(jī)驗(yàn)證時(shí)失效也是很有可能的。
4. 數(shù)據(jù)庫(kù)的BUG
如果以上方法還沒(méi)有解決問(wèn)題,你也分析不出具體的原因,恰好你的數(shù)據(jù)庫(kù)版本是11.2.0.1或10.2.0.4,那么升級(jí)吧少年。。
5. 總結(jié)
考慮此類問(wèn)題,要從多角度分析,比如:參數(shù)值低,存儲(chǔ)使用情況,網(wǎng)絡(luò)傳輸情況,sys密碼改了,數(shù)據(jù)庫(kù)的BUG等。


向AI問(wèn)一下細(xì)節(jié)

免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。

AI