Oracle Linux 6.7中 Oracle 11.2.0.4 RAC集群CRS異常處理方法是什么

發(fā)布時(shí)間：2021-11-05 15:58:17 來(lái)源：億速云閱讀：163 作者：iii 欄目：關(guān)系型數(shù)據(jù)庫(kù)

這篇文章主要講解了“Oracle Linux 6.7中 Oracle 11.2.0.4 RAC集群CRS異常處理方法是什么”，文中的講解內(nèi)容簡(jiǎn)單清晰，易于學(xué)習(xí)與理解，下面請(qǐng)大家跟著小編的思路慢慢深入，一起來(lái)研究和學(xué)習(xí)“Oracle Linux 6.7中 Oracle 11.2.0.4 RAC集群CRS異常處理方法是什么”吧！

最近一個(gè)月單位上的Oracle RAC集群CRS異常導(dǎo)致集群命令無(wú)法使用，執(zhí)行crsctl stat res -t命令出現(xiàn)如下錯(cuò)誤:

[grid@db1 ~]$ crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
[grid@db2 ~]$ crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.

但是數(shù)據(jù)庫(kù)可以正常訪問(wèn)，業(yè)務(wù)系統(tǒng)也運(yùn)行正常

[root@db1 ~]# ps -ef | grep pmon
root       8024 204594  0 21:11 pts/0    00:00:00 grep pmon
grid      77120      1  0 Dec21 ?        00:04:21 asm_pmon_+ASM1
oracle    77790      1  0 Dec21 ?        00:05:18 ora_pmon_CAIWU1
oracle    77794      1  0 Dec21 ?        00:05:08 ora_pmon_dadb1
oracle    77848      1  0 Dec21 ?        00:05:39 ora_pmon_chdyl1
oracle    77910      1  0 Dec21 ?        00:07:47 ora_pmon_RLZY1
[root@db2 ~]# ps -ef | grep pmon
grid      27745      1  0 Dec21 ?        00:04:21 asm_pmon_+ASM2
oracle    28393      1  0 Dec21 ?        00:05:21 ora_pmon_dadb2
oracle    28569      1  0 Dec21 ?        00:04:58 ora_pmon_CAIWU2
oracle    28573      1  0 Dec21 ?        00:05:36 ora_pmon_chdyl2
oracle    28583      1  0 Dec21 ?        00:07:49 ora_pmon_RLZY2

查看ASM磁盤組的狀態(tài)，發(fā)現(xiàn)OCR磁盤組確實(shí)offline了

[grid@db1 ~]$ asmcmd lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576   3072000  2679522                0         2679522              0             N  ARCH/
MOUNTED  EXTERN  N         512   4096  1048576    204800   151138                0          151138              0             N  CWDATA/
MOUNTED  EXTERN  N         512   4096  1048576    512000   472546                0          472546              0             N  DADATA/
MOUNTED  EXTERN  N         512   4096  1048576   3072000   595334                0          595334              0             N  DATA/
MOUNTED  EXTERN  N         512   4096  1048576   1843200   609953                0          609953              0             N  SBDATA/
[grid@db2 ~]$ asmcmd lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576   3072000  2679522                0         2679522              0             N  ARCH/
MOUNTED  EXTERN  N         512   4096  1048576    204800   151138                0          151138              0             N  CWDATA/
MOUNTED  EXTERN  N         512   4096  1048576    512000   472546                0          472546              0             N  DADATA/
MOUNTED  EXTERN  N         512   4096  1048576   3072000   595334                0          595334              0             N  DATA/
MOUNTED  EXTERN  N         512   4096  1048576   1843200   609953                0          609953              0             N  SBDATA/

手工將crsdg上線，命令能夠執(zhí)行成功，但執(zhí)行crsctl stat res -t命令仍然報(bào)錯(cuò)。

[grid@db1 ~]$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.4.0 Production on Mon Dec 30 21:15:33 2019
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> alter diskgroup ocr mount;
Diskgroup altered.
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
[grid@db1 ~]$ crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
[grid@db2 ~]$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.4.0 Production on Mon Dec 30 21:15:05 2019
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> alter diskgroup ocr mount;
Diskgroup altered.
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
[grid@db2 ~]$ crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.

檢查節(jié)點(diǎn)db1的alert_+ASM1.log有如下報(bào)錯(cuò)，說(shuō)是不能訪問(wèn)OCR磁盤組的相關(guān)磁盤而強(qiáng)制dismount了OCR磁盤，但使用dd命令測(cè)試是能夠訪問(wèn)的

Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_77212.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
WARNING: requested mirror side 1 of virtual extent 0 logical extent 0 offset 102400 is not allocated; I/O request failed
WARNING: requested mirror side 2 of virtual extent 0 logical extent 1 offset 102400 is not allocated; I/O request failed
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_77212.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
Sat Dec 28 05:30:48 2019
SQL> alter diskgroup OCR check /* proxy */ 
ORA-15032: not all alterations performed
ORA-15001: diskgroup "OCR" does not exist or is not mounted
ERROR: alter diskgroup OCR check /* proxy */
NOTE: client exited [77184]
Sat Dec 28 05:30:49 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35285] opening OCR file
Sat Dec 28 05:30:51 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35305] opening OCR file
Sat Dec 28 05:30:53 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35322] opening OCR file
Sat Dec 28 05:30:55 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35346] opening OCR file
Sat Dec 28 05:30:57 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35363] opening OCR file
Sat Dec 28 05:31:00 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35459] opening OCR file
Sat Dec 28 05:31:02 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35481] opening OCR file
Sat Dec 28 05:31:04 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35520] opening OCR file
Sat Dec 28 05:31:06 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35539] opening OCR file
Sat Dec 28 05:31:08 2019
NOTE: [crsd.bin@db1 (TNS V1-V3) 35557] opening OCR file
Sat Dec 28 21:00:10 2019
Warning: VKTM detected a time drift.
Time drifts can result in an unexpected behavior such as time-outs. Please check trace file for more details.

檢查錯(cuò)誤日志

[root@db1 ~]# more /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_77212.trc
Trace file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_77212.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
ORACLE_HOME = /u01/app/11.2.0/grid
System name:    Linux
Node name:      db1
Release:        3.8.13-68.3.4.el6uek.x86_64
Version:        #2 SMP Tue Jul 14 15:03:36 PDT 2015
Machine:        x86_64
Instance name: +ASM1
Redo thread mounted by this instance: 0 
Oracle process number: 24
Unix process pid: 77212, image: oracle@db1 (TNS V1-V3)
*** 2019-12-28 05:30:44.894
*** SESSION ID:(2929.3) 2019-12-28 05:30:44.894
*** CLIENT ID:() 2019-12-28 05:30:44.894
*** SERVICE NAME:() 2019-12-28 05:30:44.894
*** MODULE NAME:(crsd.bin@db1 (TNS V1-V3)) 2019-12-28 05:30:44.894
*** ACTION NAME:() 2019-12-28 05:30:44.894
 
Received ORADEBUG command (#1) 'CLEANUP_KFK_FD' from process 'Unix process pid: 35253, image: '
*** 2019-12-28 05:30:44.895
Finished processing ORADEBUG command (#1) 'CLEANUP_KFK_FD'
*** 2019-12-28 05:30:48.235
WARNING:failed xlate 1 
ORA-15078: ASM diskgroup was forcibly dismounted
ksfdrfms:Mirror Read file=+OCR.255.4294967295 fob=0x9b00e5d8 bufp=0x7f5dd012ba00 blkno=25 nbytes=4096
WARNING:failed xlate 1 
WARNING: requested mirror side 1 of virtual extent 0 logical extent 0 offset 102400 is not allocated; I/O request failed
ksfdrfms:Read failed from mirror side=1 logical extent number=0 dskno=65535
WARNING:failed xlate 1 
WARNING: requested mirror side 2 of virtual extent 0 logical extent 1 offset 102400 is not allocated; I/O request failed
ksfdrfms:Read failed from mirror side=2 logical extent number=1 dskno=65535
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted

檢查節(jié)點(diǎn)db1的alertdb1.log有如下報(bào)錯(cuò),也是說(shuō)不能訪問(wèn)OCR磁盤組的相關(guān)磁盤

2019-12-28 05:30:48.468: 
[/u01/app/11.2.0/grid/bin/oraagent.bin(77466)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:1:4} in /u01/app/11.2.0/grid/log/db1/agent/crsd/oraagent_grid/oraagent_grid.lo
g.
2019-12-28 05:30:48.468: 
[/u01/app/11.2.0/grid/bin/oraagent.bin(77684)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_oracle' disconnected from server. Details at (:CRSAGF00117:) {0:7:332} in /u01/app/11.2.0/grid/log/db1/agent/crsd/oraagent_oracle/oraagent_o
racle.log.
2019-12-28 05:30:48.471: 
[/u01/app/11.2.0/grid/bin/orarootagent.bin(77482)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:5:11497} in /u01/app/11.2.0/grid/log/db1/agent/crsd/orarootagent_root/
orarootagent_root.log.
2019-12-28 05:30:48.480: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:30:50.003: 
[crsd(35285)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:50.021: 
[crsd(35285)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:50.520: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:30:51.918: 
[crsd(35305)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:51.929: 
[crsd(35305)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:52.557: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:30:53.945: 
[crsd(35322)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:53.956: 
[crsd(35322)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:54.595: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:30:55.976: 
[crsd(35346)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:55.988: 
[crsd(35346)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:56.633: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:30:58.010: 
[crsd(35363)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:58.020: 
[crsd(35363)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:30:58.669: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:00.043: 
[crsd(35459)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:00.054: 
[crsd(35459)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:00.706: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:02.093: 
[crsd(35481)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:02.103: 
[crsd(35481)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:02.742: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:04.109: 
[crsd(35520)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:04.119: 
[crsd(35520)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:04.777: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:06.141: 
[crsd(35539)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:06.151: 
[crsd(35539)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:06.810: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:08.181: 
[crsd(35557)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:08.191: 
[crsd(35557)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db1/crsd/crsd.log.
2019-12-28 05:31:08.846: 
[ohasd(33022)]CRS-2765:Resource 'ora.crsd' has failed on server 'db1'.
2019-12-28 05:31:08.847: 
[ohasd(33022)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.
2019-12-28 05:31:08.848: 
[ohasd(33022)]CRS-2769:Unable to failover resource 'ora.crsd'.

檢查節(jié)點(diǎn)db1的oraagent_grid.log文件有如下報(bào)錯(cuò)，顯示OCR磁盤組的狀態(tài)被改變?yōu)閛ffline了。

2019-12-28 05:30:16.531: [    AGFW][511039232]{1:30746:2} Agent received the message: AGENT_HB[Engine] ID 12293:113720
2019-12-28 05:30:37.808: [    AGFW][511039232]{1:30746:9373} Agent received the message: RESOURCE_STOP[ora.OCR.dg db1 1] ID 4099:113730
2019-12-28 05:30:37.808: [    AGFW][511039232]{1:30746:9373} Preparing STOP command for: ora.OCR.dg db1 1
2019-12-28 05:30:37.808: [    AGFW][511039232]{1:30746:9373} ora.OCR.dg db1 1 state changed from: ONLINE to: STOPPING
2019-12-28 05:30:37.809: [ora.OCR.dg][513140480]{1:30746:9373} [stop] (:CLSN00108:) clsn_agent::stop {
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] DgpAgent::stop: enter { 
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] getResAttrib: attrib name USR_ORA_OPI value true len 4
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] Agent::flagUsrOraOpiIsSet(true) reason not dependency
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] DgpAgent::stop: tha exit }
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] DgpAgent::stopSingle status:2 }
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [stop] (:CLSN00108:) clsn_agent::stop }
2019-12-28 05:30:37.810: [    AGFW][513140480]{1:30746:9373} Command: stop for resource: ora.OCR.dg db1 1 completed with status: SUCCESS
2019-12-28 05:30:37.810: [ora.OCR.dg][513140480]{1:30746:9373} [check] CrsCmd::ClscrsCmdData::stat entity 1 statflag 33 useFilter 0
2019-12-28 05:30:37.811: [    AGFW][511039232]{1:30746:9373} Agent sending reply for: RESOURCE_STOP[ora.OCR.dg db1 1] ID 4099:113730
2019-12-28 05:30:37.838: [ora.OCR.dg][513140480]{1:30746:9373} [check] DgpAgent::runCheck: asm stat asmRet 0
2019-12-28 05:30:37.839: [ora.OCR.dg][513140480]{1:30746:9373} [check] DgpAgent::getConnxn connected
2019-12-28 05:30:37.844: [ora.OCR.dg][513140480]{1:30746:9373} [check] DgpAgent::queryDgStatus excp no data found
2019-12-28 05:30:37.844: [ora.OCR.dg][513140480]{1:30746:9373} [check] DgpAgent::queryDgStatus no data found in v$asm_diskgroup_stat
2019-12-28 05:30:37.844: [ora.OCR.dg][513140480]{1:30746:9373} [check] DgpAgent::queryDgStatus dgName OCR ret 1
2019-12-28 05:30:37.845: [    AGFW][511039232]{1:30746:9373} ora.OCR.dg db1 1 state changed from: STOPPING to: OFFLINE
2019-12-28 05:30:37.845: [    AGFW][511039232]{1:30746:9373} Agent sending last reply for: RESOURCE_STOP[ora.OCR.dg db1 1] ID 4099:113730
2019-12-28 05:30:43.889: [ora.asm][503641856]{1:30746:2} [check] CrsCmd::ClscrsCmdData::stat entity 1 statflag 33 useFilter 0
2019-12-28 05:30:43.920: [ora.asm][503641856]{1:30746:2} [check] AsmProxyAgent::check clsagfw_res_status 0
2019-12-28 05:30:48.465: [ CRSCOMM][521545472] IpcC: IPC client connection 6c to member 0 has been removed
2019-12-28 05:30:48.465: [CLSFRAME][521545472] Removing IPC Member:{Relative|Node:0|Process:0|Type:1}
2019-12-28 05:30:48.465: [CLSFRAME][521545472] Disconnected from CRSD:db1 process: {Relative|Node:0|Process:0|Type:1}
2019-12-28 05:30:48.474: [   AGENT][511039232]{0:1:4} {0:1:4} Created alert : (:CRSAGF00117:) :  Disconnected from server, Agent is shutting down.
2019-12-28 05:30:48.474: [    AGFW][511039232]{0:1:4} Agent is exiting with exit code: 1

檢查節(jié)點(diǎn)db2的alert_+ASM2.log有如下報(bào)錯(cuò)，出現(xiàn)了類似的"Waited 15 secs for write IO to PST disk 0 in group 1"信息，這說(shuō)明對(duì)OCR磁盤組執(zhí)行寫操作時(shí)超時(shí)了15秒.

Sat Dec 28 03:02:51 2019
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 1 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 2 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 1 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 2 in group 5.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 6.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 6.
Sat Dec 28 03:02:51 2019
NOTE: process _b000_+asm2 (45488) initiating offline of disk 0.3916160907 (OCR1) with mask 0x7e in group 5
NOTE: process _b000_+asm2 (45488) initiating offline of disk 1.3916160906 (OCR2) with mask 0x7e in group 5
NOTE: process _b000_+asm2 (45488) initiating offline of disk 2.3916160905 (OCR3) with mask 0x7e in group 5
NOTE: checking PST: grp = 5
GMON checking disk modes for group 5 at 19 for pid 27, osid 45488
ERROR: no read quorum in group: required 2, found 0 disks
NOTE: checking PST for grp 5 done.
NOTE: initiating PST update: grp = 5, dsk = 0/0xe96bdf8b, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 5, dsk = 1/0xe96bdf8a, mask = 0x6a, op = clear
NOTE: initiating PST update: grp = 5, dsk = 2/0xe96bdf89, mask = 0x6a, op = clear
GMON updating disk modes for group 5 at 20 for pid 27, osid 45488
ERROR: no read quorum in group: required 2, found 0 disks
Sat Dec 28 03:02:51 2019
NOTE: cache dismounting (not clean) group 5/0x8F5B2F9F (OCR) 
NOTE: messaging CKPT to quiesce pins Unix process pid: 45490, image: oracle@db2 (B001)
Sat Dec 28 03:02:51 2019
NOTE: halting all I/Os to diskgroup 5 (OCR)
Sat Dec 28 03:02:52 2019
NOTE: LGWR doing non-clean dismount of group 5 (OCR)
NOTE: LGWR sync ABA=23.100 last written ABA 23.100
WARNING: Offline for disk OCR1 in mode 0x7f failed.
WARNING: Offline for disk OCR2 in mode 0x7f failed.
WARNING: Offline for disk OCR3 in mode 0x7f failed.
Sat Dec 28 03:02:52 2019
kjbdomdet send to inst 1
detach from dom 5, sending detach message to inst 1
Sat Dec 28 03:02:52 2019
List of instances:
 1 2
Dirty detach reconfiguration started (new ddet inc 1, cluster inc 36)
 Global Resource Directory partially frozen for dirty detach
* dirty detach - domain 5 invalid = TRUE 
 0 GCS resources traversed, 0 cancelled
Dirty Detach Reconfiguration complete
Sat Dec 28 03:02:52 2019
WARNING: dirty detached from domain 5
NOTE: cache dismounted group 5/0x8F5B2F9F (OCR) 
SQL> alter diskgroup OCR dismount force /* ASM SERVER:2405117855 */ 
Sat Dec 28 03:02:52 2019
NOTE: cache deleting context for group OCR 5/0x8f5b2f9f
GMON dismounting group 5 at 21 for pid 28, osid 45490
NOTE: Disk OCR1 in mode 0x7f marked for de-assignment
NOTE: Disk OCR2 in mode 0x7f marked for de-assignment
NOTE: Disk OCR3 in mode 0x7f marked for de-assignment
NOTE:Waiting for all pending writes to complete before de-registering: grpnum 5
Sat Dec 28 03:03:03 2019
WARNING: Waited 27 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 27 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 24 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 24 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 27 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 27 secs for write IO to PST disk 0 in group 3.
WARNING: Waited 21 secs for write IO to PST disk 0 in group 4.
WARNING: Waited 21 secs for write IO to PST disk 0 in group 4.
WARNING: Waited 27 secs for write IO to PST disk 0 in group 6.
WARNING: Waited 27 secs for write IO to PST disk 0 in group 6.
Sat Dec 28 03:03:03 2019
ASM Health Checker found 1 new failures
Sat Dec 28 03:03:22 2019
SUCCESS: diskgroup OCR was dismounted
SUCCESS: alter diskgroup OCR dismount force /* ASM SERVER:2405117855 */
SUCCESS: ASM-initiated MANDATORY DISMOUNT of group OCR
Sat Dec 28 03:03:22 2019
NOTE: diskgroup resource ora.OCR.dg is offline
Sat Dec 28 03:03:22 2019
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Sat Dec 28 05:30:34 2019
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 6.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 6.
Sat Dec 28 05:30:37 2019
 Received dirty detach msg from inst 1 for dom 5
Sat Dec 28 05:30:37 2019
List of instances:
 1 2
Dirty detach reconfiguration started (new ddet inc 2, cluster inc 36)
 Global Resource Directory partially frozen for dirty detach
* dirty detach - domain 5 invalid = TRUE 
 0 GCS resources traversed, 0 cancelled
freeing rdom 5
Dirty Detach Reconfiguration complete
Sat Dec 28 05:30:37 2019
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
WARNING: requested mirror side 1 of virtual extent 5 logical extent 0 offset 704512 is not allocated; I/O request failed
WARNING: requested mirror side 2 of virtual extent 5 logical extent 1 offset 704512 is not allocated; I/O request failed
Errors in file /u01/app/grid/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27831.trc:
ORA-15078: ASM diskgroup was forcibly dismounted
ORA-15078: ASM diskgroup was forcibly dismounted
Sat Dec 28 05:30:37 2019
SQL> alter diskgroup OCR check /* proxy */ 
ORA-15032: not all alterations performed
ORA-15001: diskgroup "OCR" does not exist or is not mounted
ERROR: alter diskgroup OCR check /* proxy */
Sat Dec 28 05:30:44 2019
WARNING: Waited 20 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 20 secs for write IO to PST disk 0 in group 2.
Sat Dec 28 05:30:48 2019
NOTE: client exited [27819]
Sat Dec 28 05:30:49 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142641] opening OCR file
Sat Dec 28 05:30:51 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142660] opening OCR file
Sat Dec 28 05:30:53 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142678] opening OCR file
Sat Dec 28 05:30:55 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142696] opening OCR file
Sat Dec 28 05:30:57 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142723] opening OCR file
Sat Dec 28 05:30:59 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142744] opening OCR file
Sat Dec 28 05:31:01 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142773] opening OCR file
Sat Dec 28 05:31:03 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142792] opening OCR file
Sat Dec 28 05:31:05 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142806] opening OCR file
Sat Dec 28 05:31:07 2019
NOTE: [crsd.bin@db2 (TNS V1-V3) 142821] opening OCR file
Sat Dec 28 06:18:42 2019
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.

檢查節(jié)點(diǎn)db2的alertdb2.log有如下報(bào)錯(cuò),也是說(shuō)不能訪問(wèn)OCR磁盤組的相關(guān)磁盤，但使用dd命令測(cè)試是可以訪問(wèn)的。

2019-12-28 05:30:48.019: 
[/u01/app/11.2.0/grid/bin/oraagent.bin(28268)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_oracle' disconnected from server. Details at (:CRSAGF00117:) {0:7:73} in /u01/app/11.2.0/grid/log/db2/agent/crsd/oraagent_oracle/oraagent_oracle.log.
2019-12-28 05:30:48.019: 
[/u01/app/11.2.0/grid/bin/scriptagent.bin(37953)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/scriptagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:9:8} in /u01/app/11.2.0/grid/log/db2/agent/crsd/scriptagent_grid/scriptagent_grid.log.
2019-12-28 05:30:48.020: 
[/u01/app/11.2.0/grid/bin/oraagent.bin(28009)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:1:10} in /u01/app/11.2.0/grid/log/db2/agent/crsd/oraagent_grid/oraagent_grid.log.
2019-12-28 05:30:48.021: 
[/u01/app/11.2.0/grid/bin/orarootagent.bin(28025)]CRS-5822:Agent '/u01/app/11.2.0/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:5:373} in /u01/app/11.2.0/grid/log/db2/agent/crsd/orarootagent_root/orarootagent_root.log.
2019-12-28 05:30:48.024: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:49.410: 
[crsd(142641)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:49.420: 
[crsd(142641)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:50.063: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:51.442: 
[crsd(142660)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:51.451: 
[crsd(142660)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:52.100: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:53.471: 
[crsd(142678)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:53.480: 
[crsd(142678)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:54.138: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:55.507: 
[crsd(142696)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:55.517: 
[crsd(142696)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:56.176: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:57.551: 
[crsd(142723)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:57.560: 
[crsd(142723)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:58.216: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:30:59.592: 
[crsd(142744)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:30:59.602: 
[crsd(142744)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:00.253: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:31:01.627: 
[crsd(142773)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:01.636: 
[crsd(142773)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:02.290: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:31:03.658: 
[crsd(142792)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:03.668: 
[crsd(142792)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:04.327: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:31:05.701: 
[crsd(142806)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:05.711: 
[crsd(142806)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:06.365: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:31:07.726: 
[crsd(142821)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:07.735: 
[crsd(142821)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage
]. Details at (:CRSD00111:) in /u01/app/11.2.0/grid/log/db2/crsd/crsd.log.
2019-12-28 05:31:08.402: 
[ohasd(13034)]CRS-2765:Resource 'ora.crsd' has failed on server 'db2'.
2019-12-28 05:31:08.402: 
[ohasd(13034)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.
2019-12-28 05:31:08.403: 
[ohasd(13034)]CRS-2769:Unable to failover resource 'ora.crsd'.

檢查節(jié)點(diǎn)db2的oraagent_grid.log文件有如下報(bào)錯(cuò):

2019-12-28 05:29:59.329: [    AGFW][3601811200]{2:6928:2} Agent received the message: AGENT_HB[Engine] ID 12293:273304
2019-12-28 05:30:17.162: [ora.LISTENER_SCAN2.lsnr][3592312576]{1:34166:403} [check] Utils:execCmd action = 3 flags = 38 ohome = (null) cmdname = lsnrctl. 
2019-12-28 05:30:17.267: [ora.LISTENER_SCAN2.lsnr][3592312576]{1:34166:403} [check] execCmd ret = 0
2019-12-28 05:30:17.267: [ora.LISTENER_SCAN2.lsnr][3592312576]{1:34166:403} [check] CrsCmd::ClscrsCmdData::stat entity 5 statflag 32 useFilter 1
2019-12-28 05:30:17.298: [ora.LISTENER_SCAN2.lsnr][3592312576]{1:34166:403} [check] ScanLsnrAgent::checkDependentVipResource: scanVipResource = ora.scan2.vip, statRet = 0
2019-12-28 05:30:17.881: [ora.LISTENER_SCAN3.lsnr][2950686464]{1:34166:403} [check] Utils:execCmd action = 3 flags = 38 ohome = (null) cmdname = lsnrctl. 
2019-12-28 05:30:17.986: [ora.LISTENER_SCAN3.lsnr][2950686464]{1:34166:403} [check] execCmd ret = 0
2019-12-28 05:30:17.987: [ora.LISTENER_SCAN3.lsnr][2950686464]{1:34166:403} [check] CrsCmd::ClscrsCmdData::stat entity 5 statflag 32 useFilter 1
2019-12-28 05:30:18.019: [ora.LISTENER_SCAN3.lsnr][2950686464]{1:34166:403} [check] ScanLsnrAgent::checkDependentVipResource: scanVipResource = ora.scan3.vip, statRet = 0
2019-12-28 05:30:27.292: [ora.asm][2950686464]{2:6928:2} [check] CrsCmd::ClscrsCmdData::stat entity 1 statflag 33 useFilter 0
2019-12-28 05:30:27.319: [ora.asm][2950686464]{2:6928:2} [check] AsmProxyAgent::check clsagfw_res_status 0
2019-12-28 05:30:34.522: [ora.ons][2950686464]{2:6928:2} [check] getOracleHomeAttrib: oracle_home = /u01/app/11.2.0/grid
2019-12-28 05:30:34.522: [ora.ons][2950686464]{2:6928:2} [check] Utils:execCmd action = 3 flags = 6 ohome = /u01/app/11.2.0/grid/opmn/ cmdname = onsctli. 
2019-12-28 05:30:34.627: [ora.ons][2950686464]{2:6928:2} [check] (:CLSN00010:)ons is running ...
2019-12-28 05:30:34.627: [ora.ons][2950686464]{2:6928:2} [check] (:CLSN00010:)
2019-12-28 05:30:34.628: [ora.ons][2950686464]{2:6928:2} [check] execCmd ret = 0
2019-12-28 05:30:37.858: [ USRTHRD][3575973632]{1:30748:9373} Processing the event CRS_RESOURCE_STATE_CHANGE
2019-12-28 05:30:38.652: [ora.LISTENER.lsnr][3594413824]{2:6928:2} [check] Utils:execCmd action = 3 flags = 38 ohome = (null) cmdname = lsnrctl. 
2019-12-28 05:30:38.757: [ora.LISTENER.lsnr][3594413824]{2:6928:2} [check] execCmd ret = 0
2019-12-28 05:30:48.017: [ CRSCOMM][3612317440] IpcC: IPC client connection 6c to member 0 has been removed
2019-12-28 05:30:48.017: [CLSFRAME][3612317440] Removing IPC Member:{Relative|Node:0|Process:0|Type:1}
2019-12-28 05:30:48.017: [CLSFRAME][3612317440] Disconnected from CRSD:db2 process: {Relative|Node:0|Process:0|Type:1}
2019-12-28 05:30:48.020: [   AGENT][3601811200]{0:1:10} {0:1:10} Created alert : (:CRSAGF00117:) :  Disconnected from server, Agent is shutting down.
2019-12-28 05:30:48.020: [    AGFW][3601811200]{0:1:10} Agent is exiting with exit code: 1

檢查2個(gè)節(jié)點(diǎn)的/var/log/messages日志，發(fā)現(xiàn)2個(gè)節(jié)點(diǎn)均有多路徑相關(guān)的錯(cuò)誤信息，但相關(guān)的磁盤是用于備份的，不是用于生產(chǎn)數(shù)據(jù)庫(kù)所使用的

Dec 30 05:25:31 db1 multipathd: backup2: sdcr - emc_clariion_checker: query command indicates error
Dec 30 05:25:31 db1 multipathd: checker failed path 69:240 in map backup2
Dec 30 05:25:31 db1 kernel: device-mapper: multipath: Failing path 69:240.
Dec 30 05:25:31 db1 multipathd: backup2: remaining active paths: 3
Dec 30 05:25:37 db1 multipathd: backup2: sdcr - emc_clariion_checker: Active path is healthy.
Dec 30 05:25:37 db1 multipathd: 69:240: reinstated
Dec 30 05:25:37 db1 multipathd: backup2: remaining active paths: 4
Dec 30 05:25:37 db1 kernel: sd 5:0:3:2: emc: ALUA failover mode detected
Dec 30 05:25:37 db1 kernel: sd 5:0:3:2: emc: at SP A Port 5 (owned, default SP A)
Dec 30 05:26:03 db1 kernel: qla2xxx [0000:05:00.1]-801c:5: Abort command issued nexus=5:3:4 --  1 2002.
Dec 30 06:03:35 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 06:15:23 db1 multipathd: backup3: sdcq - emc_clariion_checker: Read error for WWN 600601608b203300d563752524c1e611.  Sense data are 0x0/0x0/0x0.
Dec 30 06:15:23 db1 kernel: qla2xxx [0000:05:00.1]-801c:5: Abort command issued nexus=5:3:1 --  1 2002.
Dec 30 06:15:23 db1 kernel: device-mapper: multipath: Failing path 69:224.
Dec 30 06:15:23 db1 multipathd: checker failed path 69:224 in map backup3
Dec 30 06:15:23 db1 multipathd: backup3: remaining active paths: 3
Dec 30 06:15:28 db1 multipathd: backup3: sdcq - emc_clariion_checker: Active path is healthy.
Dec 30 06:15:28 db1 multipathd: 69:224: reinstated
Dec 30 06:15:28 db1 multipathd: backup3: remaining active paths: 4
Dec 30 06:15:28 db1 kernel: sd 5:0:3:1: emc: ALUA failover mode detected
Dec 30 06:15:28 db1 kernel: sd 5:0:3:1: emc: at SP A Port 5 (owned, default SP A)
Dec 30 06:59:29 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 07:53:22 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 07:55:11 db1 multipathd: sdct: couldn't get asymmetric access state
Dec 30 07:55:11 db1 multipathd: backup4: load table [0 2147483648 multipath 2 queue_if_no_path retain_attached_hw_handler 1 emc 2 1 round-robin 0 2 1 70:16 1 66:240 1 round-robin 0 2 1 8:64 1 67:80 1]
Dec 30 07:55:11 db1 kernel: sd 5:0:3:4: emc: ALUA failover mode detected
Dec 30 07:55:11 db1 kernel: sd 5:0:3:4: emc: at SP A Port 5 (owned, default SP A)
Dec 30 07:55:11 db1 kernel: sd 4:0:3:4: emc: ALUA failover mode detected
Dec 30 07:55:11 db1 kernel: sd 4:0:3:4: emc: at SP A Port 4 (owned, default SP A)
Dec 30 07:55:35 db1 multipathd: backup2: sdcr - emc_clariion_checker: Read error for WWN 600601608b203300d663752524c1e611.  Sense data are 0x0/0x0/0x0.
Dec 30 07:55:35 db1 multipathd: checker failed path 69:240 in map backup2
Dec 30 07:55:35 db1 multipathd: backup2: remaining active paths: 3
Dec 30 07:55:35 db1 kernel: device-mapper: multipath: Failing path 69:240.
Dec 30 07:55:40 db1 multipathd: backup2: sdcr - emc_clariion_checker: Active path is healthy.
Dec 30 07:55:40 db1 multipathd: 69:240: reinstated
Dec 30 07:55:40 db1 multipathd: backup2: remaining active paths: 4
Dec 30 07:55:40 db1 kernel: sd 5:0:3:2: emc: ALUA failover mode detected
Dec 30 07:55:40 db1 kernel: sd 5:0:3:2: emc: at SP A Port 5 (owned, default SP A)
Dec 30 08:39:47 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 08:43:36 db1 multipathd: mpathb: load table [0 20971520 multipath 2 queue_if_no_path retain_attached_hw_handler 1 emc 2 1 round-robin 0 2 1 69:208 1 66:176 1 round-robin 0 2 1 8:0 1 67:16 1]
Dec 30 08:43:36 db1 kernel: sd 5:0:3:0: emc: ALUA failover mode detected
Dec 30 08:43:36 db1 kernel: sd 5:0:3:0: emc: at SP A Port 5 (owned, default SP A)
Dec 30 08:43:36 db1 kernel: sd 4:0:3:0: emc: ALUA failover mode detected
Dec 30 08:43:36 db1 kernel: sd 4:0:3:0: emc: at SP A Port 4 (owned, default SP A)
Dec 30 09:24:04 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 10:13:09 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 11:06:07 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 12:07:36 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 13:08:58 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 14:00:19 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 14:52:20 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 15:40:45 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 16:34:38 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 17:09:56 db1 auditd[15975]: Audit daemon rotating log files
Dec 30 17:38:16 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 18:59:38 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
Dec 30 19:54:43 db1 CLSD: The clock on host db1 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.

經(jīng)查看集群相關(guān)日志可以確定，由于存儲(chǔ)磁盤出現(xiàn)IO問(wèn)題（或光線閃斷、或IO延遲），導(dǎo)致集群CRS異常宕機(jī)。但是，比較奇怪的是，雖然CSR掉線了，ASM實(shí)例和DB實(shí)例卻好好的，還可以正常使用。查詢oracle support發(fā)現(xiàn)一篇文章1581864.1?提到ASM CRS仲裁盤訪問(wèn)超時(shí)與隱藏參數(shù)_asm_hbeatiowait有關(guān)系，而ASM的隱藏參數(shù)_asm_hbeatiowait由于操作系統(tǒng)多路徑Multipath配置的polling_interval有關(guān)，具體的故障原因是操作系統(tǒng)盤的判斷訪問(wèn)超時(shí)遠(yuǎn)大于數(shù)據(jù)庫(kù)ASM仲裁盤訪問(wèn)超時(shí)，導(dǎo)致ORACLE RAC判定ASM中仲裁盤無(wú)法訪問(wèn)從而將仲裁盤強(qiáng)制Offline。解決的思路是：首先，確定操作系統(tǒng)polling_interval參數(shù)與數(shù)據(jù)庫(kù)ASM隱藏參數(shù)值_asm_hbeatiowait，將_asm_hbeatiowait的值調(diào)整到比polling_interval值大即可。
下面是具體的解決操作:
1、查看數(shù)據(jù)庫(kù)RAC ASM的_asm_hbeatiowait值（默認(rèn)是15秒）：

SQL> col ksppinm for a20
SQL> col ksppstvl for a40
SQL> col ksppdesc for a80
SQL> SELECT   ksppinm, ksppstvl, ksppdesc
  2     FROM   x$ksppi x, x$ksppcv y
  3    WHERE   x.indx = y.indx AND  ksppinm = '_asm_hbeatiowait';
KSPPINM              KSPPSTVL                                 KSPPDESC
-------------------- ---------------------------------------- --------------------------------------------------------------------------------
_asm_hbeatiowait     15                                       number of secs to wait for PST Async Hbeat IO return

2、查看操作存儲(chǔ)盤訪問(wèn)超時(shí)時(shí)間（Oracle Linux 6.7默認(rèn)是30秒）

[root@db1 ~]# cat /sys/block/sdb/device/timeout 
30
[root@db2 ~]# cat /sys/block/sdb/device/timeout 
30

3、將_asm_hbeatiowait 的值調(diào)整為45秒（該參數(shù)是靜態(tài)參數(shù)，需要重啟集群）

SQL>  alter system set "_asm_hbeatiowait"=45 scope=spfile sid='*';
System altered.

4.重啟集群

5.檢查集群狀態(tài)

[grid@db1 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.ARCH.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.CWDATA.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.DADATA.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.DATA.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.LISTENER.lsnr
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.OCR.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.SBKDATA.dg
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.asm
               ONLINE  ONLINE       db1                      Started             
               ONLINE  ONLINE       db2                      Started             
ora.gsd
               OFFLINE OFFLINE      db1                                          
               OFFLINE OFFLINE      db2                                          
ora.net1.network
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
ora.ons
               ONLINE  ONLINE       db1                                          
               ONLINE  ONLINE       db2                                          
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       db2                                          
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       db1                                          
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       db1                                          
ora.caiwu.db
      1        ONLINE  ONLINE       db1                      Open                
      2        ONLINE  ONLINE       db2                      Open                
ora.chdyl.db
      1        ONLINE  ONLINE       db1                      Open                
      2        ONLINE  ONLINE       db2                      Open                
ora.cvu
      1        ONLINE  ONLINE       db1                                          
ora.dadb.db
      1        ONLINE  ONLINE       db1                      Open                
      2        ONLINE  ONLINE       db2                      Open                
ora.db1.vip
      1        ONLINE  ONLINE       db1                                          
ora.db2.vip
      1        ONLINE  ONLINE       db2                                          
ora.oc4j
      1        ONLINE  ONLINE       db1                                          
ora.rlzy.db
      1        ONLINE  ONLINE       db1                      Open                
      2        ONLINE  ONLINE       db2                      Open                
ora.scan1.vip
      1        ONLINE  ONLINE       db2                                          
ora.scan2.vip
      1        ONLINE  ONLINE       db1                                          
ora.scan3.vip
      1        ONLINE  ONLINE       db1

處理完成。

感謝各位的閱讀，以上就是“Oracle Linux 6.7中 Oracle 11.2.0.4 RAC集群CRS異常處理方法是什么”的內(nèi)容了，經(jīng)過(guò)本文的學(xué)習(xí)后，相信大家對(duì)Oracle Linux 6.7中 Oracle 11.2.0.4 RAC集群CRS異常處理方法是什么這一問(wèn)題有了更深刻的體會(huì)，具體使用情況還需要大家實(shí)踐驗(yàn)證。這里是億速云，小編將為大家推送更多相關(guān)知識(shí)點(diǎn)的文章，歡迎關(guān)注！

向AI問(wèn)一下細(xì)節(jié)

Oracle Linux 6.7中 Oracle 11.2.0.4 RAC集群CRS異常處理方法是什么

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標(biāo)簽