ora-00445/linux alsr

發(fā)布時間：2020-07-07 03:02:54 來源：網(wǎng)絡(luò) 閱讀：1694 作者：snowhill 欄目：關(guān)系型數(shù)據(jù)庫

一號節(jié)點(diǎn)：可以看出是8:51:36秒失去和二號節(jié)點(diǎn)的通信

2017-10-24 08:49:49.963 [CLSECHO(5883)]CRS-10001: 24-Oct-17 08:49 AFD-9204: false
2017-10-24 08:49:59.005 [CLSECHO(7786)]CRS-10001: 24-Oct-17 08:49 AFD-9204: false
2017-10-24 08:51:36.138 [OCSSD(271335)]CRS-1612: Network communication with node 12crac2 (2) missing for 50% of timeout interval.  Removal of this node from cluster in 14.050 seconds
2017-10-24 08:51:43.139 [OCSSD(271335)]CRS-1611: Network communication with node 12crac2 (2) missing for 75% of timeout interval.  Removal of this node from cluster in 7.050 seconds
2017-10-24 08:51:48.140 [OCSSD(271335)]CRS-1610: Network communication with node 12crac2 (2) missing for 90% of timeout interval.  Removal of this node from cluster in 2.050 seconds

二號節(jié)點(diǎn)：可以看出是在8:51:07無法檢測到資源policejx的狀態(tài)

2017-10-24 08:49:49.896 [CLSECHO(78374)]CRS-10001: 24-Oct-17 08:49 AFD-9204: false
2017-10-24 08:49:58.902 [CLSECHO(79205)]CRS-10001: 24-Oct-17 08:49 AFD-9204: false
2017-10-24 08:51:07.952 [ORAAGENT(281402)]CRS-5011: Check of resource "policejx" failed: details at "(:CLSN00007:)" in "/u01/app/grid/diag/crs/12crac2/crs/trace/crsd_oraagent_oracle.trc"
2017-10-24 08:51:10.064 [ORAAGENT(83682)]CRS-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 83682
2017-10-24 09:10:21.937 [OHASD(202039)]CRS-8500: Oracle Clusterware OHASD process is starting with operating system process ID 202039

查看crsd_oraagent_oracle.trc:在8:51:08秒檢查到失到到本地實(shí)例的連接

2017-10-24 08:51:07.955381 :   AGENT:3816797952: {0:13:2033} Agfw calling user exitCB, will exit on return
2017-10-24 08:51:07.955390 :   AGENT:3816797952: {0:13:2033} returned from user exitCB, exiting
2017-10-24 08:51:07.955455 :    AGFW:3816797952: {0:13:2033} Agent is exiting with exit code: 1
2017-10-24 08:51:08.006860 : USRTHRD:3812595456: {0:13:2} ORA-01092: ORACLE instance terminated. Disconnection forced
Process ID: 0
Session ID: 4739 Serial number: 3830

查看：ohasd_oraagent_grid.trc在8:51:09判定為實(shí)例宕掉；

2017-10-24 08:50:48.592645 :CLSDYNAM:1666320128: [ora.gipcd]{0:0:2} [check] ClsdmClient::sendMessage clsdmc_respget return: status=0, ecode=0
2017-10-24 08:51:09.396880 : USRTHRD:2702620416:  Usrco UsrcoEventForwarder::postMyEvent posting event "INSTANCE VERSION=1.0 service=policejx database=policejx instance=policejx_1 host=12crac2 status=down reason=FAILURE timestamp=2017-10-24 08:51:07 timezone=+08:00 db_domain= "
2017-10-24 08:51:09.407933 : USRTHRD:2702620416:  clsnUsrco: path=/u01/app/12.1.0/grid/racg/usrco/

查看:crsd_oraagent_oracle.trc 發(fā)現(xiàn)為連不上實(shí)例

2017-10-24 08:51:07.952430 :CLSDYNAM:3797776128: [ora.policejx.db]{0:13:2} [check] DbAgent:checkCbk shutdown reset s_PDBStatusMap
2017-10-24 08:51:07.952493 :CLSDYNAM:3797776128: [ora.policejx.db]{0:13:2} [check] InstAgent::checkState db/asm 2clsagfw_res_status 5 poolState 2
2017-10-24 08:51:07.952606 : USRTHRD:3797776128: {0:13:2} Gimh::destructor gimh_dest_query_ctx rc=0
2017-10-24 08:51:07.952802 : USRTHRD:3797776128: {0:13:2} Gimh::destructor gimh_dest_inst_ctx rc=0
2017-10-24 08:51:07.952827 :CLSDYNAM:3797776128: [ora.policejx.db]{0:13:2} [check] ConnectionPool::stopConnection
2017-10-24 08:51:07.952850 :CLSDYNAM:3797776128: [ora.policejx.db]{0:13:2} [check] ConnectionPool::removeConnection connection count 0

查看alert_poliejx1.log：在8:41:09了現(xiàn)M000進(jìn)程的dump，8:43:27出現(xiàn)進(jìn)程的異常dump,8:43分MMON進(jìn)程異常

Auto-tuning: Starting background process GTXi
Tue Oct 24 08:41:09 2017
Dumping diagnostic data in directory=[cdmp_20171024084109], requested by (instance=2, osid=1165285 (M000)), summary=[incident=643254].
Tue Oct 24 08:43:27 2017
Tue Oct 24 08:43:27 2017
System State dumped to trace file /u01/app/oracle/diag/rdbms/policejx/policejx_1/trace/policejx_1_ora_784542.trc
System State dumped to trace file /u01/app/oracle/diag/rdbms/policejx/policejx_1/trace/policejx_1_ora_783891.trc
Tue Oct 24 08:43:27 2017
Warning: VKTM detected a time drift.
Time drifts can result in an unexpected behavior such as time-outs. Please check trace file for more details.
Tue Oct 24 08:43:35 2017
LMON (ospid: 646174) waits for event 'latch: enqueue hash chains' for 248 secs.
LMON (ospid: 646174) waits for latch 'enqueue hash chains' for 248 secs.
Tue Oct 24 08:43:44 2017
Errors in file /u01/app/oracle/diag/rdbms/policejx/policejx_1/trace/policejx_1_mmon_646328.trc  (incident=577523) (PDBNAME=CDB$ROOT):
ORA-00445: background process "m005" did not start after 120 seconds
Incident details in: /u01/app/oracle/diag/rdbms/policejx/policejx_1/incident/incdir_577523/policejx_1_mmon_646328_i577523.trc
Dumping diagnostic data in directory=[cdmp_20171024084803], requested by (instance=1, osid=646328 (MMON)), summary=[incident=577523].
Tue Oct 24 08:48:35 2017
Dumping diagnostic data in directory=[cdmp_20171024084313], requested by (instance=2, osid=1165285 (M000)), summary=[incident=643255].
Tue Oct 24 08:49:47 2017
Errors in file /u01/app/oracle/diag/rdbms/policejx/policejx_1/trace/policejx_1_ora_775205.trc  (incident=586003) (PDBNAME=POLICE):
ORA-00445: 后臺進(jìn)程 "PP9S" 在 120 秒之后仍沒有啟動
Incident details in: /u01/app/oracle/diag/rdbms/policejx/policejx_1/incident/incdir_586003/policejx_1_ora_775205_i586003.trc
Tue Oct 24 08:49:52 2017
Dumping diagnostic data in directory=[cdmp_20171024084952], requested by (instance=1, osid=775205), summary=[incident=586003].
Tue Oct 24 08:51:06 2017
Dumping diagnostic data in directory=[cdmp_20171024085106], requested by (instance=2, osid=1175981 (M004)), summary=[incident=649702].
Tue Oct 24 08:51:07 2017
DRM FREEZE TIMEOUT: kjfzpdrmfrz: ospid 775215 not frozen.
 Process waiting on 'SQL*Net message from client', 62 secs since wait started.
 Parallel DRM freeze timeout (70 secs) exceeded, terminating the instance.
 See /u01/app/oracle/diag/rdbms/policejx/policejx_1/trace/policejx_1_rmv6_646880.trc.
USER (ospid: 646880): terminating the instance due to error 481
Tue Oct 24 08:51:07 2017
DRM FREEZE TIMEOUT: kjfzpdrmfrz: ospid 770029 not frozen.
 Process waiting on 'gc buffer busy acquire', 76 secs since wait started.
 Parallel DRM freeze timeout (70 secs) exceeded, terminating the instance.
 See /u01/app/oracle/diag/rdbms/policejx/policejx_1/trace/policejx_1_rmv1_646886.trc.
Tue Oct 24 08:51:07 2017
opiodr aborting process unknown ospid (281446) as a result of ORA-1092
Tue Oct 24 08:51:08 2017

查看policejx_1_mmon_646328.trc

*** 2017-10-24 08:39:46.952
loadavg : 191.78 118.51 71.78
System user time: 0.17 sys time: 0.49 context switch: 88107
Memory (Avail / Total) = 658.31M / 2066865.29M
Swap (Avail / Total) = 127618.42M /  131072.00M
skgpgcmdout: read() for cmd /bin/ps -elf | /bin/egrep 'PID | 646150' | /bin/grep -v grep timed out after 15.000 seconds
skgpgcmdout: read() for cmd /bin/cat /proc/646150/task/646150/status timed out after 0.000 seconds
Short stack dump: 
current sql: <none>
Current Wait Stack:
 0: waiting for 'os thread creation'
    pname=0x4d303035, is_process=0x1, =0x0
    wait_id=815816 seq_num=29397 snap_id=1
    wait times: snap=1 min 21 sec, exc=1 min 21 sec, total=1 min 21 sec
    wait times: max=infinite, heur=1 min 21 sec
    wait counts: calls=0 os=0
    in_wait=1 iflags=0x5a0
    
    SO: 0xcd66cbfdd0, type: 4, owner: 0xccc59956a8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
     proc=0xccc59956a8, name=session, file=ksu.h LINE:13957, pg=0 conuid=1
    (session) sid: 9559 ser: 50701 trans: (nil), creator: 0xccc59956a8
              flags: (0x8000051) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
              flags2: (0x40409) -/-/INC
              DID: 0002-0051-0000000C, short-term DID: 0002-0051-0000000D
              txn branch: (nil)
              con_id/con_uid/con_name: 1/1/CDB$ROOT
              con_logonuid: 1 con_logonid: 1
              edition#: 133              user#/name: 0/SYS
              oct: 0, prv: 0, sql: (nil), psql: 0xc1befc1bc0
              stats: 0xcaffe51108, PX stats: 0xcebc230
    ksuxds FALSE at location: 0
    service name: SYS$BACKGROUND
    Current Wait Stack:
      Not in wait; last wait ended 2.865306 sec ago 
    Wait State:
      fixed_waits=0 flags=0x21 boundary=(nil)/-1
    Session Wait History:
        elapsed time of 2.865336 sec since last wait
     0: waited for 'latch: enqueue hash chains'
        address=0xcd689799d0, number=0x24, tries=0x0
        wait_id=1009016 seq_num=27384 snap_id=1
        wait times: snap=17.802151 sec, exc=17.802151 sec, total=17.802151 sec
        wait times: max=infinite
        wait counts: calls=0 os=0
        occurred after 1 min 52 sec of elapsed time
     1: waited for 'oracle thread bootstrap'
        pname=0x305531, =0x0, =0x0
        wait_id=1009010 seq_num=27383 snap_id=5
        wait times: snap=0.000000 sec, exc=1 min 16 sec, total=3 min 8 sec
        wait times: max=2 min 0 sec
        wait counts: calls=92 os=92
        occurred after 0.000000 sec of elapsed time
     2: waited for 'latch free'
        address=0x60013540, number=0x75, tries=0x0
        wait_id=1009015 seq_num=27382 snap_id=1
        wait times: snap=36.770088 sec, exc=36.770088 sec, total=36.770088 sec
        wait times: max=infinite
        wait counts: calls=0 os=0
        occurred after 0.000000 sec of elapsed time
     3: waited for 'oracle thread bootstrap'
        pname=0x305531, =0x0, =0x0
        wait_id=1009010 seq_num=27381 snap_id=4
        wait times: snap=0.000147 sec, exc=1 min 16 sec, total=2 min 31 sec
        wait times: max=2 min 0 sec
        wait counts: calls=92 os=92
        occurred after 0.000000 sec of elapsed time
     4: waited for 'process diagnostic dump'
        =0x0, =0x0, =0x0
        wait_id=1009014 seq_num=27380 snap_id=1
        wait times: snap=31.578425 sec, exc=31.578425 sec, total=31.578425 sec
        wait times: max=30.000000 sec
        wait counts: calls=0 os=0
        occurred after 0.000000 sec of elapsed time
     5: waited for 'oracle thread bootstrap'
        pname=0x305531, =0x0, =0x0
        wait_id=1009010 seq_num=27379 snap_id=3
        wait times: snap=0.110586 sec, exc=1 min 16 sec, total=2 min 0 sec
        wait times: max=2 min 0 sec
        wait counts: calls=92 os=92
        occurred after 0.000000 sec of elapsed time
     6: waited for 'process diagnostic dump'
        =0x0, =0x0, =0x0
        wait_id=1009013 seq_num=27378 snap_id=1
        wait times: snap=22.622172 sec, exc=22.622172 sec, total=22.622172 sec
        wait times: max=30.000000 sec
        wait counts: calls=0 os=0
        occurred after 0.000000 sec of elapsed time
     7: waited for 'oracle thread bootstrap'
        pname=0x305531, =0x0, =0x0
        wait_id=1009010 seq_num=27377 snap_id=2
        wait times: snap=16.169919 sec, exc=1 min 16 sec, total=1 min 37 sec
        wait times: max=2 min 0 sec
        wait counts: calls=91 os=91
        occurred after 0.000000 sec of elapsed time
     8: waited for 'process diagnostic dump'
        =0x0, =0x0, =0x0
        wait_id=1009011 seq_num=27376 snap_id=2
        wait times: snap=0.009738 sec, exc=20.768290 sec, total=20.834894 sec
        wait times: max=30.000000 sec
        wait counts: calls=0 os=0
        occurred after 0.000000 sec of elapsed time

查看AWR:

Snap Id	Snap Time	Sessions	Cursors/Session	Instances	CDB
Begin Snap:	1835	24-Oct-17 08:00:19	3611	19.2	2	YES
End Snap:	1836	24-Oct-17 09:00:56	4026	10.8	1	YES
Elapsed:		60.62 (mins)
DB Time:		12,323.11 (mins)

CPU負(fù)載在80%左右；

根據(jù)AWR提供的load

Operating System Statistics - Detail

Snap Time	Load	%busy	%user	%sys	%idle	%iowait
24-Oct 10:00:20	26.19
24-Oct 11:00:02	66.72	12.96	11.71	1.18	87.04	0.23
24-Oct 12:00:17	454.69	88.22	86.37	1.72	11.78	0.06
24-Oct 13:00:29	384.12	97.53	95.75	1.61	2.47	0.01
24-Oct 14:00:14	33.43	74.91	73.03	1.72	25.09	0.09
24-Oct 15:00:01	153.89	33.61	32.07	1.44	66.39	0.15
24-Oct 16:00:19	128.44	62.83	61.10	1.62	37.17	0.11
24-Oct 17:00:20	313.98	84.20	82.17	1.88	15.80	0.09
24-Oct 18:00:14	158.36	80.30	78.88	1.29	19.70	0.10

機(jī)器負(fù)載相當(dāng)高；

當(dāng)時了現(xiàn)問題，節(jié)點(diǎn)2的物理內(nèi)存已使用完，開始用虛擬內(nèi)存，同時CPU的負(fù)載也是居高不下；由于oracle無法產(chǎn)生的新的進(jìn)程，并在此時進(jìn)行了DRM操作，導(dǎo)致節(jié)點(diǎn)2重啟；建議優(yōu)化辦法：

1 關(guān)閉linux aslr隨機(jī)特性

ORA-00445: Background Process "xxxx" Did Not Start After 120 Seconds (文檔 ID 1345364.1)

add/modify this parameter in /etc/sysctl.conf
kernel.randomize_va_space=0
kernel.exec-shield=0

2 關(guān)閉transparent hugepages (DOCID 1557478.1)
RHEL 6.X
vi /etc/rc.local增加：

if test -f /sys/kernel/mm/transparent_hugepage/enabled; then echo never > /sys/kernel/mm/transparent_hugepage/enabled fi if test -f /sys/kernel/mm/transparent_hugepage/defrag; then echo never > /sys/kernel/mm/transparent_hugepage/defrag fi

檢查：

grep -e AnonHugePages /proc/*/smaps | awk '{ if($2>4) print $0} ' | awk -F "/" '{print $0; system("ps -fp " $3)} '

3　啟用hugepage linux 6.X默認(rèn)的內(nèi)存頁面管理為4K，建議啟用2MB的大頁管理（文檔ID361468.1)

相關(guān)參數(shù)：

vi /etc/security/limits.conf

  /etc/security/limits.conf file. Set the value (in KB) slightly smaller than installed RAM. e.g. If you have 64GB RAM installed, you may set:
*   soft   memlock    60397977
*   hard   memlock    60397977

檢測：ulimit -l

啟用：根據(jù)SGA/頁面大小=vm.nr_hugepages

設(shè)置vi /etc/sysctl.conf,增加vm.nr_hugepages=XXXXXX

參考計算腳本：hugepage_settting.sh

相關(guān)參考：http://www.oracle.com/technetwork/cn/articles/servers-storage-dev/hugepages-2099009-zhs.html

4 關(guān)閉DRM

_gc_policy_time=0

_gc_undo_policy=FALSE

這2個參數(shù)是靜態(tài)參數(shù)，必須要重啟實(shí)例才能生效。

或者設(shè)置成超長時間：

_gc_policy_limit=250

_gc_policy_minimum=10485760

DRM的步驟：

1． Oracle停止所有在需要進(jìn)行remastering的buffer上的操作。注意：DRM是漸進(jìn)的，也就是說以windows 為單位，每次對一部分的buffer 進(jìn)行remastering 操作。
2． Lmon 通知所有實(shí)例，準(zhǔn)備進(jìn)行remastering
3． 在舊的master實(shí)例清除對應(yīng)buffer的master信息
4． 將master信息傳遞給新的master實(shí)例
5． 在新的master實(shí)例構(gòu)建資源的最新狀態(tài)
6． 結(jié)束，并釋放所有之前所有步驟占用的資源。

_gc_affinity_time ：單位為分鐘，控制DRM統(tǒng)計實(shí)例訪問buffer次數(shù)的時間間隔，默認(rèn)為是10分鐘。

_gc_affinity_ratio：控制進(jìn)行remastering所需要達(dá)到的最小比例（閥值），默認(rèn)為50。也就是說，如果某個實(shí)例在10分鐘（_gc_policy_time）之內(nèi)，訪問某個數(shù)據(jù)庫對象的次數(shù)大于其他所有實(shí)例50倍時(注意：是50倍，而不是50次)，對該數(shù)據(jù)庫對象的buffer進(jìn)行remastering。

注意：10g和11g的不同，11g里改為_gc_affinity_limit改名為_gc_policy_limit；_gc_affinity_time改名為_gc_policy_time；_gc_affinity_minimun改名為_gc_policy_minimum

診斷DRM:

"gcs drm freeze in enter server mode" 等待事件：

Script to Collect DRM Information (drmdiag.sql) (文檔 ID 1492990.1)

-- NAME: DRMDIAG.SQL
-- ------------------------------------------------------------------------
-- AUTHOR: Michael Polaski - Oracle Support Services
-- ------------------------------------------------------------------------
-- PURPOSE:
-- This script is intended to provide a user friendly guide to troubleshoot
-- drm (dynamic resource remastering) waits. The script will create a file
-- called drmdiag_<timestamp>.out in your local directory.set echo off
set feedback off
column timecol new_value timestamp
column spool_extension new_value suffix
select to_char(sysdate,'Mondd_hh34mi') timecol,
'.out' spool_extension from sys.dual;
column output new_value dbname
select value || '_' output
from v$parameter where name = 'db_name';
spool drmdiag_&&dbname&&timestamp&&suffix
set trim on
set trims on
set lines 140
set pages 100
set verify off
set feedback on
PROMPT DRMDIAG DATA FOR &&dbname&&timestamp
PROMPT Important paramenters:
PROMPT
PROMPT _gc_policy_minimum (default is 1500). Increasing this would cause DRMs to happen less frequently.
PROMPT Use the "OBJECT_POLICY_STATISTICS" section later in this report to see how active various objects are.
PROMPT
PROMPT _gc_policy_time (default to 10 (minutes)). Amount of time to evaluate policy stats. Use the
PROMPT "OBJECT_POLICY_STATISTICS" section later in this report to see how active various objects are for the
PROMPT _gc_policy_time. Usually not necessary to change this parameter.
PROMPT
PROMPT _gc_read_mostly_locking (default is TRUE). Setting this to FALSE would disable read mostly related DRMs.
PROMPT
PROMPT gcs_server_processes (default is derived from CPU count/4). May need to increase this above the
PROMPT default to add LMS processes to complte the work during a DRM but the default is usually adequate.
PROMPT
PROMPT _gc_element_percent (default is 110). May need to apply the fix for bug 14791477 and increase this to
PROMPT 140 if running out of lock elements. Usually not necessary to change this parameter.
PROMPT
PROMPT GC Related parameters set in this instance:
show parameter gc
PROMPT
PROMPT CPU count on this instance:
show parameter cpu_count
PROMPT
PROMPT SGA INFO FOR &&dbname&&timestamp
PROMPT
PROMPT Larger buffer caches (above 100 gig) may increase the cost of DRMs significantly.
set lines 120
set pages 100
column component format a40 tru
column current_size format 99999999999999999
column min_size format 99999999999999999
column max_size format 99999999999999999
column user_specified_size format 99999999999999999
select component, current_size, min_size, max_size, user_specified_size
from v$sga_dynamic_components
where current_size > 0;
PROMPT
PROMPT ASH THRESHOLD...
PROMPT
PROMPT This will be the threshold in milliseconds for total drm freeze
PROMPT times. This will be used for the next queries to look for the worst
PROMPT 'drm freeze' minutes. Any minutes that have an average log file
PROMPT sync time greater than the threshold will be analyzed further.
column threshold_in_ms new_value threshold format 999999999.999
select decode(min(threshold_in_ms),null,0,min(threshold_in_ms)) threshold_in_ms
from (select inst_id, to_char(sample_time,'Mondd_hh34mi') minute,
sum(time_waited)/1000 threshold_in_ms
from gv$active_session_history
where event like '%drm freeze%'
group by inst_id,to_char(sample_time,'Mondd_hh34mi')
order by 3 desc)
where rownum <= 10;
PROMPT
PROMPT ASH WORST MINUTES FOR DRM FREEZE WAITS:
PROMPT
PROMPT APPROACH: These are the minutes where the avg drm freeze time
PROMPT was the highest (in milliseconds).
column event format a30 tru
column program format a35 tru
column total_wait_time format 999999999999.999
column avg_time_waited format 999999999999.999
select to_char(sample_time,'Mondd_hh34mi') minute, inst_id, event,
sum(time_waited)/1000 TOTAL_WAIT_TIME , count(*) WAITS,
avg(time_waited)/1000 AVG_TIME_WAITED
from gv$active_session_history
where event like '%drm freeze%'
group by to_char(sample_time,'Mondd_hh34mi'), inst_id, event
having sum(time_waited)/1000 > &&threshold
order by 1,2;
PROMPT
PROMPT ASH DRM BACKGROUND PROCESS WAITS DURING WORST MINUTES:
PROMPT
PROMPT APPROACH: What are LMS and RMV doing when 'drm freeze' waits
PROMPT are happening? LMD and LMON info may also be relevant
column inst format 999
column minute format a12 tru
column event format a50 tru
column program format a55 wra
select to_char(sample_time,'Mondd_hh34mi') minute, inst_id inst,
sum(time_waited)/1000 TOTAL_WAIT_TIME , count(*) WAITS,
avg(time_waited)/1000 AVG_TIME_WAITED,
program, event
from gv$active_session_history
where to_char(sample_time,'Mondd_hh34mi') in (select to_char(sample_time,'Mondd_hh34mi')
from gv$active_session_history
where event like '%drm freeze%'
group by to_char(sample_time,'Mondd_hh34mi'), inst_id
having sum(time_waited)/1000 > &&threshold and sum(time_waited)/1000 > 0.5)
and (program like '%LMS%' or program like '%RMV%' or program like '%LMD%' or
program like '%LMON%' or event like '%drm freeze%')
group by to_char(sample_time,'Mondd_hh34mi'), inst_id, program, event
order by 1,2,3,5 desc, 4;
PROMPT
PROMPT POLICY HISTORY INFO:
PROMPT See if you can correlate policy history events with minutes of high
PROMPT wait time.
select * from gv$policy_history
order by event_date;
PROMPT
PROMPT DYNAMIC_REMASTER_STATS
PROMPT This shows where time is spent during DRM operations.
set heading off
set lines 60
select 'Instance: '||inst_id inst, 'Remaster Ops: '||remaster_ops rops,
'Remaster Time: '||remaster_time rtime, 'Remastered Objects: '||remastered_objects robjs,
'Quiesce Time: '||quiesce_time qtime, 'Freeze Time: '||freeze_time ftime,
'Cleanup Time: '||cleanup_time ctime, 'Replay Time: '||replay_time rptime,
'Fixwrite Time: '||fixwrite_time fwtime, 'Sync Time: '||sync_time stime,
'Resources Cleaned: '||resources_cleaned rclean,
'Replayed Locks Sent: '||replayed_locks_sent rlockss,
'Replayed Locks Received: '||replayed_locks_received rlocksr,
'Current Objects: '||current_objects
from gv$dynamic_remaster_stats
order by 1;
set lines 120
set heading on
PROMPT
PROMPT OBJECT_POLICY_STATISTICS:
PROMPT The sum of the last 3 columns (sopens,xopens,xfers) decides whether the object
PROMPT will be considered for DRM (_gc_policy_minimum). The duration of the stats
PROMPT are controlled by _gc_policy_time (default is 10 minutes).
select object,node,sopens,xopens,xfers from x$object_policy_statistics;
PROMPT
PROMPT ACTIVE OBJECTS (OBJECT_POLICY_STATISTICS)
PROMPT These are the objects that are above the default _gc_policy_minimum (1500).
select object, node, sopens+xopens+xfers activity
from x$object_policy_statistics
where sopens+xopens+xfers > 1500
order by 3 desc;
PROMPT LWM FOR LE FREELIST
PROMPT This number should never get near zero, if it does consider the fix for bug 14791477
PROMPT and/or increasing _gc_element_percent.
select sum(lwm) from x$kclfx;
PROMPT
PROMPT GCSPFMASTER INFO WITH OBJECT NAMES
column objname format a120 tru
select o.name || ' - '|| o.subname objname, o.type#, h.*
from v$gcspfmaster_info h, obj$ o where h.data_object_id=o.dataobj#
order by data_object_id;
PROMPT
PROMPT ASH DETAILS FOR WORST MINUTES:
PROMPT
PROMPT APPROACH: If you cannot determine the problem from the data
PROMPT above, you may need to look at the details of what each session
PROMPT is doing during each 'bad' snap. Most likely you will want to
PROMPT note the times of the high drm freeze waits, look at what
PROMPT LMS, RMV, LMD0, LMON is doing at those times, and go from there...
set lines 140
column program format a45 wra
column sample_time format a25 tru
column event format a30 tru
column time_waited format 999999.999
column p1 format a40 tru
column p2 format a40 tru
column p3 format a40 tru
select sample_time, inst_id inst, session_id, program, event, time_waited/1000 TIME_WAITED,
p1text||': '||p1 p1,p2text||': '||p2 p2,p3text||': '||p3 p3
from gv$active_session_history
where to_char(sample_time,'Mondd_hh34mi') in (select
to_char(sample_time,'Mondd_hh34mi')
from gv$active_session_history
where event like '%drm freeze%'
group by to_char(sample_time,'Mondd_hh34mi'), inst_id
having sum(time_waited)/1000 > &&threshold)
and time_waited > 0.5
order by 1,2,3,4,5;
spool off
PROMPT
PROMPT OUTPUT FILE IS: drmdiag_&&dbname&&timestamp&&suffix
PROMPT

相關(guān)文檔： 390483.1

5 后續(xù)建議部署細(xì)粒度的監(jiān)控工具如OSWATCHER/NMON之類的監(jiān)控，因?yàn)槌霈F(xiàn)問題時OS的進(jìn)程數(shù)已經(jīng)達(dá)到9900多個，建議監(jiān)控DB的連接數(shù)和OS的進(jìn)程數(shù)；

向AI問一下細(xì)節(jié)

ora-00445/linux alsr

Operating System Statistics - Detail

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標(biāo)簽