溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點擊 登錄注冊 即表示同意《億速云用戶服務條款》

數據庫中如何利用systemstate dump trace文件尋找dg terminated原因

發(fā)布時間:2021-11-09 14:32:59 來源:億速云 閱讀:140 作者:小新 欄目:關系型數據庫

這篇文章主要介紹了數據庫中如何利用systemstate dump trace文件尋找dg terminated原因,具有一定借鑒價值,感興趣的朋友可以參考下,希望大家閱讀完這篇文章之后大有收獲,下面讓小編帶著大家一起了解一下。

收到告警,一臺dataguard宕機了。軟件版本為11.2.0.4.登錄上來首先查看alert文件.

Fri Jan 25   11:00:39 2019

System   State dumped to trace file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_ora_42298.trc

Fri Jan 25   11:00:54 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_ora_42298.trc:

Fri Jan 25   11:00:58 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:01:19 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170: deadlocked   on readable physical standby (undo segment 65535)

Fri Jan 25   11:01:39 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:01:59 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:02:19 2019

Errors in file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:02:40 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:03:00 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:03:21 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:03:41 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:04:01 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:04:21 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:04:41 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:05:01 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:05:21 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:05:27 2019

RFS[3]: Selected   log 9 for thread 1 sequence 85314 dbid -1036715693 branch 899829907

Fri Jan 25   11:05:27 2019

Media Recovery   Waiting for thread 1 sequence 85314 (in transit)

Recovery of   Online Redo Log: Thread 1 Group 9 Seq 85314 Reading mem 0

  Mem# 0:   /san/data/oradata/EIN1/standby_redo03.log

Fri Jan 25   11:05:27 2019

Archived Log   entry 165577 added for thread 1 sequence 85313 ID 0xc2352453 dest 1:

Fri Jan 25 11:05:41   2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:06:01 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:06:21 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:06:41 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_pmon_56104.trc:

ORA-03170:   deadlocked on readable physical standby (undo segment 65535)

Fri Jan 25   11:06:45 2019

Errors in file   /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_lgwr_56177.trc:

ORA-04020:   deadlock detected while trying to lock object SYS.ERAC

LGWR (ospid:   56177): terminating the instance due to error 4020

Fri Jan 25   11:06:46 2019

System state dump   requested by (instance=1, osid=56177 (LGWR)), summary=[abnormal instance   termination].

System State   dumped to trace file /u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_diag_56159_20190125110646.trc

 

可以看到持續(xù)大量的ORA-03170死鎖告警,11:06:46分的時候實例被終止。在貼出的alert日志開頭生成了一個trace文件,首先分析一下這個文件。

Vi 編輯/u01/app/oracle/diag/rdbms/eracpdg01/ERAC/trace/ERAC_ora_42298.trc文件,搜索關鍵字final,定位到下面這段:

      SO:   0xa1cccae30, type: 4, owner: 0xa0c7c8da8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3

     proc=0xa0c7c8da8, name=session,   file=ksu.h LINE:12729, pg=0

    (session) sid: 2228 ser: 9 trans: (nil),   creator: 0xa0c7c8da8

              flags: (0x45) USR/- flags_idl:   (0x1) BSY/-/-/-/-/-

              flags2: (0x40009) -/-/INC

              DID: , short-term DID:

              txn branch: (nil)

              edition#: 100              oct: 3, prv: 0, sql:   0xa33cf5e08, psql: 0xa3e7cdec0, user: 627/CONVERT_R

    ksuxds FALSE at location: 0

    service name: SYS$USERS

    client details:

      O/S info: user: EM-H6VQXD2$, term:   unknown, ospid: 1234

      machine: EM-H6VQXD2 program: JDBC Thin   Client

      application name: JDBC Thin Client,   hash value=2546894660

    Current Wait Stack:

     0: waiting for 'library cache lock'

        handle address=0xa3ff9eb18, lock   address=0x50de690a0, 100*mode+namespace=0x1004a0002

        wait_id=945878 seq_num=28394   snap_id=1

        wait times: snap=2 min 53 sec, exc=2   min 53 sec, total=2 min 53 sec

        wait times: max=15 min 0 sec, heur=2   min 53 sec

        wait counts: calls=60 os=60

        in_wait=1 iflags=0x15a2

There is at least one session blocking this session.

      Dumping 1   direct blocker(s):

        inst:   1, sid: 970, ser: 1

      Dumping   final blocker:

        inst:   1, sid: 2059, ser: 26985

    Wait State:

        fixed_waits=0 flags=0x22 boundary=(nil)/-1

    Session   Wait History:

        elapsed   time of 0.000134 sec since current wait

     0: waited   for 'SQL*Net message from client'

        driver id=0x54435000, #bytes=0x1, =0x0

          wait_id=945877 seq_num=28393 snap_id=1

        wait   times: snap=3.470941 sec, exc=3.470941 sec, total=3.470941 sec

        wait   times: max=infinite

        wait   counts: calls=0 os=0

          occurred after 0.000001 sec of elapsed time

     1: waited   for 'SQL*Net message to client'

        driver   id=0x54435000, #bytes=0x1, =0x0

          wait_id=945876 seq_num=28392 snap_id=1

        wait   times: snap=0.000001 sec, exc=0.000001 sec, total=0.000001 sec

        wait   times: max=infinite

        wait   counts: calls=0 os=0

          occurred after 0.000048 sec of elapsed time

可以看到最源頭的blocker是sid: 2059, ser: 26985這個session。

這里的SO是0xa1cccae30, type: 4說明是會話信息, owner: 0xa0c7c8da8是這個SO(0xa1cccae30)的父節(jié)點。

搜索這個位置所在的PROCESS進程,發(fā)現正是LGWR進程

PROCESS 17: LGWR

然后直接搜索sid: 2059 ser: 26985

      SO: 0xa18befcd0, type:   4, owner: 0xa18786ed8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3

     proc=0xa18786ed8, name=session,   file=ksu.h LINE:12729, pg=0

    (session) sid: 2059 ser: 26985 trans:   0xa1b361290, creator: 0xa18786ed8

              flags: (0x41) USR/- flags_idl:   (0x1) BSY/-/-/-/-/-

              flags2: (0x40009) -/-/INC

              DID: , short-term DID:

              txn branch: 0xa0f20f998

              edition#: 100              oct: 3, prv: 0, sql:   0xa345bb8d0, psql: 0xa2fe3c0a8, user: 510/READER

    ksuxds FALSE at location: 0

    service name: SYS$USERS

    client details:

      O/S info: user: Administrator, term:   ADMIN-PC, ospid: 8960:8832

      machine: WORKGROUP\ADMIN-PC program:   plsqldev.exe

      application name: PL/SQL Developer,   hash value=1190136663

      action name: SQL 窗口 - ISIN香港凈值.sql, hash value=506963986

    Current Wait Stack:

     0: waiting for 'SQL*Net message from dblink'

        driver id=0x28444553, #bytes=0x1,   =0x0

        wait_id=474 seq_num=475 snap_id=1

        wait times: snap=4 min 45 sec, exc=4   min 45 sec, total=4 min 45 sec

        wait times: max=infinite, heur=4 min   45 sec

        wait counts: calls=0 os=0

        in_wait=1 iflags=0x5a0

    There are 28 sessions blocked by this   session.

可以看到是reader用戶從一臺Windows的PL/SQL developer發(fā)起的一段sql文本導致的。而且這段sql使用了db_link。已經持續(xù)了4分45秒,接近5分鐘的時間。該會話阻塞了28個會話。

目前為止完整的阻塞情況可能并沒有全部得到,但是已經得到了LGWR進程被一個PL/SQL developer會話所阻塞的結論,在sql文本執(zhí)行了接近五分鐘的時候,實例被LGWR終止。

這里為什么是五分鐘呢?這是因為之前發(fā)生過的其他故障,那時修改了一個隱含參數_adg_parselock_timeout

SQL> col name   for a30

SQL> col value   for a20

SQL> col   description for a70

SQL> set line   150

SQL> select   a.ksppinm name,b.ksppstvl value,a.ksppdesc description

  2        from x$ksppi a,x$ksppcv b

  3       where a.inst_id = USERENV ('Instance')

  4         and b.inst_id = USERENV ('Instance')

  5         and a.indx = b.indx

  6         and upper(a.ksppinm) LIKE upper('%adg_parselock_timeout%')

  7         order by name;

 

NAME                           VALUE                DESCRIPTION

------------------------------   --------------------   ----------------------------------------------------------------------

_adg_parselock_timeout         30000                timeout for parselock get on   ADG in centiseconds

_adg_parselock_timeout_sleep   100                  sleep duration after a   parselock timeout on ADG in milliseconds

 

SQL> col name   for a30;

SQL> col value   for a10;

SQL> select *   from

  2    (select

  3    x.ksppinm name,

  4    y.ksppstvl value,

  5    y.ksppstdf isdefault,

  6    decode(bitand(y.ksppstvf,7),1,'MODIFIED',4,'SYSTEM_MOD','FALSE')   ismod,

  7    decode(bitand(y.ksppstvf,2),2,'TRUE','FALSE') isadj

  8    from

  9    sys.x$ksppi x,

 10    sys.x$ksppcv y

 11    where

 12    x.inst_id = userenv('Instance') and

 13    y.inst_id = userenv('Instance') and

 14    x.indx = y.indx

 15    order by

 16    translate(x.ksppinm, ' _', ' ')) T where T.name like   '%adg_parselock_timeout%';

 

NAME                           VALUE      ISDEFAULT                   ISMOD                          ISADJ

------------------------------   ---------- --------------------------- ------------------------------   ---------------

_adg_parselock_timeout         30000      FALSE                       FALSE                          FALSE

_adg_parselock_timeout_sleep   100          TRUE                        FALSE                          FALSE

_adg_parselock_timeout該值設置為了30000厘秒,約等于300秒,也就是五分鐘。

搜索MOS后確認為BUG

Bug 18515268 ORA-4020 in ADG Standby Database causing   instance crash by LGWR

Solution

Apply interim patch 18515268, if available for your platform and   Oracle version.

Fixed releases: 11.2.0.4.BP20 12.1.0.2 12.2.0.1

感謝你能夠認真閱讀完這篇文章,希望小編分享的“數據庫中如何利用systemstate dump trace文件尋找dg terminated原因”這篇文章對大家有幫助,同時也希望大家多多支持億速云,關注億速云行業(yè)資訊頻道,更多相關知識等著你來學習!

向AI問一下細節(jié)

免責聲明:本站發(fā)布的內容(圖片、視頻和文字)以原創(chuàng)、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。

AI