溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點擊 登錄注冊 即表示同意《億速云用戶服務(wù)條款》

Oracle 11.2.0.3數(shù)據(jù)庫CJQ進程造成row cache lock等待事件影響job無法停止怎么辦

發(fā)布時間:2021-11-11 14:17:51 來源:億速云 閱讀:183 作者:小新 欄目:關(guān)系型數(shù)據(jù)庫

小編給大家分享一下Oracle 11.2.0.3數(shù)據(jù)庫CJQ進程造成row cache lock等待事件影響job無法停止怎么辦,希望大家閱讀完這篇文章之后都有所收獲,下面讓我們一起去探討吧!

一, 現(xiàn)象描述

某生產(chǎn)系統(tǒng)應(yīng)用人員反映自己發(fā)起的job異常,沒有正常運行,導(dǎo)致相關(guān)表無數(shù)據(jù),自己嘗試停止job但未成功,也無法重新發(fā)起job。

二, 問題分析

job的登錄模式一般為’DBMS_SCHEDULER’,通過PL/SQL工具登錄運行如下語句進行查看

select S.STATUS,

       S.INST_ID,

       S.SID,

       S.PADDR,

       S.SQL_ID,

       S.SQL_EXEC_START,

       S.ACTION,

       S.LAST_CALL_ET,

       S.BLOCKING_INSTANCE,

       S.BLOCKING_SESSION,

       S.EVENT,

       S.WAIT_CLASS

  from v$session S

 where module = 'DBMS_SCHEDULER';
Oracle 11.2.0.3數(shù)據(jù)庫CJQ進程造成row cache lock等待事件影響job無法停止怎么辦

查詢到一個被kill的job,狀態(tài)為killed,并未釋放,通過下面的語句查詢到spid后,通過登錄后臺kill -9 的方式殺掉了系統(tǒng)進程。
Select p.spid from v$session s,v$processes p where p.addr=s.paddr and s.sid=508
之后讓應(yīng)用再次發(fā)起job,但job依然無法執(zhí)行,通過后臺發(fā)現(xiàn)該job發(fā)生了等待事件row cache lock,等待會話為1209,P1值為11,通過下面的語句查看無返回結(jié)果,不能定位row cache lock發(fā)生在哪里

Select * from v$rowcache where cache#=11;

Oracle 11.2.0.3數(shù)據(jù)庫CJQ進程造成row cache lock等待事件影響job無法停止怎么辦

查詢1209會話發(fā)現(xiàn)為系統(tǒng)后臺進程CJQ0, CJQn和Jnnn均為為oracle作業(yè)隊列進程,后臺進程為什么會鎖正常的job呢?

Select * from v$session where sid=1209;

Oracle 11.2.0.3數(shù)據(jù)庫CJQ進程造成row cache lock等待事件影響job無法停止怎么辦

應(yīng)用人員通過PL/SQL提供的JOB任務(wù)發(fā)現(xiàn)存在如下異?,F(xiàn)象,

Oracle 11.2.0.3數(shù)據(jù)庫CJQ進程造成row cache lock等待事件影響job無法停止怎么辦

(上面的截圖缺少了任務(wù)CSH_BRANCH_D_1DAY,因為截取的是已經(jīng)處理后的圖像,其實在出現(xiàn)問的時候是有的,只不過忘了截取了)

當(dāng)前運行很多job,但沒有都沒有session_id,感覺像是僵尸job,既然無法通過會話殺掉job,
我們嘗試通過dbms_scheduler包進行停止,但無法停止,而且依然出現(xiàn)了row cache lock事件,依然被1209后臺進程鎖住。

Oracle 11.2.0.3數(shù)據(jù)庫CJQ進程造成row cache lock等待事件影響job無法停止怎么辦

Oracle 11.2.0.3數(shù)據(jù)庫CJQ進程造成row cache lock等待事件影響job無法停止怎么辦

Row cache lock無法通過P1定位到具體的鎖對象,后臺進程為什么會鎖住job呢,通過上面的結(jié)果初步分析是因為之前有很多僵死的job沒有被正常停止,導(dǎo)致再次發(fā)起同樣的job任務(wù)或?qū)υ搄ob進行處理就會出現(xiàn)鎖的情況,而且是被后臺進程CJQn鎖住。

SQL>Show parameter processes

通過查看參數(shù)job_queue_processes值為1000,aq_tm_processes為10000,并未達到峰值

進一步分析

登錄后臺通過DEBUG命令收集相關(guān)信息分析

用sysdba登錄到數(shù)據(jù)庫上:
$sqlplus / as sysdba
或者
$sqlplus -prelim / as sysdba <==當(dāng)數(shù)據(jù)庫已經(jīng)很慢或者hang到無法連接

以下為操作截屏,采用的hanganalyze 3,systemstate dump 258級別。

Oracle 11.2.0.3數(shù)據(jù)庫CJQ進程造成row cache lock等待事件影響job無法停止怎么辦

相關(guān)說明:

Level 的含義:

1-2:只有hanganalyze輸出,不dump任何進程
3:Level2+Dump出在IN_HANG狀態(tài)的進程
4:Level3+Dump出在等待鏈里面的blockers(狀態(tài)為LEAF/LEAF_NW/IGN_DMP)
5:Level4+Dump出所有在等待鏈中的進程(狀態(tài)為NLEAF)
Oracle官方建議不要超過level 3,一般level 3也能夠解決問題,超過level 3會給系統(tǒng)帶來額外負擔(dān)。

systemstate dump有多個級別:
2:  dump (不包括lock element)

10:  dump
11:  dump + global cache of RAC
256: short stack (函數(shù)堆棧)
258: 256+2   -->short stack +dump(不包括lock element)
266: 256+10 -->short stack+ dump
267: 256+11 -->short stack+ dump + global cache of RAC

level 11和 267會 dump global cache, 會生成較大的trace 文件,一般情況下不推薦。
    一般情況下,如果進程不是太多,推薦用266,因為這樣可以dump出來進程的函數(shù)堆棧,可以用來分析進程在執(zhí)行什么操作。
但是生成short stack比較耗時,如果進程非常多,比如2000個進程,那么可能耗時30分鐘以上。這種情況下,可以生成level 10 或者 level 258, level 258 比 level 10會多收集short short stack, 但比level 10少收集一些lock element data.

以下為hanganalyze的截取輸出,可以看到主要等待為row cache lock,session id:7/porcess id:128被session id:1209/process id: 49鎖住,和上面的分析一致。

Chains most likely to have caused the hang:

 [a] Chain 1 Signature: 'rdbms ipc message'<='row cache lock'

     Chain 1 Signature Hash: 0xfe4c0898

Chain 1:

-------------------------------------------------------------------------------

    Oracle session identified by:

    {

                instance: 2 (cmbd.cmbd2)

                   os id: 43516004

              process id: 128, oracle@jkptdb2

              session id: 7

        session serial #: 47563

    }

    is waiting for 'row cache lock' with wait info:

    {

                      p1: 'cache id'=0xb

                      p2: 'mode'=0x0

                      p3: 'request'=0x5

            time in wait: 0.077179 sec

      heur. time in wait: 16 min 18 sec

           timeout after: 2.922821 sec

                 wait id: 13467

                blocking: 0 sessions

            wait history:

              * time between current wait and wait #1: 0.000045 sec

              1.       event: 'row cache lock'

                 time waited: 3.000020 sec

                     wait id: 13466           p1: 'cache id'=0xb

                                              p2: 'mode'=0x0

                                              p3: 'request'=0x5

              * time between wait #1 and #2: 0.000030 sec

              2.       event: 'row cache lock'

                 time waited: 3.000016 sec

                     wait id: 13465           p1: 'cache id'=0xb

                                              p2: 'mode'=0x0

                                              p3: 'request'=0x5

              * time between wait #2 and #3: 0.000032 sec

              3.       event: 'row cache lock'

                 time waited: 3.000016 sec

                     wait id: 13464           p1: 'cache id'=0xb

                                              p2: 'mode'=0x0

                                              p3: 'request'=0x5

    }

    and is blocked by

 => Oracle session identified by:

    {

                instance: 2 (cmbd.cmbd2)

                   os id: 19792004

              process id: 49, oracle@jkptdb2 (CJQ0)

              session id: 1209

        session serial #: 9

    }

    which is waiting for 'rdbms ipc message' with wait info:

    {

                      p1: 'timeout'=0x14

            time in wait: 0.168064 sec

      heur. time in wait: 2.171697 sec

           timeout after: 0.031936 sec

                 wait id: 72756966

                blocking: 1 session

            wait history:

              * time between current wait and wait #1: 0.000350 sec

              1.       event: 'rdbms ipc message'

                 time waited: 0.200014 sec

                     wait id: 72756965        p1: 'timeout'=0x14

              * time between wait #1 and #2: 0.000304 sec

              2.       event: 'rdbms ipc message'

                 time waited: 0.200016 sec

                     wait id: 72756964        p1: 'timeout'=0x14

              * time between wait #2 and #3: 0.000307 sec

              3.       event: 'rdbms ipc message'

                 time waited: 0.200041 sec

                     wait id: 72756963        p1: 'timeout'=0x14

    }

Chain 1 Signature: 'rdbms ipc message'<='row cache lock'

Chain 1 Signature Hash: 0xfe4c0898

以下為systemstate dump(部分截取信息)輸出如下:

上鎖進程(session id:129):

PROCESS 49: CJQ0

下面的描述大致為:系統(tǒng)進程49,也就是session id 129鎖住一個會話是sid:7,等待事件為row cache lock,和上面的分析吻合。

    There are 1 sessions blocked by this session.

    Dumping one waiter:

      inst: 2, sid: 7, ser: 47563

      wait event: 'row cache lock'

        p1: 'cache id'=0xb

        p2: 'mode'=0x0

        p3: 'request'=0x5

      row_wait_obj#: 520, block#: 11728, row#: 0, file# 1

      min_blocked_time: 0 secs, waiter_cache_ver: 21656

下面可以看到CJQ0進程持有S鎖,而對象正是=CSH_BRANCH_D_1DAY_JOB,也就是我們上面打算停止的job名稱。

kssdmh:  so 0x7000001b029ced8 (type 75, "row cache enqueues") may not have children

      ----------------------------------------

      SO: 0x7000001b02dd868, type: 75, owner: 0x700000191e0d808, flag: INIT/-/-/0x00 if: 0x1 c: 0x1

       proc=0x7000001aa6dc268, name=row cache enqueues, file=kqr.h LINE:2064 ID:, pg=0

      row cache enqueue: count=2 session=7000001adb58460 object=7000001810e7238, mode=S

      savepoint=0x351e5fa5

      row cache parent object: address=7000001810e7238 cid=11(dc_objects)

      hash=af8ef272 typ=21 transaction=0 flags=00000002

      objectno=352585 ownerid=57 nsp=1

      name=CSH_BRANCH_D_1DAY_JOB      own=7000001810e7300[7000001b02dd910,7000001b02dd910] wat=7000001810e7310[7000001b2c1bf00,7000001b2c1bf00] mode=S

      status=VALID/-/-/-/-/-/-/-/-

      request=N release=FALSE flags=8

      instance lock=QI af8ef272 41d157e6

      set=0, complete=FALSE

      data=

      00000039 00154353 485f4252 414e4348 5f445f31 4441595f 4a4f4200 00000000

      00000000 01000000 00000000 00000000 00000000 00000000 00000000 00000000

      00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

被鎖進程(session id:7)

PROCESS 128:

  ----------------------------------------

    Short stack dump:

ksedsts()+360<-ksdxfstk()+44<-ksdxcb()+3384<-sspuser()+116<-47dc<-sskgpwwait()+32<-skgpwwait()+180

<-ksliwat()+11032<-kslwaitctx()+180<-kqrigt()+1424<-kqrLockAndPinPo()+532<-kqrpre1()+864<-kqrpre()+28

<-jskaObjGet()+1024<-jskaJobRun()+2788<-jsiRunJob()+776<-jsaRunJob()+780<-spefcmpa()+600

<-spefmccallstd()+1340<-pextproc()+168<-peftrusted()+164<-psdexsp()+276<-rpiswu2()+480

<-psdextp()+632<-pefccal()+588<-pefcal()+260<-pevm_FCAL()+144<-pfrinstr_FCAL()+96

<-pfrrun_no_tool()+96<-pfrrun()+1032<-plsql_run()+716<-peicnt()+320<-kkxexe()+508

<-opiexe()+17168<-kpoal8()+1856<-opiodr()+720<-ttcpip()+1028<-opitsk()+1508<-opiino()+940

<-opiodr()+720<-opidrv()+1132<-sou2o()+136<-opimai_real()+608<-ssthrdmain()+268

<-main()+204<-__start()+112

當(dāng)前正在等待事件row cache lock

    Current Wait Stack:

     0: waiting for 'row cache lock'

        cache id=0xb, mode=0x0, request=0x5

        wait_id=13602 seq_num=13603 snap_id=1

        wait times: snap=0.003162 sec, exc=0.003162 sec, total=0.003162 sec

        wait times: max=3.000000 sec, heur=23 min 0 sec

        wait counts: calls=1 os=1

        in_wait=1 iflags=0x5a2

下面顯示了該進程在請求一個X鎖,請求的對象是正是CSH_BRANCH_D_1DAY_JOB和上面的分析是溫和的,CJQ進程對 CSH_BRANCH_D_1DAY_JOB持有一個S鎖,而當(dāng)前的進程需要對CSH_BRANCH_D_1DAY_JOB進程操作,需要一個X鎖。

SO: 0x7000001b2c1be58, type: 75, owner: 0x70000019c72b078, flag: INIT/-/-/0x00 if: 0x1 c: 0x1

         proc=0x7000001ab6cb410, name=row cache enqueues, file=kqr.h LINE:2064 ID:, pg=0

        row cache enqueue: count=1 session=7000001ab826ce0 object=7000001810e7238, request=X

        savepoint=0x893a

        row cache parent object: address=7000001810e7238 cid=11(dc_objects)

        hash=af8ef272 typ=21 transaction=0 flags=00000002

        objectno=352585 ownerid=57 nsp=1

        name=CSH_BRANCH_D_1DAY_JOB        own=7000001810e7300[7000001b02dd910,7000001b02dd910] wat=7000001810e7310[7000001b2c1bf00,7000001b2c1bf00] mode=S

        status=VALID/-/-/-/-/-/-/-/-

        request=N release=FALSE flags=8

        instance lock=QI af8ef272 41d157e6

        set=0, complete=FALSE

        data=

        00000039 00154353 485f4252 414e4348 5f445f31 4441595f 4a4f4200 00000000

        00000000 01000000 00000000 00000000 00000000 00000000 00000000 00000000

        011044c6 00000000 07000001 810e7300

下面記錄了操作語句:

begin sys.dbms_scheduler.run_job('CMBD.CSH_BRANCH_D_1DAY_JOB', false); end;

      ----------------------------------------

      SO: 0x70000015d5f35c8, type: 78, owner: 0x7000001ab826ce0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3

       proc=0x7000001ab6cb410, name=LIBRARY OBJECT LOCK, file=kgl.h LINE:8548 ID:, pg=0

      LibraryObjectLock:  Address=70000015d5f35c8 Handle=7000000ba27f998 Mode=N CanBeBrokenCount=1 Incarnation=1 ExecutionCount=1         

        User=7000001ab826ce0 Session=7000001ab826ce0 ReferenceCount=1 Flags=CNB/[0001] SavepointNum=5a52c6da

      LibraryHandle:  Address=7000000ba27f998 Hash=a53cfcdd LockMode=N PinMode=0 LoadLockMode=0 Status=VALD

        ObjectName:Name=begin sys.dbms_scheduler.run_job('CMBD.CSH_BRANCH_D_1DAY_JOB', false); end;


三,問題定位

通過相關(guān)關(guān)鍵字row cache lock cjq在metlink搜索查看,最終定位到了一個BUG16994952 –Unable to unscheduled propagation due to CJQ self-deadlock/hang。通過BUG的描述發(fā)現(xiàn)和上面的分析吻合。

Bug 16994952  Unable to unschedule propagation due to CJQ self-deadlock / hang 

 This note gives a brief overview of bug 16994952.
 The content was last updated on: 23-AUG-2015
 Click here for details of each of the sections below.

Affects:

Product (Component)

Oracle Server (Rdbms)

Range of versions believed to be affected

(Not specified)

Versions confirmed as being affected

· 11.2.0.3 

Platforms affected

Generic (all / most platforms affected)

Fixed:

The fix for 16994952 is first included in 

· 12.1.0.1 (Base Release) 


Interim patches may be available for earlier versions - click here to check. 

Symptoms:

Related To:

· Deadlock 

· Hang (Process Hang) 

· Waits for "row cache lock"

· Stack is likely to include jskaObjGet

· Advanced Queuing 

· Job Queues 

Description

Unable to unschedule propagation due to CJQ self-deadlock / hang.

Rediscovery Information:

 This bug may be the cause if the CJQ0 process is stuck in a

 self-deadlock waiting on "row cache lock" for a dc_objects

 entry in X mode for an object that it already holds in S mode.

 The stack is likely to include jskaObjGet

Workaround:

 Kill the hung CJQ process

HOOKS "WAITEVENT:row cache lock" STACKHAS:jskaObjGet WAITEVENT:row cache lock STACKHAS:jskaObjGet LIKELYAFFECTS XAFFECTS_11.2.0.1 XAFFECTS_V11020001 AFFECTS=11.2.0.1 XAFFECTS_11.2.0.2 XAFFECTS_V11020002 AFFECTS=11.2.0.2 XAFFECTS_11.2.0.3 XAFFECTS_V11020003 AFFECTS=11.2.0.3 XAFFECTS_11.2.0.4 XAFFECTS_V11020004 AFFECTS=11.2.0.4 XPRODID_5 PRODUCT_ID=5 PRODID-5 RDBMS XCOMP_RDBMS COMPONENT=RDBMS TAG_AQ TAG_DEADLOCK TAG_HANG TAG_JOBQ AQ DEADLOCK HANG JOBQ FIXED_12.1.0.1 SOLVED_SR=2 SOLVED_SR_BAND_2

Please note: The above is a summary description only. Actual symptoms can vary. Matching to any symptoms here does not confirm that you are encountering this problem. For questions about this bug please consult Oracle Support.

References

Bug:16994952 (This link will only work for PUBLISHED bugs)
Note:245840.1 Information on the sections in this article

四,問題處理

Kill the hung CJQ process,殺掉CJQ進程

SQL>Select spid from v$session s,v$process p where p.addr=s.paddr and s.sid=129;

Oracle 11.2.0.3數(shù)據(jù)庫CJQ進程造成row cache lock等待事件影響job無法停止怎么辦

再次通過dbms_scheduler包進行停止,可以正常停止,依次停止僵死的job后,應(yīng)用可以再次正常發(fā)起job,恢復(fù)正常。

Oracle 11.2.0.3數(shù)據(jù)庫CJQ進程造成row cache lock等待事件影響job無法停止怎么辦
Oracle 11.2.0.3數(shù)據(jù)庫CJQ進程造成row cache lock等待事件影響job無法停止怎么辦

看完了這篇文章,相信你對“Oracle 11.2.0.3數(shù)據(jù)庫CJQ進程造成row cache lock等待事件影響job無法停止怎么辦”有了一定的了解,如果想了解更多相關(guān)知識,歡迎關(guān)注億速云行業(yè)資訊頻道,感謝各位的閱讀!

向AI問一下細節(jié)

免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點不代表本網(wǎng)站立場,如果涉及侵權(quán)請聯(lián)系站長郵箱:is@yisu.com進行舉報,并提供相關(guān)證據(jù),一經(jīng)查實,將立刻刪除涉嫌侵權(quán)內(nèi)容。

AI