您好,登錄后才能下訂單哦!
內(nèi)容介紹
本文詳細(xì)介紹了在客戶現(xiàn)場一次因?yàn)?span>resmgr:cpu quantum等待事件而導(dǎo)致CPU大量被消耗,在確認(rèn)原因并將導(dǎo)致CPU沖高的Resource Manager資源管理器特性關(guān)閉后,后續(xù)導(dǎo)致了后臺(tái)alert日志中大量的自動(dòng)定時(shí)JOB沒有執(zhí)行并報(bào)錯(cuò):
ORA-12012: error on auto execute of job "SYS"."ORA$AT_OS_OPT_SY_255" ORA-29373: resource manager is not on |
客戶操作系統(tǒng):RHEL6
數(shù)據(jù)庫版本:11.2.0.3.11
以下我們將描述并說明Resource Manager資源管理器以及后續(xù)的解決處理排查等一系列的過程。
概念普及
Resource Manager資源管理器是oracle在10g推出,并在11g中得以完善的一項(xiàng)功能。資源管理器通過控制數(shù)據(jù)庫內(nèi)部的執(zhí)行調(diào)度任務(wù)控制資源在各個(gè)會(huì)話之間的分配。通過控制所要運(yùn)行的會(huì)話以及會(huì)話運(yùn)行的時(shí)間長度,數(shù)據(jù)庫資源管理器可以確保資源利用及分配和我們計(jì)劃中的配置一樣,有效利用資源,當(dāng)然,Resource Manager主要控制資源包括如下幾塊:
Oracle會(huì)話的CPU使用率
并行度
SQL語句操作執(zhí)行時(shí)間
會(huì)話空閑時(shí)間
活躍會(huì)話(session)數(shù)
UNDO管理
當(dāng)然,在真正的生產(chǎn)環(huán)境中,Resource Manager還沒有被大量的使用起來,但是在11g(11.1.0.6 to 11.1.0.7 and 11gR2)開始,oracle默認(rèn)開啟Resource Manager計(jì)劃,如下顯示oracle后臺(tái)日志:
Setting Resource Manager plan SCHEDULER[0x51B5]:DEFAULT_MAINTENANCE_PLAN via scheduler window Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter Thu Feb 05 22:00:03 2009 Begin automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" Thu Feb 05 22:00:39 2009 End automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" |
如下顯示在10g和11g中,oracle對(duì)Resource Manager采用的不同策略:
Subject |
10G |
11G |
Maintenance Window |
2 windows, WEEK and WEEKEND |
Each day has its own window |
Resource manager |
Not enabled per default |
Default resource plan specified |
故障詳解
客戶新系統(tǒng)上線,在上線后,發(fā)現(xiàn)數(shù)據(jù)庫出現(xiàn)大量的等待事件:resmgr:cpu quantum,這個(gè)等待事件是在11g中一個(gè)比較常見的等待事件,消耗大量CPU 系統(tǒng)性能變差,在和客戶溝通后,決定對(duì)該特性進(jìn)行關(guān)閉,以下為關(guān)閉該特性步驟:
ALTER SYSTEM SET “_resource_manager_always_on”=FALSE SCOPE=SPFILE SID='*';
#1,Change the active windows to use the null resource manager plan (or other nonrestrictive plan) using:
execute dbms_scheduler.set_attribute('WEEKNIGHT_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('WEEKEND_WINDOW','RESOURCE_PLAN','');
#2,Since in 11g there are more Maintenance Windows, we should add them too:
execute dbms_scheduler.set_attribute('SATURDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('SUNDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('MONDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('TUESDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('WEDNESDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('THURSDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('FRIDAY_WINDOW','RESOURCE_PLAN','');
#3. Then, for each window_name (WINDOW_NAME from DBA_SCHEDULER_WINDOWS), run:
execute dbms_scheduler.set_attribute('<window name>','RESOURCE_PLAN',''); |
以上關(guān)掉Resource Manager后效果確實(shí)很明顯,系統(tǒng)運(yùn)行穩(wěn)定,CPU處于正常水平。但是運(yùn)行一段時(shí)間后后臺(tái)開始報(bào)錯(cuò):
ORA-12012: error on auto execute of job "SYS"."ORA$AT_SQ_SQL_SW_186" ORA-29373: resource manager is not on |
從報(bào)錯(cuò)上看,似乎和Resource Manager有一定的關(guān)系,從JOB的命名來看,這個(gè)JOB是"SQL Tuning Advisor Job", 用于診斷和監(jiān)控高負(fù)載的SQL, 并為ADDM提供SQL優(yōu)化建議的JOB,第一次,我們對(duì)該報(bào)錯(cuò)的JOB,也就是SQL Tuning Advisor進(jìn)行了關(guān)閉,關(guān)閉過程如下:
exec DBMS_AUTO_TASK_ADMIN.DISABLE('SQL TUNING ADVISOR',NULL, NULL); |
關(guān)閉完后又有相應(yīng)新的JOB開始報(bào)錯(cuò):
ORA-12012: error on auto execute of job "SYS"."ORA$AT_OS_OPT_SY_255" ORA-29373: resource manager is not on |
我們檢查后臺(tái)最近的JOB執(zhí)行情況:
SQL> select client_name,window_name,job_name,job_status from dba_autotask_job_history ;
CLIENT_NAME WINDOW_NAME JOB_NAME JOB_STATUS ----------------------------- ------------------------------------- -------- auto optimizer stats collection WEDNESDAY_WINDOW ORA$AT_OS_OPT_SY_102 FAILED auto optimizer stats collection FRIDAY_WINDOW ORA$AT_OS_OPT_SY_105 FAILED auto optimizer stats collection TUESDAY_WINDOW ORA$AT_OS_OPT_SY_99 FAILED auto optimizer stats collection MONDAY_WINDOW ORA$AT_OS_OPT_SY_96 FAILED |
在以上的腳本輸出中,我們發(fā)現(xiàn)大量的后臺(tái)JOB報(bào)錯(cuò),而以上JOB報(bào)錯(cuò)從告警日志上看,基本都是由于Resource Manager關(guān)閉造成。但是我們知道,Resource Manager的開啟關(guān)閉應(yīng)該不至于影響JOB的正常調(diào)度,而且在相同的數(shù)據(jù)庫版本下,我們在AIX上卻沒有發(fā)現(xiàn)問題,而在檢查過程中,我們發(fā)現(xiàn),雖然我們已經(jīng)關(guān)閉了Resource Manager,但是相關(guān)的DBRM進(jìn)程卻依舊存在,ORACLE的DBRM進(jìn)程進(jìn)程即為Resource Manager的管理進(jìn)程,在某種情況下,關(guān)閉了Resource Manager卻發(fā)現(xiàn)該進(jìn)程依舊存在,這本身就存在一定的問題。這從一定程度上給予我們一定的懷疑方向,可能存在resource并沒有完全關(guān)閉,而且從詳細(xì)的trace文件中的從call stack trace里來看, 當(dāng)前進(jìn)程也沒有沒有關(guān)于resource manager的函數(shù)調(diào)用,而是一直在向另外一個(gè)進(jìn)程post message。
所以,我們懷疑錯(cuò)誤很有可能是由于DBRM沒有正常關(guān)閉造成。
#ps -ef|grep dbrm oracle 34920 1 0 Apr18 ? 00:00:59 ora_dbrm_nfdb1 |
在對(duì)問題進(jìn)行了詳細(xì)的跟蹤及分析后,我們向ORACLE提交了SR,ORACLE反饋給我們第二個(gè)隱含參數(shù): _resource_manager_always_on設(shè)置為FALSE
我們嘗試對(duì)該參數(shù)進(jìn)行設(shè)置,并重啟了數(shù)據(jù)庫生效該參數(shù)后,DBRM進(jìn)程消失:
# ps -ef | grep dbrm root 4650 3647 0 21:58 pts/2 00:00:00 grep dbrm |
且相關(guān)的JOB執(zhí)行正常。為了確認(rèn)問題,我們分別進(jìn)行了3個(gè)實(shí)驗(yàn),分別為:
1. 設(shè)置“_resource_manager_always_on隱含參數(shù),關(guān)閉resource manager windows 調(diào)用計(jì)劃
2. 刪除隱含參數(shù),只設(shè)置resource manager windows 調(diào)用計(jì)劃
3. 添加2個(gè)隱含參數(shù),關(guān)閉resource manager windows 調(diào)用計(jì)劃
關(guān)閉數(shù)據(jù)庫,調(diào)整時(shí)間,設(shè)置_resource_manager_always_on隱含參數(shù),關(guān)閉windows 計(jì)劃
關(guān)閉數(shù)據(jù)庫: SQL> shutdown immediate Database closed. Database dismounted. ORACLE instance shut down.
調(diào)整時(shí)間: [root@rhel6 ~]# date -s 2014/05/14 [root@rhel6 ~]# date -s 21:57:00 [root@rhel6 ~]# clock -w [root@rhel6 ~]# date Wed May 14 21:57:35 CST 2014
添加隱含參數(shù),啟動(dòng)至open: SQL> ALTER SYSTEM SET "_resource_manager_always_on"=FALSE SCOPE=SPFILE; SQL> startup force(測試環(huán)境,直接force啟動(dòng),生產(chǎn)環(huán)境勿如此操作)
設(shè)置resource manager plan: execute dbms_scheduler.set_attribute('SATURDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('SUNDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('MONDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('TUESDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('WEDNESDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('THURSDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('FRIDAY_WINDOW','RESOURCE_PLAN',''); |
我們觀察22點(diǎn)的alert信息,確實(shí)開始報(bào)錯(cuò):
Wed May 14 22:00:03 2014 Errors in file /oracle/ora11g/base/diag/rdbms/ora11g/ora11g/trace/ora11g_j003_4452.trc: ORA-12012: error on auto execute of job "SYS"."ORA$AT_OS_OPT_SY_27" ORA-29373: resource manager is not on Wed May 14 22:00:03 2014 Errors in file /oracle/ora11g/base/diag/rdbms/ora11g/ora11g/trace/ora11g_j004_4454.trc: ORA-12012: error on auto execute of job "SYS"."ORA$AT_SA_SPC_SY_28" ORA-29373: resource manager is not on Wed May 14 22:00:03 2014 Errors in file /oracle/ora11g/base/diag/rdbms/ora11g/ora11g/trace/ora11g_j005_4456.trc: ORA-12012: error on auto execute of job "SYS"."ORA$AT_SQ_SQL_SW_29" ORA-29373: resource manager is not on Wed May 14 22:00:04 2014 XDB installed. XDB initialized. |
檢查DBRM進(jìn)程已經(jīng)存在:
[root@rhel6 ~]# ps -ef | grep dbrm ora11g 4346 1 0 21:57 ? 00:00:00 ora_dbrm_ora11g
|
檢查后臺(tái)JOB執(zhí)行記錄視圖:
SQL> select CLIENT_NAME,WINDOW_NAME,JOB_NAME,JOB_STATUS,JOB_START_TIME from DBA_AUTOTASK_JOB_HISTORY where CLIENT_NAME='auto optimizer stats collection' order by JOB_START_TIME desc;
CLIENT_NAME WINDOW_NAME JOB_NAME JOB_STATUS JOB_START_TIME ----------------------------------- ------------------------------ ------------------------------ ------------------------------ ---------------------------------------- auto optimizer stats collection WEDNESDAY_WINDOW ORA$AT_OS_OPT_SY_27 FAILED 14-MAY-14 10.00.03.233570 PM PRC |
模擬過程與當(dāng)前環(huán)境產(chǎn)生效果一致
接下來我們進(jìn)行第二個(gè)模擬:
關(guān)閉數(shù)據(jù)庫,調(diào)整時(shí)間,去除隱含參數(shù),關(guān)閉windows 計(jì)劃
關(guān)閉數(shù)據(jù)庫: SQL> shutdown immediate Database closed. Database dismounted. ORACLE instance shut down.
調(diào)整時(shí)間: [root@rhel6 ~]# date -s 2014/05/14 [root@rhel6 ~]# date -s 21:57:00 [root@rhel6 ~]# clock -w [root@rhel6 ~]# date Wed May 14 21:57:35 CST 2014
去除隱含參數(shù),啟動(dòng)至open: 隱含參數(shù)去除采用create pfile from spfile; 刪除spfile,編輯pfile文件,刪除隱含參數(shù),以pfile啟動(dòng)數(shù)據(jù)庫
設(shè)置resource manager plan:(由于前面已經(jīng)設(shè)置過,無需再設(shè)置) |
我們觀察22點(diǎn)的alert信息,發(fā)現(xiàn)沒有報(bào)錯(cuò)
Tue May 13 22:00:03 2014 Begin automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" End automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" Tue May 13 22:00:05 2014 XDB installed. XDB initialized.
|
檢查 DBRM進(jìn)程:
[root@rhel6 ~]# ps -ef | grep dbrm ora11g 3844 1 0 21:55 ? 00:00:00 ora_dbrm_ora11g |
說明:此時(shí)resource manager由于只是關(guān)閉了resource manager plan計(jì)劃,沒有真正關(guān)閉resource manager 因此該進(jìn)程依舊存在。
檢查后臺(tái)JOB執(zhí)行視圖信息:
SQL> select CLIENT_NAME,WINDOW_NAME,JOB_NAME,JOB_STATUS,JOB_START_TIME from DBA_AUTOTASK_JOB_HISTORY where CLIENT_NAME='auto optimizer stats collection' order by JOB_START_TIME desc;
CLIENT_NAME WINDOW_NAME JOB_NAME JOB_STATUS JOB_START_TIME ---------------------------------------- ------------------------------ ------------------------- -------------------- ---------------------------------------- auto optimizer stats collection TUESDAY_WINDOW ORA$AT_OS_OPT_SY_24 SUCCEEDED 13-MAY-14 10.00.02.102741 PM PRC |
說明在隱含參數(shù)除掉的情況下,JOB可以正常執(zhí)行,后臺(tái)沒有報(bào)錯(cuò)。
模擬隱含參數(shù)及resource manager plan均存在的情況
關(guān)閉數(shù)據(jù)庫,調(diào)整時(shí)間,添加隱含參數(shù),關(guān)閉windows 計(jì)劃
關(guān)閉數(shù)據(jù)庫: SQL> shutdown immediate Database closed. Database dismounted. ORACLE instance shut down. 調(diào)整時(shí)間: [root@rhel6 ~]# date -s 2014/05/15 [root@rhel6 ~]# date -s 21:57:00 [root@rhel6 ~]# clock -w [root@rhel6 ~]# date Thu May 15 21:57:35 CST 2014 添加隱含參數(shù),啟動(dòng)至open: SQL> alter system set "_resource_manager_always_off"=true scope=spfile; SQL> ALTER SYSTEM SET "_resource_manager_always_on"=FALSE SCOPE=SPFILE; SQL> startup force(測試環(huán)境,直接force啟動(dòng),生產(chǎn)環(huán)境勿如此操作) 設(shè)置resource manager plan:(由于前面已經(jīng)設(shè)置過,無需再設(shè)置) |
我們觀察22點(diǎn)的alert信息,發(fā)現(xiàn)沒有報(bào)錯(cuò)
Thu May 15 22:00:03 2014 Begin automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" Thu May 15 22:00:05 2014 XDB installed. XDB initialized. End automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" |
檢查 DBRM進(jìn)程:
[root@rhel6 ~]# ps -ef | grep dbrm root 4650 3647 0 21:58 pts/2 00:00:00 grep dbrm |
說明:此時(shí)DBRM進(jìn)程消失
檢查后臺(tái)JOB執(zhí)行視圖信息:
SQL> select CLIENT_NAME,WINDOW_NAME,JOB_NAME,JOB_STATUS,JOB_START_TIME from DBA_AUTOTASK_JOB_HISTORY where CLIENT_NAME='auto optimizer stats collection' order by JOB_START_TIME desc;
CLIENT_NAME WINDOW_NAME JOB_NAME JOB_STATUS JOB_START_TIME ----------------------------------- ------------------------------ ------------------------------ ------------------------------ ---------------------------------------- auto optimizer stats collection THURSDAY_WINDOW ORA$AT_OS_OPT_SY_30 SUCCEEDED 15-MAY-14 10.00.02.115232 PM PRC |
說明在隱含參數(shù)兩個(gè)都添加的情況下,完全屏蔽resource manager的的情況下,JOB可以正常執(zhí)行,后臺(tái)沒有報(bào)錯(cuò)。
技術(shù)結(jié)論
以上測試分析結(jié)果證明,后臺(tái)報(bào)錯(cuò)JOB執(zhí)行失敗原因應(yīng)該是DBRM進(jìn)程依舊活動(dòng),而DBRM進(jìn)程是管理Resource Manager當(dāng)去除"_resource_manager_always_off"=true及"_resource_manager_always_on"=FALSE
或者將兩個(gè)參數(shù)全部添加,均可避免該錯(cuò)誤,統(tǒng)計(jì)信息自動(dòng)收集也可以自動(dòng)執(zhí)行
在對(duì)以為問題進(jìn)行分析及確認(rèn)后,我們向ORACLE提交了相關(guān)的SR,最終,ORACLE對(duì)此,確認(rèn)相關(guān)的一個(gè)BUG,BUG號(hào)為:Bug 18748456 : AUTO TASK JOBS FAILED WITH ORA-29373 ERROR免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場,如果涉及侵權(quán)請(qǐng)聯(lián)系站長郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。