[20120412]自治事務(wù)(AUTONOMOUS_TRANSACTION)與deadlock問(wèn)題.txt
[20120412]自治事務(wù)(AUTONOMOUS_TRANSACTION)與deadlock問(wèn)題.txt
生產(chǎn)系統(tǒng)rac遇到一個(gè)問(wèn)題,在alert*.log:
Sat Mar 17 11:43:31 2012
Global Enqueue Services Deadlock detected. More info in file
/u01/app/oracle/admin/orcl/udump/orcl2_ora_18076.trc.
Sat Mar 17 11:43:31 2012
Trace dumping is performing id=[cdmp_20120317114331]
Sat Mar 17 11:44:29 2012
System State dumped to trace file /u01/app/oracle/admin/orcl/bdump/orcl2_diag_20766.trc
在10.2.0.3以前,會(huì)轉(zhuǎn)儲(chǔ)很大的文件到進(jìn)程ora_diag_orcl*對(duì)應(yīng)的轉(zhuǎn)儲(chǔ)文件,查看對(duì)應(yīng)cdmp*目錄,發(fā)現(xiàn)自己
能力有限,一直沒(méi)有解決這個(gè)問(wèn)題。我不得不經(jīng)常檢查/u01的使用情況,避免磁盤空間不足。
google找了許多鏈接,提示大部分都是主外鍵的問(wèn)題。另外google這個(gè)鏈接:
http://studycow.itpub.net/post/37461/502772
轉(zhuǎn)載如下:
五.煩人的Bug 來(lái)之前就已經(jīng)知道,Oracle的安裝目錄/opt所在盤,時(shí)常被充滿,導(dǎo)致Oracle停止響應(yīng)。由于一直忙于處理性能問(wèn)題,把這個(gè)問(wèn)題放到了最后。之前的考慮是性能問(wèn)題解決了,可能這個(gè)問(wèn)題就自動(dòng)解決了;確實(shí)有些時(shí)候Oracle的一些bug在性能差的時(shí)候,更容易被觸發(fā)。自從調(diào)整性能后,這個(gè)問(wèn)題出現(xiàn)的頻率的降低了一些,但還是在出;正在思考的時(shí)候,發(fā)現(xiàn)/opt的使用率又已經(jīng)到90%了,檢查發(fā)現(xiàn)在安裝目錄的bdump目錄又新生成了一個(gè)4g左右的核心轉(zhuǎn)儲(chǔ)文件。正是這個(gè)核心轉(zhuǎn)儲(chǔ)文件,占用了大量的磁盤空間。檢查alertSID.log文件發(fā)現(xiàn)如下的內(nèi)容:Sun Jun 6 18:51:35 2010Global Enqueue Services Deadlock detected. More info in file/opt/oracle/product/admin/orcl/udump/orcl1_ora_26649.trc.Sun Jun 6 19:14:05 2010System State dumped to trace file /opt/oracle/product/admin/orcl/bdump/orcl1_diag_4783.trc....日志顯示是由于Oracle檢查到了一個(gè)ges的死鎖,于是做了系統(tǒng)級(jí)的dump操作,這種系統(tǒng)級(jí)的dump出來(lái)的內(nèi)容很多,導(dǎo)致了dump文件特別大。感覺(jué)象是遇到了bug,在metalink找到了現(xiàn)象類似的bug:6145177,bug中描述的現(xiàn)象與醫(yī)院有些相似,但不完全一樣,但打了補(bǔ)丁后,沒(méi)再出現(xiàn)這個(gè)問(wèn)題。這個(gè)問(wèn)題在10.2.0.4中已經(jīng)解決了,目前醫(yī)院使用的是10.2.0.3,Oracle已經(jīng)提供了單獨(dú)的Patch,為了避免大的版本升級(jí),就應(yīng)用了單獨(dú)的patch,整個(gè)過(guò)程比較順利。打了這個(gè)patch,一直用到第二天上午,再也沒(méi)有出現(xiàn)這個(gè)錯(cuò)誤 ,看來(lái)patch已經(jīng)起了作用。根據(jù)這個(gè)建議要升級(jí)到10.2.0.4.
我自己也檢查自己的alert文件,我發(fā)現(xiàn)大量出現(xiàn)是從2011/7/14號(hào)開(kāi)始,估計(jì)開(kāi)發(fā)人員改動(dòng)程序引起,問(wèn)了半天對(duì)方也不能確定是那個(gè)程序引起的問(wèn)題。
看來(lái)良好的記錄與文檔很重要!去年7月5號(hào)也有出現(xiàn),實(shí)際上個(gè)人感覺(jué)開(kāi)發(fā)人員可以回憶起來(lái)。
$ grep -B 1 '^Global Enqueue Services Deadlock detected' alert_orcl2.log | egrep '2011|2012' > /tmp/a1
Thu Jun 30 09:12:56 2011
Tue Jul 5 08:37:28 2011
Tue Jul 5 08:38:25 2011
Tue Jul 5 08:38:55 2011
Tue Jul 5 08:39:09 2011
Tue Jul 5 08:39:47 2011
Tue Jul 5 08:40:02 2011
Sat Jul 9 10:50:39 2011
Sat Jul 9 10:50:52 2011
Sun Jul 10 10:11:50 2011
Thu Jul 14 13:36:10 2011
Thu Jul 14 13:36:15 2011
Thu Jul 14 13:41:34 2011
Thu Jul 14 13:41:39 2011
Thu Jul 14 16:24:29 2011
Thu Jul 14 16:24:35 2011
Thu Jul 14 16:25:00 2011
Thu Jul 14 16:25:09 2011
Thu Jul 14 16:25:16 2011
Thu Jul 14 16:25:23 2011
Thu Jul 14 17:02:01 2011
Fri Jul 15 08:45:24 2011
Fri Jul 15 08:45:30 2011
Fri Jul 15 08:49:36 2011
Fri Jul 15 14:20:11 2011
.....
Wed Apr 11 11:46:39 2012
Wed Apr 11 19:53:17 2012
Thu Apr 12 07:47:45 2012
Thu Apr 12 14:26:26 2012
Thu Apr 12 14:26:46 2012
--從以上信息可以發(fā)現(xiàn),2011/6/30號(hào)前根本沒(méi)有出現(xiàn)這個(gè)問(wèn)題,編程出現(xiàn)問(wèn)題的可能性個(gè)人認(rèn)為情況很大。
2012年3月18號(hào)作了升級(jí)到10.2.0.4,我開(kāi)始一直關(guān)注以前的情況是否解決。檢查發(fā)現(xiàn)diag文件不再迅速增加,但是
Global Enqueue Services Deadlock detected依舊出現(xiàn),不過(guò)這回我可以定位問(wèn)題了。
Fri Apr 13 16:29:03 2012
Global Enqueue Services Deadlock detected. More info in file
/u01/app/oracle/admin/orcl/udump/orcl2_ora_3135.trc.
Fri Apr 13 16:29:03 2012
Trace dumping is performing id=[cdmp_20120413162903]
查看對(duì)用的orcl2_ora_3135.trc轉(zhuǎn)儲(chǔ)文件,發(fā)現(xiàn)里面有記錄的sql語(yǔ)句結(jié)合記錄里面的XID,最終確定是問(wèn)題,是開(kāi)發(fā)人員在一個(gè)存儲(chǔ)過(guò)程使用了自治事務(wù)引起的。
自己做了一個(gè)測(cè)試?yán)觼?lái)再現(xiàn)這個(gè)問(wèn)題:
1.測(cè)試環(huán)境:
SQL> select * from v$version ;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
PL/SQL Release 11.2.0.1.0 - Production
CORE 11.2.0.1.0 Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production
create table t (id number,name varchar2(10));
CREATE UNIQUE INDEX I_T_I ON SCOTT.T(ID);
CREATE PROCEDURE test1( l_id number,l_name varchar2,flag VARCHAR2) AS
PRAGMA AUTONOMOUS_TRANSACTION;
BEGIN
commit ;
IF flag = 'INSERT' THEN
insert into t values(l_id,l_name);
END IF;
IF flag = 'UPDATE' THEN
update t set id=l_id,name=l_name where id=l_id;
END IF;
IF flag = 'DELETE' THEN
delete from t where id=l_id;
END IF;
commit;
dbms_output.put_line (flag);
END;
/
insert into t values(1,'a');
insert into t values(2,'b');
commit;
2.測(cè)試:
打開(kāi)會(huì)話1,執(zhí)行如下:
SQL> set SERVEROUT on
SQL> update t set name='aaa' where id=1;
SQL> exec test1(1,'A','UPDATE');
BEGIN test1(1,'A','UPDATE'); END;
*
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
ORA-06512: at "SCOTT.TEST1", line 9
ORA-06512: at line 1
alert提示如下:
Mon Apr 16 10:37:38 2012
ORA-00060: Deadlock detected. More info in file /u01/app/oracle11g/diag/rdbms/test/test/trace/test_ora_13751.trc.
--而且11G下記錄更加詳細(xì),定位更加方便。
Deadlock graph:
---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name process session holds waits process session holds waits
TX-00090008-00001193 26 132 X 26 132 X
session 132: DID 0001-001A-00000007 session 132: DID 0001-001A-00000007
Rows waited on:
Session 132: obj - rowid = 00015D00 - AAAV0AAAEAAAAIMAAA
(dictionary objn - 89344, file - 4, block - 524, slot - 0)
----- Information for the OTHER waiting sessions -----
----- End of information for the OTHER waiting sessions -----
Information for THIS session:
----- Current SQL Statement for this session (sql_id=2bbypw0dj45w2) -----
UPDATE T SET ID=:B1 ,NAME=:B2 WHERE ID=:B1
----- PL/SQL Stack -----
----- PL/SQL Call Stack -----
object line object
handle number name
0x95fcdac0 9 procedure SCOTT.TEST1
0x95f2c440 1 anonymous block
===================================================
--我的測(cè)試環(huán)境是單機(jī),如果在rac環(huán)境下,alert提示如下:
Thu Apr 12 15:56:49 2012
Global Enqueue Services Deadlock detected. More info in file
/u01/app/oracle/admin/orcl/udump/orcl1_ora_8070.trc.
Thu Apr 12 15:56:49 2012
Trace dumping is performing id=[cdmp_20120412155649]
3.再做一個(gè)另外的測(cè)試:
SQL> rollback ;
Rollback complete.
SQL> select * from t;
ID NAME
---------- ----------
1 a
2 b
SQL> insert into t values(3,'c');
1 row created.
SQL> exec test1(1,'A','UPDATE');
UPDATE
PL/SQL procedure successfully completed.
SQL> select * from t;
ID NAME
---------- ----------
1 A
2 b
3 c
--打開(kāi)另外的會(huì)話,執(zhí)行:
SQL> select * from t;
ID NAME
---------- ----------
1 A
2 b
--可以發(fā)現(xiàn)自治事務(wù)的特點(diǎn),id=1的記錄已經(jīng)顯示修改,而插入的記錄卻還沒(méi)有提交。
如果會(huì)話1這個(gè)時(shí)候rollback, 對(duì)id=1的修改依舊有效!
SQL> rollback ;
Rollback complete.
SQL> select * from t;
ID NAME
---------- ----------
1 A
2 b