溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點擊 登錄注冊 即表示同意《億速云用戶服務條款》

goldengate故障的處理方法

發(fā)布時間:2021-11-09 15:57:37 來源:億速云 閱讀:229 作者:柒染 欄目:建站服務器

這期內(nèi)容當中小編將會給大家?guī)碛嘘Pgoldengate故障的處理方法,文章內(nèi)容豐富且以專業(yè)的角度為大家分析和敘述,閱讀完這篇文章希望大家可以有所收獲。

問題描述:

我們線上的gg上線時間是上周三晚上,也就是4月19號晚上,當時上線的時候是配置在rac的節(jié)點3上的,在重啟節(jié)點3的時候由于疏忽,原本32G的內(nèi)存,起來之后只識別了24G,當時沒有發(fā)現(xiàn),運行幾天后,突然發(fā)現(xiàn),每天都有那么一、二次,節(jié)點3并發(fā)非常高,操作系統(tǒng)層面平均負載從幾一下飆升到五六十,造成數(shù)據(jù)庫短暫性假死現(xiàn)象,恰恰在這個時間點上,gg的抽取進程在top1,再看操作系統(tǒng)的內(nèi)存使用情況,只剩下幾十k了,一開始懷疑是nfs掛載的問題,最后測試下來,也沒什么問題,最后決定緊急處理節(jié)點3的內(nèi)存問題,具體處理細節(jié)如下:

晚6點下班后,由于6點到9點這個時間段,相對來說網(wǎng)站和boss都還比較繁忙,這段時間就沒做任何操作,到了9點鐘,通知運維相關人員,把節(jié)點3的tomcat全部停止,然后我這里停gg,卸載nfs,關閉節(jié)點3的所有數(shù)據(jù)庫進程,最后關機,操作見下:

GGSCI (rac3) 21> stop mgr

GGSCI (rac3) 21> stop extract xxxx

GGSCI (rac3) 21> stop dpump xxxx

停的過程中,errlog中的信息如下:

2012-04-26 20:57:39  INFO    OGG-00497  Oracle GoldenGate Capture for Oracle, extksr1.prm:  Writing DDL operation to extract trail file.

2012-04-26 21:01:36  INFO    OGG-00987  Oracle GoldenGate Command Interpreter for Oracle:  GGSCI command (oracle): stop extksr1.

2012-04-26 21:01:38  INFO    OGG-01021  Oracle GoldenGate Capture for Oracle, extksr1.prm:  Command received from GGSCI: STOP.

2012-04-26 21:01:39  INFO    OGG-00991  Oracle GoldenGate Capture for Oracle, extksr1.prm:  EXTRACT EXTKSR1 stopped normally.

2012-04-26 21:01:41  INFO    OGG-00987  Oracle GoldenGate Command Interpreter for Oracle:  GGSCI command (oracle): stop dpksr1.

2012-04-26 21:01:43  INFO    OGG-01021  Oracle GoldenGate Capture for Oracle, dpksr1.prm:  Command received from GGSCI: STOP.

2012-04-26 21:01:43  INFO    OGG-00991  Oracle GoldenGate Capture for Oracle, dpksr1.prm:  EXTRACT DPKSR1 stopped normally.

2012-04-26 21:01:47  INFO    OGG-00987  Oracle GoldenGate Command Interpreter for Oracle:  GGSCI command (oracle): stop mgr.

2012-04-26 21:01:49  INFO    OGG-00963  Oracle GoldenGate Manager for Oracle, mgr.prm:  Command received from GGSCI on host 10.1.8.49 (STOP).

2012-04-26 21:01:49  WARNING OGG-00938  Oracle GoldenGate Manager for Oracle, mgr.prm:  Manager is stopping at user request.

相關進程都停止之后,卸載nfs,umount了節(jié)點1,2以及共享存儲,具體命令略過,很簡單,值得一提的是,在卸載共享存儲的時候,會出現(xiàn)資源忙的情況,只要加個-l參數(shù)就可以了,同時主站gg進程都停止之后,會發(fā)現(xiàn)gg的目標端進程雖然是running狀態(tài),但是errlog里會提示抽取進程已停止的相關信息:

2012-04-26 20:54:38  INFO    OGG-00484  Oracle GoldenGate Delivery for Oracle, repksr1.prm:  Executing DDL operation.

2012-04-26 20:54:38  INFO    OGG-00483  Oracle GoldenGate Delivery for Oracle, repksr1.prm:  DDL operation successful.

2012-04-26 20:54:38  INFO    OGG-01408  Oracle GoldenGate Delivery for Oracle, repksr1.prm:  Restoring current schema for DDL operation to [OGG].

2012-04-26 20:58:41  INFO    OGG-01735  Oracle GoldenGate Collector:  Synchronizing /home/oracle/ggs/trails/t1000239 to disk.

2012-04-26 20:58:41  INFO    OGG-01670  Oracle GoldenGate Collector:  Closing /home/oracle/ggs/trails/t1000239.

2012-04-26 20:58:41  INFO    OGG-01675  Oracle GoldenGate Collector:  Terminating because extract is stopped.

以上步驟執(zhí)行完了之后,停掉節(jié)點3上的數(shù)據(jù)庫相關進程和服務,略過,然后就是關機,通知在機房候命的同事,然后那邊開始處理內(nèi)存問題.........大約30分鐘后,內(nèi)存問題解決,服務器啟動起來后,我這里開始處理后續(xù)事宜:

首先就是在節(jié)點3上啟動portmap和nfs服務,略過................

之后掛載節(jié)點1,2以及共享存儲,之后在啟動mgr進程的時候會報錯,如下:

2012-04-26 21:50:18  ERROR   OGG-01117  Oracle GoldenGate Command Interpreter for Oracle:  Received signal: Program interrupt (2).

2012-04-26 21:50:18  ERROR   OGG-01668  Oracle GoldenGate Command Interpreter for Oracle:  PROCESS ABENDING.

2012-04-26 21:51:43  INFO    OGG-00987  Oracle GoldenGate Command Interpreter for Oracle:  GGSCI command (oracle): start mgr.

2012-04-26 21:52:13  ERROR   OGG-01454  Oracle GoldenGate Manager for Oracle, mgr.prm:  Unable to lock file "/share_disk/ggs/dirpcs/MGR.pcm" (error 37, No locks available).

2012-04-26 21:52:13  ERROR   OGG-01668  Oracle GoldenGate Manager for Oracle, mgr.prm:  PROCESS ABENDING.

以上紅色部分大概意思就是mgr進程無法獲得共享存儲上的相關鎖,直接會導致后續(xù)操作都無法進行,方法很簡單,就是在節(jié)點3上啟動nfslock服務,然后再啟動mgr進程就好了,待mgr啟動起來之后,發(fā)現(xiàn)抽取進程abend掉了,errlog里拋出相關extract的錯誤信息,如下:

2012-04-26 21:54:34  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, dpksr1.prm:  Rolling over remote file /home/oracle/ggs/trails/t1000240.

2012-04-26 21:54:34  INFO    OGG-01053  Oracle GoldenGate Capture for Oracle, dpksr1.prm:  Recovery completed for target file /home/oracle/ggs/trails/t1000240, at RBA 1022.

2012-04-26 21:54:34  INFO    OGG-01057  Oracle GoldenGate Capture for Oracle, dpksr1.prm:  Recovery completed for all targets.

2012-04-26 21:54:35  ERROR   OGG-00446  Oracle GoldenGate Capture for Oracle, extksr1.prm:  Could not find archived log for sequence 16857 thread 3 under alternative destinations. SQL <SELECT MAX(sequence#)  FROM v$log WHERE thread# = :ora_thread>. Last alternative log tried /arch/rac3/3_16857_744833311.dbf, error retrieving redo file name for sequence 16857, archived = 1, use_alternate = 0Not able to establish initial position for sequence 16857, rba 1529360.

2012-04-26 21:54:35  ERROR   OGG-01668  Oracle GoldenGate Capture for Oracle, extksr1.prm:  PROCESS ABENDING.

造成這種情況的原因很簡單,就是節(jié)點3在關閉的時候,出現(xiàn)vip漂移至其他節(jié)點了,導致原本節(jié)點3上的歸檔歸到了其他的節(jié)點上,在gg抽取節(jié)點3的歸檔的時候,在相關目錄下找不到必須的歸檔日志,所以就abend掉了,原因清楚之后,解決就簡單了,直接到其他節(jié)點上把節(jié)點3的歸檔日志拷貝過來,然后再啟動抽取進程就ok了:

2012-04-26 21:57:22  INFO    OGG-00993  Oracle GoldenGate Capture for Oracle, extksr1.prm:  EXTRACT EXTKSR1 started.

2012-04-26 21:57:22  INFO    OGG-01055  Oracle GoldenGate Capture for Oracle, extksr1.prm:  Recovery initialization completed for target file /share_disk/ggs/trails/s1000239, at RBA 24518902.

2012-04-26 21:57:22  INFO    OGG-01478  Oracle GoldenGate Capture for Oracle, extksr1.prm:  Output file /share_disk/ggs/trails/s1 is using format RELEASE 10.4/11.1.

2012-04-26 21:57:23  INFO    OGG-01517  Oracle GoldenGate Capture for Oracle, extksr1.prm:  Position of first record processed for Thread 1, Sequence 29645, RBA 18568720, SCN 18.122009990, Apr 26, 2012 9:01:24 PM.

2012-04-26 21:57:23  INFO    OGG-01517  Oracle GoldenGate Capture for Oracle, extksr1.prm:  Position of first record processed for Thread 2, Sequence 28161, RBA 12794496, SCN 18.122010368, Apr 26, 2012 9:01:32 PM.

2012-04-26 21:57:24  INFO    OGG-01026  Oracle GoldenGate Capture for Oracle, extksr1.prm:  Rolling over remote file /share_disk/ggs/trails/s1000239.

2012-04-26 21:57:24  INFO    OGG-01053  Oracle GoldenGate Capture for Oracle, extksr1.prm:  Recovery completed for target file /share_disk/ggs/trails/s1000240, at RBA 1019.

2012-04-26 21:57:24  INFO    OGG-01057  Oracle GoldenGate Capture for Oracle, extksr1.prm:  Recovery completed for all targets.

gg主庫:

GGSCI (rac3) 20> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           

EXTRACT     RUNNING     DPKSR1      00:00:00      00:00:00    

EXTRACT     RUNNING     EXTKSR1     00:00:00      00:00:04   

gg備庫:

GGSCI (rptdb) 7> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           

REPLICAT    RUNNING     REPKSR1     00:00:00      00:00:00

最后觀察了一段時間,發(fā)現(xiàn)主站和gg都沒什么問題了,整過程持續(xù)了大概一個小時,接下來一周時間繼續(xù)觀察監(jiān)控。

記錄一下~~

上述就是小編為大家分享的goldengate故障的處理方法了,如果剛好有類似的疑惑,不妨參照上述分析進行理解。如果想知道更多相關知識,歡迎關注億速云行業(yè)資訊頻道。

向AI問一下細節(jié)

免責聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點不代表本網(wǎng)站立場,如果涉及侵權請聯(lián)系站長郵箱:is@yisu.com進行舉報,并提供相關證據(jù),一經(jīng)查實,將立刻刪除涉嫌侵權內(nèi)容。

AI