standby 磁盤IO性能較差，影響Primary性能

發(fā)布時間：2020-07-27 18:34:30 來源：網(wǎng)絡(luò) 閱讀：1026 作者：hsbxxl 欄目：數(shù)據(jù)庫

1. 近日處理一個由于standby 磁盤IO性能較差，導(dǎo)致Primary的性能受到影響。
主庫主要是等待"log file switch completion"，通過ASH dump分析，最終發(fā)現(xiàn)實際等待事件是"LGWR-LNS wait on channel”.這個事件基本上可以將問題歸結(jié)到網(wǎng)絡(luò)性能和standby的IO性能，而客戶的傳輸模式是“MAXIMUM AVAILABILITY"
最后提出兩個解決方案，
(1). 更換性能更好的standby存儲
(2). 修改傳輸模式為MAXIMUM performance，并使用LGWR ASYNC傳輸模式
這里順帶強調(diào)一下standby三種傳輸模式，以及對應(yīng)的可使用的傳輸方式

比較項	Maximum protection	Maximum availability	Maximum performance
Redo寫或傳輸進程	lgwr	lgwr	lgwr或者arch
網(wǎng)絡(luò)傳輸模式	sync	sync	sync或者async
IO寫入成功確認	affirm	affirm	affirm或者noaffirm
standby redologs	需要	需要	lgwr需要，arch不需要

問題的根本，就是standby IO性能差，而使用“MAXIMUM AVAILABILITY"方式傳輸，使用sync模式，需要磁盤IO的寫入成功的確認信息，導(dǎo)致拖累的primary的性能。

2. 下面是關(guān)于SYNC和ASYNC的介紹

http://docs.oracle.com/cd/B10501_01/server.920/a96653/log_arch_dest_param.htm#77394

SYNC=PARALLEL
SYNC=NOPARALLEL

The SYNC attribute specifies that network I/O is to be performed synchronously for the destination, which means that once the I/O is initiated, the archiving process waits for the I/O to complete before continuing. The SYNC attribute is one requirement for setting up a no-data-loss environment, because it ensures that the redo records were successfully transmitted to the standby site before continuing.

If the log writer process is defined to be the transmitter to multiple standby destinations that use the SYNC attribute, the user has the option of specifying SYNC=PARALLEL or SYNC=NOPARALLEL for each of those destinations.

- If SYNC=NOPARALLEL is used, the log writer process performs the network I/O to each destination in series. In other words, the log writer process initiates an I/O to the first destination and waits until it completes before initiating the I/O to the next destination. Specifying the SYNC=NOPARALLEL attribute is the same as specifying the ASYNC=0 attribute.

- If SYNC=PARALLEL is used, the network I/O is initiated asynchronously, so that I/O to multiple destinations can be initiated in parallel. However, once the I/O is initiated, the log writer process waits for each I/O operation to complete before continuing. This is, in effect, the same as performing multiple, synchronous I/O operations simultaneously. The use of SYNC=PARALLEL is likely to perform better than SYNC=NOPARALLEL.

Because the PARALLEL and NOPARALLEL qualifiers only make a difference if multiple destinations are involved, Oracle Corporation recommends that all destinations use the same value.

ASYNC[=blocks]

The ASYNC attribute specifies that network I/O is to be performed asynchronously for the destination. Once the I/O is initiated, the log writer continues processing the next request without waiting for the I/O to complete and without checking the completion status of the I/O. Use of the ASYNC attribute allows standby environments to be maintained with little or no performance effect on the primary database. The optional block count determines the size of the SGA network buffer to be used. In general, the slower the network connection, the larger the block count should be. Also, specifying the ASYNC=0 attribute is the same as specifying the SYNC=NOPARALLEL attribute.

通過仔細解讀文檔，可以總結(jié)下面幾點
sync，在IO傳輸發(fā)起之后，只有在standby做IO確認成功信息反饋之后，primary才能繼續(xù)進行下一步，這樣，如果standby IO性能較差，就會影響主庫性能。
Async，是不需要對IO進行確認了，在primary發(fā)起IO初始化之后，就進行下一步工作了，standby的寫入快慢，不會影響到primary

3. 在充分理解這兩個概念之后，再回頭分析客戶的問題：
客戶一共有三個standby，但是LOG_ARCHIVE_DEST_3對應(yīng)的standby服務(wù)器性能較差，在系統(tǒng)相對繁忙的時間段，在oswatcher log中可以發(fā)現(xiàn)，standby的IO使用率都是100%。
至此，問題已經(jīng)確認，就是standby服務(wù)器和primary的性能差距比較大，同時由于使用LGWR SYNC傳輸模式，導(dǎo)致standby的IO壓力比較大。
并且primary要在standby確認收到log信息的傳輸完成，才能繼續(xù)下一步，導(dǎo)致primary的性能受到很大影響。

4. 總結(jié)，建議standby的性能不要與primary有太大差異，至少能達到primary的70~80%的性能，不然在switch和fail over的時候，standby根本無法接管primary的業(yè)務(wù)。
而且在日常的日志傳輸?shù)?，也會影響primary的性能。
也許看完本文之后，你會有個疑問？說好的Maximum availability可以自動切換成Maximum performance呢？怎么就會影響到性能呢？

5. 帶著問題，我們來分析一下，先看概念：

Maximum availability Thisprotection mode provides the highest level of data protection that is possiblewithout compromising the availability of the primary database. Like maximumprotection mode, a transaction will not commit until the redo needed to recoverthat transaction is written to the local online redo log and to the standbyredo log of at least one transactionally consistent standby database. Unlikemaximum protection mode, the primary database does not shut down if a faultprevents it from writing its redo stream to a remote standby redo log. Instead,the primary database operates in maximum performance mode until the fault iscorrected, and all gaps in redo log files are resolved. When all gaps areresolved, the primary database automatically resumes operating in maximumavailability mode.
This mode ensures that no data loss will occur if the primarydatabase fails, but only if a second fault does not prevent a complete set ofredo data from being sent from the primary database to at least one standbydatabase.

最大可用性模式 -- 這種保護模式提供了可能的最高級別的數(shù)據(jù)保護，而不用與主數(shù)據(jù)庫的可用性相折衷。與最大保護模式相同，在恢復(fù)事務(wù)所需的重做寫到本地聯(lián)機重做日志和至少一個事務(wù)一致性備數(shù)據(jù)庫上的備重做日志之前，事務(wù)將不會提交。與最大保護模式不同的是，如果故障導(dǎo)致主數(shù)據(jù)庫無法寫重做流到異地備重做日志時，主數(shù)據(jù)庫不會關(guān)閉。替代地，主數(shù)據(jù)庫以最大性能模式運行直到故障消除，并且解決所有重做日志文件中的中斷。當所有中斷解決之后，主數(shù)據(jù)庫自動繼續(xù)以最大可用性模式運行。

這種模式確保如果主數(shù)據(jù)庫故障，但是只有當?shù)诙喂收蠜]有阻止完整的重做數(shù)據(jù)集從主數(shù)據(jù)庫發(fā)送到至少一個備數(shù)據(jù)庫時，不發(fā)生數(shù)據(jù)丟失。

在Maximum availability模式下，如果和備庫的連接正常，運行方式等同Maximum protection模式，事務(wù)也是主備庫同時提交。如果備庫和主庫失去聯(lián)系，則主庫自動切換到Maximum performance模式下運行，保證主庫具有最大的可用性。

發(fā)現(xiàn)沒？“如果備庫和主庫失去聯(lián)系”，“失去聯(lián)系”非常重要。本文的情況，恰恰是正常聯(lián)系，就是IO性能較差，不是完全不提供服務(wù)。

向AI問一下細節(jié)

standby 磁盤IO性能較差，影響Primary性能

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標簽

standby 磁盤IO性能較差，影響Primary性能