Oracle RAC學(xué)習(xí)之--OPS中的RAC Ping和RAC Cache Fusion

發(fā)布時間：2020-08-27 09:32:46 來源：網(wǎng)絡(luò) 閱讀：1670 作者：客居天涯欄目：關(guān)系型數(shù)據(jù)庫

Oracle RAC學(xué)習(xí)之--OPS中的RAC Ping和RAC Cache Fusion

一、OPS中的RAC Ping

二、RAC中的Cache Fusion

什么是Cache Fusion?
Cache Fusion就是通過互聯(lián)網(wǎng)絡(luò)在集群內(nèi)各節(jié)點的SGA之間進行塊傳遞，以避免首先將塊推送到磁盤，然后再重新讀入其他實例的緩存中這樣一種低效的實現(xiàn)方式(OPS的實現(xiàn))。當一個塊被讀入RAC環(huán)境中某個實例的緩存時，該塊會被賦予一個鎖資源（與行級鎖不同），以確保其他實例知道該塊正在被使用。之后，如果另一個實例請求該塊的一個副本，而該塊已經(jīng)處于前一個實例的緩存內(nèi)，那么該塊會通過互聯(lián)網(wǎng)絡(luò)直接被傳遞到另一個實例的SGA。如果內(nèi)存中的塊已經(jīng)被改變，但改變尚未提交，那么將會傳遞一個CR副本。這就意味著只要可能，數(shù)據(jù)塊無需寫回磁盤即可在各實例的緩存之間移動，從而避免了同步多實例的緩存所花費的額外I/O。很明顯，不同的實例緩存的數(shù)據(jù)可以是不同的，也就是在一個實例要訪問特定塊之前，而它又從未訪問過這個塊，那么它要么從其他實例cache fusion過來，或者從磁盤中讀入。
這里還是有一些問題需要思考的：
1、在所有實例都未讀取該塊，而第一個實例讀取時，是怎么加的鎖，加的什么鎖？如果此時有另一個實例也要讀這個塊，幾乎是同時的，那么Oracle如何來仲裁，如何讓其中一個讀取，而另一個再從前者的緩存中通過cache來得到？
2、如果一個塊已經(jīng)被其他實例讀入，那么本實例如何判斷它的存在？
3、如果某個實例改變了這個數(shù)據(jù)塊，是否會將改變傳遞到其他實例，或者說其他實例是否會知道并重新更新狀態(tài)？
4、如果一個實例要swap out 某個塊，而同時其他實例也有這個塊的緩存，修改過的和未修改過的，本實例修改的和其他實例修改的，如何操作? truncate一張表，drop一張表... 和單實例有何不同？
5、應(yīng)該如何設(shè)計應(yīng)用，以使RAC真正發(fā)揮作用，而不是引入競爭，導(dǎo)致系統(tǒng)被削弱？
6、RAC下鎖的實現(xiàn)。鎖是在各實例的SGA中保留的資源，通常被用于控制對數(shù)據(jù)庫塊的訪問。每個實例通常會保留或控制一定數(shù)量與塊范圍相關(guān)的鎖。當一個實例請求一個塊時，該塊必須獲得一個鎖，并且鎖必須來自當前控制這些鎖的實例。也就是鎖被分布在不同的實例上。而要獲得特定的鎖要從不同的實例上去獲得。但是從這個過程來看這些鎖不是固定在某個實例上的，而是根據(jù)鎖的請求頻率會被調(diào)整到使用最頻繁的實例上，從而提高效率。
對于前面的一些問題，可以結(jié)合另外的概念來學(xué)習(xí)，它們是全局緩存服務(wù)和全局隊列服務(wù)。

全局緩存服務(wù)(GCS):

全局緩存要涉及到數(shù)據(jù)塊。全局緩存服務(wù)負責維護該全局緩沖存儲區(qū)內(nèi)的緩存一致性，確保一個實例在任何時刻想修改一個數(shù)據(jù)塊時，都可獲得一個全局鎖資源，從而避免另一個實例同時修改該塊的可能性。進行修改的實例將擁有塊的當前版本（包括已提交的和未提交的事物）以及塊的前象(post p_w_picpath)。如果另一個實例也請求該塊，那么全局緩存服務(wù)要負責跟蹤擁有該塊的實例、擁有塊的版本是什么，以及塊處于何種模式。LMS進程是全局緩存服務(wù)的關(guān)鍵組成部分。
全局隊列服務(wù)(GES)：

Global Enqueue Service (GES) tracks the status of all Oracle enqueuing mechanism.主要負責維護字典緩存和庫緩存內(nèi)的一致性。字典緩存是實例的SGA內(nèi)所存儲的對數(shù)據(jù)字典信息的緩存，用于高速訪問。由于該字典信息存儲在內(nèi)存中，因而在某個節(jié)點上對字典進行的修改（如DDL)必須立即被傳播至所有節(jié)點上的字典緩存。GES負責處理上述情況，并消除實例間出現(xiàn)的差異。處于同樣的原因，為了分析影響這些對象的SQL語句，數(shù)據(jù)庫內(nèi)對象上的庫緩存鎖會被去掉。這些鎖必須在實例間進行維護，而全局隊列服務(wù)必須確保請求訪問相同對象的多個實例間不會出現(xiàn)死鎖。LMON、LCK和LMD進程聯(lián)合工作來實現(xiàn)全局隊列服務(wù)的功能。GES是除了數(shù)據(jù)塊本身的維護和管理（由GCS完成）之外，在RAC環(huán)境中調(diào)節(jié)節(jié)點間其他資源的重要服務(wù)。
查看GCS和GES:
SQL> set linesize 1000
SQL> select * from gv$sysstat where name like 'gcs %'

INST_ID STATISTIC# NAME CLASS VALUE STAT_ID
---------- ---------- ------------------------------ ---------- ---------- ----------
1 44 gcs messages sent 32 5981 2765451804
2 44 gcs messages sent 32 3632 2765451804

SQL> select * from gv$sysstat where name like 'ges %';

INST_ID STATISTIC# NAME CLASS VALUE STAT_ID
---------- ---------- ------------------------------ ---------- ---------- ----------
1 45 ges messages sent 32 3760 1145425433
2 45 ges messages sent 32 4447 1145425433

這里可以看到gcs和ges消息的發(fā)送個數(shù)。
（如果沒有使用DBCA來創(chuàng)建數(shù)據(jù)庫，那么要SYSDBA權(quán)限來運行CATCLUST.SQL腳本來創(chuàng)建RAC相關(guān)的視圖和表）
在RAC中InterConnect的配置要求：
怎樣配制interconnect互聯(lián)網(wǎng)絡(luò)以保證高效運行？
1、硬件：千兆網(wǎng)絡(luò)
2、參數(shù)設(shè)定
net.core.rmem_max 最大的TCP數(shù)據(jù)接收緩沖

Cache Fusion

提供傳輸?shù)臄U展性，在實例間傳輸block 的p_w_picpath，跟蹤資源的當前位置和狀態(tài)，每個實例的sga的目錄結(jié)構(gòu)中保存有資源信息

GRD:Global Resoure Directory

GES and GCS together maintains Global Resource Directory (GRD). GRD is like an in-memory database which contains details about all the blocks that are present in cache. GRD know what is the location of latest version of block, what is the mode of block, what is the role of block (Mode and role will be discussed shortly) etc. When ever a user ask for any data block GCS gets all the information from GRD. GRD is a distributed resource, meaning that each instance maintain some part of GRD. This distributed nature of GRD is a key to fault tolerance of RAC. GRD is stored in SGA.

Typically GRD contains following and more information

（1）Data Block Address – This is the address of data block being modified

（2）Location of most current version of data block

（3）Modes of data block

（4）Roles of data block

（5）SCN number of data block

（7）Image of data block – Could be current p_w_picpath or past p_w_picpath.

Global Resoure Directory由Global Cache Service 來管理
記錄資源的模式、資源的角色、block在實例中的狀態(tài)、在各個活動的節(jié)點發(fā)布資源的master、在必要的時候重新發(fā)布master（例如實例的啟動和關(guān)閉）

Global Cache Service MODE and ROLE:
1、資源模式(mode）
null (默認的)
share(s) (查詢)
exclusive(x) (可以改變block的內(nèi)容,其它的實例就是null mode)
2、資源角色（role）
local：
第一次請求資源的初試模式;只有一個實例可以有這個block的dirty copy
global：
當一個Block在多個實例中變dirty時，Local就變成了Global Block只能由Global Cache Service寫到磁盤中

Past Images

Past Image concept was introduced in Oracle 9i to maintain data integrity. In an Oracle database, a typical block is not written to disk immediately after it is dirtied. This is to reduce excessive IO. When the same dirty block is requested by some other instance for write of read purpose, an p_w_picpath of the block is created in owning instance and then the block is shifted to requesting instance. This p_w_picpath copy of the block is called Past Image (PI). In the event of failure Oracle can reconstruct the block by reading PIs. It is also possible to have more then 1 PI of the block, depending on how many times the block was requested in dirty stage.

A past p_w_picpath of the block is different then CR (Consistent read) p_w_picpath. Past p_w_picpath is required to create CR by applying undo data.

“Juggling” Data with Multiple Past Images

（1）Multiple Past Image versions of a data block may be kept by different instances

（2）Upon a checkpoint, only the current p_w_picpath is written to disk; Past Images are discarded

（3）In the event of a failure, current version of block can be reconstructed from PIs

（4）Since PIs are kept in memory, they aid in avoiding frequent disk writes

（5）This avoids “disk pinging” experienced with 8i OPS due to frequent writes to disk

（6）Data is “juggled” in memory, without touching down on the disk

Cache Fusion Block的傳輸

例如：有A、B、C、D四個節(jié)點

1. Read with no transfer

如果C節(jié)點需要向共享磁盤文件上讀一個Block，那么它向Global Cache Service 發(fā)送請求，這個時候請求被定向到節(jié)點D，D是這個Block的Master(每個資源都有Master)。GCS把資源授權(quán)為Share Mode和Local Role，在目錄中記錄下了他的狀態(tài)(目錄在節(jié)點D)，然后通知C，C把這個資源從Null改成Share。C開始I/O，現(xiàn)在C有了這個Block以Share模式從磁盤文件讀取。
2. Read to write transfer
B也要這個Block，并且不僅是讀，而且還要改變它的內(nèi)容。B向D(這個Block的Mater)的GCS發(fā)出請求，GCS向C發(fā)出請求，要求C把這個Block給B，C把Block給B，B收到后，告訴GCS，現(xiàn)在B可以修改這個block了。
3. Write to write transfer
A向D節(jié)點的GCS發(fā)出請求，GCS告訴B節(jié)點放棄他的Exclusive鎖，并且把當前的Image傳到A，如果這個請求沒有完成，就會放到GCS的隊列里。B把這個Block傳到A，這個時候，要寫Log，強制Log Flush，把模式變成Null。發(fā)送到A，并且告訴它這個Exclusive的資源可以用了。A收到了這個Block的Image，會通知GCS并且告訴它Block的Status是Exclusive。這個時候，B不能對這個Block做操作，雖然在它的Buffer Cache中，它還有這個Block的Copy。
4. Write to read transfer
C要讀這個Block，先向D(Master)發(fā)出請求，GCS要求A把它傳輸?shù)紺，A接受到請求完成它的工作，這可能會在A寫Log和Log Flush在發(fā)送這個Block之前。A會把它的Exclusive鎖降低到Share模式。C把從A收到的Block的SCN取出來，建設(shè)成一個資源Assumption信息為GCS更新Global Resource Directory。

通過設(shè)置參數(shù)gc_files_to_locks，可以關(guān)閉Cache Fusion。這樣就象8i的OPS一樣，別的節(jié)點要訪問數(shù)據(jù)快，必須等待別的節(jié)點提交，寫回數(shù)據(jù)文件中。
Cache Resoure的Remastering：
Cache Resoure在一個節(jié)點上不再需要繼續(xù)Master，Dynamic Remastering能把它移動到不同的節(jié)點。GCS和GES使用動態(tài)的Remastering：在一個新實例加入到這個Active Set之后重新分發(fā)資源，在一個實例離開這個Active Set之后重新分發(fā)資源。

Cache Fusion示例圖：

Read/Read Cache Fusion – GCS Processing

Oracle RAC學(xué)習(xí)之--OPS中的RAC Ping和RAC Cache Fusion

Write/Write Cache Fusion – GCS Processing

Oracle RAC學(xué)習(xí)之--OPS中的RAC Ping和RAC Cache Fusion

Blocks to Disk – GCS Processing

Oracle RAC學(xué)習(xí)之--OPS中的RAC Ping和RAC Cache Fusion

Online Instance Recovery Steps

步驟如下：

（1）Instance Failure detected by Cluster Manager and GCS

（2）Reconfiguration of GES resources (enqueues); global resource directory is frozen

（3）Reconfiguration of GCS resources; involves redistribution among surviving instances

（4）One of the surviving instances becomes the “recovering instance”

（5）SMON process of recovering instance starts first pass of redo log read of the failed instance’s redo log thread

（6）SMON finds BWR (block written records) in the redo and removes them as their PI is already written to disk

（7）SMON prepares recovery set of the blocks modified by the failed instance but not written to disk

（8）Entries in the recovery list are sorted by first dirty SCN

（9）SMON informs each block’s master node to take ownership of the block for recovery

（10）Second pass of log read begins.

（11）Redo is applied to the data files.

（12）Global Resource Directory is unfrozen

向AI問一下細節(jié)

Oracle RAC學(xué)習(xí)之--OPS中的RAC Ping和RAC Cache Fusion

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標簽