溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點擊 登錄注冊 即表示同意《億速云用戶服務(wù)條款》

PostgreSQL 源碼解讀(112)- WAL#8(XLogCtrl數(shù)據(jù)結(jié)構(gòu))

發(fā)布時間:2020-08-15 05:08:36 來源:ITPUB博客 閱讀:235 作者:husthxd 欄目:關(guān)系型數(shù)據(jù)庫

本節(jié)簡單介紹了XLOG全局(所有進(jìn)程之間)共享的數(shù)據(jù)結(jié)構(gòu):XLogCtlData和XLogCtlInsert。在這兩個結(jié)構(gòu)體中,存儲了REDO point/Lock等相關(guān)重要的信息.

一、數(shù)據(jù)結(jié)構(gòu)

XLogCtlInsert
WAL插入記錄時使用的共享數(shù)據(jù)結(jié)構(gòu)

/*
 * Shared state data for WAL insertion.
 * WAL插入記錄時使用的共享數(shù)據(jù)結(jié)構(gòu)
 */
typedef struct XLogCtlInsert
{
    //包含CurrBytePos和PrevBytePos的lock
    slock_t     insertpos_lck;  /* protects CurrBytePos and PrevBytePos */

    /*
     * CurrBytePos is the end of reserved WAL. The next record will be
     * inserted at that position. PrevBytePos is the start position of the
     * previously inserted (or rather, reserved) record - it is copied to the
     * prev-link of the next record. These are stored as "usable byte
     * positions" rather than XLogRecPtrs (see XLogBytePosToRecPtr()).
     * CurrBytePos是保留WAL的結(jié)束位置。
     *   下一條記錄將插入到那個位置。
     * PrevBytePos是先前插入(或者保留)記錄的起始位置——它被復(fù)制到下一條記錄的prev-link中。
     * 這些存儲為“可用字節(jié)位置”,而不是XLogRecPtrs(參見XLogBytePosToRecPtr())。
     */
    uint64      CurrBytePos;
    uint64      PrevBytePos;

    /*
     * Make sure the above heavily-contended spinlock and byte positions are
     * on their own cache line. In particular, the RedoRecPtr and full page
     * write variables below should be on a different cache line. They are
     * read on every WAL insertion, but updated rarely, and we don't want
     * those reads to steal the cache line containing Curr/PrevBytePos.
     * 確保以上激烈競爭的自旋鎖和字節(jié)位置在它們自己的緩存line上。
     * 特別是,RedoRecPtr和下面的全頁寫變量應(yīng)該位于不同的緩存line上。
     * 它們在每次插入WAL時都被讀取,但很少更新,
     *   我們不希望這些讀取竊取包含Curr/PrevBytePos的緩存line。
     */
    char        pad[PG_CACHE_LINE_SIZE];

    /*
     * fullPageWrites is the master copy used by all backends to determine
     * whether to write full-page to WAL, instead of using process-local one.
     * This is required because, when full_page_writes is changed by SIGHUP,
     * we must WAL-log it before it actually affects WAL-logging by backends.
     * Checkpointer sets at startup or after SIGHUP.
     * fullpagewrite是所有后臺進(jìn)程使用的主副本,
     *   用于確定是否將整個頁面寫入WAL,而不是使用process-local副本。
     * 這是必需的,因為當(dāng)SIGHUP更改full_page_write時,
     *   我們必須在它通過后臺進(jìn)程實際影響WAL-logging之前對其進(jìn)行WAL-log記錄。
     * Checkpointer檢查點設(shè)置在啟動或SIGHUP之后。
     *
     * To read these fields, you must hold an insertion lock. To modify them,
     * you must hold ALL the locks.
     * 為了讀取這些域,必須持有insertion lock.
     * 如需更新,則需要持有所有這些lock. 
     */
    //插入時的當(dāng)前redo point
    XLogRecPtr  RedoRecPtr;     /* current redo point for insertions */
    //為PITR強(qiáng)制執(zhí)行full-page寫?
    bool        forcePageWrites;    /* forcing full-page writes for PITR? */
    //是否全頁寫?
    bool        fullPageWrites;

    /*
     * exclusiveBackupState indicates the state of an exclusive backup (see
     * comments of ExclusiveBackupState for more details). nonExclusiveBackups
     * is a counter indicating the number of streaming base backups currently
     * in progress. forcePageWrites is set to true when either of these is
     * non-zero. lastBackupStart is the latest checkpoint redo location used
     * as a starting point for an online backup.
     * exclusive sivebackupstate表示排他備份的狀態(tài)
     * (有關(guān)詳細(xì)信息,請參閱exclusive sivebackupstate的注釋)。
     * 非排他性備份是一個計數(shù)器,指示當(dāng)前正在進(jìn)行的流基礎(chǔ)備份的數(shù)量。
     * forcePageWrites在這兩個值都不為零時被設(shè)置為true。
     * lastBackupStart用作在線備份起點的最新檢查點的重做位置。
     */
    ExclusiveBackupState exclusiveBackupState;
    int         nonExclusiveBackups;
    XLogRecPtr  lastBackupStart;

    /*
     * WAL insertion locks.
     * WAL寫入鎖
     */
    WALInsertLockPadded *WALInsertLocks;
} XLogCtlInsert;

XLogCtl
XLOG的所有共享內(nèi)存狀態(tài)信息

/*
 * Total shared-memory state for XLOG.
 * XLOG的所有共享內(nèi)存狀態(tài)信息
 */
typedef struct XLogCtlData
{
    XLogCtlInsert Insert;//插入控制器

    /* Protected by info_lck: */
    //------ 通過info_lck鎖保護(hù)
    XLogwrtRqst LogwrtRqst;
    //Insert->RedoRecPtr最近的拷貝
    XLogRecPtr  RedoRecPtr;     /* a recent copy of Insert->RedoRecPtr */
    //最后的checkpoint的nextXID & epoch
    uint32      ckptXidEpoch;   /* nextXID & epoch of latest checkpoint */
    TransactionId ckptXid;
    //最新異步提交/回滾的LSN
    XLogRecPtr  asyncXactLSN;   /* LSN of newest async commit/abort */
    //slot需要的最"老"的LSN
    XLogRecPtr  replicationSlotMinLSN;  /* oldest LSN needed by any slot */
    //最后移除/回收的XLOG段
    XLogSegNo   lastRemovedSegNo;   /* latest removed/recycled XLOG segment */

    /* Fake LSN counter, for unlogged relations. Protected by ulsn_lck. */
    //---- "偽裝"的LSN計數(shù)器,用于不需要記錄日志的關(guān)系.通過ulsn_lck鎖保護(hù)
    XLogRecPtr  unloggedLSN;
    slock_t     ulsn_lck;

    /* Time and LSN of last xlog segment switch. Protected by WALWriteLock. */
    //---- 切換后最新的xlog段的時間線和LSN,通過WALWriteLock鎖保護(hù)
    pg_time_t   lastSegSwitchTime;
    XLogRecPtr  lastSegSwitchLSN;

    /*
     * Protected by info_lck and WALWriteLock (you must hold either lock to
     * read it, but both to update)
     * 通過info_lck和WALWriteLock保護(hù)
     * (必須持有其中之一才能讀取,必須全部持有才能更新)
     */
    XLogwrtResult LogwrtResult;

    /*
     * Latest initialized page in the cache (last byte position + 1).
     * 在緩存中最后初始化的page(最后一個字節(jié)位置 + 1)
     * 
     * To change the identity of a buffer (and InitializedUpTo), you need to
     * hold WALBufMappingLock.  To change the identity of a buffer that's
     * still dirty, the old page needs to be written out first, and for that
     * you need WALWriteLock, and you need to ensure that there are no
     * in-progress insertions to the page by calling
     * WaitXLogInsertionsToFinish().
     * 如需改變緩沖區(qū)的標(biāo)識(以及InitializedUpTo),需要持有WALBufMappingLock鎖.
     * 改變標(biāo)記為dirty的緩沖區(qū)的標(biāo)識符,舊的page需要先行寫出,因此必須持有WALWriteLock鎖,
     *   而且必須確保沒有正在通過調(diào)用WaitXLogInsertionsToFinish()進(jìn)行執(zhí)行中的插入page操作
     */
    XLogRecPtr  InitializedUpTo;

    /*
     * These values do not change after startup, although the pointed-to pages
     * and xlblocks values certainly do.  xlblock values are protected by
     * WALBufMappingLock.
     * 在啟動后這些值不會修改,雖然pointed-to pages和xlblocks值通常會更改.
     * xlblock的值通過WALBufMappingLock鎖保護(hù).
     */
    //未寫入的XLOG pages的緩存
    char       *pages;          /* buffers for unwritten XLOG pages */
    //ptr-s的第一個字節(jié) + XLOG_BLCKSZ
    XLogRecPtr *xlblocks;       /* 1st byte ptr-s + XLOG_BLCKSZ */
    //已分配的xlog緩沖的索引最高值
    int         XLogCacheBlck;  /* highest allocated xlog buffer index */

    /*
     * Shared copy of ThisTimeLineID. Does not change after end-of-recovery.
     * If we created a new timeline when the system was started up,
     * PrevTimeLineID is the old timeline's ID that we forked off from.
     * Otherwise it's equal to ThisTimeLineID.
     * ThisTimeLineID的共享拷貝.
     * 在完成恢復(fù)后不要修改.
     * 如果在系統(tǒng)啟動后創(chuàng)建了一個新的時間線,PrevTimeLineID是從舊時間線分叉的ID.
     * 否則,PrevTimeLineID = ThisTimeLineID
     */
    TimeLineID  ThisTimeLineID;
    TimeLineID  PrevTimeLineID;

    /*
     * SharedRecoveryInProgress indicates if we're still in crash or archive
     * recovery.  Protected by info_lck.
     * SharedRecoveryInProgress標(biāo)記是否處于宕機(jī)或者歸檔恢復(fù)中,通過info_lck鎖保護(hù).
     */
    bool        SharedRecoveryInProgress;

    /*
     * SharedHotStandbyActive indicates if we're still in crash or archive
     * recovery.  Protected by info_lck.
     * SharedHotStandbyActive標(biāo)記是否處于宕機(jī)或者歸檔恢復(fù)中,通過info_lck鎖保護(hù).
     */
    bool        SharedHotStandbyActive;

    /*
     * WalWriterSleeping indicates whether the WAL writer is currently in
     * low-power mode (and hence should be nudged if an async commit occurs).
     * Protected by info_lck.
     * WalWriterSleeping標(biāo)記WAL writer進(jìn)程是否處于"節(jié)能"模式
     * (因此,如果發(fā)生異步提交,應(yīng)該對其進(jìn)行微操作).
     * 通過info_lck鎖保護(hù).
     */
    bool        WalWriterSleeping;

    /*
     * recoveryWakeupLatch is used to wake up the startup process to continue
     * WAL replay, if it is waiting for WAL to arrive or failover trigger file
     * to appear.
     * recoveryWakeupLatch等待WAL arrive或者failover觸發(fā)文件出現(xiàn),
     *   如出現(xiàn)則喚醒啟動進(jìn)程繼續(xù)執(zhí)行WAL回放.
     * 
     */
    Latch       recoveryWakeupLatch;

    /*
     * During recovery, we keep a copy of the latest checkpoint record here.
     * lastCheckPointRecPtr points to start of checkpoint record and
     * lastCheckPointEndPtr points to end+1 of checkpoint record.  Used by the
     * checkpointer when it wants to create a restartpoint.
     * 在恢復(fù)期間,我們保存最后檢查點記錄的一個拷貝在這里.
     * lastCheckPointRecPtr指向檢查點的起始位置
     * lastCheckPointEndPtr指向執(zhí)行檢查點的結(jié)束點+1位置
     * 在checkpointer進(jìn)程希望創(chuàng)建一個重新啟動的點時使用.
     *
     * Protected by info_lck.
     * 使用info_lck鎖保護(hù).
     */
    XLogRecPtr  lastCheckPointRecPtr;
    XLogRecPtr  lastCheckPointEndPtr;
    CheckPoint  lastCheckPoint;

    /*
     * lastReplayedEndRecPtr points to end+1 of the last record successfully
     * replayed. When we're currently replaying a record, ie. in a redo
     * function, replayEndRecPtr points to the end+1 of the record being
     * replayed, otherwise it's equal to lastReplayedEndRecPtr.
     * lastReplayedEndRecPtr指向最后一個成功回放的記錄的結(jié)束點 + 1的位置.
     * 如果正處于redo函數(shù)回放記錄期間,那么replayEndRecPtr指向正在恢復(fù)的記錄的結(jié)束點 + 1的位置,
     * 否則replayEndRecPtr = lastReplayedEndRecPtr
     */
    XLogRecPtr  lastReplayedEndRecPtr;
    TimeLineID  lastReplayedTLI;
    XLogRecPtr  replayEndRecPtr;
    TimeLineID  replayEndTLI;
    /* timestamp of last COMMIT/ABORT record replayed (or being replayed) */
    //最后的COMMIT/ABORT回放(或正在回放)記錄的時間戳
    TimestampTz recoveryLastXTime;

    /*
     * timestamp of when we started replaying the current chunk of WAL data,
     * only relevant for replication or archive recovery
     * 我們開始回放當(dāng)前的WAL chunk的時間戳(僅與復(fù)制或存檔恢復(fù)相關(guān))
     */
    TimestampTz currentChunkStartTime;
    /* Are we requested to pause recovery? */
    //是否請求暫?;謴?fù)
    bool        recoveryPause;

    /*
     * lastFpwDisableRecPtr points to the start of the last replayed
     * XLOG_FPW_CHANGE record that instructs full_page_writes is disabled.
     * lastFpwDisableRecPtr指向最后已回放的XLOG_FPW_CHANGE記錄(禁用對整個頁面的寫指令)的起始點.
     */
    XLogRecPtr  lastFpwDisableRecPtr;
    //鎖結(jié)構(gòu)
    slock_t     info_lck;       /* locks shared variables shown above */
} XLogCtlData;

static XLogCtlData *XLogCtl = NULL;

二、跟蹤分析

跟蹤任意一個后臺進(jìn)程,打印全局變量XLogCtl.

(gdb) p XLogCtl
$6 = (XLogCtlData *) 0x7f391e00ea80
(gdb) p *XLogCtl
$7 = {Insert = {insertpos_lck = 0 '\000', CurrBytePos = 5494680728, PrevBytePos = 5494680616, 
    pad = '\000' <repeats 127 times>, RedoRecPtr = 5510830896, forcePageWrites = false, fullPageWrites = true, 
    exclusiveBackupState = EXCLUSIVE_BACKUP_NONE, nonExclusiveBackups = 0, lastBackupStart = 0, 
    WALInsertLocks = 0x7f391e013100}, LogwrtRqst = {Write = 5510831008, Flush = 5510831008}, RedoRecPtr = 5510830896, 
  ckptXidEpoch = 0, ckptXid = 2036, asyncXactLSN = 5510830896, replicationSlotMinLSN = 0, lastRemovedSegNo = 0, 
  unloggedLSN = 1, ulsn_lck = 0 '\000', lastSegSwitchTime = 1545962218, lastSegSwitchLSN = 5507670464, LogwrtResult = {
    Write = 5510831008, Flush = 5510831008}, InitializedUpTo = 5527601152, pages = 0x7f391e014000 "\230\320\006", 
  xlblocks = 0x7f391e00f088, XLogCacheBlck = 2047, ThisTimeLineID = 1, PrevTimeLineID = 1, 
  archiveCleanupCommand = '\000' <repeats 1023 times>, SharedRecoveryInProgress = false, SharedHotStandbyActive = false, 
  WalWriterSleeping = true, recoveryWakeupLatch = {is_set = 0, is_shared = true, owner_pid = 0}, lastCheckPointRecPtr = 0, 
  lastCheckPointEndPtr = 0, lastCheckPoint = {redo = 0, ThisTimeLineID = 0, PrevTimeLineID = 0, fullPageWrites = false, 
    nextXidEpoch = 0, nextXid = 0, nextOid = 0, nextMulti = 0, nextMultiOffset = 0, oldestXid = 0, oldestXidDB = 0, 
    oldestMulti = 0, oldestMultiDB = 0, time = 0, oldestCommitTsXid = 0, newestCommitTsXid = 0, oldestActiveXid = 0}, 
  lastReplayedEndRecPtr = 0, lastReplayedTLI = 0, replayEndRecPtr = 0, replayEndTLI = 0, recoveryLastXTime = 0, 
  currentChunkStartTime = 0, recoveryPause = false, lastFpwDisableRecPtr = 0, info_lck = 0 '\000'}
(gdb) 

其中:
1.XLogCtl->Insert是XLogCtlInsert結(jié)構(gòu)體變量.
2.RedoRecPtr為5510830896 -> 1/48789B30,該值與pg_control文件中的REDO location相對應(yīng).

[xdb@localhost ~]$ pg_controldata|grep REDO
Latest checkpoint's REDO location:    1/48789B30
Latest checkpoint's REDO WAL file:    000000010000000100000048

3.ThisTimeLineID&PrevTimeLineID,時間線ID,值為1.
其他相關(guān)信息可對照結(jié)構(gòu)體定義閱讀.

三、參考資料

PostgreSQL 源碼解讀(4)- 插入數(shù)據(jù)#3(heap_insert)
PG Source Code

向AI問一下細(xì)節(jié)

免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點不代表本網(wǎng)站立場,如果涉及侵權(quán)請聯(lián)系站長郵箱:is@yisu.com進(jìn)行舉報,并提供相關(guān)證據(jù),一經(jīng)查實,將立刻刪除涉嫌侵權(quán)內(nèi)容。

AI