XLogInsert函數(shù)分析”的有關(guān)知識,在實際案例的操作過程中,不少人都會遇到這樣的困境,接下來就讓小編帶領(lǐng)大家學(xué)習(xí)..."/>
您好,登錄后才能下訂單哦!
本篇內(nèi)容介紹了“PostgreSQL中heap_insert->XLogInsert函數(shù)分析”的有關(guān)知識,在實際案例的操作過程中,不少人都會遇到這樣的困境,接下來就讓小編帶領(lǐng)大家學(xué)習(xí)一下如何處理這些情況吧!希望大家仔細(xì)閱讀,能夠?qū)W有所成!
靜態(tài)變量
進程中全局共享
/* * An array of XLogRecData structs, to hold registered data. * XLogRecData結(jié)構(gòu)體數(shù)組,存儲已注冊的數(shù)據(jù) */ static XLogRecData *rdatas; //已使用的入口 static int num_rdatas; /* entries currently used */ //已分配的空間大小 static int max_rdatas; /* allocated size */ //是否調(diào)用XLogBeginInsert函數(shù) static bool begininsert_called = false;
宏定義
typedef char* Pointer;//指針 typedef Pointer Page;//Page #define XLOG_HEAP_INSERT 0x00 /* * Pointer to a location in the XLOG. These pointers are 64 bits wide, * because we don't want them ever to overflow. * 指向XLOG中的位置. * 這些指針大小為64bit,以確保指針不會溢出. */ typedef uint64 XLogRecPtr; /* * Additional macros for access to page headers. (Beware multiple evaluation * of the arguments!) */ #define PageGetLSN(page) \ PageXLogRecPtrGet(((PageHeader) (page))->pd_lsn) #define PageSetLSN(page, lsn) \ PageXLogRecPtrSet(((PageHeader) (page))->pd_lsn, lsn) /* Buffer size required to store a compressed version of backup block image */ //存儲壓縮會后的塊鏡像所需要的緩存空間大小 #define PGLZ_MAX_BLCKSZ PGLZ_MAX_OUTPUT(BLCKSZ) /* * Fake spinlock implementation using semaphores --- slow and prone * to fall foul of kernel limits on number of semaphores, so don't use this * unless you must! The subroutines appear in spin.c. * 使用信號量的偽自旋鎖實現(xiàn)——很慢而且容易與內(nèi)核對信號量的限制相沖突, * 所以除非必須,否則不要使用它! * 相關(guān)的子例程出現(xiàn)在spin.c中。 */ typedef int slock_t;
XLogCtl
XLOG的所有共享內(nèi)存狀態(tài)信息
/* * Total shared-memory state for XLOG. * XLOG的所有共享內(nèi)存狀態(tài)信息 */ typedef struct XLogCtlData { XLogCtlInsert Insert;//插入控制器 /* Protected by info_lck: */ //------ 通過info_lck鎖保護 XLogwrtRqst LogwrtRqst; //Insert->RedoRecPtr最近的拷貝 XLogRecPtr RedoRecPtr; /* a recent copy of Insert->RedoRecPtr */ //最后的checkpoint的nextXID & epoch uint32 ckptXidEpoch; /* nextXID & epoch of latest checkpoint */ TransactionId ckptXid; //最新異步提交/回滾的LSN XLogRecPtr asyncXactLSN; /* LSN of newest async commit/abort */ //slot需要的最"老"的LSN XLogRecPtr replicationSlotMinLSN; /* oldest LSN needed by any slot */ //最后移除/回收的XLOG段 XLogSegNo lastRemovedSegNo; /* latest removed/recycled XLOG segment */ /* Fake LSN counter, for unlogged relations. Protected by ulsn_lck. */ //---- "偽裝"的LSN計數(shù)器,用于不需要記錄日志的關(guān)系.通過ulsn_lck鎖保護 XLogRecPtr unloggedLSN; slock_t ulsn_lck; /* Time and LSN of last xlog segment switch. Protected by WALWriteLock. */ //---- 切換后最新的xlog段的時間線和LSN,通過WALWriteLock鎖保護 pg_time_t lastSegSwitchTime; XLogRecPtr lastSegSwitchLSN; /* * Protected by info_lck and WALWriteLock (you must hold either lock to * read it, but both to update) * 通過info_lck和WALWriteLock保護 * (必須持有其中之一才能讀取,必須全部持有才能更新) */ XLogwrtResult LogwrtResult; /* * Latest initialized page in the cache (last byte position + 1). * 在緩存中最后初始化的page(最后一個字節(jié)位置 + 1) * * To change the identity of a buffer (and InitializedUpTo), you need to * hold WALBufMappingLock. To change the identity of a buffer that's * still dirty, the old page needs to be written out first, and for that * you need WALWriteLock, and you need to ensure that there are no * in-progress insertions to the page by calling * WaitXLogInsertionsToFinish(). * 如需改變緩沖區(qū)的標(biāo)識(以及InitializedUpTo),需要持有WALBufMappingLock鎖. * 改變標(biāo)記為dirty的緩沖區(qū)的標(biāo)識符,舊的page需要先行寫出,因此必須持有WALWriteLock鎖, * 而且必須確保沒有正在通過調(diào)用WaitXLogInsertionsToFinish()進行執(zhí)行中的插入page操作 */ XLogRecPtr InitializedUpTo; /* * These values do not change after startup, although the pointed-to pages * and xlblocks values certainly do. xlblock values are protected by * WALBufMappingLock. * 在啟動后這些值不會修改,雖然pointed-to pages和xlblocks值通常會更改. * xlblock的值通過WALBufMappingLock鎖保護. */ //未寫入的XLOG pages的緩存 char *pages; /* buffers for unwritten XLOG pages */ //ptr-s的第一個字節(jié) + XLOG_BLCKSZ XLogRecPtr *xlblocks; /* 1st byte ptr-s + XLOG_BLCKSZ */ //已分配的xlog緩沖的索引最高值 int XLogCacheBlck; /* highest allocated xlog buffer index */ /* * Shared copy of ThisTimeLineID. Does not change after end-of-recovery. * If we created a new timeline when the system was started up, * PrevTimeLineID is the old timeline's ID that we forked off from. * Otherwise it's equal to ThisTimeLineID. * ThisTimeLineID的共享拷貝. * 在完成恢復(fù)后不要修改. * 如果在系統(tǒng)啟動后創(chuàng)建了一個新的時間線,PrevTimeLineID是從舊時間線分叉的ID. * 否則,PrevTimeLineID = ThisTimeLineID */ TimeLineID ThisTimeLineID; TimeLineID PrevTimeLineID; /* * SharedRecoveryInProgress indicates if we're still in crash or archive * recovery. Protected by info_lck. * SharedRecoveryInProgress標(biāo)記是否處于宕機或者歸檔恢復(fù)中,通過info_lck鎖保護. */ bool SharedRecoveryInProgress; /* * SharedHotStandbyActive indicates if we're still in crash or archive * recovery. Protected by info_lck. * SharedHotStandbyActive標(biāo)記是否處于宕機或者歸檔恢復(fù)中,通過info_lck鎖保護. */ bool SharedHotStandbyActive; /* * WalWriterSleeping indicates whether the WAL writer is currently in * low-power mode (and hence should be nudged if an async commit occurs). * Protected by info_lck. * WalWriterSleeping標(biāo)記WAL writer進程是否處于"節(jié)能"模式 * (因此,如果發(fā)生異步提交,應(yīng)該對其進行微操作). * 通過info_lck鎖保護. */ bool WalWriterSleeping; /* * recoveryWakeupLatch is used to wake up the startup process to continue * WAL replay, if it is waiting for WAL to arrive or failover trigger file * to appear. * recoveryWakeupLatch等待WAL arrive或者failover觸發(fā)文件出現(xiàn), * 如出現(xiàn)則喚醒啟動進程繼續(xù)執(zhí)行WAL回放. * */ Latch recoveryWakeupLatch; /* * During recovery, we keep a copy of the latest checkpoint record here. * lastCheckPointRecPtr points to start of checkpoint record and * lastCheckPointEndPtr points to end+1 of checkpoint record. Used by the * checkpointer when it wants to create a restartpoint. * 在恢復(fù)期間,我們保存最后檢查點記錄的一個拷貝在這里. * lastCheckPointRecPtr指向檢查點的起始位置 * lastCheckPointEndPtr指向執(zhí)行檢查點的結(jié)束點+1位置 * 在checkpointer進程希望創(chuàng)建一個重新啟動的點時使用. * * Protected by info_lck. * 使用info_lck鎖保護. */ XLogRecPtr lastCheckPointRecPtr; XLogRecPtr lastCheckPointEndPtr; CheckPoint lastCheckPoint; /* * lastReplayedEndRecPtr points to end+1 of the last record successfully * replayed. When we're currently replaying a record, ie. in a redo * function, replayEndRecPtr points to the end+1 of the record being * replayed, otherwise it's equal to lastReplayedEndRecPtr. * lastReplayedEndRecPtr指向最后一個成功回放的記錄的結(jié)束點 + 1的位置. * 如果正處于redo函數(shù)回放記錄期間,那么replayEndRecPtr指向正在恢復(fù)的記錄的結(jié)束點 + 1的位置, * 否則replayEndRecPtr = lastReplayedEndRecPtr */ XLogRecPtr lastReplayedEndRecPtr; TimeLineID lastReplayedTLI; XLogRecPtr replayEndRecPtr; TimeLineID replayEndTLI; /* timestamp of last COMMIT/ABORT record replayed (or being replayed) */ //最后的COMMIT/ABORT回放(或正在回放)記錄的時間戳 TimestampTz recoveryLastXTime; /* * timestamp of when we started replaying the current chunk of WAL data, * only relevant for replication or archive recovery * 我們開始回放當(dāng)前的WAL chunk的時間戳(僅與復(fù)制或存檔恢復(fù)相關(guān)) */ TimestampTz currentChunkStartTime; /* Are we requested to pause recovery? */ //是否請求暫停恢復(fù) bool recoveryPause; /* * lastFpwDisableRecPtr points to the start of the last replayed * XLOG_FPW_CHANGE record that instructs full_page_writes is disabled. * lastFpwDisableRecPtr指向最后已回放的XLOG_FPW_CHANGE記錄(禁用對整個頁面的寫指令)的起始點. */ XLogRecPtr lastFpwDisableRecPtr; //鎖結(jié)構(gòu) slock_t info_lck; /* locks shared variables shown above */ } XLogCtlData; static XLogCtlData *XLogCtl = NULL;
heap_insert
主要實現(xiàn)邏輯是插入元組到堆中,其中存在對WAL(XLog)進行處理的部分.
參見PostgreSQL 源碼解讀(104)- WAL#1(Insert & WAL-heap_insert函數(shù)#1)
XLogInsert
插入一個具有指定的RMID和info字節(jié)的XLOG記錄,該記錄的主體是先前通過XLogRegister*調(diào)用注冊的數(shù)據(jù)和緩沖區(qū)引用。
/* * Insert an XLOG record having the specified RMID and info bytes, with the * body of the record being the data and buffer references registered earlier * with XLogRegister* calls. * 插入一個具有指定的RMID和info字節(jié)的XLOG記錄, * 該記錄的主體是先前通過XLogRegister*調(diào)用注冊的數(shù)據(jù)和緩沖區(qū)引用。 * * Returns XLOG pointer to end of record (beginning of next record). * This can be used as LSN for data pages affected by the logged action. * (LSN is the XLOG point up to which the XLOG must be flushed to disk * before the data page can be written out. This implements the basic * WAL rule "write the log before the data".) * 返回XLOG指針到記錄的結(jié)束點(下一條記錄的開始)。 * 這可以用作受日志操作影響的數(shù)據(jù)頁的LSN。 * (LSN是必須將XLOG刷新到磁盤才能寫出數(shù)據(jù)頁的XLOG點。 * 這實現(xiàn)了基本的WAL規(guī)則:“在數(shù)據(jù)之前寫日志”。) */ XLogRecPtr XLogInsert(RmgrId rmid, uint8 info) { XLogRecPtr EndPos;//uint64 /* XLogBeginInsert() must have been called. */ //在此前,XLogBeginInsert()必須已調(diào)用 if (!begininsert_called) elog(ERROR, "XLogBeginInsert was not called"); /* * The caller can set rmgr bits, XLR_SPECIAL_REL_UPDATE and * XLR_CHECK_CONSISTENCY; the rest are reserved for use by me. * 調(diào)用方必須設(shè)置rmgr位:XLR_SPECIAL_REL_UPDATE & XLR_CHECK_CONSISTENCY. * 其余在這里保留使用 */ if ((info & ~(XLR_RMGR_INFO_MASK | XLR_SPECIAL_REL_UPDATE | XLR_CHECK_CONSISTENCY)) != 0) elog(PANIC, "invalid xlog info mask %02X", info); TRACE_POSTGRESQL_WAL_INSERT(rmid, info); /* * In bootstrap mode, we don't actually log anything but XLOG resources; * return a phony record pointer. * 在bootstrap模式,除了XLOG資源外,不需要實際記錄內(nèi)容. * 返回一個偽記錄指針. */ if (IsBootstrapProcessingMode() && rmid != RM_XLOG_ID) { XLogResetInsertion(); EndPos = SizeOfXLogLongPHD; /* 返回偽記錄指針;start of 1st chkpt record */ return EndPos; } do { //循環(huán) XLogRecPtr RedoRecPtr; bool doPageWrites; XLogRecPtr fpw_lsn; XLogRecData *rdt; /* * Get values needed to decide whether to do full-page writes. Since * we don't yet have an insertion lock, these could change under us, * but XLogInsertRecord will recheck them once it has a lock. * 獲取決定是否執(zhí)行全頁寫入所需的值。 * 由于我們還沒有插入鎖,所以這些可能會在我們的操作期間被更改, * 但是XLogInsertRecord一旦有了鎖,就會重新檢查它們。 */ GetFullPageWriteInfo(&RedoRecPtr, &doPageWrites); rdt = XLogRecordAssemble(rmid, info, RedoRecPtr, doPageWrites, &fpw_lsn); //curinsert_flags類型為uint8 EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags); } while (EndPos == InvalidXLogRecPtr); XLogResetInsertion(); return EndPos; }
XLogInsertRecord
插入一個由已經(jīng)構(gòu)造的數(shù)據(jù)chunks鏈表示的XLOG記錄。
/* * Insert an XLOG record represented by an already-constructed chain of data * chunks. This is a low-level routine; to construct the WAL record header * and data, use the higher-level routines in xloginsert.c. * 插入一個由已經(jīng)構(gòu)造的數(shù)據(jù)chunks鏈表示的XLOG記錄。 * 這是一個比較底層的處理邏輯實現(xiàn), * 使用xloginsert.c中高層的子程序構(gòu)造WAL記錄的頭部和數(shù)據(jù) * * If 'fpw_lsn' is valid, it is the oldest LSN among the pages that this * WAL record applies to, that were not included in the record as full page * images. If fpw_lsn <= RedoRecPtr, the function does not perform the * insertion and returns InvalidXLogRecPtr. The caller can then recalculate * which pages need a full-page image, and retry. If fpw_lsn is invalid, the * record is always inserted. * 如"fpw_lsn"是有效的,那么該值為在所有的WAL記錄應(yīng)用到pages中最小的LSN, * 但該值不包括全頁鏡像的記錄. * 如fpw_lsn <= RedoRecPtr,該函數(shù)不會執(zhí)行插入同時會返回InvalidXLogRecPtr. * 調(diào)用者可以重新計算哪些pages需要full-page image以及記錄入口. * 如果fpw_lsn無效,那么記錄已被插入. * * 'flags' gives more in-depth control on the record being inserted. See * XLogSetRecordFlags() for details. * "flags"在即將插入的記錄上給定了更多的深層次的控制. * 查看函數(shù)XLogSetRecordFlags()獲取更多的細(xì)節(jié)信息. * * The first XLogRecData in the chain must be for the record header, and its * data must be MAXALIGNed. XLogInsertRecord fills in the xl_prev and * xl_crc fields in the header, the rest of the header must already be filled * by the caller. * 鏈中的第一個XLogRecData必須是吉林的頭部,數(shù)據(jù)必須已被MAXALIGNed. * XLogInsertRecord填充在頭部的xl_prev和xl_crc域中, * 頭部的其他域已通過調(diào)用者提供. * * Returns XLOG pointer to end of record (beginning of next record). * This can be used as LSN for data pages affected by the logged action. * (LSN is the XLOG point up to which the XLOG must be flushed to disk * before the data page can be written out. This implements the basic * WAL rule "write the log before the data".) * 返回XLOG指針,指向記錄結(jié)束的位置(下一記錄的起始點). * 這可以用作受日志操作影響的數(shù)據(jù)頁的LSN。 * (LSN是必須將XLOG刷新到磁盤上才能寫出數(shù)據(jù)頁的XLOG點。 * 這實現(xiàn)了WAL的基本規(guī)則"在寫數(shù)據(jù)前寫日志") */ XLogRecPtr XLogInsertRecord(XLogRecData *rdata, XLogRecPtr fpw_lsn, uint8 flags) { XLogCtlInsert *Insert = &XLogCtl->Insert;//XLOG寫入控制器 pg_crc32c rdata_crc;//uint32 bool inserted; XLogRecord *rechdr = (XLogRecord *) rdata->data; uint8 info = rechdr->xl_info & ~XLR_INFO_MASK; bool isLogSwitch = (rechdr->xl_rmid == RM_XLOG_ID && info == XLOG_SWITCH); XLogRecPtr StartPos; XLogRecPtr EndPos; bool prevDoPageWrites = doPageWrites; /* we assume that all of the record header is in the first chunk */ //假定所有的記錄頭部數(shù)據(jù)都處于第一個chunk中 Assert(rdata->len >= SizeOfXLogRecord); /* cross-check on whether we should be here or not */ //交叉檢查 if (!XLogInsertAllowed()) elog(ERROR, "cannot make new WAL entries during recovery"); /*---------- * * We have now done all the preparatory work we can without holding a * lock or modifying shared state. From here on, inserting the new WAL * record to the shared WAL buffer cache is a two-step process: * 現(xiàn)在,我們已經(jīng)完成了所有的準(zhǔn)備工作,無需持有鎖或修改共享狀態(tài)。 * 從這里開始,將新的WAL記錄插入到共享的WAL緩沖區(qū)緩存需要兩個步驟: * * 1. Reserve the right amount of space from the WAL. The current head of * reserved space is kept in Insert->CurrBytePos, and is protected by * insertpos_lck. * 1. 從WAL中預(yù)留合適的空間.預(yù)留空間的頭部保存在Insert->CurrBytePos中, * 通過insertpos_lck鎖保護 * * 2. Copy the record to the reserved WAL space. This involves finding the * correct WAL buffer containing the reserved space, and copying the * record in place. This can be done concurrently in multiple processes. * 2. 拷貝記錄到保留的WAL空間中.這會涉及到尋找持有保留空間的正確的WAL緩沖區(qū), * 以及拷貝記錄到合適的位置上. * 在多進程間必須同步完成. * * To keep track of which insertions are still in-progress, each concurrent * inserter acquires an insertion lock. In addition to just indicating that * an insertion is in progress, the lock tells others how far the inserter * has progressed. There is a small fixed number of insertion locks, * determined by NUM_XLOGINSERT_LOCKS. When an inserter crosses a page * boundary, it updates the value stored in the lock to the how far it has * inserted, to allow the previous buffer to be flushed. * 為了跟蹤那個插入操作仍處于進行當(dāng)中,每一個當(dāng)前的插入器需要insertion鎖. * 除了用于標(biāo)識那個insertion處于進行當(dāng)中,鎖同時會告知其他插入器可以處理的邊界界限. * 系統(tǒng)有少數(shù)幾個固定數(shù)量的insertion所,通過參數(shù)NUM_XLOGINSERT_LOCKS定義. * 如果某個插入器跨越了page的邊界,該插入器會更新存儲在鎖中的值以表示它已插入的大小, * 這樣方便刷新先前的緩存. * * Holding onto an insertion lock also protects RedoRecPtr and * fullPageWrites from changing until the insertion is finished. * 持有插入鎖還可以保護RedoRecPtr和fullpagewrite在插入完成之前不受更改。 * * Step 2 can usually be done completely in parallel. If the required WAL * page is not initialized yet, you have to grab WALBufMappingLock to * initialize it, but the WAL writer tries to do that ahead of insertions * to avoid that from happening in the critical path. * 步驟2通??梢酝耆⑿型瓿伞? * 如果所需的WAL頁面還沒有初始化,您必須獲取WALBufMappingLock來初始化它, * 但是WAL writer進程會在插入之前嘗試這樣做,以避免在關(guān)鍵路徑中發(fā)生這種情況。 * *---------- */ START_CRIT_SECTION(); if (isLogSwitch) WALInsertLockAcquireExclusive(); else WALInsertLockAcquire(); /* * Check to see if my copy of RedoRecPtr is out of date. If so, may have * to go back and have the caller recompute everything. This can only * happen just after a checkpoint, so it's better to be slow in this case * and fast otherwise. * 看看進程的RedoRecPtr是不是過期了。 * 如果是,可能需要返回并讓調(diào)用方重新計算所有內(nèi)容。 * 這只會在檢查點之后才會發(fā)生,所以在這種情況下最好慢一點,否則最好快一點。 * * Also check to see if fullPageWrites or forcePageWrites was just turned * on; if we weren't already doing full-page writes then go back and * recompute. * 還要檢查是否打開了fullpagewrite或forcepagewrite; * 如果我們還沒有完成整頁的寫操作,那么返回并重新計算。 * * If we aren't doing full-page writes then RedoRecPtr doesn't actually * affect the contents of the XLOG record, so we'll update our local copy * but not force a recomputation. (If doPageWrites was just turned off, * we could recompute the record without full pages, but we choose not to * bother.) * 如果我們并沒有在執(zhí)行全頁寫操作,那么RedoRecPtr實際上不會影響XLOG記錄的內(nèi)容, * 因此我們將更新本地副本,但不會強制進行重新計算。 * (如果doPageWrites關(guān)閉,可以在沒有完整頁面的情況下重新計算記錄,但我們沒有這種麻煩的做法。) * */ if (RedoRecPtr != Insert->RedoRecPtr) { Assert(RedoRecPtr < Insert->RedoRecPtr); RedoRecPtr = Insert->RedoRecPtr; } doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites); if (doPageWrites && (!prevDoPageWrites || (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr))) { /* * Oops, some buffer now needs to be backed up that the caller didn't * back up. Start over. * 糟糕,現(xiàn)在需要備份一些調(diào)用者沒有備份的緩沖區(qū)。 * 讓我們重新開始吧。 */ WALInsertLockRelease(); END_CRIT_SECTION(); return InvalidXLogRecPtr; } /* * Reserve space for the record in the WAL. This also sets the xl_prev * pointer. * 在WAL預(yù)留記錄空間.同時會設(shè)置xl_prev指針. * */ if (isLogSwitch) inserted = ReserveXLogSwitch(&StartPos, &EndPos, &rechdr->xl_prev); else { ReserveXLogInsertLocation(rechdr->xl_tot_len, &StartPos, &EndPos, &rechdr->xl_prev); inserted = true; } if (inserted) { /* * Now that xl_prev has been filled in, calculate CRC of the record * header. * 現(xiàn)在xl_prev指針已填充,計算記錄頭部的CRC */ rdata_crc = rechdr->xl_crc; COMP_CRC32C(rdata_crc, rechdr, offsetof(XLogRecord, xl_crc)); FIN_CRC32C(rdata_crc); rechdr->xl_crc = rdata_crc; /* * All the record data, including the header, is now ready to be * inserted. Copy the record in the space reserved. * 所有的記錄數(shù)據(jù),包括頭部數(shù)據(jù),準(zhǔn)備插入! * 拷貝記錄到保留空間中. */ CopyXLogRecordToWAL(rechdr->xl_tot_len, isLogSwitch, rdata, StartPos, EndPos); /* * Unless record is flagged as not important, update LSN of last * important record in the current slot. When holding all locks, just * update the first one. * 除非記錄被標(biāo)記為不重要,否則更新當(dāng)前slot中最后一條重要記錄的LSN。 * 如持有所有鎖,只需更新第一個。 */ if ((flags & XLOG_MARK_UNIMPORTANT) == 0) { int lockno = holdingAllLocks ? 0 : MyLockNo; WALInsertLocks[lockno].l.lastImportantAt = StartPos; } } else { /* * This was an xlog-switch record, but the current insert location was * already exactly at the beginning of a segment, so there was no need * to do anything. * 這是一個xlog-switch記錄,但是當(dāng)前插入位置已經(jīng)確切地位于段的開頭,所以不需要做任何事情。 */ } /* * Done! Let others know that we're finished. * 全部完成!讓其他插入器知道我們已經(jīng)完成了! */ WALInsertLockRelease(); MarkCurrentTransactionIdLoggedIfAny(); END_CRIT_SECTION(); /* * Update shared LogwrtRqst.Write, if we crossed page boundary. * 如跨越了page邊界,更新共享的LogwrtRqst.Write變量 */ if (StartPos / XLOG_BLCKSZ != EndPos / XLOG_BLCKSZ) { SpinLockAcquire(&XLogCtl->info_lck); /* advance global request to include new block(s) */ //預(yù)先請求包含新塊(s) if (XLogCtl->LogwrtRqst.Write < EndPos) XLogCtl->LogwrtRqst.Write = EndPos; /* update local result copy while I have the chance */ //如有機會,更新本地的結(jié)果拷貝 LogwrtResult = XLogCtl->LogwrtResult; SpinLockRelease(&XLogCtl->info_lck); } /* * If this was an XLOG_SWITCH record, flush the record and the empty * padding space that fills the rest of the segment, and perform * end-of-segment actions (eg, notifying archiver). * 如果這是一條XLOG_SWITCH記錄, * 刷新記錄和填充該段其余部分的空白填充空間, * 并執(zhí)行段結(jié)束操作(例如,通知歸檔器)。 */ if (isLogSwitch) { TRACE_POSTGRESQL_WAL_SWITCH(); XLogFlush(EndPos); /* * Even though we reserved the rest of the segment for us, which is * reflected in EndPos, we return a pointer to just the end of the * xlog-switch record. * 即使我們?yōu)樽约罕A袅硕蔚钠溆嗖糠?這反映在EndPos中), * 我們也只返回一個指向xlog-switch記錄末尾的指針。 */ if (inserted) { EndPos = StartPos + SizeOfXLogRecord; if (StartPos / XLOG_BLCKSZ != EndPos / XLOG_BLCKSZ) { uint64 offset = XLogSegmentOffset(EndPos, wal_segment_size); if (offset == EndPos % XLOG_BLCKSZ) EndPos += SizeOfXLogLongPHD; else EndPos += SizeOfXLogShortPHD; } } } #ifdef WAL_DEBUG//DEBUG代碼 if (XLOG_DEBUG) { static XLogReaderState *debug_reader = NULL; StringInfoData buf; StringInfoData recordBuf; char *errormsg = NULL; MemoryContext oldCxt; oldCxt = MemoryContextSwitchTo(walDebugCxt); initStringInfo(&buf); appendStringInfo(&buf, "INSERT @ %X/%X: ", (uint32) (EndPos >> 32), (uint32) EndPos); /* * We have to piece together the WAL record data from the XLogRecData * entries, so that we can pass it to the rm_desc function as one * contiguous chunk. */ initStringInfo(&recordBuf); for (; rdata != NULL; rdata = rdata->next) appendBinaryStringInfo(&recordBuf, rdata->data, rdata->len); if (!debug_reader) debug_reader = XLogReaderAllocate(wal_segment_size, NULL, NULL); if (!debug_reader) { appendStringInfoString(&buf, "error decoding record: out of memory"); } else if (!DecodeXLogRecord(debug_reader, (XLogRecord *) recordBuf.data, &errormsg)) { appendStringInfo(&buf, "error decoding record: %s", errormsg ? errormsg : "no error message"); } else { appendStringInfoString(&buf, " - "); xlog_outdesc(&buf, debug_reader); } elog(LOG, "%s", buf.data); pfree(buf.data); pfree(recordBuf.data); MemoryContextSwitchTo(oldCxt); } #endif /* * Update our global variables * 更新全局變量 */ ProcLastRecPtr = StartPos; XactLastRecEnd = EndPos; return EndPos; }
測試腳本如下
insert into t_wal_partition(c1,c2,c3) VALUES(0,'HASH0','HAHS0');
啟動gdb,設(shè)置斷點,進入XLogInsert
(gdb) b XLogInsert Breakpoint 1 at 0x5652d6: file xloginsert.c, line 420. (gdb) c Continuing. Breakpoint 1, XLogInsert (rmid=10 '\n', info=0 '\000') at xloginsert.c:420 420 if (!begininsert_called)
在此前,XLogBeginInsert()必須已調(diào)用
420 if (!begininsert_called) (gdb) n
調(diào)用方必須設(shè)置rmgr位:XLR_SPECIAL_REL_UPDATE & XLR_CHECK_CONSISTENCY.其余在這里保留使用
427 if ((info & ~(XLR_RMGR_INFO_MASK | (gdb) n 432 TRACE_POSTGRESQL_WAL_INSERT(rmid, info);
進入循環(huán)
(gdb) n 438 if (IsBootstrapProcessingMode() && rmid != RM_XLOG_ID) (gdb) 457 GetFullPageWriteInfo(&RedoRecPtr, &doPageWrites);
獲取決定是否執(zhí)行全頁寫入所需的值
(gdb) p *RedoRecPtr $1 = 1166604425 (gdb) p doPageWrites $2 = false (gdb) n 459 rdt = XLogRecordAssemble(rmid, info, RedoRecPtr, doPageWrites, (gdb) p RedoRecPtr $3 = 5411227832 (gdb) p doPageWrites $4 = true
獲取rdt
(gdb) n 462 EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags); (gdb) p *rdt $5 = {next = 0x2a911b8, data = 0x2a8f460 <incomplete sequence \322>, len = 51}
XLogInsertRecord->調(diào)用XLogInsertRecord,進入XLogInsertRecord函數(shù)
fpw_lsn=0, flags=1 '\001'
(gdb) step XLogInsertRecord (rdata=0xf9cc70 <hdr_rdt>, fpw_lsn=0, flags=1 '\001') at xlog.c:970 970 XLogCtlInsert *Insert = &XLogCtl->Insert;
XLogInsertRecord->獲取插入管理器
(gdb) n 973 XLogRecord *rechdr = (XLogRecord *) rdata->data; (gdb) p *Insert $6 = {insertpos_lck = 0 '\000', CurrBytePos = 5395369608, PrevBytePos = 5395369552, pad = '\000' <repeats 127 times>, RedoRecPtr = 5411227832, forcePageWrites = false, fullPageWrites = true, exclusiveBackupState = EXCLUSIVE_BACKUP_NONE, nonExclusiveBackups = 0, lastBackupStart = 0, WALInsertLocks = 0x7fa2523d4100}
XLogInsertRecord->變量賦值
(gdb) n 974 uint8 info = rechdr->xl_info & ~XLR_INFO_MASK; (gdb) 975 bool isLogSwitch = (rechdr->xl_rmid == RM_XLOG_ID && (gdb) 979 bool prevDoPageWrites = doPageWrites; (gdb) 982 Assert(rdata->len >= SizeOfXLogRecord); (gdb) (gdb) p *rechdr $7 = {xl_tot_len = 210, xl_xid = 1948, xl_prev = 0, xl_info = 0 '\000', xl_rmid = 10 '\n', xl_crc = 3212449170} (gdb) p info $8 = 0 '\000' (gdb) p isLogSwitch $9 = false (gdb) p prevDoPageWrites $10 = true
XLogInsertRecord->執(zhí)行相關(guān)判斷,開啟CRIT_SECTION,并獲取WAL插入鎖
(gdb) n 985 if (!XLogInsertAllowed()) (gdb) 1020 START_CRIT_SECTION(); (gdb) 1021 if (isLogSwitch) (gdb) 1024 WALInsertLockAcquire(); (gdb) 1042 if (RedoRecPtr != Insert->RedoRecPtr) (gdb)
XLogInsertRecord->執(zhí)行相關(guān)判斷,更新doPageWrites
(gdb) p RedoRecPtr $11 = 5411227832 (gdb) p Insert->RedoRecPtr $12 = 5411227832 (gdb) n 1047 doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites); (gdb) 1049 if (doPageWrites && (gdb) p doPageWrites $13 = true (gdb) n 1050 (!prevDoPageWrites || (gdb) 1049 if (doPageWrites &&
XLogInsertRecord->在WAL預(yù)留記錄空間.同時會設(shè)置xl_prev指針.
(gdb) 1050 (!prevDoPageWrites || (gdb) 1066 if (isLogSwitch) (gdb) 1070 ReserveXLogInsertLocation(rechdr->xl_tot_len, &StartPos, &EndPos, (gdb) 1072 inserted = true; (gdb) p rechdr->xl_tot_len $14 = 210 (gdb) p StartPos $15 = 5411228000 (gdb) p EndPos $16 = 5411228216 (gdb) p *rechdr->xl_prev Cannot access memory at address 0x14288c928 (gdb) p rechdr->xl_prev $17 = 5411227944 (gdb)
XLogInsertRecord->現(xiàn)在xl_prev指針已填充,計算記錄頭部的CRC
(gdb) n 1075 if (inserted) (gdb) 1081 rdata_crc = rechdr->xl_crc; (gdb) 1082 COMP_CRC32C(rdata_crc, rechdr, offsetof(XLogRecord, xl_crc)); (gdb) 1083 FIN_CRC32C(rdata_crc); (gdb) 1084 rechdr->xl_crc = rdata_crc; (gdb) 1090 CopyXLogRecordToWAL(rechdr->xl_tot_len, isLogSwitch, rdata, (gdb) p rdata_crc $18 = 2310972234 (gdb) p *rechdr $19 = {xl_tot_len = 210, xl_xid = 1948, xl_prev = 5411227944, xl_info = 0 '\000', xl_rmid = 10 '\n', xl_crc = 2310972234} (gdb)
XLogInsertRecord->所有的記錄數(shù)據(jù),包括頭部數(shù)據(jù)已OK,準(zhǔn)備插入!拷貝記錄到保留空間中.
除非記錄被標(biāo)記為不重要,否則更新當(dāng)前slot中最后一條重要記錄的LSN.
(gdb) n 1098 if ((flags & XLOG_MARK_UNIMPORTANT) == 0) (gdb) 1100 int lockno = holdingAllLocks ? 0 : MyLockNo; (gdb) (gdb) n 1102 WALInsertLocks[lockno].l.lastImportantAt = StartPos; (gdb) 1117 WALInsertLockRelease();
XLogInsertRecord->全部完成!讓其他插入器知道我們已經(jīng)完成了!
如跨越了page邊界,更新共享的LogwrtRqst.Write變量
(gdb) 1117 WALInsertLockRelease(); (gdb) n 1119 MarkCurrentTransactionIdLoggedIfAny(); (gdb) 1121 END_CRIT_SECTION(); (gdb) 1126 if (StartPos / XLOG_BLCKSZ != EndPos / XLOG_BLCKSZ) (gdb) 1142 if (isLogSwitch)
XLogInsertRecord->更新全局變量,函數(shù)返回
(gdb) 1220 ProcLastRecPtr = StartPos; (gdb) 1221 XactLastRecEnd = EndPos; (gdb) 1223 return EndPos; (gdb) 1224 }
返回XLogInsert,重置insertion,返回EndPos,結(jié)束
(gdb) XLogInsert (rmid=10 '\n', info=0 '\000') at xloginsert.c:463 463 } while (EndPos == InvalidXLogRecPtr); (gdb) n 465 XLogResetInsertion(); (gdb) 467 return EndPos; (gdb) 468 } (gdb) p EndPos $20 = 5411228216 (gdb) $21 = 5411228216 (gdb) n heap_insert (relation=0x7fa280616228, tup=0x2b15440, cid=0, options=0, bistate=0x0) at heapam.c:2590 2590 PageSetLSN(page, recptr); (gdb)
“PostgreSQL中heap_insert->XLogInsert函數(shù)分析”的內(nèi)容就介紹到這里了,感謝大家的閱讀。如果想了解更多行業(yè)相關(guān)的知識可以關(guān)注億速云網(wǎng)站,小編將為大家輸出更多高質(zhì)量的實用文章!
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點不代表本網(wǎng)站立場,如果涉及侵權(quán)請聯(lián)系站長郵箱:is@yisu.com進行舉報,并提供相關(guān)證據(jù),一經(jīng)查實,將立刻刪除涉嫌侵權(quán)內(nèi)容。