您好,登錄后才能下訂單哦!
這篇文章主要介紹“PostgreSQL中BufferAlloc函數(shù)有什么作用”,在日常操作中,相信很多人在PostgreSQL中BufferAlloc函數(shù)有什么作用問題上存在疑惑,小編查閱了各式資料,整理出簡單好用的操作方法,希望對大家解答”PostgreSQL中BufferAlloc函數(shù)有什么作用”的疑惑有所幫助!接下來,請跟著小編一起來學(xué)習(xí)吧!
BufferDesc
共享緩沖區(qū)的共享描述符(狀態(tài))數(shù)據(jù)
/* * Flags for buffer descriptors * buffer描述器標(biāo)記 * * Note: TAG_VALID essentially means that there is a buffer hashtable * entry associated with the buffer's tag. * 注意:TAG_VALID本質(zhì)上意味著有一個與緩沖區(qū)的標(biāo)記相關(guān)聯(lián)的緩沖區(qū)散列表條目。 */ //buffer header鎖定 #define BM_LOCKED (1U << 22) /* buffer header is locked */ //數(shù)據(jù)需要寫入(標(biāo)記為DIRTY) #define BM_DIRTY (1U << 23) /* data needs writing */ //數(shù)據(jù)是有效的 #define BM_VALID (1U << 24) /* data is valid */ //已分配buffer tag #define BM_TAG_VALID (1U << 25) /* tag is assigned */ //正在R/W #define BM_IO_IN_PROGRESS (1U << 26) /* read or write in progress */ //上一個I/O出現(xiàn)錯誤 #define BM_IO_ERROR (1U << 27) /* previous I/O failed */ //開始寫則變DIRTY #define BM_JUST_DIRTIED (1U << 28) /* dirtied since write started */ //存在等待sole pin的其他進(jìn)程 #define BM_PIN_COUNT_WAITER (1U << 29) /* have waiter for sole pin */ //checkpoint發(fā)生,必須刷到磁盤上 #define BM_CHECKPOINT_NEEDED (1U << 30) /* must write for checkpoint */ //持久化buffer(不是unlogged或者初始化fork) #define BM_PERMANENT (1U << 31) /* permanent buffer (not unlogged, * or init fork) */ /* * BufferDesc -- shared descriptor/state data for a single shared buffer. * BufferDesc -- 共享緩沖區(qū)的共享描述符(狀態(tài))數(shù)據(jù) * * Note: Buffer header lock (BM_LOCKED flag) must be held to examine or change * the tag, state or wait_backend_pid fields. In general, buffer header lock * is a spinlock which is combined with flags, refcount and usagecount into * single atomic variable. This layout allow us to do some operations in a * single atomic operation, without actually acquiring and releasing spinlock; * for instance, increase or decrease refcount. buf_id field never changes * after initialization, so does not need locking. freeNext is protected by * the buffer_strategy_lock not buffer header lock. The LWLock can take care * of itself. The buffer header lock is *not* used to control access to the * data in the buffer! * 注意:必須持有Buffer header鎖(BM_LOCKED標(biāo)記)才能檢查或修改tag/state/wait_backend_pid字段. * 通常來說,buffer header lock是spinlock,它與標(biāo)記位/參考計數(shù)/使用計數(shù)組合到單個原子變量中. * 這個布局設(shè)計允許我們執(zhí)行原子操作,而不需要實際獲得或者釋放spinlock(比如,增加或者減少參考計數(shù)). * buf_id字段在初始化后不會出現(xiàn)變化,因此不需要鎖定. * freeNext通過buffer_strategy_lock鎖而不是buffer header lock保護(hù). * LWLock可以很好的處理自己的狀態(tài). * 務(wù)請注意的是:buffer header lock不用于控制buffer中的數(shù)據(jù)訪問! * * It's assumed that nobody changes the state field while buffer header lock * is held. Thus buffer header lock holder can do complex updates of the * state variable in single write, simultaneously with lock release (cleaning * BM_LOCKED flag). On the other hand, updating of state without holding * buffer header lock is restricted to CAS, which insure that BM_LOCKED flag * is not set. Atomic increment/decrement, OR/AND etc. are not allowed. * 假定在持有buffer header lock的情況下,沒有人改變狀態(tài)字段. * 持有buffer header lock的進(jìn)程可以執(zhí)行在單個寫操作中執(zhí)行復(fù)雜的狀態(tài)變量更新, * 同步的釋放鎖(清除BM_LOCKED標(biāo)記). * 換句話說,如果沒有持有buffer header lock的狀態(tài)更新,會受限于CAS, * 這種情況下確保BM_LOCKED沒有被設(shè)置. * 比如原子的增加/減少(AND/OR)等操作是不允許的. * * An exception is that if we have the buffer pinned, its tag can't change * underneath us, so we can examine the tag without locking the buffer header. * Also, in places we do one-time reads of the flags without bothering to * lock the buffer header; this is generally for situations where we don't * expect the flag bit being tested to be changing. * 一種例外情況是如果我們已有buffer pinned,該buffer的tag不能改變(在本進(jìn)程之下), * 因此不需要鎖定buffer header就可以檢查tag了. * 同時,在執(zhí)行一次性的flags讀取時不需要鎖定buffer header. * 這種情況通常用于我們不希望正在測試的flag bit將被改變. * * We can't physically remove items from a disk page if another backend has * the buffer pinned. Hence, a backend may need to wait for all other pins * to go away. This is signaled by storing its own PID into * wait_backend_pid and setting flag bit BM_PIN_COUNT_WAITER. At present, * there can be only one such waiter per buffer. * 如果其他進(jìn)程有buffer pinned,那么進(jìn)程不能物理的從磁盤頁面中刪除items. * 因此,后臺進(jìn)程需要等待其他pins清除.這可以通過存儲它自己的PID到wait_backend_pid中, * 并設(shè)置標(biāo)記位BM_PIN_COUNT_WAITER. * 目前,每個緩沖區(qū)只能由一個等待進(jìn)程. * * We use this same struct for local buffer headers, but the locks are not * used and not all of the flag bits are useful either. To avoid unnecessary * overhead, manipulations of the state field should be done without actual * atomic operations (i.e. only pg_atomic_read_u32() and * pg_atomic_unlocked_write_u32()). * 本地緩沖頭部使用同樣的結(jié)構(gòu),但并不需要使用locks,而且并不是所有的標(biāo)記位都使用. * 為了避免不必要的負(fù)載,狀態(tài)域的維護(hù)不需要實際的原子操作 * (比如只有pg_atomic_read_u32() and pg_atomic_unlocked_write_u32()) * * Be careful to avoid increasing the size of the struct when adding or * reordering members. Keeping it below 64 bytes (the most common CPU * cache line size) is fairly important for performance. * 在增加或者記錄成員變量時,小心避免增加結(jié)構(gòu)體的大小. * 保持結(jié)構(gòu)體大小在64字節(jié)內(nèi)(通常的CPU緩存線大小)對于性能是非常重要的. */ typedef struct BufferDesc { //buffer tag BufferTag tag; /* ID of page contained in buffer */ //buffer索引編號(0開始) int buf_id; /* buffer's index number (from 0) */ /* state of the tag, containing flags, refcount and usagecount */ //tag狀態(tài),包括flags/refcount和usagecount pg_atomic_uint32 state; //pin-count等待進(jìn)程ID int wait_backend_pid; /* backend PID of pin-count waiter */ //空閑鏈表鏈中下一個空閑的buffer int freeNext; /* link in freelist chain */ //緩沖區(qū)內(nèi)容鎖 LWLock content_lock; /* to lock access to buffer contents */ } BufferDesc;
BufferTag
Buffer tag標(biāo)記了buffer存儲的是磁盤中哪個block
/* * Buffer tag identifies which disk block the buffer contains. * Buffer tag標(biāo)記了buffer存儲的是磁盤中哪個block * * Note: the BufferTag data must be sufficient to determine where to write the * block, without reference to pg_class or pg_tablespace entries. It's * possible that the backend flushing the buffer doesn't even believe the * relation is visible yet (its xact may have started before the xact that * created the rel). The storage manager must be able to cope anyway. * 注意:BufferTag必須足以確定如何寫block而不需要參照pg_class或者pg_tablespace數(shù)據(jù)字典信息. * 有可能后臺進(jìn)程在刷新緩沖區(qū)的時候深圳不相信關(guān)系是可見的(事務(wù)可能在創(chuàng)建rel的事務(wù)之前). * 存儲管理器必須可以處理這些事情. * * Note: if there's any pad bytes in the struct, INIT_BUFFERTAG will have * to be fixed to zero them, since this struct is used as a hash key. * 注意:如果在結(jié)構(gòu)體中有填充的字節(jié),INIT_BUFFERTAG必須將它們固定為零,因為這個結(jié)構(gòu)體用作散列鍵. */ typedef struct buftag { //物理relation標(biāo)識符 RelFileNode rnode; /* physical relation identifier */ ForkNumber forkNum; //相對于relation起始的塊號 BlockNumber blockNum; /* blknum relative to begin of reln */ } BufferTag;
SMgrRelation
smgr.c維護(hù)一個包含SMgrRelation對象的hash表,SMgrRelation對象本質(zhì)上是緩存的文件句柄.
/* * smgr.c maintains a table of SMgrRelation objects, which are essentially * cached file handles. An SMgrRelation is created (if not already present) * by smgropen(), and destroyed by smgrclose(). Note that neither of these * operations imply I/O, they just create or destroy a hashtable entry. * (But smgrclose() may release associated resources, such as OS-level file * descriptors.) * smgr.c維護(hù)一個包含SMgrRelation對象的hash表,SMgrRelation對象本質(zhì)上是緩存的文件句柄. * SMgrRelation對象(如非現(xiàn)成)通過smgropen()方法創(chuàng)建,通過smgrclose()方法銷毀. * 注意:這些操作都不會執(zhí)行I/O操作,只會創(chuàng)建或者銷毀哈希表條目. * (但是smgrclose()方法可能會釋放相關(guān)的資源,比如OS基本的文件描述符) * * An SMgrRelation may have an "owner", which is just a pointer to it from * somewhere else; smgr.c will clear this pointer if the SMgrRelation is * closed. We use this to avoid dangling pointers from relcache to smgr * without having to make the smgr explicitly aware of relcache. There * can't be more than one "owner" pointer per SMgrRelation, but that's * all we need. * SMgrRelation可能會有"宿主",這個宿主可能只是從某個地方指向它的指針而已; * 如SMgrRelationsmgr.c會清除該指針.這樣做可以避免從relcache到smgr的懸空指針, * 而不必要讓smgr顯式的感知relcache(也就是隔離了smgr了relcache). * 每個SMgrRelation不能跟多個"owner"指針關(guān)聯(lián),但這就是我們所需要的. * * SMgrRelations that do not have an "owner" are considered to be transient, * and are deleted at end of transaction. * SMgrRelations如無owner指針,則被視為臨時對象,在事務(wù)的最后被刪除. */ typedef struct SMgrRelationData { /* rnode is the hashtable lookup key, so it must be first! */ //-------- rnode是哈希表的搜索鍵,因此在結(jié)構(gòu)體的首位 //關(guān)系物理定義ID RelFileNodeBackend smgr_rnode; /* relation physical identifier */ /* pointer to owning pointer, or NULL if none */ //--------- 指向擁有的指針,如無則為NULL struct SMgrRelationData **smgr_owner; /* * These next three fields are not actually used or manipulated by smgr, * except that they are reset to InvalidBlockNumber upon a cache flush * event (in particular, upon truncation of the relation). Higher levels * store cached state here so that it will be reset when truncation * happens. In all three cases, InvalidBlockNumber means "unknown". * 接下來的3個字段實際上并不用于或者由smgr管理, * 除非這些表里在cache flush event發(fā)生時被重置為InvalidBlockNumber * (特別是在關(guān)系被截斷時). * 在這里,更高層的存儲緩存了狀態(tài)因此在截斷發(fā)生時會被重置. * 在這3種情況下,InvalidBlockNumber都意味著"unknown". */ //當(dāng)前插入的目標(biāo)bloc BlockNumber smgr_targblock; /* current insertion target block */ //最后已知的fsm fork大小 BlockNumber smgr_fsm_nblocks; /* last known size of fsm fork */ //最后已知的vm fork大小 BlockNumber smgr_vm_nblocks; /* last known size of vm fork */ /* additional public fields may someday exist here */ //------- 未來可能新增的公共域 /* * Fields below here are intended to be private to smgr.c and its * submodules. Do not touch them from elsewhere. * 下面的字段是smgr.c及其子模塊私有的,不要從其他模塊接觸這些字段. */ //存儲管理器選擇器 int smgr_which; /* storage manager selector */ /* * for md.c; per-fork arrays of the number of open segments * (md_num_open_segs) and the segments themselves (md_seg_fds). * 用于md.c,打開段(md_num_open_segs)和段自身(md_seg_fds)的數(shù)組(每個fork一個) */ int md_num_open_segs[MAX_FORKNUM + 1]; struct _MdfdVec *md_seg_fds[MAX_FORKNUM + 1]; /* if unowned, list link in list of all unowned SMgrRelations */ //如沒有宿主,未宿主的SMgrRelations鏈表的鏈表鏈接. struct SMgrRelationData *next_unowned_reln; } SMgrRelationData; typedef SMgrRelationData *SMgrRelation;
RelFileNodeBackend
組合relfilenode和后臺進(jìn)程ID,用于提供需要定位物理存儲的所有信息.
/* * Augmenting a relfilenode with the backend ID provides all the information * we need to locate the physical storage. The backend ID is InvalidBackendId * for regular relations (those accessible to more than one backend), or the * owning backend's ID for backend-local relations. Backend-local relations * are always transient and removed in case of a database crash; they are * never WAL-logged or fsync'd. * 組合relfilenode和后臺進(jìn)程ID,用于提供需要定位物理存儲的所有信息. * 對于普通的關(guān)系(可通過多個后臺進(jìn)程訪問),后臺進(jìn)程ID是InvalidBackendId; * 如為臨時表,則為自己的后臺進(jìn)程ID. * 臨時表(backend-local relations)通常是臨時存在的,在數(shù)據(jù)庫崩潰時刪除,無需WAL-logged或者fsync. */ typedef struct RelFileNodeBackend { RelFileNode node;//節(jié)點 BackendId backend;//后臺進(jìn)程 } RelFileNodeBackend;
BufferAlloc是ReadBuffer的子過程.處理共享緩存的搜索.如果已無buffer可用,則選擇一個可替換的buffer并刪除舊頁面,但注意不要讀入新頁面.
該函數(shù)的主要處理邏輯如下:
1.初始化,根據(jù)Tag確定hash值和分區(qū)鎖定ID
2.檢查block是否已在buffer pool中
3.在緩沖區(qū)中找到該buffer(buf_id >= 0)
3.1獲取buffer描述符并Pin buffer
3.2如PinBuffer返回F,則執(zhí)行StartBufferIO,如該函數(shù)返回F,則設(shè)置標(biāo)記*foundPtr為F
3.3返回buf
4.在緩沖區(qū)中找不到該buffer(buf_id < 0)
4.1釋放newPartitionLock
4.2執(zhí)行循環(huán),尋找合適的buffer
4.2.1確保在自旋鎖尚未持有時,有一個空閑的refcount入口(條目)
4.2.2選擇一個待淘汰的buffer
4.2.3拷貝buffer flags到oldFlags中
4.2.4Pin buffer,然后釋放buffer自旋鎖
4.2.5如buffer標(biāo)記位BM_DIRTY,FlushBuffer
4.2.6如buffer標(biāo)記為BM_TAG_VALID,計算原tag的hashcode和partition lock ID,并鎖定新舊分區(qū)鎖
否則需要新的分區(qū),鎖定新分區(qū)鎖,重置原分區(qū)鎖和原h(huán)ash值
4.2.7嘗試使用buffer新的tag構(gòu)造hash表入口
4.2.8存在沖突(buf_id >= 0),在這里只需要像一開始處理的那樣,視為已在緩沖池發(fā)現(xiàn)該buffer
4.2.9不存在沖突(buf_id < 0),鎖定buffer header,如緩沖區(qū)沒有變臟或者被pinned,則已找到buf,跳出循環(huán)
否則,解鎖buffer header,刪除hash表入口,釋放鎖,重新尋找buffer
4.3可以重新設(shè)置buffer tag,完成后解鎖buffer header,刪除原有的hash表入口,釋放分區(qū)鎖
4.4執(zhí)行StartBufferIO,設(shè)置*foundPtr標(biāo)記
4.5返回buf
/* * BufferAlloc -- subroutine for ReadBuffer. Handles lookup of a shared * buffer. If no buffer exists already, selects a replacement * victim and evicts the old page, but does NOT read in new page. * BufferAlloc -- ReadBuffer的子過程.處理共享緩存的搜索. * 如果已無buffer可用,則選擇一個可替換的buffer并刪除舊頁面,但注意不要讀入新頁面. * * "strategy" can be a buffer replacement strategy object, or NULL for * the default strategy. The selected buffer's usage_count is advanced when * using the default strategy, but otherwise possibly not (see PinBuffer). * "strategy"可以是緩存替換策略對象,如為默認(rèn)策略,則為NULL. * 如使用默認(rèn)讀取策略,則選中的緩沖buffer的usage_count會加一,但也可能不會增加(詳細(xì)參見PinBuffer). * * The returned buffer is pinned and is already marked as holding the * desired page. If it already did have the desired page, *foundPtr is * set true. Otherwise, *foundPtr is set false and the buffer is marked * as IO_IN_PROGRESS; ReadBuffer will now need to do I/O to fill it. * 返回的buffer已pinned并已標(biāo)記為持有指定的頁面. * 如果確實已持有指定的頁面,*foundPtr設(shè)置為T. * 否則的話,*foundPtr設(shè)置為F,buffer標(biāo)記為IO_IN_PROGRESS,ReadBuffer將會執(zhí)行I/O操作. * * *foundPtr is actually redundant with the buffer's BM_VALID flag, but * we keep it for simplicity in ReadBuffer. * *foundPtr跟buffer的BM_VALID標(biāo)記是重復(fù)的,但為了ReadBuffer中的簡化,仍然保持這個參數(shù). * * No locks are held either at entry or exit. * 在進(jìn)入或者退出的時候,不需要持有任何的Locks. */ static BufferDesc * BufferAlloc(SMgrRelation smgr, char relpersistence, ForkNumber forkNum, BlockNumber blockNum, BufferAccessStrategy strategy, bool *foundPtr) { //請求block的ID BufferTag newTag; /* identity of requested block */ //newTag的Hash值 uint32 newHash; /* hash value for newTag */ //緩沖區(qū)分區(qū)鎖 LWLock *newPartitionLock; /* buffer partition lock for it */ //選中緩沖區(qū)對應(yīng)的上一個ID BufferTag oldTag; /* previous identity of selected buffer */ //oldTag的hash值 uint32 oldHash; /* hash value for oldTag */ //原緩沖區(qū)分區(qū)鎖 LWLock *oldPartitionLock; /* buffer partition lock for it */ //原標(biāo)記位 uint32 oldFlags; //buffer ID編號 int buf_id; //buffer描述符 BufferDesc *buf; //是否有效 bool valid; //buffer狀態(tài) uint32 buf_state; /* create a tag so we can lookup the buffer */ //創(chuàng)建一個tag,用于檢索buffer INIT_BUFFERTAG(newTag, smgr->smgr_rnode.node, forkNum, blockNum); /* determine its hash code and partition lock ID */ //根據(jù)Tag確定hash值和分區(qū)鎖定ID newHash = BufTableHashCode(&newTag); newPartitionLock = BufMappingPartitionLock(newHash); /* see if the block is in the buffer pool already */ //檢查block是否已在buffer pool中 LWLockAcquire(newPartitionLock, LW_SHARED); buf_id = BufTableLookup(&newTag, newHash); if (buf_id >= 0) { //---- 在緩沖區(qū)中找到該buffer /* * Found it. Now, pin the buffer so no one can steal it from the * buffer pool, and check to see if the correct data has been loaded * into the buffer. * 找到了!現(xiàn)在pin緩沖區(qū),確保沒有進(jìn)程可以從緩沖區(qū)中刪除 * 檢查正確的數(shù)據(jù)是否已裝載到緩沖區(qū)中. */ buf = GetBufferDescriptor(buf_id); //Pin緩沖區(qū) valid = PinBuffer(buf, strategy); /* Can release the mapping lock as soon as we've pinned it */ //一旦pinned,立即釋放newPartitionLock LWLockRelease(newPartitionLock); //設(shè)置返回參數(shù) *foundPtr = true; if (!valid) { //如無效 /* * We can only get here if (a) someone else is still reading in * the page, or (b) a previous read attempt failed. We have to * wait for any active read attempt to finish, and then set up our * own read attempt if the page is still not BM_VALID. * StartBufferIO does it all. * 程序執(zhí)行到這里原因是(a)有其他進(jìn)程仍然讀入了該page,或者(b)上一次讀取嘗試失敗. * 在這里必須等到其他活動的讀取完成,然后在page狀態(tài)仍然不是BM_VALID時設(shè)置讀取嘗試. * StartBufferIO過程執(zhí)行這些工作. */ if (StartBufferIO(buf, true)) { /* * If we get here, previous attempts to read the buffer must * have failed ... but we shall bravely try again. */ //上一次嘗試讀取已然失敗,這里還是需要勇敢的再試一次! *foundPtr = false;//設(shè)置為F } } //返回buf return buf; } /* * Didn't find it in the buffer pool. We'll have to initialize a new * buffer. Remember to unlock the mapping lock while doing the work. * 沒有在緩沖池中發(fā)現(xiàn)該buffer. * 這時候不得不初始化一個buffer. * 記住:在執(zhí)行工作的時候,記得首先解鎖mapping lock. */ LWLockRelease(newPartitionLock); /* Loop here in case we have to try another victim buffer */ //循環(huán),尋找合適的buffer for (;;) { /* * Ensure, while the spinlock's not yet held, that there's a free * refcount entry. * 確保在自旋鎖尚未持有時,有一個空閑的refcount入口(條目). */ ReservePrivateRefCountEntry(); /* * Select a victim buffer. The buffer is returned with its header * spinlock still held! * 選擇一個待淘汰的buffer. * 返回的buffer,仍然持有其header的自旋鎖. */ buf = StrategyGetBuffer(strategy, &buf_state); Assert(BUF_STATE_GET_REFCOUNT(buf_state) == 0); /* Must copy buffer flags while we still hold the spinlock */ //在仍持有自旋鎖的情況下必須拷貝buffer flags oldFlags = buf_state & BUF_FLAG_MASK; /* Pin the buffer and then release the buffer spinlock */ //Pin buffer,然后釋放buffer自旋鎖 PinBuffer_Locked(buf); /* * If the buffer was dirty, try to write it out. There is a race * condition here, in that someone might dirty it after we released it * above, or even while we are writing it out (since our share-lock * won't prevent hint-bit updates). We will recheck the dirty bit * after re-locking the buffer header. * 如果buffer已臟,嘗試刷新到磁盤上. * 這里有一個競爭條件,那就是某些進(jìn)程可能在我們在上面釋放它(或者甚至在我們正在刷新時)之后使該緩沖區(qū)變臟. * 在再次鎖定buffer header后,我們會重新檢查相應(yīng)的dirty標(biāo)記位. */ if (oldFlags & BM_DIRTY) { /* * We need a share-lock on the buffer contents to write it out * (else we might write invalid data, eg because someone else is * compacting the page contents while we write). We must use a * conditional lock acquisition here to avoid deadlock. Even * though the buffer was not pinned (and therefore surely not * locked) when StrategyGetBuffer returned it, someone else could * have pinned and exclusive-locked it by the time we get here. If * we try to get the lock unconditionally, we'd block waiting for * them; if they later block waiting for us, deadlock ensues. * (This has been observed to happen when two backends are both * trying to split btree index pages, and the second one just * happens to be trying to split the page the first one got from * StrategyGetBuffer.) * 需要持有buffer內(nèi)容的共享鎖來刷出該緩沖區(qū). * (否則的話,我們可能會寫入無效的數(shù)據(jù),原因比如是其他進(jìn)程在我們寫入時壓縮page). * 在這里,必須使用條件鎖來避免死鎖. * 在StrategyGetBuffer返回時雖然buffer尚未pinned, * 其他進(jìn)程可能已經(jīng)pinned該buffer并且同時已持有獨(dú)占鎖. * 如果我們嘗試無條件的鎖定,那么因為等待而阻塞.其他進(jìn)程稍后又會等待本進(jìn)程,那么死鎖就會發(fā)生. * (在實際中,兩個后臺進(jìn)程在嘗試分裂B樹索引pages, * 而第二個正好嘗試分裂第一個進(jìn)程通過StrategyGetBuffer獲取的page時,會發(fā)生這種情況). */ if (LWLockConditionalAcquire(BufferDescriptorGetContentLock(buf), LW_SHARED)) { //---- 執(zhí)行有條件鎖定請求(buffer內(nèi)容共享鎖) /* * If using a nondefault strategy, and writing the buffer * would require a WAL flush, let the strategy decide whether * to go ahead and write/reuse the buffer or to choose another * victim. We need lock to inspect the page LSN, so this * can't be done inside StrategyGetBuffer. * 如使用非默認(rèn)的策略,則寫緩沖會請求WAL flush,讓策略確定如何繼續(xù)以及寫入/重用 * 緩沖或者選擇另外一個待淘汰的buffer. * 我們需要鎖定,檢查page的LSN,因此不能在StrategyGetBuffer中完成. */ if (strategy != NULL) { //非默認(rèn)策略 XLogRecPtr lsn; /* Read the LSN while holding buffer header lock */ //在持有buffer header lock時讀取LSN buf_state = LockBufHdr(buf); lsn = BufferGetLSN(buf); UnlockBufHdr(buf, buf_state); if (XLogNeedsFlush(lsn) && StrategyRejectBuffer(strategy, buf)) { //需要flush WAL并且StrategyRejectBuffer /* Drop lock/pin and loop around for another buffer */ //清除lock/pin并循環(huán)到另外一個buffer LWLockRelease(BufferDescriptorGetContentLock(buf)); UnpinBuffer(buf, true); continue; } } /* OK, do the I/O */ //現(xiàn)在可以執(zhí)行I/O了 TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY_START(forkNum, blockNum, smgr->smgr_rnode.node.spcNode, smgr->smgr_rnode.node.dbNode, smgr->smgr_rnode.node.relNode); FlushBuffer(buf, NULL); LWLockRelease(BufferDescriptorGetContentLock(buf)); ScheduleBufferTagForWriteback(&BackendWritebackContext, &buf->tag); TRACE_POSTGRESQL_BUFFER_WRITE_DIRTY_DONE(forkNum, blockNum, smgr->smgr_rnode.node.spcNode, smgr->smgr_rnode.node.dbNode, smgr->smgr_rnode.node.relNode); } else { /* * Someone else has locked the buffer, so give it up and loop * back to get another one. * 其他進(jìn)程已經(jīng)鎖定了buffer,放棄,獲取另外一個 */ UnpinBuffer(buf, true); continue; } } /* * To change the association of a valid buffer, we'll need to have * exclusive lock on both the old and new mapping partitions. * 修改有效緩沖區(qū)的相關(guān)性,需要在原有和新的映射分區(qū)上持有獨(dú)占鎖 */ if (oldFlags & BM_TAG_VALID) { //----------- buffer標(biāo)記為BM_TAG_VALID /* * Need to compute the old tag's hashcode and partition lock ID. * XXX is it worth storing the hashcode in BufferDesc so we need * not recompute it here? Probably not. * 需要計算原tag的hashcode和partition lock ID. * 這里是否值得存儲hashcode在BufferDesc中而無需再次計算?可能不值得. */ oldTag = buf->tag; oldHash = BufTableHashCode(&oldTag); oldPartitionLock = BufMappingPartitionLock(oldHash); /* * Must lock the lower-numbered partition first to avoid * deadlocks. * 必須首先鎖定更低一級編號的分區(qū)以避免死鎖 */ if (oldPartitionLock < newPartitionLock) { //按順序鎖定 LWLockAcquire(oldPartitionLock, LW_EXCLUSIVE); LWLockAcquire(newPartitionLock, LW_EXCLUSIVE); } else if (oldPartitionLock > newPartitionLock) { //按順序鎖定 LWLockAcquire(newPartitionLock, LW_EXCLUSIVE); LWLockAcquire(oldPartitionLock, LW_EXCLUSIVE); } else { /* only one partition, only one lock */ //只有一個分區(qū),只需要一個鎖 LWLockAcquire(newPartitionLock, LW_EXCLUSIVE); } } else { //----------- buffer未標(biāo)記為BM_TAG_VALID /* if it wasn't valid, we need only the new partition */ //buffer無效,需要新的分區(qū) LWLockAcquire(newPartitionLock, LW_EXCLUSIVE); /* remember we have no old-partition lock or tag */ //不需要原有分區(qū)的鎖&tag oldPartitionLock = NULL; /* this just keeps the compiler quiet about uninit variables */ //這行代碼的目的是讓編譯器"閉嘴" oldHash = 0; } /* * Try to make a hashtable entry for the buffer under its new tag. * This could fail because while we were writing someone else * allocated another buffer for the same block we want to read in. * Note that we have not yet removed the hashtable entry for the old * tag. * 嘗試使用buffer新的tag構(gòu)造hash表入口. * 這可能會失敗,因為在我們寫入時其他進(jìn)程可能已為我們希望讀入的同一個block分配了另外一個buffer. * 注意我們還沒有刪除原有tag的hash表入口. */ buf_id = BufTableInsert(&newTag, newHash, buf->buf_id); if (buf_id >= 0) { /* * Got a collision. Someone has already done what we were about to * do. We'll just handle this as if it were found in the buffer * pool in the first place. First, give up the buffer we were * planning to use. * 存在沖突.某個進(jìn)程已完成了我們準(zhǔn)備做的事情. * 在這里只需要像一開始處理的那樣,視為已在緩沖池發(fā)現(xiàn)該buffer. * 首先,放棄計劃使用的buffer. */ UnpinBuffer(buf, true); /* Can give up that buffer's mapping partition lock now */ //放棄原有的partition lock if (oldPartitionLock != NULL && oldPartitionLock != newPartitionLock) LWLockRelease(oldPartitionLock); /* remaining code should match code at top of routine */ //剩余的代碼應(yīng)匹配上面的處理過程 //詳細(xì)參見以上代碼注釋 buf = GetBufferDescriptor(buf_id); valid = PinBuffer(buf, strategy); /* Can release the mapping lock as soon as we've pinned it */ //是否新partition lock LWLockRelease(newPartitionLock); //設(shè)置標(biāo)記 *foundPtr = true; if (!valid) { /* * We can only get here if (a) someone else is still reading * in the page, or (b) a previous read attempt failed. We * have to wait for any active read attempt to finish, and * then set up our own read attempt if the page is still not * BM_VALID. StartBufferIO does it all. */ if (StartBufferIO(buf, true)) { /* * If we get here, previous attempts to read the buffer * must have failed ... but we shall bravely try again. */ *foundPtr = false; } } return buf; } /* * Need to lock the buffer header too in order to change its tag. * 需要鎖定緩沖頭部,目的是修改tag */ buf_state = LockBufHdr(buf); /* * Somebody could have pinned or re-dirtied the buffer while we were * doing the I/O and making the new hashtable entry. If so, we can't * recycle this buffer; we must undo everything we've done and start * over with a new victim buffer. * 在我們執(zhí)行I/O和標(biāo)記新的hash表入口時,某些進(jìn)程可能已經(jīng)pinned或者重新弄臟了buffer. * 如出現(xiàn)這樣的情況,不能回收該緩沖區(qū);必須回滾我們所做的所有事情,并重新尋找新的待淘汰的緩沖區(qū). */ oldFlags = buf_state & BUF_FLAG_MASK; if (BUF_STATE_GET_REFCOUNT(buf_state) == 1 && !(oldFlags & BM_DIRTY)) //已經(jīng)OK了 break; //解鎖buffer header UnlockBufHdr(buf, buf_state); //刪除hash表入口 BufTableDelete(&newTag, newHash); //釋放鎖 if (oldPartitionLock != NULL && oldPartitionLock != newPartitionLock) LWLockRelease(oldPartitionLock); LWLockRelease(newPartitionLock); UnpinBuffer(buf, true); //重新尋找buffer } /* * Okay, it's finally safe to rename the buffer. * 現(xiàn)在終于可以安全的給buffer重命名了 * * Clearing BM_VALID here is necessary, clearing the dirtybits is just * paranoia. We also reset the usage_count since any recency of use of * the old content is no longer relevant. (The usage_count starts out at * 1 so that the buffer can survive one clock-sweep pass.) * 如需要,清除BM_VALID標(biāo)記,清除臟標(biāo)記位. * 我們還需要重置usage_count,因為使用舊內(nèi)容的recency不再相關(guān). * (usage_count從1開始,因此buffer可以在一個時鐘周期經(jīng)過后仍能存活) * * Make sure BM_PERMANENT is set for buffers that must be written at every * checkpoint. Unlogged buffers only need to be written at shutdown * checkpoints, except for their "init" forks, which need to be treated * just like permanent relations. * 確保標(biāo)記為BM_PERMANENT的buffer必須在每次checkpoint時刷到磁盤上. * Unlogged緩沖只需要在shutdown checkpoint時才需要寫入,除非它們"init" forks, * 這些操作需要類似持久化關(guān)系一樣處理. */ buf->tag = newTag; buf_state &= ~(BM_VALID | BM_DIRTY | BM_JUST_DIRTIED | BM_CHECKPOINT_NEEDED | BM_IO_ERROR | BM_PERMANENT | BUF_USAGECOUNT_MASK); if (relpersistence == RELPERSISTENCE_PERMANENT || forkNum == INIT_FORKNUM) buf_state |= BM_TAG_VALID | BM_PERMANENT | BUF_USAGECOUNT_ONE; else buf_state |= BM_TAG_VALID | BUF_USAGECOUNT_ONE; UnlockBufHdr(buf, buf_state); if (oldPartitionLock != NULL) { BufTableDelete(&oldTag, oldHash); if (oldPartitionLock != newPartitionLock) LWLockRelease(oldPartitionLock); } LWLockRelease(newPartitionLock); /* * Buffer contents are currently invalid. Try to get the io_in_progress * lock. If StartBufferIO returns false, then someone else managed to * read it before we did, so there's nothing left for BufferAlloc() to do. * 緩沖區(qū)內(nèi)存已無效. * 嘗試獲取io_in_progress lock.如StartBufferIO返回F,意味著其他進(jìn)程已在我們完成前讀取該緩沖區(qū), * 因此對于BufferAlloc()來說,已無事可做. */ if (StartBufferIO(buf, true)) *foundPtr = false; else *foundPtr = true; return buf; }
測試腳本,查詢數(shù)據(jù)表:
10:01:54 (xdb@[local]:5432)testdb=# select * from t1 limit 10;
啟動gdb,設(shè)置斷點
(gdb) b BufferAlloc Breakpoint 1 at 0x8778ad: file bufmgr.c, line 1005. (gdb) c Continuing. Breakpoint 1, BufferAlloc (smgr=0x2267430, relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=0, strategy=0x0, foundPtr=0x7ffcc97fb4f3) at bufmgr.c:1005 1005 INIT_BUFFERTAG(newTag, smgr->smgr_rnode.node, forkNum, blockNum); (gdb)
輸入?yún)?shù)
smgr-SMgrRelationData結(jié)構(gòu)體指針
relpersistence-關(guān)系是否持久化
forkNum-fork類型,MAIN_FORKNUM對應(yīng)數(shù)據(jù)文件,還有fsm/vm文件
blockNum-塊號
strategy-buffer訪問策略,為NULL
*foundPtr-輸出參數(shù)
(gdb) p *smgr $1 = {smgr_rnode = {node = {spcNode = 1663, dbNode = 16402, relNode = 51439}, backend = -1}, smgr_owner = 0x7f86133f3778, smgr_targblock = 4294967295, smgr_fsm_nblocks = 4294967295, smgr_vm_nblocks = 4294967295, smgr_which = 0, md_num_open_segs = {0, 0, 0, 0}, md_seg_fds = {0x0, 0x0, 0x0, 0x0}, next_unowned_reln = 0x0} (gdb) p *smgr->smgr_owner $2 = (struct SMgrRelationData *) 0x2267430 (gdb) p **smgr->smgr_owner $3 = {smgr_rnode = {node = {spcNode = 1663, dbNode = 16402, relNode = 51439}, backend = -1}, smgr_owner = 0x7f86133f3778, smgr_targblock = 4294967295, smgr_fsm_nblocks = 4294967295, smgr_vm_nblocks = 4294967295, smgr_which = 0, md_num_open_segs = {0, 0, 0, 0}, md_seg_fds = {0x0, 0x0, 0x0, 0x0}, next_unowned_reln = 0x0} (gdb)
1.初始化,根據(jù)Tag確定hash值和分區(qū)鎖定ID
(gdb) n 1008 newHash = BufTableHashCode(&newTag); (gdb) p newTag $4 = {rnode = {spcNode = 1663, dbNode = 16402, relNode = 51439}, forkNum = MAIN_FORKNUM, blockNum = 0} (gdb) n 1009 newPartitionLock = BufMappingPartitionLock(newHash); (gdb) 1012 LWLockAcquire(newPartitionLock, LW_SHARED); (gdb) 1013 buf_id = BufTableLookup(&newTag, newHash); (gdb) p newHash $5 = 1398580903 (gdb) p newPartitionLock $6 = (LWLock *) 0x7f85e5db9600 (gdb) p *newPartitionLock $7 = {tranche = 59, state = {value = 536870913}, waiters = {head = 2147483647, tail = 2147483647}} (gdb)
2.檢查block是否已在buffer pool中
(gdb) n 1014 if (buf_id >= 0) (gdb) p buf_id $8 = -1
4.在緩沖區(qū)中找不到該buffer(buf_id < 0)
4.1釋放newPartitionLock
4.2執(zhí)行循環(huán),尋找合適的buffer
4.2.1確保在自旋鎖尚未持有時,有一個空閑的refcount入口(條目) —-> ReservePrivateRefCountEntry
(gdb) n 1056 LWLockRelease(newPartitionLock); (gdb) 1065 ReservePrivateRefCountEntry(); (gdb)
4.2.2選擇一個待淘汰的buffer
(gdb) n 1071 buf = StrategyGetBuffer(strategy, &buf_state); (gdb) n 1073 Assert(BUF_STATE_GET_REFCOUNT(buf_state) == 0); (gdb) p buf $9 = (BufferDesc *) 0x7f85e705fd80 (gdb) p *buf $10 = {tag = {rnode = {spcNode = 0, dbNode = 0, relNode = 0}, forkNum = InvalidForkNumber, blockNum = 4294967295}, buf_id = 104, state = {value = 4194304}, wait_backend_pid = 0, freeNext = -2, content_lock = {tranche = 54, state = { value = 536870912}, waiters = {head = 2147483647, tail = 2147483647}}} (gdb)
4.2.3拷貝buffer flags到oldFlags中
(gdb) n 1076 oldFlags = buf_state & BUF_FLAG_MASK; (gdb)
4.2.4Pin buffer,然后釋放buffer自旋鎖
(gdb) 1079 PinBuffer_Locked(buf); (gdb)
4.2.5如buffer標(biāo)記位BM_DIRTY,FlushBuffer
1088 if (oldFlags & BM_DIRTY) (gdb)
4.2.6如buffer標(biāo)記為BM_TAG_VALID,計算原tag的hashcode和partition lock ID,并鎖定新舊分區(qū)鎖
否則需要新的分區(qū),鎖定新分區(qū)鎖,重置原分區(qū)鎖和原h(huán)ash值
(gdb) 1166 if (oldFlags & BM_TAG_VALID) (gdb) 1200 LWLockAcquire(newPartitionLock, LW_EXCLUSIVE); (gdb) 1202 oldPartitionLock = NULL; (gdb) 1204 oldHash = 0; (gdb) p oldFlags $11 = 4194304 (gdb)
4.2.7嘗試使用buffer新的tag構(gòu)造hash表入口
(gdb) 1214 buf_id = BufTableInsert(&newTag, newHash, buf->buf_id); (gdb) n 1216 if (buf_id >= 0) (gdb) p buf_id $12 = -1 (gdb)
4.2.9不存在沖突(buf_id < 0),鎖定buffer header,如緩沖區(qū)沒有變臟或者被pinned,則已找到buf,跳出循環(huán)
否則,解鎖buffer header,刪除hash表入口,釋放鎖,重新尋找buffer
(gdb) n 1267 buf_state = LockBufHdr(buf); (gdb) 1275 oldFlags = buf_state & BUF_FLAG_MASK; (gdb) 1276 if (BUF_STATE_GET_REFCOUNT(buf_state) == 1 && !(oldFlags & BM_DIRTY)) (gdb) 1277 break; (gdb)
4.3可以重新設(shè)置buffer tag,完成后解鎖buffer header,刪除原有的hash表入口,釋放分區(qū)鎖
1301 buf->tag = newTag; (gdb) 1302 buf_state &= ~(BM_VALID | BM_DIRTY | BM_JUST_DIRTIED | (gdb) 1305 if (relpersistence == RELPERSISTENCE_PERMANENT || forkNum == INIT_FORKNUM) (gdb) 1306 buf_state |= BM_TAG_VALID | BM_PERMANENT | BUF_USAGECOUNT_ONE; (gdb) 1310 UnlockBufHdr(buf, buf_state); (gdb) 1312 if (oldPartitionLock != NULL) (gdb) 1319 LWLockRelease(newPartitionLock); (gdb) p *buf $13 = {tag = {rnode = {spcNode = 1663, dbNode = 16402, relNode = 51439}, forkNum = MAIN_FORKNUM, blockNum = 0}, buf_id = 104, state = {value = 2181300225}, wait_backend_pid = 0, freeNext = -2, content_lock = {tranche = 54, state = { value = 536870912}, waiters = {head = 2147483647, tail = 2147483647}}} (gdb)
4.4執(zhí)行StartBufferIO,設(shè)置*foundPtr標(biāo)記
(gdb) 1326 if (StartBufferIO(buf, true)) (gdb) n 1327 *foundPtr = false; (gdb)
4.5返回buf
(gdb) 1331 return buf; (gdb) 1332 } (gdb)
執(zhí)行完成
(gdb) ReadBuffer_common (smgr=0x2267430, relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=0, mode=RBM_NORMAL, strategy=0x0, hit=0x7ffcc97fb5eb) at bufmgr.c:747 747 if (found) (gdb) 750 pgBufferUsage.shared_blks_read++; (gdb)
到此,關(guān)于“PostgreSQL中BufferAlloc函數(shù)有什么作用”的學(xué)習(xí)就結(jié)束了,希望能夠解決大家的疑惑。理論與實踐的搭配能更好的幫助大家學(xué)習(xí),快去試試吧!若想繼續(xù)學(xué)習(xí)更多相關(guān)知識,請繼續(xù)關(guān)注億速云網(wǎng)站,小編會繼續(xù)努力為大家?guī)砀鄬嵱玫奈恼拢?/p>
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點不代表本網(wǎng)站立場,如果涉及侵權(quán)請聯(lián)系站長郵箱:is@yisu.com進(jìn)行舉報,并提供相關(guān)證據(jù),一經(jīng)查實,將立刻刪除涉嫌侵權(quán)內(nèi)容。