溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點擊 登錄注冊 即表示同意《億速云用戶服務條款》

PostgreSQL 源碼解讀(130)- MVCC#14(vacuum過程-lazy_scan_heap函數(shù))

發(fā)布時間:2020-08-04 16:02:47 來源:ITPUB博客 閱讀:249 作者:husthxd 欄目:關系型數(shù)據(jù)庫

本節(jié)簡單介紹了PostgreSQL手工執(zhí)行vacuum的處理流程,主要分析了ExecVacuum->vacuum->vacuum_rel->heap_vacuum_rel->lazy_scan_heap函數(shù)的實現(xiàn)邏輯,該函數(shù)掃描已打開的heap relation,清理堆中的每個頁面。

一、數(shù)據(jù)結構

宏定義
Vacuum和Analyze命令選項


/* ----------------------
 *      Vacuum and Analyze Statements
 *      Vacuum和Analyze命令選項
 * 
 * Even though these are nominally two statements, it's convenient to use
 * just one node type for both.  Note that at least one of VACOPT_VACUUM
 * and VACOPT_ANALYZE must be set in options.
 * 雖然在這里有兩種不同的語句,但只需要使用統(tǒng)一的Node類型即可.
 * 注意至少VACOPT_VACUUM/VACOPT_ANALYZE在選項中設置.
 * ----------------------
 */
typedef enum VacuumOption
{
    VACOPT_VACUUM = 1 << 0,     /* do VACUUM */
    VACOPT_ANALYZE = 1 << 1,    /* do ANALYZE */
    VACOPT_VERBOSE = 1 << 2,    /* print progress info */
    VACOPT_FREEZE = 1 << 3,     /* FREEZE option */
    VACOPT_FULL = 1 << 4,       /* FULL (non-concurrent) vacuum */
    VACOPT_SKIP_LOCKED = 1 << 5,    /* skip if cannot get lock */
    VACOPT_SKIPTOAST = 1 << 6,  /* don't process the TOAST table, if any */
    VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7   /* don't skip any pages */
} VacuumOption;

xl_heap_freeze_tuple
該結構表示’freeze plan’,用于存儲在vacuum期間凍結tuple所需要的信息


/*
 * This struct represents a 'freeze plan', which is what we need to know about
 * a single tuple being frozen during vacuum.
 * 該結構表示'freeze plan',用于存儲在vacuum期間凍結tuple所需要的信息
 */
/* 0x01 was XLH_FREEZE_XMIN */
#define     XLH_FREEZE_XVAC     0x02
#define     XLH_INVALID_XVAC    0x04
typedef struct xl_heap_freeze_tuple
{
    TransactionId xmax;
    OffsetNumber offset;
    uint16      t_infomask2;
    uint16      t_infomask;
    uint8       frzflags;
} xl_heap_freeze_tuple;

二、源碼解讀

lazy_scan_heap掃描已打開的heap relation,清理堆中的每個頁面,具體工作包括:
1.將DEAD元組截斷為DEAD行指針
2.整理頁面碎片
3.設置提交狀態(tài)位(參見heap_page_prune)
4.構建空閑空間的DEAD元組和頁鏈表
5.計算堆中存活元組數(shù)量的統(tǒng)計信息,并在合適的情況下將頁標記為all-visible
6.執(zhí)行index vacuuming并調用lazy_vacuum_heap回收DEAD行指針

其處理流程如下:
1.初始化相關變量
2.獲取總塊數(shù)(nblocks)
3.初始化統(tǒng)計信息和相關數(shù)組(vacrelstats/frozen)
4.計算下一個不能跳過的block(next_unskippable_block)
5.遍歷每個block
5.1如已達next_unskippable_block塊,計算下一個不能跳過的block
否則,如skipping_blocks為T,并且沒有強制執(zhí)行頁面檢查,則跳到下一個block
5.2如即將超出DEAD元組tid的可用空間,那么在處理此頁面之前,執(zhí)行vacuuming
5.2.1遍歷index relation,調用lazy_vacuum_index執(zhí)行vacuum
5.2.2調用lazy_vacuum_heap清理heap relation中的元組
5.2.3重置vacrelstats->num_dead_tuples計數(shù)器為0
5.2.4Vacuum FSM以使新釋放的空間再頂層FSM pages中可見
5.3以擴展方式讀取buffer
5.4獲取buffer cleanup lock但不成功,則
A.aggressive為F并且非強制檢查頁面,則處理下一個block;
B.aggressive為T或者要求強制檢查頁面,如不需要凍結元組,則跳過該block;
C.aggressive為F(即要求強制檢查頁面),更新統(tǒng)計信息,跳過該block;
D.調用LockBufferForCleanup鎖定buf,進入常規(guī)流程
5.5如為新頁,執(zhí)行相關處理邏輯(重新初始化或者標記buffer為臟),繼續(xù)下一個block;
5.6如為空頁,執(zhí)行相關邏輯(設置all-visible標記等),繼續(xù)下一個block;
5.7調用heap_page_prune清理該page中的所有HOT-update鏈
5.8遍歷page中的行指針
5.8.1行指針未使用,繼續(xù)下一個tuple
5.8.2行指針是重定向指針,繼續(xù)下一個tuple
5.8.3行指針已廢棄,調用lazy_record_dead_tuple記錄需刪除的tuple,設置all_visible,繼續(xù)下一個tuple
5.8.4初始化tuple變量
5.8.5調用HeapTupleSatisfiesVacuum函數(shù)確定元組狀態(tài),根據(jù)元組狀態(tài)執(zhí)行相關標記處理
5.8.6如tupgone標記為T,記錄需刪除的tuple;否則調用heap_prepare_freeze_tuple判斷是否需要凍結,如需凍結則記錄偏移
5.9如凍結統(tǒng)計數(shù)>0,遍歷需凍結的行指針,執(zhí)行凍結;如需記錄日志,則寫WAL Record
5.10如果沒有索引,那么執(zhí)行vacuum page,而不需要二次掃描了.
5.11通過all_visible和all_visible_according_to_vm標記同步vm
5.12釋放frozen
5.13更新統(tǒng)計信息
5.14位最后一批dead tuples執(zhí)行清理
5.15vacuum FSM
5.16執(zhí)行vacuum收尾工作,為每個索引更新統(tǒng)計信息
5.17記錄系統(tǒng)日志


/*
 *  lazy_scan_heap() -- scan an open heap relation
 *  lazy_scan_heap() -- 掃描已打開的heap relation
 *
 *      This routine prunes each page in the heap, which will among other
 *      things truncate dead tuples to dead line pointers, defragment the
 *      page, and set commit status bits (see heap_page_prune).  It also builds
 *      lists of dead tuples and pages with free space, calculates statistics
 *      on the number of live tuples in the heap, and marks pages as
 *      all-visible if appropriate.  When done, or when we run low on space for
 *      dead-tuple TIDs, invoke vacuuming of indexes and call lazy_vacuum_heap
 *      to reclaim dead line pointers.
 *      這個例程將清理堆中的每個頁面,
 *        其中包括將DEAD元組截斷為DEAD行指針、整理頁面碎片和設置提交狀態(tài)位(參見heap_page_prune)。
 *      它還構建具有空閑空間的DEAD元組和頁鏈表,
 *        計算堆中存活元組數(shù)量的統(tǒng)計信息,并在適當?shù)那闆r下將頁標記為all-visible。
 *      當完成時,或者當DEAD元組TIDs的空間不足時,
 *        執(zhí)行index vacuuming并調用lazy_vacuum_heap來回收DEAD行指針。      
 *      If there are no indexes then we can reclaim line pointers on the fly;
 *      dead line pointers need only be retained until all index pointers that
 *      reference them have been killed.
 *      如果沒有索引,那么我們可以動態(tài)地回收行指針;
 *        DEAD行指針需要保留到引用它們的所有索引指針都被清理為止。
 */
static void
lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
               Relation *Irel, int nindexes, bool aggressive)
{
    BlockNumber nblocks,//塊數(shù)
                blkno;//塊號
    HeapTupleData tuple;//元組
    char       *relname;//關系名稱
    TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;//凍結的XID
    TransactionId relminmxid = onerel->rd_rel->relminmxid;//最新的mxid
    BlockNumber empty_pages,//空頁數(shù)
                vacuumed_pages,//已被vacuum數(shù)
                next_fsm_block_to_vacuum;//塊號
    //未被清理的元組數(shù)/仍存活的元組數(shù)(估算)/通過vacuum清理的元組數(shù)/DEAD但未被清理的元組數(shù)/未使用的行指針
    double      num_tuples,     /* total number of nonremovable tuples */
                live_tuples,    /* live tuples (reltuples estimate) */
                tups_vacuumed,  /* tuples cleaned up by vacuum */
                nkeep,          /* dead-but-not-removable tuples */
                nunused;        /* unused item pointers */
    IndexBulkDeleteResult **indstats;
    int         i;//臨時變量
    PGRUsage    ru0;
    Buffer      vmbuffer = InvalidBuffer;//buffer
    BlockNumber next_unskippable_block;//block number
    bool        skipping_blocks;//是否跳過block?
    xl_heap_freeze_tuple *frozen;//凍結元組數(shù)組
    StringInfoData buf;
    const int   initprog_index[] = {
        PROGRESS_VACUUM_PHASE,
        PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
        PROGRESS_VACUUM_MAX_DEAD_TUPLES
    };
    int64       initprog_val[3];
    //初始化PGRUsage變量
    pg_rusage_init(&ru0);
    //獲取關系名稱
    relname = RelationGetRelationName(onerel);
    //記錄操作日志
    if (aggressive)
        ereport(elevel,
                (errmsg("aggressively vacuuming \"%s.%s\"",
                        get_namespace_name(RelationGetNamespace(onerel)),
                        relname)));
    else
        ereport(elevel,
                (errmsg("vacuuming \"%s.%s\"",
                        get_namespace_name(RelationGetNamespace(onerel)),
                        relname)));
    //初始化變量  
    empty_pages = vacuumed_pages = 0;
    next_fsm_block_to_vacuum = (BlockNumber) 0;
    num_tuples = live_tuples = tups_vacuumed = nkeep = nunused = 0;
    indstats = (IndexBulkDeleteResult **)
        palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
    //獲取該relation總的塊數(shù)
    nblocks = RelationGetNumberOfBlocks(onerel);
    //初始化統(tǒng)計信息
    vacrelstats->rel_pages = nblocks;
    vacrelstats->scanned_pages = 0;
    vacrelstats->tupcount_pages = 0;
    vacrelstats->nonempty_pages = 0;
    vacrelstats->latestRemovedXid = InvalidTransactionId;
    //每個block都進行單獨記錄
    lazy_space_alloc(vacrelstats, nblocks);
    //為frozen數(shù)組分配內存空間
    frozen = palloc(sizeof(xl_heap_freeze_tuple) * MaxHeapTuplesPerPage);
    /* Report that we're scanning the heap, advertising total # of blocks */
    //報告正在掃描heap,并廣播總的塊數(shù)
    //PROGRESS_VACUUM_PHASE_SCAN_HEAP狀態(tài)
    initprog_val[0] = PROGRESS_VACUUM_PHASE_SCAN_HEAP;
    initprog_val[1] = nblocks;//總塊數(shù)
    initprog_val[2] = vacrelstats->max_dead_tuples;//最大廢棄元組數(shù)
    pgstat_progress_update_multi_param(3, initprog_index, initprog_val);
    /*
     * Except when aggressive is set, we want to skip pages that are
     * all-visible according to the visibility map, but only when we can skip
     * at least SKIP_PAGES_THRESHOLD consecutive pages.  Since we're reading
     * sequentially, the OS should be doing readahead for us, so there's no
     * gain in skipping a page now and then; that's likely to disable
     * readahead and so be counterproductive. Also, skipping even a single
     * page means that we can't update relfrozenxid, so we only want to do it
     * if we can skip a goodly number of pages.
     * 除非設置了aggressive,否則我們希望跳過根據(jù)vm確定的全部可見頁面,
     *   但只有當我們可以跳過至少SKIP_PAGES_THRESHOLD個連續(xù)頁面時才可以。
     * 因為我們是按順序讀取的,所以操作系統(tǒng)應該為我們提前讀取,
     *   所以時不時地跳過一個頁面是沒有好處的;這可能會禁用readahead,從而產生反效果。
     * 而且,即使跳過一個頁面,也意味著我們無法更新relfrozenxid,所以我們只希望跳過相當多的頁面。
     *
     * When aggressive is set, we can't skip pages just because they are
     * all-visible, but we can still skip pages that are all-frozen, since
     * such pages do not need freezing and do not affect the value that we can
     * safely set for relfrozenxid or relminmxid.
     * 當設置了aggressive(T),我們不能僅僅因為頁面都是可見的就跳過它們,
     *   但是我們仍然可以跳過全部凍結的頁面,因為這些頁面不需要凍結,
     *   并且不影響我們可以安全地為relfrozenxid或relminmxid設置新值。
     *
     * Before entering the main loop, establish the invariant that
     * next_unskippable_block is the next block number >= blkno that we can't
     * skip based on the visibility map, either all-visible for a regular scan
     * or all-frozen for an aggressive scan.  We set it to nblocks if there's
     * no such block.  We also set up the skipping_blocks flag correctly at
     * this stage.
     * 在進入主循環(huán)之前,建立一個不變式,即next_unskippable_block: the next block number >= blkno,
     *   那么我們不能基于vm跳過它,對于常規(guī)掃描是全可見的,對于主動掃描是全凍結的。
     * 如果不存在這樣的block,那么我們就設它為nblocks。
     * 同時,我們還在這個階段正確地設置了skipping_blocks標志。
     *
     * Note: The value returned by visibilitymap_get_status could be slightly
     * out-of-date, since we make this test before reading the corresponding
     * heap page or locking the buffer.  This is OK.  If we mistakenly think
     * that the page is all-visible or all-frozen when in fact the flag's just
     * been cleared, we might fail to vacuum the page.  It's easy to see that
     * skipping a page when aggressive is not set is not a very big deal; we
     * might leave some dead tuples lying around, but the next vacuum will
     * find them.  But even when aggressive *is* set, it's still OK if we miss
     * a page whose all-frozen marking has just been cleared.  Any new XIDs
     * just added to that page are necessarily newer than the GlobalXmin we
     * computed, so they'll have no effect on the value to which we can safely
     * set relfrozenxid.  A similar argument applies for MXIDs and relminmxid.
     * 注意:visibilitymap_get_status返回的值可能有點過時,
     *   因為我們在讀取相應的堆頁面或鎖定緩沖區(qū)之前進行了測試。這沒有什么問題。
     * 如果我們錯誤地認為頁面是全部可見或全部凍結,
     *   而實際上剛剛清除了標志,那么我們可能無法執(zhí)行vacuum。
     * 顯而易見,在沒有設置aggressive的情況下跳過一個頁面并不是什么大問題;
     *   我們可能會留下一些DEAD元組,但是下一個vacuum會找到它們。
     * 但是,即使設置了aggressive,如果我們錯過了剛剛清除了所有凍結標記的頁面,也沒關系。
     * 剛剛添加到該頁面的任何新xid都必須比我們計算的GlobalXmin更新,
     *   因此它們不會影響我們安全地設置relfrozenxid的值。
     * 類似的觀點也適用于mxid和relminmxid。
     *
     * We will scan the table's last page, at least to the extent of
     * determining whether it has tuples or not, even if it should be skipped
     * according to the above rules; except when we've already determined that
     * it's not worth trying to truncate the table.  This avoids having
     * lazy_truncate_heap() take access-exclusive lock on the table to attempt
     * a truncation that just fails immediately because there are tuples in
     * the last page.  This is worth avoiding mainly because such a lock must
     * be replayed on any hot standby, where it can be disruptive.
     * 即使按照上面的規(guī)則應該跳過pages,但我們將掃描該表的最后一頁,
     *   至少掃描到可以確定該表是否有元組的extent內以確定是否存在元組,
     * 除非我們已經(jīng)確定不值得嘗試截斷表,那么就不需要執(zhí)行這樣的掃描。
     * 這避免了lazy_truncate_heap()函數(shù)對表進行訪問獨占鎖定并嘗試立即執(zhí)行截斷,因為最后一頁中有元組。
     * 這是值得的,主要是因為這樣的鎖必須在所有hot standby上replay,因為它可能會造成破壞。
     */
    //下一個未跳過的block
    next_unskippable_block = 0;
    if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
    {
        //選項沒有禁用跳過PAGE
        while (next_unskippable_block < nblocks)//循環(huán)k
        {
            uint8       vmstatus;//vm狀態(tài)
            vmstatus = visibilitymap_get_status(onerel, next_unskippable_block,
                                                &vmbuffer);
            if (aggressive)
            {
                if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
                    break;//遇到全凍結的block,跳出循環(huán)
            }
            else
            {
                if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
                    break;//如非強制掃描,遇到全可見block,跳出循環(huán)
            }
            vacuum_delay_point();
            next_unskippable_block++;
        }
    }
    if (next_unskippable_block >= SKIP_PAGES_THRESHOLD)
        skipping_blocks = true;//大于閾值,則設置為T
    else
        skipping_blocks = false;//否則為F
    for (blkno = 0; blkno < nblocks; blkno++)
    {
        //循環(huán)處理每個block
        Buffer      buf;//緩沖區(qū)編號
        Page        page;//page
        OffsetNumber offnum,//偏移
                    maxoff;
        bool        tupgone,
                    hastup;
        int         prev_dead_count;//上次已廢棄元組統(tǒng)計
        int         nfrozen;//凍結統(tǒng)計
        Size        freespace;//空閑空間
        bool        all_visible_according_to_vm = false;//通過vm判斷可見性的標記
        bool        all_visible;//全可見?
        bool        all_frozen = true;  /* provided all_visible is also true */
        bool        has_dead_tuples;//是否存在dead元組?
        TransactionId visibility_cutoff_xid = InvalidTransactionId;//事務ID
        /* see note above about forcing scanning of last page */
        //請查看上述關于最后一個page的強制掃描注釋
        //全部掃描&嘗試截斷
#define FORCE_CHECK_PAGE() \
        (blkno == nblocks - 1 && should_attempt_truncation(vacrelstats))
        //更新統(tǒng)計信息
        pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
        if (blkno == next_unskippable_block)
        {
            //到達了next_unskippable_block標記的地方
            /* Time to advance next_unskippable_block */
            //是時候增加next_unskippable_block計數(shù)了
            next_unskippable_block++;
            //尋找下一個需跳過的block
            if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
            {
                while (next_unskippable_block < nblocks)
                {
                    uint8       vmskipflags;
                    vmskipflags = visibilitymap_get_status(onerel,
                                                           next_unskippable_block,
                                                           &vmbuffer);
                    if (aggressive)
                    {
                        if ((vmskipflags & VISIBILITYMAP_ALL_FROZEN) == 0)
                            break;
                    }
                    else
                    {
                        if ((vmskipflags & VISIBILITYMAP_ALL_VISIBLE) == 0)
                            break;
                    }
                    vacuum_delay_point();
                    next_unskippable_block++;
                }
            }
            /*
             * We know we can't skip the current block.  But set up
             * skipping_blocks to do the right thing at the following blocks.
             * 不能跳過當前block.
             * 但設置skipping_blocks標記處理接下來的blocks
             */
            if (next_unskippable_block - blkno > SKIP_PAGES_THRESHOLD)
                skipping_blocks = true;
            else
                skipping_blocks = false;
            /*
             * Normally, the fact that we can't skip this block must mean that
             * it's not all-visible.  But in an aggressive vacuum we know only
             * that it's not all-frozen, so it might still be all-visible.
             * 通常,我們不能跳過這個塊的事實一定意味著它不是完全可見的。
             * 但在一個aggressive vacuum中,我們只知道它不是完全凍結的,所以它可能仍然是完全可見的。
             */
            if (aggressive && VM_ALL_VISIBLE(onerel, blkno, &vmbuffer))
                all_visible_according_to_vm = true;
        }
        else
        {
            //尚未到達next_unskippable_block標記的地方
            /*
             * The current block is potentially skippable; if we've seen a
             * long enough run of skippable blocks to justify skipping it, and
             * we're not forced to check it, then go ahead and skip.
             * Otherwise, the page must be at least all-visible if not
             * all-frozen, so we can set all_visible_according_to_vm = true.
             * 當前塊可能是可跳過的;如果我們已經(jīng)看到了足夠長的可跳過的塊運行時間,則可以跳過它,
             *   并且如果我們不需要檢查,那么就繼續(xù)跳過它。
             * 否則,頁面必須至少全部可見(如果不是全部凍結的話),
             *   因此我們可以設置all_visible_according_to_vm = true。
             */
            if (skipping_blocks && !FORCE_CHECK_PAGE())
            {
                /*
                 * Tricky, tricky.  If this is in aggressive vacuum, the page
                 * must have been all-frozen at the time we checked whether it
                 * was skippable, but it might not be any more.  We must be
                 * careful to count it as a skipped all-frozen page in that
                 * case, or else we'll think we can't update relfrozenxid and
                 * relminmxid.  If it's not an aggressive vacuum, we don't
                 * know whether it was all-frozen, so we have to recheck; but
                 * in this case an approximate answer is OK.
                 * 困難,棘手。如果這是在aggressive vacuum中,
                 *   那么在我們檢查頁面是否可跳過時,頁面肯定已經(jīng)完全凍結,但現(xiàn)在可能不會了。
                 * 在這種情況下,我們必須小心地將其視為跳過的全部凍結頁面,
                 *   否則我們將認為無法更新relfrozenxid和relminmxid。
                 * 如果它不是一個aggressive vacuum,我們不知道它是否完全凍結了,
                 *   所以我們必須重新檢查;但在這種情況下,近似的答案是可以的。
                 */
                if (aggressive || VM_ALL_FROZEN(onerel, blkno, &vmbuffer))
                    vacrelstats->frozenskipped_pages++;//完全凍結的page計數(shù)+1
                continue;//跳到下一個block
            }
            all_visible_according_to_vm = true;
        }
        vacuum_delay_point();
        /*
         * If we are close to overrunning the available space for dead-tuple
         * TIDs, pause and do a cycle of vacuuming before we tackle this page.
         * 如即將超出DEAD元組tid的可用空間,那么在處理此頁面之前,暫停并執(zhí)行一個vacuuming循環(huán)。
         */
        if ((vacrelstats->max_dead_tuples - vacrelstats->num_dead_tuples) < MaxHeapTuplesPerPage &&
            vacrelstats->num_dead_tuples > 0)
        {
            //存在廢棄的元組,而且:
            //MaxHeapTuplesPerPage + vacrelstats->num_dead_tuples > vacrelstats->max_dead_tuples
            const int   hvp_index[] = {
                PROGRESS_VACUUM_PHASE,
                PROGRESS_VACUUM_NUM_INDEX_VACUUMS
            };
            int64       hvp_val[2];
            /*
             * Before beginning index vacuuming, we release any pin we may
             * hold on the visibility map page.  This isn't necessary for
             * correctness, but we do it anyway to avoid holding the pin
             * across a lengthy, unrelated operation.
             * 在開始index vacuuming前,釋放在vm page上持有的所有pin.
             * 這對于正確性并不是必需的,但是我們這樣做是為了避免在一個冗長的、不相關的操作中持有pin。
             */
            if (BufferIsValid(vmbuffer))
            {
                ReleaseBuffer(vmbuffer);
                vmbuffer = InvalidBuffer;
            }
            /* Log cleanup info before we touch indexes */
            //在開始處理indexes前清除日志信息
            vacuum_log_cleanup_info(onerel, vacrelstats);
            /* Report that we are now vacuuming indexes */
            //正在清理vacuum indexes
            pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
                                         PROGRESS_VACUUM_PHASE_VACUUM_INDEX);
            /* Remove index entries */
            //遍歷index relation,執(zhí)行vacuum
            //刪除指向在vacrelstats->dead_tuples元組的索引條目,更新運行時統(tǒng)計信息
            for (i = 0; i < nindexes; i++)
                lazy_vacuum_index(Irel[i],
                                  &indstats[i],
                                  vacrelstats);
            /*
             * Report that we are now vacuuming the heap.  We also increase
             * the number of index scans here; note that by using
             * pgstat_progress_update_multi_param we can update both
             * parameters atomically.
             * 報告正在vacumming heap.
             * 這里會增加索引掃描,注意通過設置pgstat_progress_update_multi_param參數(shù)可以同時自動更新參數(shù).
             */
            hvp_val[0] = PROGRESS_VACUUM_PHASE_VACUUM_HEAP;
            hvp_val[1] = vacrelstats->num_index_scans + 1;
            pgstat_progress_update_multi_param(2, hvp_index, hvp_val);
            /* Remove tuples from heap */
            //清理heap relation中的元組
            lazy_vacuum_heap(onerel, vacrelstats);
            /*
             * Forget the now-vacuumed tuples, and press on, but be careful
             * not to reset latestRemovedXid since we want that value to be
             * valid.
             * 無需理會now-vacuumed元組,
             *   繼續(xù)處理,但是要小心不要重置latestRemovedXid,因為我們希望該值是有效的。
             */
            vacrelstats->num_dead_tuples = 0;//重置計數(shù)
            vacrelstats->num_index_scans++;//索引掃描次數(shù)+1
            /*
             * Vacuum the Free Space Map to make newly-freed space visible on
             * upper-level FSM pages.  Note we have not yet processed blkno.
             * Vacuum FSM以使新釋放的空間再頂層FSM pages中可見.
             * 注意,我們還沒有處理blkno。
             */
            FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum, blkno);
            next_fsm_block_to_vacuum = blkno;
            /* Report that we are once again scanning the heap */
            //報告再次掃描heap
            pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
                                         PROGRESS_VACUUM_PHASE_SCAN_HEAP);
        }
        /*
         * Pin the visibility map page in case we need to mark the page
         * all-visible.  In most cases this will be very cheap, because we'll
         * already have the correct page pinned anyway.  However, it's
         * possible that (a) next_unskippable_block is covered by a different
         * VM page than the current block or (b) we released our pin and did a
         * cycle of index vacuuming.
         * 如需要標記page為all-visible,則在內存中PIN VM.
         * 在大多數(shù)情況下,這個動作的成本很低,因為我們已經(jīng)pinned page了.
         * 但是,有可能(a) next_unskippable_block被不同的VM page而不是當前block覆蓋
         *   (b) 釋放了pin并且執(zhí)行了index vacuuming
         */
        visibilitymap_pin(onerel, blkno, &vmbuffer);
        //以擴展方式讀取buffer
        buf = ReadBufferExtended(onerel, MAIN_FORKNUM, blkno,
                                 RBM_NORMAL, vac_strategy);
        /* We need buffer cleanup lock so that we can prune HOT chains. */
        //需要buffer cleanup lock以便清理HOT chains.
        //ConditionalLockBufferForCleanup - 跟LockBufferForCleanup類似,但不會等待鎖的獲取
        if (!ConditionalLockBufferForCleanup(buf))
        {
            //----------- 不能獲取到鎖
            /*
             * If we're not performing an aggressive scan to guard against XID
             * wraparound, and we don't want to forcibly check the page, then
             * it's OK to skip vacuuming pages we get a lock conflict on. They
             * will be dealt with in some future vacuum.
             * 如果執(zhí)行的不是aggressive掃描(用于避免XID wraparound),而且我們不希望強制檢查頁面,
             *   那么出現(xiàn)鎖沖突跳過vacuuming pages也是可以接受的.
             * 這些page會在未來的vacuum中進行處理.
             */
            if (!aggressive && !FORCE_CHECK_PAGE())
            {
                //非aggressive掃描 && 不強制檢查page
                //釋放buffer,跳過pinned pages+1
                ReleaseBuffer(buf);
                vacrelstats->pinskipped_pages++;
                continue;
            }
            /*
             * Read the page with share lock to see if any xids on it need to
             * be frozen.  If not we just skip the page, after updating our
             * scan statistics.  If there are some, we wait for cleanup lock.
             * 使用共享鎖讀取page,檢查是否存在XIDs需要凍結.
             * 如無此需要,則更新掃描統(tǒng)計信息后跳過此page.
             * 如有此需要,則等待clean lock.
             *
             * We could defer the lock request further by remembering the page
             * and coming back to it later, or we could even register
             * ourselves for multiple buffers and then service whichever one
             * is received first.  For now, this seems good enough.
             * 我們可以通過記住頁面并稍后返回來進一步延遲鎖請求,
             *   或者甚至可以為多個緩沖區(qū)注冊,然后為最先接收到的緩沖區(qū)提供服務。
             *
             * If we get here with aggressive false, then we're just forcibly
             * checking the page, and so we don't want to insist on getting
             * the lock; we only need to know if the page contains tuples, so
             * that we can update nonempty_pages correctly.  It's convenient
             * to use lazy_check_needs_freeze() for both situations, though.
             * 如aggressive為F,那么強制執(zhí)行page檢查,這時候不希望一直持有鎖,
             *   我們只需要知道page包含tuples以便可以正確的更新非空pages.
             * 對于這兩種情況,都可以方便地使用lazy_check_needs_freeze()。
             */
            //共享方式鎖定buffer
            LockBuffer(buf, BUFFER_LOCK_SHARE);
            //lazy_check_needs_freeze --> 掃描page檢查是否存在元組需要清理以避免wraparound
            if (!lazy_check_needs_freeze(buf, &hastup))
            {
                //不存在需要清理的tuples
                UnlockReleaseBuffer(buf);
                vacrelstats->scanned_pages++;
                vacrelstats->pinskipped_pages++;
                if (hastup)
                    vacrelstats->nonempty_pages = blkno + 1;
                //跳過該block
                continue;
            }
            if (!aggressive)
            {
                /*
                 * Here, we must not advance scanned_pages; that would amount
                 * to claiming that the page contains no freezable tuples.
                 * 在這里不需要增加scanned_pages,這相當于聲明頁面不包含可凍結的元組。
                 */
                UnlockReleaseBuffer(buf);
                vacrelstats->pinskipped_pages++;
                if (hastup)
                    vacrelstats->nonempty_pages = blkno + 1;
                continue;
            }
            LockBuffer(buf, BUFFER_LOCK_UNLOCK);
            LockBufferForCleanup(buf);
            /* drop through to normal processing */
        }
        //更新統(tǒng)計信息
        vacrelstats->scanned_pages++;
        vacrelstats->tupcount_pages++;
        //獲取page
        page = BufferGetPage(buf);
        if (PageIsNew(page))
        {
            //-------------- 新初始化的PAGE
            /*
             * An all-zeroes page could be left over if a backend extends the
             * relation but crashes before initializing the page. Reclaim such
             * pages for use.
             * 如果后臺進程擴展了relation但在初始化頁面前數(shù)據(jù)庫崩潰,那么初始化(全0)的page可以一直保留.
             * 重新聲明該頁面可用即可.
             *
             * We have to be careful here because we could be looking at a
             * page that someone has just added to the relation and not yet
             * been able to initialize (see RelationGetBufferForTuple). To
             * protect against that, release the buffer lock, grab the
             * relation extension lock momentarily, and re-lock the buffer. If
             * the page is still uninitialized by then, it must be left over
             * from a crashed backend, and we can initialize it.
             * 在這里注意小心應對,我們可能正在搜索一個其他進程需要添加到relation的page,
             *   而且該page尚未初始化(詳見RelationGetBufferForTuple).
             * 為了避免這種情況引起的問題,釋放緩存鎖,暫時獲取關系擴展鎖,并重新鎖定緩沖.
             * 如果這時候page仍為初始化,那么該page肯定是一個崩潰的后臺進程導致的,
             *   這時候我們可以初始化該page.
             *
             * We don't really need the relation lock when this is a new or
             * temp relation, but it's probably not worth the code space to
             * check that, since this surely isn't a critical path.
             * 對于新的或臨時relation,這時候不需要獲取relation鎖,
             *   但是可能不值得花這么多代碼來檢查它,因為這肯定不是一個關鍵路徑。
             *
             * Note: the comparable code in vacuum.c need not worry because
             * it's got exclusive lock on the whole relation.
             * 注意:無需擔心vacuum.c中的對比代碼,因為代碼并沒有獲取整個relation的獨享鎖.
             */
            LockBuffer(buf, BUFFER_LOCK_UNLOCK);
            //ExclusiveLock鎖定
            LockRelationForExtension(onerel, ExclusiveLock);
            //ExclusiveLock釋放
            UnlockRelationForExtension(onerel, ExclusiveLock);
            //鎖定buffer
            LockBufferForCleanup(buf);
            //再次判斷page是否NEW
            if (PageIsNew(page))
            {
                //page仍然是New的,那可以重新init該page了.
                ereport(WARNING,
                        (errmsg("relation \"%s\" page %u is uninitialized --- fixing",
                                relname, blkno)));
                PageInit(page, BufferGetPageSize(buf), 0);
                empty_pages++;
            }
            //獲取空閑空間
            freespace = PageGetHeapFreeSpace(page);
            //標記buffer為臟
            MarkBufferDirty(buf);
            UnlockReleaseBuffer(buf);
            //標記page
            RecordPageWithFreeSpace(onerel, blkno, freespace);
            //下一個page
            continue;
        }
        if (PageIsEmpty(page))
        {
            //----------------- 空PAGE
            empty_pages++;
            freespace = PageGetHeapFreeSpace(page);
            /* empty pages are always all-visible and all-frozen */
            //空pages通常是all-visible和all-frozen的
            if (!PageIsAllVisible(page))
            {
                //Page不是all-Visible
                //處理之
                START_CRIT_SECTION();
                /* mark buffer dirty before writing a WAL record */
                //寫入WAL Record前標記該buffer為臟buffer
                MarkBufferDirty(buf);
                /*
                 * It's possible that another backend has extended the heap,
                 * initialized the page, and then failed to WAL-log the page
                 * due to an ERROR.  Since heap extension is not WAL-logged,
                 * recovery might try to replay our record setting the page
                 * all-visible and find that the page isn't initialized, which
                 * will cause a PANIC.  To prevent that, check whether the
                 * page has been previously WAL-logged, and if not, do that
                 * now.
                 * 存在可能:另外一個后臺進程已擴展了heap,并初始化了page,但記錄日志失敗.
                 * 因為heap擴展是沒有寫日志的,恢復過程可能嘗試回放我們的記錄設置page
                 *   為all-visible并發(fā)現(xiàn)該page并未初始化,這會導致PANIC.
                 * 為了避免這種情況,檢查page先前是否已記錄日志,如沒有,現(xiàn)在執(zhí)行該操作.
                 */
                if (RelationNeedsWAL(onerel) &&
                    PageGetLSN(page) == InvalidXLogRecPtr)
                    //如需要記錄WAL Record但page的LSN非法,則記錄日志
                    log_newpage_buffer(buf, true);
                //設置page的all-visible標記
                PageSetAllVisible(page);
                //設置vm
                visibilitymap_set(onerel, blkno, buf, InvalidXLogRecPtr,
                                  vmbuffer, InvalidTransactionId,
                                  VISIBILITYMAP_ALL_VISIBLE | VISIBILITYMAP_ALL_FROZEN);
                END_CRIT_SECTION();
            }
            UnlockReleaseBuffer(buf);
            RecordPageWithFreeSpace(onerel, blkno, freespace);
            //處理下一個block
            continue;
        }
        /*
         * Prune all HOT-update chains in this page.
         * 清理該page中的所有HOT-update鏈
         *
         * We count tuples removed by the pruning step as removed by VACUUM.
         * 計算通過VACUUM的清理步驟清楚的tuples數(shù)量.
         */
        tups_vacuumed += heap_page_prune(onerel, buf, OldestXmin, false,
                                         &vacrelstats->latestRemovedXid);
        /*
         * Now scan the page to collect vacuumable items and check for tuples
         * requiring freezing.
         * 現(xiàn)在,掃描page統(tǒng)計已清理的條目數(shù)并檢查哪些tuples需要凍結.
         */
        all_visible = true;
        has_dead_tuples = false;
        nfrozen = 0;
        hastup = false;
        prev_dead_count = vacrelstats->num_dead_tuples;
        maxoff = PageGetMaxOffsetNumber(page);//獲取最大偏移
        /*
         * Note: If you change anything in the loop below, also look at
         * heap_page_is_all_visible to see if that needs to be changed.
         * 注意:如果在下面的循環(huán)中修改了業(yè)務邏輯,
         *   需要檢查heap_page_is_all_visible判斷是否需要改變.
         */
        for (offnum = FirstOffsetNumber;
             offnum <= maxoff;
             offnum = OffsetNumberNext(offnum))
        {
            ItemId      itemid;
            itemid = PageGetItemId(page, offnum);
            /* Unused items require no processing, but we count 'em */
            //未使用的條目無需處理,但需要計數(shù).
            if (!ItemIdIsUsed(itemid))
            {
                //未被使用,跳過
                nunused += 1;
                continue;
            }
            /* Redirect items mustn't be touched */
            //重定向的條目不需要"接觸".
            if (ItemIdIsRedirected(itemid))
            {
                //重定向的ITEM
                //該page不能被截斷
                hastup = true;  /* this page won't be truncatable */
                continue;
            }
            //設置行指針
            ItemPointerSet(&(tuple.t_self), blkno, offnum);
            /*
             * DEAD item pointers are to be vacuumed normally; but we don't
             * count them in tups_vacuumed, else we'd be double-counting (at
             * least in the common case where heap_page_prune() just freed up
             * a non-HOT tuple).
             * 廢棄的行指針將被正常vacuumed.
             * 但我們不需要通過tups_vacuumed變量計數(shù),否則會重復統(tǒng)計.
             * (起碼在通常情況下,heap_page_prune()會釋放non-HOT元組)
             */
            if (ItemIdIsDead(itemid))
            {
                //記錄需刪除的tuple
                //vacrelstats->dead_tuples[vacrelstats->num_dead_tuples] = *itemptr;
                //vacrelstats->num_dead_tuples++;
                lazy_record_dead_tuple(vacrelstats, &(tuple.t_self));
                all_visible = false;
                continue;
            }
            Assert(ItemIdIsNormal(itemid));
            //獲取數(shù)據(jù)
            tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
            tuple.t_len = ItemIdGetLength(itemid);
            tuple.t_tableOid = RelationGetRelid(onerel);
            tupgone = false;
            /*
             * The criteria for counting a tuple as live in this block need to
             * match what analyze.c's acquire_sample_rows() does, otherwise
             * VACUUM and ANALYZE may produce wildly different reltuples
             * values, e.g. when there are many recently-dead tuples.
             * 統(tǒng)計存活元組的計算策略需要與analyze.c中的acquire_sample_rows()邏輯匹配,
             *   否則的話,VACUUM/ANALYZE可能會產生差異很大的reltuples值,
             *   比如在出現(xiàn)非常多近期被廢棄的元組的情況下.
             *
             * The logic here is a bit simpler than acquire_sample_rows(), as
             * VACUUM can't run inside a transaction block, which makes some
             * cases impossible (e.g. in-progress insert from the same
             * transaction).
             * 這里的邏輯比acquire_sample_rows()函數(shù)邏輯要簡單許多,
             *   因為VACUUM不能在事務塊內支線,這可以減少許多不必要的邏輯.
             */
            //為VACUUM確定元組的狀態(tài).
            //在這里,主要目的是一個元組是否可能對所有正在運行中的事務可見.
            switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
            {
                case HEAPTUPLE_DEAD:
                    /*
                     * Ordinarily, DEAD tuples would have been removed by
                     * heap_page_prune(), but it's possible that the tuple
                     * state changed since heap_page_prune() looked.  In
                     * particular an INSERT_IN_PROGRESS tuple could have
                     * changed to DEAD if the inserter aborted.  So this
                     * cannot be considered an error condition.
                     * 通常來說,廢棄的元組可能已通過heap_page_prune()函數(shù)清除,
                     *   但在heap_page_prune()搜索的過程中元組的狀態(tài)可能會出現(xiàn)變更.
                     * 特別是,如果插入程序中止,INSERT_IN_PROGRESS元組可能已經(jīng)變成DEAD。
                     * 所以這不能被認為是一個錯誤條件。
                     *
                     * If the tuple is HOT-updated then it must only be
                     * removed by a prune operation; so we keep it just as if
                     * it were RECENTLY_DEAD.  Also, if it's a heap-only
                     * tuple, we choose to keep it, because it'll be a lot
                     * cheaper to get rid of it in the next pruning pass than
                     * to treat it like an indexed tuple.
                     * 如果該tuple是HOT-updated,那么必須通過pruge操作清理.
                     *   因此元組狀態(tài)調整為RECENTLY_DEAD.
                     * 同時,如果這是一個HOT,我們選擇保留該tuple,
                     *   因為在下一次清理中刪除它要比現(xiàn)在像處理索引元組那樣處理它成本要低得多。
                     *
                     * If this were to happen for a tuple that actually needed
                     * to be deleted, we'd be in trouble, because it'd
                     * possibly leave a tuple below the relation's xmin
                     * horizon alive.  heap_prepare_freeze_tuple() is prepared
                     * to detect that case and abort the transaction,
                     * preventing corruption.
                     * 如果這種情況發(fā)生在需要刪除的元組上,我們就有麻煩了,
                     *   因為它可能會使關系小于xmin的元組保持活動狀態(tài)。
                     * heap_prepare_freeze_tuple()函數(shù)用于檢測這種狀態(tài),并終止事務以避免出現(xiàn)崩潰.
                     */
                    if (HeapTupleIsHotUpdated(&tuple) ||
                        HeapTupleIsHeapOnly(&tuple))
                        nkeep += 1;
                    else
                        //可以刪除元組
                        tupgone = true; /* we can delete the tuple */
                    //存在dead tuple,設置all Visible標記為F
                    all_visible = false;
                    break;
                case HEAPTUPLE_LIVE:
                    /*
                     * Count it as live.  Not only is this natural, but it's
                     * also what acquire_sample_rows() does.
                     * 存活元組計數(shù).
                     * 這不僅很自然,而且acquire_sample_rows()也是這樣做的。
                     */
                    live_tuples += 1;
                    /*
                     * Is the tuple definitely visible to all transactions?
                     * 元組對所有事務肯定可見嗎?
                     *
                     * NB: Like with per-tuple hint bits, we can't set the
                     * PD_ALL_VISIBLE flag if the inserter committed
                     * asynchronously. See SetHintBits for more info. Check
                     * that the tuple is hinted xmin-committed because of
                     * that.
                     * 注意:與per-tuple hint bits類似,如果異步提交,那么不能設置PD_ALL_VISIBLE標記.
                     * 詳見SetHintBits函數(shù).
                     * 因此需要檢測該元組已標記為xmin-committed.
                     */
                    if (all_visible)
                    {
                        //all_visible = T
                        TransactionId xmin;
                        if (!HeapTupleHeaderXminCommitted(tuple.t_data))
                        {
                            //xmin not committed,設置為F
                            all_visible = false;
                            break;
                        }
                        /*
                         * The inserter definitely committed. But is it old
                         * enough that everyone sees it as committed?
                         * 插入器確實已經(jīng)提交
                         * 但已足夠老,其他進程都可以看到?
                         */
                        xmin = HeapTupleHeaderGetXmin(tuple.t_data);
                        if (!TransactionIdPrecedes(xmin, OldestXmin))
                        {
                            //元組xmin比OldestXmin要小,則設置為F
                            all_visible = false;
                            break;
                        }
                        /* Track newest xmin on page. */
                        //跟蹤page上最新的xmin
                        //if (int32)(xmin > visibility_cutoff_xid) > 0,return T
                        if (TransactionIdFollows(xmin, visibility_cutoff_xid))
                            visibility_cutoff_xid = xmin;
                    }
                    break;
                case HEAPTUPLE_RECENTLY_DEAD:
                    /*
                     * If tuple is recently deleted then we must not remove it
                     * from relation.
                     * 如元組是近期被刪除的,那么不能從relation中刪除這些元組.
                     */
                    nkeep += 1;
                    all_visible = false;
                    break;
                case HEAPTUPLE_INSERT_IN_PROGRESS:
                    /*
                     * This is an expected case during concurrent vacuum.
                     * 在并發(fā)vacuum期間這是可以預期的情況.
                     *
                     * We do not count these rows as live, because we expect
                     * the inserting transaction to update the counters at
                     * commit, and we assume that will happen only after we
                     * report our results.  This assumption is a bit shaky,
                     * but it is what acquire_sample_rows() does, so be
                     * consistent.
                     * 不能統(tǒng)計這些元組為存活元組,因為我們期望插入事務在提交時更新計數(shù)器,
                     *   同時我們假定只在報告了結果后才會發(fā)生.
                     * 這個假設有點不可靠,但acquire_sample_rows()就是這么做的,所以要保持一致。
                     */
                    all_visible = false;
                    break;
                case HEAPTUPLE_DELETE_IN_PROGRESS:
                    /* This is an expected case during concurrent vacuum */
                    //在同步期間,這種情況可以預期
                    all_visible = false;
                    /*
                     * Count such rows as live.  As above, we assume the
                     * deleting transaction will commit and update the
                     * counters after we report.
                     * 這些行視為存活行.
                     * 如上所述,我們假定刪除事務會提交并在我們報告后更新計數(shù)器.
                     */
                    live_tuples += 1;
                    break;
                default:
                    //沒有其他狀態(tài)了.
                    elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
                    break;
            }
            if (tupgone)
            {
                 //記錄需刪除的tuple
                //vacrelstats->dead_tuples[vacrelstats->num_dead_tuples] = *itemptr;
                //vacrelstats->num_dead_tuples++;
                lazy_record_dead_tuple(vacrelstats, &(tuple.t_self));
                HeapTupleHeaderAdvanceLatestRemovedXid(tuple.t_data,
                                                       &vacrelstats->latestRemovedXid);
                tups_vacuumed += 1;
                has_dead_tuples = true;
            }
            else
            {
                bool        tuple_totally_frozen;//所有都凍結標記
                num_tuples += 1;
                hastup = true;
                /*
                 * Each non-removable tuple must be checked to see if it needs
                 * freezing.  Note we already have exclusive buffer lock.
                 * 每一個未清理的tuple必須檢查看看是否需要凍結.
                 * 注意我們已經(jīng)持有了獨占緩沖鎖.
                 */
                if (heap_prepare_freeze_tuple(tuple.t_data,
                                              relfrozenxid, relminmxid,
                                              FreezeLimit, MultiXactCutoff,
                                              &frozen[nfrozen],
                                              &tuple_totally_frozen))
                    frozen[nfrozen++].offset = offnum;
                if (!tuple_totally_frozen)
                    all_frozen = false;
            }
        }                       /* scan along page */
        /*
         * If we froze any tuples, mark the buffer dirty, and write a WAL
         * record recording the changes.  We must log the changes to be
         * crash-safe against future truncation of CLOG.
         * 如果凍結了所有的元組,標記緩沖為臟狀態(tài),寫入WAL Record記錄這些變化.
         * 必須記錄這些變化以避免截斷CLOG時出現(xiàn)崩潰導致數(shù)據(jù)丟失.
         */
        if (nfrozen > 0)
        {
            //已凍結計數(shù)>0,執(zhí)行相關處理
            START_CRIT_SECTION();
            //標記緩沖為臟
            MarkBufferDirty(buf);
            /* execute collected freezes */
            //執(zhí)行凍結
            for (i = 0; i < nfrozen; i++)
            {
                ItemId      itemid;
                HeapTupleHeader htup;
                itemid = PageGetItemId(page, frozen[i].offset);
                htup = (HeapTupleHeader) PageGetItem(page, itemid);
                //執(zhí)行凍結
                heap_execute_freeze_tuple(htup, &frozen[i]);
            }
            /* Now WAL-log freezing if necessary */
            //如需要,記錄凍結日志
            if (RelationNeedsWAL(onerel))
            {
                XLogRecPtr  recptr;
                recptr = log_heap_freeze(onerel, buf, FreezeLimit,
                                         frozen, nfrozen);
                PageSetLSN(page, recptr);
            }
            END_CRIT_SECTION();
        }
        /*
         * If there are no indexes then we can vacuum the page right now
         * instead of doing a second scan.
         * 如果沒有索引,那么現(xiàn)在執(zhí)行vacuum page而不需要二次掃描.
         */
        if (nindexes == 0 &&
            vacrelstats->num_dead_tuples > 0)
        {
            //------------- 如無索引并且存在dead元組,執(zhí)行清理
            /* Remove tuples from heap */
            //清除元組
            lazy_vacuum_page(onerel, blkno, buf, 0, vacrelstats, &vmbuffer);
            has_dead_tuples = false;
            /*
             * Forget the now-vacuumed tuples, and press on, but be careful
             * not to reset latestRemovedXid since we want that value to be
             * valid.
             * 無需再關注現(xiàn)在已被vacuum的元組,繼續(xù),但要小心不要重置了latestRemovedXid,
             *   因為我們希望該值是有效的.
             */
            vacrelstats->num_dead_tuples = 0;//重置計數(shù)器
            vacuumed_pages++;//已完成的page+1
            /*
             * Periodically do incremental FSM vacuuming to make newly-freed
             * space visible on upper FSM pages.  Note: although we've cleaned
             * the current block, we haven't yet updated its FSM entry (that
             * happens further down), so passing end == blkno is correct.
             * 周期性的進行增量FSM vacuuming,以使新釋放的空間在上層FSM pages中可見.
             * 注意:雖然我們已經(jīng)清理了當前塊,我們并不需要更新塊的FSM入口(后續(xù)才進行處理),
             *   因此設置end == blkno是沒有問題的.
             */
            if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
            {
                //批量處理
                FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum,
                                        blkno);
                next_fsm_block_to_vacuum = blkno;
            }
        }
        //獲取空閑空間
        freespace = PageGetHeapFreeSpace(page);
        //以下if/else邏輯用于同步vm狀態(tài)
        /* mark page all-visible, if appropriate */
        //如OK,標記頁面為all-Visible
        if (all_visible && !all_visible_according_to_vm)
        {
            //
            uint8       flags = VISIBILITYMAP_ALL_VISIBLE;
            if (all_frozen)
                flags |= VISIBILITYMAP_ALL_FROZEN;
            /*
             * It should never be the case that the visibility map page is set
             * while the page-level bit is clear, but the reverse is allowed
             * (if checksums are not enabled).  Regardless, set the both bits
             * so that we get back in sync.
             * 如page-level bit是否被清除,不應設置VM page,但允許反向設置(如沒有啟用校驗和).
             * 不管怎樣,把這兩個標記位都設置好,這樣我們就可以同步狀態(tài)了.
             *
             * NB: If the heap page is all-visible but the VM bit is not set,
             * we don't need to dirty the heap page.  However, if checksums
             * are enabled, we do need to make sure that the heap page is
             * dirtied before passing it to visibilitymap_set(), because it
             * may be logged.  Given that this situation should only happen in
             * rare cases after a crash, it is not worth optimizing.
             * 注意:如果heap page是all-visible但VM沒有設置,我們不需要設置該page為臟page.
             * 但是,如果啟用了校驗位,
             *   我們確實需要確保heap page在傳遞給visibilitymap_set()函數(shù)前標記為臟,因為可能需要記錄日志.
             * 給定的這個條件應只出現(xiàn)在較為罕見的崩潰之后,因此不值得調優(yōu).  
             */
            PageSetAllVisible(page);
            MarkBufferDirty(buf);
            visibilitymap_set(onerel, blkno, buf, InvalidXLogRecPtr,
                              vmbuffer, visibility_cutoff_xid, flags);
        }
        /*
         * As of PostgreSQL 9.2, the visibility map bit should never be set if
         * the page-level bit is clear.  However, it's possible that the bit
         * got cleared after we checked it and before we took the buffer
         * content lock, so we must recheck before jumping to the conclusion
         * that something bad has happened.
         * 從PostgreSQL 9.2開始,如果頁面級別位已清除,就不應該設置可見性映射位。
         * 但是,可能會出現(xiàn)在我們檢查之后和持有緩存內存鎖之前,頁面級別位被清理,
         *   因此我們必須在情況變壞之前重新檢查
         */
        else if (all_visible_according_to_vm && !PageIsAllVisible(page)
                 && VM_ALL_VISIBLE(onerel, blkno, &vmbuffer))
        {
            elog(WARNING, "page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
                 relname, blkno);
            visibilitymap_clear(onerel, blkno, vmbuffer,
                                VISIBILITYMAP_VALID_BITS);
        }
        /*
         * It's possible for the value returned by GetOldestXmin() to move
         * backwards, so it's not wrong for us to see tuples that appear to
         * not be visible to everyone yet, while PD_ALL_VISIBLE is already
         * set. The real safe xmin value never moves backwards, but
         * GetOldestXmin() is conservative and sometimes returns a value
         * that's unnecessarily small, so if we see that contradiction it just
         * means that the tuples that we think are not visible to everyone yet
         * actually are, and the PD_ALL_VISIBLE flag is correct.
         * GetOldestXmin()返回的值有可能向后移動,
         *   因此我們看到的元組似乎還不是每個事務都可見,
         *   而PD_ALL_VISIBLE已經(jīng)設置好了,這并沒有錯。
         * 實際安全的xmin值永遠都不應該往后移動,但GetOldestXmin()比較保守,有時會返回一個不必要的小值,
         *   因此如果我們看到這個毛病,那么意味著我們認為對所有事務都不可見的元組實際上仍在那里,
         *   而且PD_ALL_VISIBLE標記是正確的.
         *
         * There should never be dead tuples on a page with PD_ALL_VISIBLE
         * set, however.
         * 但是,在一個標記為PD_ALL_VISIBLE的page中,永遠不應出現(xiàn)dead tupls.
         */
        else if (PageIsAllVisible(page) && has_dead_tuples)
        {
            elog(WARNING, "page containing dead tuples is marked as all-visible in relation \"%s\" page %u",
                 relname, blkno);
            PageClearAllVisible(page);
            MarkBufferDirty(buf);
            visibilitymap_clear(onerel, blkno, vmbuffer,
                                VISIBILITYMAP_VALID_BITS);
        }
        /*
         * If the all-visible page is turned out to be all-frozen but not
         * marked, we should so mark it.  Note that all_frozen is only valid
         * if all_visible is true, so we must check both.
         * 如all-visible page已被凍結但未被標記,我們應該標記它.
         * 注意all_frozen只有在all_visible為T的情況下才是有效的,因此必須兩者都要檢查.
         */
        else if (all_visible_according_to_vm && all_visible && all_frozen &&
                 !VM_ALL_FROZEN(onerel, blkno, &vmbuffer))
        {
            /*
             * We can pass InvalidTransactionId as the cutoff XID here,
             * because setting the all-frozen bit doesn't cause recovery
             * conflicts.
             * 我們可以把InvalidTransactionId作為截斷XID參數(shù)進行傳遞,
             *   因為設置all-frozen位必會導致恢復沖突.
             */
            visibilitymap_set(onerel, blkno, buf, InvalidXLogRecPtr,
                              vmbuffer, InvalidTransactionId,
                              VISIBILITYMAP_ALL_FROZEN);
        }
        UnlockReleaseBuffer(buf);
        /* Remember the location of the last page with nonremovable tuples */
        //使用未被清理的元組記錄最后一個頁面的位置.
        if (hastup)
            vacrelstats->nonempty_pages = blkno + 1;
        /*
         * If we remembered any tuples for deletion, then the page will be
         * visited again by lazy_vacuum_heap, which will compute and record
         * its post-compaction free space.  If not, then we're done with this
         * page, so remember its free space as-is.  (This path will always be
         * taken if there are no indexes.)
         * 如果我們記得要刪除任何元組,那么lazy_vacuum_heap將再次訪問該頁,它將計算并記錄壓縮后的空閑空間。
         * 如果不是,那么我們就清理完了這個頁面,所以請記住它的空閑空間是原樣的。
         * (如果沒有索引,則始終采用此路徑。)
         */
        if (vacrelstats->num_dead_tuples == prev_dead_count)
            RecordPageWithFreeSpace(onerel, blkno, freespace);
    } //結束block循環(huán)
    /* report that everything is scanned and vacuumed */
    //報告所有數(shù)據(jù)已掃描并vacuumed.
    pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
    pfree(frozen);
    /* save stats for use later */
    //存儲統(tǒng)計已備后用
    vacrelstats->tuples_deleted = tups_vacuumed;
    vacrelstats->new_dead_tuples = nkeep;
    /* now we can compute the new value for pg_class.reltuples */
    //現(xiàn)在可以為pg_class.reltuples設置新值了.
    vacrelstats->new_live_tuples = vac_estimate_reltuples(onerel,
                                                          nblocks,
                                                          vacrelstats->tupcount_pages,
                                                          live_tuples);
    /* also compute total number of surviving heap entries */
    //同時,技術存活的heap條目總數(shù)
    vacrelstats->new_rel_tuples =
        vacrelstats->new_live_tuples + vacrelstats->new_dead_tuples;
    /*
     * Release any remaining pin on visibility map page.
     * 在vm page中釋放所有的pin
     */
    if (BufferIsValid(vmbuffer))
    {
        ReleaseBuffer(vmbuffer);
        vmbuffer = InvalidBuffer;
    }
    /* If any tuples need to be deleted, perform final vacuum cycle */
    /* XXX put a threshold on min number of tuples here? */
    //如果仍有元組需要刪除,執(zhí)行最后的vacuum循環(huán).
    //在這里為元組的最小數(shù)目設置一個閾值?
    if (vacrelstats->num_dead_tuples > 0)
    {
        const int   hvp_index[] = {
            PROGRESS_VACUUM_PHASE,
            PROGRESS_VACUUM_NUM_INDEX_VACUUMS
        };
        int64       hvp_val[2];
        /* Log cleanup info before we touch indexes */
        //在訪問索引前記錄清理信息
        vacuum_log_cleanup_info(onerel, vacrelstats);
        /* Report that we are now vacuuming indexes */
        //報告我們正在vacumming索引
        pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
                                     PROGRESS_VACUUM_PHASE_VACUUM_INDEX);
        /* Remove index entries */
        //清理索引條目
        for (i = 0; i < nindexes; i++)
            lazy_vacuum_index(Irel[i],
                              &indstats[i],
                              vacrelstats);
        /* Report that we are now vacuuming the heap */
        //報告我們正在vacuuming heap
        hvp_val[0] = PROGRESS_VACUUM_PHASE_VACUUM_HEAP;
        hvp_val[1] = vacrelstats->num_index_scans + 1;
        pgstat_progress_update_multi_param(2, hvp_index, hvp_val);
        /* Remove tuples from heap */
        //清理元組
        pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
                                     PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
        lazy_vacuum_heap(onerel, vacrelstats);
        vacrelstats->num_index_scans++;
    }
    /*
     * Vacuum the remainder of the Free Space Map.  We must do this whether or
     * not there were indexes.
     * vacuum FSM.
     * 不管是否存在索引,都必須如此處理.
     */
    if (blkno > next_fsm_block_to_vacuum)
        FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum, blkno);
    /* report all blocks vacuumed; and that we're cleaning up */
    //報告所有blocks vacuumed,已完成清理.
    pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
    pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
                                 PROGRESS_VACUUM_PHASE_INDEX_CLEANUP);
    /* Do post-vacuum cleanup and statistics update for each index */
    //執(zhí)行vacuum收尾工作,為每個索引更新統(tǒng)計信息
    for (i = 0; i < nindexes; i++)
        lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
    /* If no indexes, make log report that lazy_vacuum_heap would've made */
    //如無索引,寫日志
    if (vacuumed_pages)
        ereport(elevel,
                (errmsg("\"%s\": removed %.0f row versions in %u pages",
                        RelationGetRelationName(onerel),
                        tups_vacuumed, vacuumed_pages)));
    /*
     * This is pretty messy, but we split it up so that we can skip emitting
     * individual parts of the message when not applicable.
     * 一起寫日志會非?;靵y,但我們把它拆分了,因此我們可以跳過發(fā)送消息的各個部分.
     */
    initStringInfo(&buf);
    appendStringInfo(&buf,
                     _("%.0f dead row versions cannot be removed yet, oldest xmin: %u\n"),
                     nkeep, OldestXmin);
    appendStringInfo(&buf, _("There were %.0f unused item pointers.\n"),
                     nunused);
    appendStringInfo(&buf, ngettext("Skipped %u page due to buffer pins, ",
                                    "Skipped %u pages due to buffer pins, ",
                                    vacrelstats->pinskipped_pages),
                     vacrelstats->pinskipped_pages);
    appendStringInfo(&buf, ngettext("%u frozen page.\n",
                                    "%u frozen pages.\n",
                                    vacrelstats->frozenskipped_pages),
                     vacrelstats->frozenskipped_pages);
    appendStringInfo(&buf, ngettext("%u page is entirely empty.\n",
                                    "%u pages are entirely empty.\n",
                                    empty_pages),
                     empty_pages);
    appendStringInfo(&buf, _("%s."), pg_rusage_show(&ru0));
    ereport(elevel,
            (errmsg("\"%s\": found %.0f removable, %.0f nonremovable row versions in %u out of %u pages",
                    RelationGetRelationName(onerel),
                    tups_vacuumed, num_tuples,
                    vacrelstats->scanned_pages, nblocks),
             errdetail_internal("%s", buf.data)));
    pfree(buf.data);
}

三、跟蹤分析

測試腳本,執(zhí)行壓力測試的同時,執(zhí)行vacuum


-- session 1
pgbench -c 2 -C -f ./update.sql -j 1 -n -T 600 -U xdb testdb
-- session 2
17:52:59 (xdb@[local]:5432)testdb=# vacuum verbose t1;

啟動gdb,設置斷點


(gdb) b lazy_scan_heap
Breakpoint 1 at 0x6bc38a: file vacuumlazy.c, line 470.
(gdb) c
Continuing.
Breakpoint 1, lazy_scan_heap (onerel=0x7f224a197788, options=5, vacrelstats=0x296d7b8, Irel=0x296d8b0, nindexes=1, 
    aggressive=false) at vacuumlazy.c:470
470     TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;
(gdb)

輸入?yún)?shù)
1-relation


(gdb) p *onerel
$1 = {rd_node = {spcNode = 1663, dbNode = 16402, relNode = 50820}, rd_smgr = 0x2930270, rd_refcnt = 1, rd_backend = -1, 
  rd_islocaltemp = false, rd_isnailed = false, rd_isvalid = true, rd_indexvalid = 1 '\001', rd_statvalid = false, 
  rd_createSubid = 0, rd_newRelfilenodeSubid = 0, rd_rel = 0x7f224a197bb8, rd_att = 0x7f224a0d8050, rd_id = 50820, 
  rd_lockInfo = {lockRelId = {relId = 50820, dbId = 16402}}, rd_rules = 0x0, rd_rulescxt = 0x0, trigdesc = 0x0, 
  rd_rsdesc = 0x0, rd_fkeylist = 0x0, rd_fkeyvalid = false, rd_partkeycxt = 0x0, rd_partkey = 0x0, rd_pdcxt = 0x0, 
  rd_partdesc = 0x0, rd_partcheck = 0x0, rd_indexlist = 0x7f224a198fe8, rd_oidindex = 0, rd_pkindex = 0, 
  rd_replidindex = 0, rd_statlist = 0x0, rd_indexattr = 0x0, rd_projindexattr = 0x0, rd_keyattr = 0x0, rd_pkattr = 0x0, 
  rd_idattr = 0x0, rd_projidx = 0x0, rd_pubactions = 0x0, rd_options = 0x0, rd_index = 0x0, rd_indextuple = 0x0, 
  rd_amhandler = 0, rd_indexcxt = 0x0, rd_amroutine = 0x0, rd_opfamily = 0x0, rd_opcintype = 0x0, rd_support = 0x0, 
  rd_supportinfo = 0x0, rd_indoption = 0x0, rd_indexprs = 0x0, rd_indpred = 0x0, rd_exclops = 0x0, rd_exclprocs = 0x0, 
  rd_exclstrats = 0x0, rd_amcache = 0x0, rd_indcollation = 0x0, rd_fdwroutine = 0x0, rd_toastoid = 0, 
  pgstat_info = 0x2923e50}
(gdb)

2-options=5,即VACOPT_VACUUM | VACOPT_VERBOSE
3-vacrelstats


(gdb) p *vacrelstats
$2 = {hasindex = true, old_rel_pages = 75, rel_pages = 0, scanned_pages = 0, pinskipped_pages = 0, frozenskipped_pages = 0, 
  tupcount_pages = 0, old_live_tuples = 10000, new_rel_tuples = 0, new_live_tuples = 0, new_dead_tuples = 0, 
  pages_removed = 0, tuples_deleted = 0, nonempty_pages = 0, num_dead_tuples = 0, max_dead_tuples = 0, dead_tuples = 0x0, 
  num_index_scans = 0, latestRemovedXid = 0, lock_waiter_detected = false}
(gdb)

4-Irel


(gdb) p *Irel
$3 = (Relation) 0x7f224a198688
(gdb) p **Irel
$4 = {rd_node = {spcNode = 1663, dbNode = 16402, relNode = 50823}, rd_smgr = 0x29302e0, rd_refcnt = 1, rd_backend = -1, 
  rd_islocaltemp = false, rd_isnailed = false, rd_isvalid = true, rd_indexvalid = 0 '\000', rd_statvalid = false, 
  rd_createSubid = 0, rd_newRelfilenodeSubid = 0, rd_rel = 0x7f224a1988a0, rd_att = 0x7f224a1989b8, rd_id = 50823, 
  rd_lockInfo = {lockRelId = {relId = 50823, dbId = 16402}}, rd_rules = 0x0, rd_rulescxt = 0x0, trigdesc = 0x0, 
  rd_rsdesc = 0x0, rd_fkeylist = 0x0, rd_fkeyvalid = false, rd_partkeycxt = 0x0, rd_partkey = 0x0, rd_pdcxt = 0x0, 
  rd_partdesc = 0x0, rd_partcheck = 0x0, rd_indexlist = 0x0, rd_oidindex = 0, rd_pkindex = 0, rd_replidindex = 0, 
  rd_statlist = 0x0, rd_indexattr = 0x0, rd_projindexattr = 0x0, rd_keyattr = 0x0, rd_pkattr = 0x0, rd_idattr = 0x0, 
  rd_projidx = 0x0, rd_pubactions = 0x0, rd_options = 0x0, rd_index = 0x7f224a198d58, rd_indextuple = 0x7f224a198d20, 
  rd_amhandler = 330, rd_indexcxt = 0x28cb340, rd_amroutine = 0x28cb480, rd_opfamily = 0x28cb598, rd_opcintype = 0x28cb5b8, 
  rd_support = 0x28cb5d8, rd_supportinfo = 0x28cb600, rd_indoption = 0x28cb738, rd_indexprs = 0x0, rd_indpred = 0x0, 
  rd_exclops = 0x0, rd_exclprocs = 0x0, rd_exclstrats = 0x0, rd_amcache = 0x0, rd_indcollation = 0x28cb718, 
  rd_fdwroutine = 0x0, rd_toastoid = 0, pgstat_info = 0x2923ec8}
(gdb)

5-nindexes=1,存在一個索引
6-aggressive=false,無需執(zhí)行全表掃描
下面開始初始化相關變量


(gdb) n
471     TransactionId relminmxid = onerel->rd_rel->relminmxid;
(gdb) 
483     Buffer      vmbuffer = InvalidBuffer;
(gdb) 
488     const int   initprog_index[] = {
(gdb) 
495     pg_rusage_init(&ru0);
(gdb) 
497     relname = RelationGetRelationName(onerel);
(gdb) 
498     if (aggressive)
(gdb) 
504         ereport(elevel,
(gdb) 
509     empty_pages = vacuumed_pages = 0;
(gdb) 
510     next_fsm_block_to_vacuum = (BlockNumber) 0;
(gdb) 
511     num_tuples = live_tuples = tups_vacuumed = nkeep = nunused = 0;
(gdb) 
514         palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
(gdb) 
513     indstats = (IndexBulkDeleteResult **)
(gdb) 
516     nblocks = RelationGetNumberOfBlocks(onerel);
(gdb) p relminmxid
$5 = 1
(gdb) p ru0
$6 = {tv = {tv_sec = 1548669429, tv_usec = 578779}, ru = {ru_utime = {tv_sec = 0, tv_usec = 29531}, ru_stime = {tv_sec = 0, 
      tv_usec = 51407}, {ru_maxrss = 7488, __ru_maxrss_word = 7488}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {ru_idrss = 0, 
      __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0}, {ru_minflt = 1819, __ru_minflt_word = 1819}, {
      ru_majflt = 0, __ru_majflt_word = 0}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 2664, 
      __ru_inblock_word = 2664}, {ru_oublock = 328, __ru_oublock_word = 328}, {ru_msgsnd = 0, __ru_msgsnd_word = 0}, {
      ru_msgrcv = 0, __ru_msgrcv_word = 0}, {ru_nsignals = 0, __ru_nsignals_word = 0}, {ru_nvcsw = 70, 
      __ru_nvcsw_word = 70}, {ru_nivcsw = 3, __ru_nivcsw_word = 3}}}
(gdb) p relname
$7 = 0x7f224a197bb8 "t1"
(gdb)

獲取總塊數(shù)


(gdb) n
517     vacrelstats->rel_pages = nblocks;
(gdb) p nblocks
$8 = 75
(gdb)

初始化統(tǒng)計信息和相關數(shù)組


(gdb) n
518     vacrelstats->scanned_pages = 0;
(gdb) 
519     vacrelstats->tupcount_pages = 0;
(gdb) 
520     vacrelstats->nonempty_pages = 0;
(gdb) 
521     vacrelstats->latestRemovedXid = InvalidTransactionId;
(gdb) 
523     lazy_space_alloc(vacrelstats, nblocks);
(gdb) 
524     frozen = palloc(sizeof(xl_heap_freeze_tuple) * MaxHeapTuplesPerPage);
(gdb) 
527     initprog_val[0] = PROGRESS_VACUUM_PHASE_SCAN_HEAP;
(gdb) 
528     initprog_val[1] = nblocks;
(gdb) 
529     initprog_val[2] = vacrelstats->max_dead_tuples;
(gdb) 
530     pgstat_progress_update_multi_param(3, initprog_index, initprog_val);
(gdb) p *vacrelstats
$9 = {hasindex = true, old_rel_pages = 75, rel_pages = 75, scanned_pages = 0, pinskipped_pages = 0, 
  frozenskipped_pages = 0, tupcount_pages = 0, old_live_tuples = 10000, new_rel_tuples = 0, new_live_tuples = 0, 
  new_dead_tuples = 0, pages_removed = 0, tuples_deleted = 0, nonempty_pages = 0, num_dead_tuples = 0, 
  max_dead_tuples = 21825, dead_tuples = 0x297e820, num_index_scans = 0, latestRemovedXid = 0, lock_waiter_detected = false}
(gdb)

計算下一個不能跳過的block
第0個塊也不能跳過(0 < 32),設置標記skipping_blocks為F


(gdb) n
576     next_unskippable_block = 0;
(gdb) 
577     if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
(gdb) 
579         while (next_unskippable_block < nblocks)
(gdb) 
583             vmstatus = visibilitymap_get_status(onerel, next_unskippable_block,
(gdb) 
585             if (aggressive)
(gdb) p vmstatus
$10 = 0 '\000'
(gdb) n
592                 if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
(gdb) 
593                     break;
(gdb) 
600     if (next_unskippable_block >= SKIP_PAGES_THRESHOLD)
(gdb) p next_unskippable_block
$11 = 0
(gdb) p SKIP_PAGES_THRESHOLD
$12 = 32
(gdb) n
603         skipping_blocks = false;
(gdb)

開始遍歷每個block
初始化相關變量


(gdb) 
605     for (blkno = 0; blkno < nblocks; blkno++)
(gdb) 
616         bool        all_visible_according_to_vm = false;
(gdb) 
618         bool        all_frozen = true;  /* provided all_visible is also true */
(gdb) 
620         TransactionId visibility_cutoff_xid = InvalidTransactionId;
(gdb) 
626         pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
(gdb) 
628         if (blkno == next_unskippable_block)
(gdb)

blkno == next_unskippable_block,獲取下一個不可跳過的block


(gdb) p blkno
$13 = 0
(gdb) p next_unskippable_block
$14 = 0
(gdb) n
631             next_unskippable_block++;
(gdb) 
632             if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
(gdb) 
634                 while (next_unskippable_block < nblocks)
(gdb) 
638                     vmskipflags = visibilitymap_get_status(onerel,
(gdb) 
641                     if (aggressive)
(gdb) p vmskipflags
$15 = 0 '\000'
(gdb) n
648                         if ((vmskipflags & VISIBILITYMAP_ALL_VISIBLE) == 0)
(gdb) 
649                             break;
(gdb) 
660             if (next_unskippable_block - blkno > SKIP_PAGES_THRESHOLD)
(gdb) p next_unskippable_block
$16 = 1
(gdb) n 
1047                        if (onerel->rd_rel->relhasoids &&
(gdb) 
1132                if (tupgone)
(gdb)

tupgone為F,判斷是否需要凍結(F)
獲取偏移,遍歷元組


(gdb) p tupgone
$17 = false
(gdb) n
1144                    num_tuples += 1;
(gdb) 
1145                    hastup = true;
(gdb) 
1151                    if (heap_prepare_freeze_tuple(tuple.t_data,
(gdb) 
1154                                                  &frozen[nfrozen],
(gdb) p nfrozen
$18 = 0
(gdb) n
1151                    if (heap_prepare_freeze_tuple(tuple.t_data,
(gdb) 
1158                    if (!tuple_totally_frozen)
(gdb) 
1159                        all_frozen = false;
(gdb) 
958              offnum = OffsetNumberNext(offnum))
(gdb) 
956         for (offnum = FirstOffsetNumber;
(gdb)

該元組正常


(gdb) p offnum
$19 = 3
(gdb) n
962             itemid = PageGetItemId(page, offnum);
(gdb) 
965             if (!ItemIdIsUsed(itemid))
(gdb) 
972             if (ItemIdIsRedirected(itemid))
(gdb) 
978             ItemPointerSet(&(tuple.t_self), blkno, offnum);
(gdb) 
986             if (ItemIdIsDead(itemid))
(gdb) 
993             Assert(ItemIdIsNormal(itemid));
(gdb) 
995             tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
(gdb) 
996             tuple.t_len = ItemIdGetLength(itemid);
(gdb) 
997             tuple.t_tableOid = RelationGetRelid(onerel);
(gdb) 
999             tupgone = false;
(gdb)

調用HeapTupleSatisfiesVacuum確定元組狀態(tài),主要目的是一個元組是否可能對所有正在運行中的事務可見
該元組是Live tuple


1012                switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
(gdb) 
(gdb) n
1047                        if (onerel->rd_rel->relhasoids &&
(gdb) n
1056                        live_tuples += 1;
(gdb) 
1067                        if (all_visible)
(gdb) p all_visible
$20 = false

跳出循環(huán)


(gdb) b vacuumlazy.c:1168
Breakpoint 2 at 0x6bd4e7: file vacuumlazy.c, line 1168.
(gdb) c
Continuing.
Breakpoint 2, lazy_scan_heap (onerel=0x7f224a197788, options=5, vacrelstats=0x296d7b8, Irel=0x296d8b0, nindexes=1, 
    aggressive=false) at vacuumlazy.c:1168
1168            if (nfrozen > 0)
(gdb)

更新統(tǒng)計信息


(gdb) n
1203            if (nindexes == 0 &&
(gdb) p nfrozen
$23 = 0
(gdb) n
1232            freespace = PageGetHeapFreeSpace(page);
(gdb) 
1235            if (all_visible && !all_visible_according_to_vm)
(gdb) 
1268            else if (all_visible_according_to_vm && !PageIsAllVisible(page)
(gdb) 
1290            else if (PageIsAllVisible(page) && has_dead_tuples)
(gdb) 
1305            else if (all_visible_according_to_vm && all_visible && all_frozen &&
(gdb) 
1318            UnlockReleaseBuffer(buf);
(gdb) 
1321            if (hastup)
(gdb) 
1322                vacrelstats->nonempty_pages = blkno + 1;
(gdb) p hastup
$24 = true
(gdb) n
1331            if (vacrelstats->num_dead_tuples == prev_dead_count)
(gdb) 
1332                RecordPageWithFreeSpace(onerel, blkno, freespace);

繼續(xù)下一個block


(gdb) 
605     for (blkno = 0; blkno < nblocks; blkno++)
(gdb) p blkno
$25 = 0
(gdb) n
616         bool        all_visible_according_to_vm = false;
(gdb) p blkno
$26 = 1
(gdb)

判斷(vacrelstats->max_dead_tuples - vacrelstats->num_dead_tuples) < MaxHeapTuplesPerPage && vacrelstats->num_dead_tuples > 0,不滿足,繼續(xù)執(zhí)行


...
(gdb) 
701         vacuum_delay_point();
(gdb) 
707         if ((vacrelstats->max_dead_tuples - vacrelstats->num_dead_tuples) < MaxHeapTuplesPerPage &&
(gdb) p vacrelstats->max_dead_tuples
$27 = 21825
(gdb) p vacrelstats->num_dead_tuples
$28 = 0
(gdb) p MaxHeapTuplesPerPage
No symbol "__builtin_offsetof" in current context.
(gdb)

以擴展方式讀取buffer


(gdb) n
783         visibilitymap_pin(onerel, blkno, &vmbuffer);
(gdb) 
785         buf = ReadBufferExtended(onerel, MAIN_FORKNUM, blkno,
(gdb) 
789         if (!ConditionalLockBufferForCleanup(buf))
(gdb)

取buffer cleanup lock,成功!
調用heap_page_prune清理該page中的所有HOT-update鏈


(gdb) n
847         vacrelstats->scanned_pages++;
(gdb) 
848         vacrelstats->tupcount_pages++;
(gdb) 
850         page = BufferGetPage(buf);
(gdb) 
852         if (PageIsNew(page))
(gdb) 
894         if (PageIsEmpty(page))
(gdb) 
938         tups_vacuumed += heap_page_prune(onerel, buf, OldestXmin, false,
(gdb) 
945         all_visible = true;
(gdb)

遍歷page中的行指針


956         for (offnum = FirstOffsetNumber;
(gdb) p maxoff
$29 = 291
(gdb) 
$30 = 291
(gdb) n
962             itemid = PageGetItemId(page, offnum);
(gdb) n
965             if (!ItemIdIsUsed(itemid))
(gdb) 
972             if (ItemIdIsRedirected(itemid))
(gdb) 
978             ItemPointerSet(&(tuple.t_self), blkno, offnum);
(gdb) 
986             if (ItemIdIsDead(itemid))
(gdb) 
993             Assert(ItemIdIsNormal(itemid));
(gdb) 
995             tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
(gdb) 
996             tuple.t_len = ItemIdGetLength(itemid);
(gdb) 
997             tuple.t_tableOid = RelationGetRelid(onerel);
(gdb) 
999             tupgone = false;
(gdb) 
1012                switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))
(gdb) 
1099                        nkeep += 1;
(gdb) 
1100                        all_visible = false;
(gdb) 
1101                        break;
(gdb) 
1132                if (tupgone)
(gdb) 
1144                    num_tuples += 1;

跳出循環(huán)


(gdb) c
Continuing.
Breakpoint 2, lazy_scan_heap (onerel=0x7f224a197788, options=5, vacrelstats=0x296d7b8, Irel=0x296d8b0, nindexes=1, 
    aggressive=false) at vacuumlazy.c:1168
1168            if (nfrozen > 0)
(gdb)

DONE!

四、參考資料

PG Source Code

向AI問一下細節(jié)

免責聲明:本站發(fā)布的內容(圖片、視頻和文字)以原創(chuàng)、轉載和分享為主,文章觀點不代表本網(wǎng)站立場,如果涉及侵權請聯(lián)系站長郵箱:is@yisu.com進行舉報,并提供相關證據(jù),一經(jīng)查實,將立刻刪除涉嫌侵權內容。

AI