溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊(cè)×
其他方式登錄
點(diǎn)擊 登錄注冊(cè) 即表示同意《億速云用戶服務(wù)條款》

PostgreSQL 源碼解讀(135)- MVCC#19(vacuum過(guò)程-heap_execute_freeze_tuple函數(shù))

發(fā)布時(shí)間:2020-08-14 04:08:09 來(lái)源:ITPUB博客 閱讀:286 作者:husthxd 欄目:關(guān)系型數(shù)據(jù)庫(kù)

本節(jié)簡(jiǎn)單介紹了PostgreSQL手工執(zhí)行vacuum的處理流程,主要分析了ExecVacuum->vacuum->vacuum_rel->heap_vacuum_rel->lazy_scan_heap->heap_execute_freeze_tuple函數(shù)的實(shí)現(xiàn)邏輯,該函數(shù)執(zhí)行實(shí)際的元組凍結(jié)操作(先前已完成準(zhǔn)備工作)。

一、數(shù)據(jù)結(jié)構(gòu)

宏定義
Vacuum和Analyze命令選項(xiàng)


/* ----------------------
 *      Vacuum and Analyze Statements
 *      Vacuum和Analyze命令選項(xiàng)
 * 
 * Even though these are nominally two statements, it's convenient to use
 * just one node type for both.  Note that at least one of VACOPT_VACUUM
 * and VACOPT_ANALYZE must be set in options.
 * 雖然在這里有兩種不同的語(yǔ)句,但只需要使用統(tǒng)一的Node類型即可.
 * 注意至少VACOPT_VACUUM/VACOPT_ANALYZE在選項(xiàng)中設(shè)置.
 * ----------------------
 */
typedef enum VacuumOption
{
    VACOPT_VACUUM = 1 << 0,     /* do VACUUM */
    VACOPT_ANALYZE = 1 << 1,    /* do ANALYZE */
    VACOPT_VERBOSE = 1 << 2,    /* print progress info */
    VACOPT_FREEZE = 1 << 3,     /* FREEZE option */
    VACOPT_FULL = 1 << 4,       /* FULL (non-concurrent) vacuum */
    VACOPT_SKIP_LOCKED = 1 << 5,    /* skip if cannot get lock */
    VACOPT_SKIPTOAST = 1 << 6,  /* don't process the TOAST table, if any */
    VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7   /* don't skip any pages */
} VacuumOption;

HeapTupleHeaderData
堆元組頭部.為了避免浪費(fèi)空間,字段通過(guò)這么一種方式進(jìn)行布局避免不必要的對(duì)齊填充.


/*
 * Heap tuple header.  To avoid wasting space, the fields should be
 * laid out in such a way as to avoid structure padding.
 * 堆元組頭部.為了避免浪費(fèi)空間,字段通過(guò)這么一種方式進(jìn)行布局避免結(jié)構(gòu)體不必要的填充.
 *
 * Datums of composite types (row types) share the same general structure
 * as on-disk tuples, so that the same routines can be used to build and
 * examine them.  However the requirements are slightly different: a Datum
 * does not need any transaction visibility information, and it does need
 * a length word and some embedded type information.  We can achieve this
 * by overlaying the xmin/cmin/xmax/cmax/xvac fields of a heap tuple
 * with the fields needed in the Datum case.  Typically, all tuples built
 * in-memory will be initialized with the Datum fields; but when a tuple is
 * about to be inserted in a table, the transaction fields will be filled,
 * overwriting the datum fields.
 * 組合類型(行類型)的Datums與磁盤(pán)上的元組共享相同的常規(guī)結(jié)構(gòu)體,
 *   因此可以使用相同的處理過(guò)程來(lái)構(gòu)造和檢查這些信息.
 * 但是,需求可能很不一樣:Datum不需要任何事物可見(jiàn)性相關(guān)的信息,但確實(shí)需要長(zhǎng)度字和一些嵌入的類型信息.
 * 在Datum這種情況下,我們可以通過(guò)使用堆元組中的xmin/cmin/xmax/cmax/xvac字段疊加來(lái)獲取這些信息.
 * 典型的,在內(nèi)存中構(gòu)造的所有元組會(huì)通過(guò)Datum字段初始化,但在元組將要插入到表時(shí),事務(wù)字段會(huì)被填充,覆寫(xiě)Datum字段.
 *
 * The overall structure of a heap tuple looks like:
 *          fixed fields (HeapTupleHeaderData struct)
 *          nulls bitmap (if HEAP_HASNULL is set in t_infomask)
 *          alignment padding (as needed to make user data MAXALIGN'd)
 *          object ID (if HEAP_HASOID_OLD is set in t_infomask, not created
 *          anymore)
 *          user data fields
 * 堆元組的整體結(jié)構(gòu)看起來(lái)是這樣的:
 *          固定字段(HeapTupleHeaderData結(jié)構(gòu)體)
 *          nulls位圖(如在t_infomask中設(shè)置了HEAP_HASNULL標(biāo)記位)
 *          對(duì)齊填充(如MAXALIGN)
 *          對(duì)象ID(如t_infomask設(shè)置了HEAP_HASOID_OLD標(biāo)記位,則沒(méi)有創(chuàng)建)
 *          用戶數(shù)據(jù)字段
 *
 * We store five "virtual" fields Xmin, Cmin, Xmax, Cmax, and Xvac in three
 * physical fields.  Xmin and Xmax are always really stored, but Cmin, Cmax
 * and Xvac share a field.  This works because we know that Cmin and Cmax
 * are only interesting for the lifetime of the inserting and deleting
 * transaction respectively.  If a tuple is inserted and deleted in the same
 * transaction, we store a "combo" command id that can be mapped to the real
 * cmin and cmax, but only by use of local state within the originating
 * backend.  See combocid.c for more details.  Meanwhile, Xvac is only set by
 * old-style VACUUM FULL, which does not have any command sub-structure and so
 * does not need either Cmin or Cmax.  (This requires that old-style VACUUM
 * FULL never try to move a tuple whose Cmin or Cmax is still interesting,
 * ie, an insert-in-progress or delete-in-progress tuple.)
 * 在三個(gè)物理字段中存儲(chǔ)了5個(gè)"虛擬"字段,分別是Xmin, Cmin, Xmax, Cmax, and Xvac.
 * Xmin和Xmax通常是實(shí)際存儲(chǔ)的,但Cmin,Cmax和Xvac共享一個(gè)字段.
 * 這樣之所以可行是因?yàn)槲覀冎繡min和Cmax只在相應(yīng)的插入和刪除事務(wù)生命周期時(shí)才會(huì)有用.
 * 如果元組在同一個(gè)事務(wù)中插入和刪除,則存儲(chǔ)一個(gè)"combo"命令I(lǐng)D,該ID可以映射到實(shí)際的cmin和cmax,
 *   但只有在原始后臺(tái)進(jìn)程中使用本地狀態(tài)時(shí)才使用.
 * 同時(shí),Xvac在老版本的VACUUM FULL時(shí)才會(huì)設(shè)置,該命令不存在命令子結(jié)構(gòu)因此不需要Cmin和Cmax.
 * (這需要老版本的VACUUM FULL永遠(yuǎn)不要嘗試移動(dòng)Cmin和Cmax仍有用的元組,比如在插入或刪除元組期間).
 *
 * A word about t_ctid: whenever a new tuple is stored on disk, its t_ctid
 * is initialized with its own TID (location).  If the tuple is ever updated,
 * its t_ctid is changed to point to the replacement version of the tuple.  Or
 * if the tuple is moved from one partition to another, due to an update of
 * the partition key, t_ctid is set to a special value to indicate that
 * (see ItemPointerSetMovedPartitions).  Thus, a tuple is the latest version
 * of its row iff XMAX is invalid or
 * t_ctid points to itself (in which case, if XMAX is valid, the tuple is
 * either locked or deleted).  One can follow the chain of t_ctid links
 * to find the newest version of the row, unless it was moved to a different
 * partition.  Beware however that VACUUM might
 * erase the pointed-to (newer) tuple before erasing the pointing (older)
 * tuple.  Hence, when following a t_ctid link, it is necessary to check
 * to see if the referenced slot is empty or contains an unrelated tuple.
 * Check that the referenced tuple has XMIN equal to the referencing tuple's
 * XMAX to verify that it is actually the descendant version and not an
 * unrelated tuple stored into a slot recently freed by VACUUM.  If either
 * check fails, one may assume that there is no live descendant version.
 * 關(guān)于c_ctid要說(shuō)的:不管什么時(shí)候元組存儲(chǔ)到磁盤(pán)上,元組的t_ctid使用自己的TID(位置)進(jìn)行初始化.
 * 如果元組曾經(jīng)修改過(guò),那么t_ctid修改為指向元組的新版本上.
 * 或者,如果元組從一個(gè)分區(qū)移動(dòng)到另外一個(gè)分區(qū),由于分區(qū)鍵的修改,
 *   t_ctid會(huì)設(shè)置為一個(gè)特別的值用以表示這種情況(詳細(xì)查看ItemPointerSetMovedPartitions).
 * 因此,在XMAX是無(wú)需或者t_ctid指向自己的時(shí)候,元組是最后的版本
 * (在這種情況下,如果XMAX是有效的,元組要么被鎖定要么已被刪除)
 *
 * t_ctid is sometimes used to store a speculative insertion token, instead
 * of a real TID.  A speculative token is set on a tuple that's being
 * inserted, until the inserter is sure that it wants to go ahead with the
 * insertion.  Hence a token should only be seen on a tuple with an XMAX
 * that's still in-progress, or invalid/aborted.  The token is replaced with
 * the tuple's real TID when the insertion is confirmed.  One should never
 * see a speculative insertion token while following a chain of t_ctid links,
 * because they are not used on updates, only insertions.
 * t_ctid有時(shí)候用于存儲(chǔ) speculative insertion token而不是一個(gè)實(shí)際的TID.
 * 在正在插入的元組上設(shè)置speculative token,直至插入程序確定繼續(xù)插入.
 * 因此token在XMAX事務(wù)正在處理或者無(wú)效/回滾時(shí)可以查看.
 * token在插入確認(rèn)后被替換成實(shí)際的TID.
 * 在跟蹤t_ctid鏈接鏈時(shí),不應(yīng)該看到speculative insertion token,
 *   因?yàn)樗鼈儾挥糜诟?,只用于插入? *
 * Following the fixed header fields, the nulls bitmap is stored (beginning
 * at t_bits).  The bitmap is *not* stored if t_infomask shows that there
 * are no nulls in the tuple.  If an OID field is present (as indicated by
 * t_infomask), then it is stored just before the user data, which begins at
 * the offset shown by t_hoff.  Note that t_hoff must be a multiple of
 * MAXALIGN.
 * 在固定的頭部字段后是nulls位圖(以t_bits開(kāi)始).
 * 如t_infomask標(biāo)記提示沒(méi)有空值,則不存才nulls位圖.
 * 如果OID字段是現(xiàn)成的(通過(guò)t_infomask指示),那么在用戶數(shù)據(jù)前存儲(chǔ),用戶數(shù)據(jù)從t_hoff所示的偏移量開(kāi)始。
 * 注意t_hoff必須是MAXALIGN的倍數(shù).
 */
typedef struct HeapTupleFields
{
    TransactionId t_xmin;       /* 插入事務(wù)ID;inserting xact ID */
    TransactionId t_xmax;       /* 刪除或鎖定事務(wù)ID;deleting or locking xact ID */
    union
    {
        CommandId   t_cid;      /* 插入或刪除命令I(lǐng)D或者combo命令;inserting or deleting command ID, or both */
        TransactionId t_xvac;   /* old-style VACUUM FULL xact ID */
    }           t_field3;//聯(lián)合體
} HeapTupleFields;//頭部字段
typedef struct DatumTupleFields
{
    int32       datum_len_;     /* 可變長(zhǎng)頭部(不能夠直接接觸);varlena header (do not touch directly!) */
    int32       datum_typmod;   /* -1或者是記錄類型標(biāo)識(shí)符;-1, or identifier of a record type */
    Oid         datum_typeid;   /* 組合類型OID或者RECORDOID;composite type OID, or RECORDOID */
    /*
     * datum_typeid cannot be a domain over composite, only plain composite,
     * even if the datum is meant as a value of a domain-over-composite type.
     * This is in line with the general principle that CoerceToDomain does not
     * change the physical representation of the base type value.
     * 即使datum是domain-over-composite類型,datum_typeid也不能是域組合只能是平面組合.
     * 這與一般原則相一致,即CoerceToDomain不改變基類型值的物理表示形式。
     * 
     * Note: field ordering is chosen with thought that Oid might someday
     * widen to 64 bits.
     * 注意:字段排序的選擇考慮到Oid可能有一天會(huì)擴(kuò)展到64位。
     */
} DatumTupleFields;
struct HeapTupleHeaderData
{
    union
    {
        HeapTupleFields t_heap;
        DatumTupleFields t_datum;
    }           t_choice;
    ItemPointerData t_ctid;     /* current TID of this or newer tuple (or a
                                 * speculative insertion token) */
    /* Fields below here must match MinimalTupleData! */
#define FIELDNO_HEAPTUPLEHEADERDATA_INFOMASK2 2
    uint16      t_infomask2;    /* number of attributes + various flags */
#define FIELDNO_HEAPTUPLEHEADERDATA_INFOMASK 3
    uint16      t_infomask;     /* various flag bits, see below */
#define FIELDNO_HEAPTUPLEHEADERDATA_HOFF 4
    uint8       t_hoff;         /* sizeof header incl. bitmap, padding */
    /* ^ - 23 bytes - ^ */
#define FIELDNO_HEAPTUPLEHEADERDATA_BITS 5
    bits8       t_bits[FLEXIBLE_ARRAY_MEMBER];  /* bitmap of NULLs */
    /* MORE DATA FOLLOWS AT END OF STRUCT */
};
typedef HeapTupleHeaderData* HeapTupleHeader;
/*
結(jié)構(gòu)體展開(kāi),詳見(jiàn)下表:
Field           Type            Length  Offset  Description
t_xmin          TransactionId   4 bytes 0       insert XID stamp
t_xmax          TransactionId   4 bytes 4       delete XID stamp
t_cid           CommandId       4 bytes 8       insert and/or delete CID stamp (overlays with t_xvac)
t_xvac          TransactionId   4 bytes 8       XID for VACUUM operation moving a row version
t_ctid          ItemPointerData 6 bytes 12      current TID of this or newer row version
t_infomask2     uint16          2 bytes 18      number of attributes, plus various flag bits
t_infomask      uint16          2 bytes 20      various flag bits
t_hoff          uint8           1 byte  22      offset to user data
//注意:t_cid和t_xvac為聯(lián)合體,共用存儲(chǔ)空間
*/
//t_infomask=\x0802,十進(jìn)制值為2050,二進(jìn)制值為100000000010
//t_infomask說(shuō)明
               1 #define HEAP_HASNULL            0x0001  /* has null attribute(s) */
              10 #define HEAP_HASVARWIDTH        0x0002  /* has variable-width attribute(s) */
             100 #define HEAP_HASEXTERNAL        0x0004  /* has external stored attribute(s) */
            1000 #define HEAP_HASOID             0x0008  /* has an object-id field */
           10000 #define HEAP_XMAX_KEYSHR_LOCK   0x0010  /* xmax is a key-shared locker */
          100000 #define HEAP_COMBOCID           0x0020  /* t_cid is a combo cid */
         1000000 #define HEAP_XMAX_EXCL_LOCK     0x0040  /* xmax is exclusive locker */
        10000000 #define HEAP_XMAX_LOCK_ONLY     0x0080  /* xmax, if valid, is only a locker */
                    /* xmax is a shared locker */
                 #define HEAP_XMAX_SHR_LOCK  (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_KEYSHR_LOCK)
                 #define HEAP_LOCK_MASK  (HEAP_XMAX_SHR_LOCK | HEAP_XMAX_EXCL_LOCK | \
                          HEAP_XMAX_KEYSHR_LOCK)
       100000000 #define HEAP_XMIN_COMMITTED     0x0100  /* t_xmin committed */
      1000000000 #define HEAP_XMIN_INVALID       0x0200  /* t_xmin invalid/aborted */
                 #define HEAP_XMIN_FROZEN        (HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)
     10000000000 #define HEAP_XMAX_COMMITTED     0x0400  /* t_xmax committed */
    100000000000 #define HEAP_XMAX_INVALID       0x0800  /* t_xmax invalid/aborted */
   1000000000000 #define HEAP_XMAX_IS_MULTI      0x1000  /* t_xmax is a MultiXactId */
  10000000000000 #define HEAP_UPDATED            0x2000  /* this is UPDATEd version of row */
 100000000000000 #define HEAP_MOVED_OFF          0x4000  /* moved to another place by pre-9.0
                                         * VACUUM FULL; kept for binary
                                         * upgrade support */
1000000000000000 #define HEAP_MOVED_IN           0x8000  /* moved from another place by pre-9.0
                                         * VACUUM FULL; kept for binary
                                         * upgrade support */
                 #define HEAP_MOVED (HEAP_MOVED_OFF | HEAP_MOVED_IN)
1111111111110000 #define HEAP_XACT_MASK          0xFFF0  /* visibility-related bits */
//\x0802,二進(jìn)制100000000010表示第2位和第12位為1,
//意味著存在可變長(zhǎng)屬性(HEAP_HASVARWIDTH),XMAX無(wú)效(HEAP_XMAX_INVALID)
/*
 * information stored in t_infomask2:
 */
#define HEAP_NATTS_MASK 0x07FF /* 11 bits for number of attributes */
/* bits 0x1800 are available */
#define HEAP_KEYS_UPDATED 0x2000 /* tuple was updated and key cols
 * modified, or tuple deleted */
#define HEAP_HOT_UPDATED 0x4000 /* tuple was HOT-updated */
#define HEAP_ONLY_TUPLE 0x8000 /* this is heap-only tuple */
#define HEAP2_XACT_MASK 0xE000 /* visibility-related bits */
//把十六進(jìn)制值轉(zhuǎn)換為二進(jìn)制顯示
     11111111111 #define HEAP_NATTS_MASK         0x07FF 
  10000000000000 #define HEAP_KEYS_UPDATED       0x2000  
 100000000000000 #define HEAP_HOT_UPDATED        0x4000  
1000000000000000 #define HEAP_ONLY_TUPLE         0x8000  
1110000000000000 #define HEAP2_XACT_MASK         0xE000 
1111111111111110 #define SpecTokenOffsetNumber       0xfffe
//前(低)11位為屬性的個(gè)數(shù),3意味著有3個(gè)屬性(字段)

xl_heap_freeze_tuple
xl_heap_freeze_tuple表示’freeze plan’,用于存儲(chǔ)在vacuum期間凍結(jié)tuple所需要的信息.


/*
 * This struct represents a 'freeze plan', which is what we need to know about
 * a single tuple being frozen during vacuum.
 * 該結(jié)構(gòu)表示'freeze plan',用于存儲(chǔ)在vacuum期間凍結(jié)tuple所需要的信息
 */
/* 0x01 was XLH_FREEZE_XMIN */
#define     XLH_FREEZE_XVAC     0x02
#define     XLH_INVALID_XVAC    0x04
typedef struct xl_heap_freeze_tuple
{
    TransactionId xmax;
    OffsetNumber offset;
    uint16      t_infomask2;
    uint16      t_infomask;
    uint8       frzflags;
} xl_heap_freeze_tuple;

二、源碼解讀

heap_execute_freeze_tuple執(zhí)行實(shí)際的元組凍結(jié)操作(先前已完成準(zhǔn)備工作),邏輯很簡(jiǎn)單,設(shè)置xmax和凍結(jié)事務(wù)號(hào).


/*
 * heap_execute_freeze_tuple
 *      Execute the prepared freezing of a tuple.
 *      執(zhí)行實(shí)際的元組凍結(jié)操作(先前已完成準(zhǔn)備工作)
 * 
 * Caller is responsible for ensuring that no other backend can access the
 * storage underlying this tuple, either by holding an exclusive lock on the
 * buffer containing it (which is what lazy VACUUM does), or by having it be
 * in private storage (which is what CLUSTER and friends do).
 * 調(diào)用者有責(zé)任確保沒(méi)有其他后臺(tái)進(jìn)程可以訪問(wèn)該元組所在的存儲(chǔ)空間,
 *   通過(guò)持有該元組所在的buffer獨(dú)占鎖(lazy VACUUM所做的事情),
 *   或者在私有存儲(chǔ)空間中存儲(chǔ)(CLUSTER和友元的處理方式)
 *
 * Note: it might seem we could make the changes without exclusive lock, since
 * TransactionId read/write is assumed atomic anyway.  However there is a race
 * condition: someone who just fetched an old XID that we overwrite here could
 * conceivably not finish checking the XID against pg_xact before we finish
 * the VACUUM and perhaps truncate off the part of pg_xact he needs.  Getting
 * exclusive lock ensures no other backend is in process of checking the
 * tuple status.  Also, getting exclusive lock makes it safe to adjust the
 * infomask bits.
 * 注意:看起來(lái)我們可以不需要獨(dú)占鎖就可以進(jìn)行修改,因?yàn)門(mén)ransactionId R/W假定是原子操作.
 * 但是,這里有條件爭(zhēng)用:某些進(jìn)程剛剛提取了一個(gè)舊的XID,而該XID已被覆蓋,
 *   這時(shí)候會(huì)出現(xiàn)在完成VACUUM之前還沒(méi)有完成pg_xact之上的XID檢查,
 *   并且可能會(huì)出現(xiàn)截?cái)嗔藀g_xact所需要的部分內(nèi)容.
 * 獲取獨(dú)占鎖可以確保沒(méi)有其他后臺(tái)進(jìn)程正在檢查元組狀態(tài).
 * 同時(shí),獲取獨(dú)占鎖可以安全的調(diào)整infomask標(biāo)記位.
 *
 * NB: All code in here must be safe to execute during crash recovery!
 * 注意:這里的所有代碼必須在崩潰恢復(fù)期間可以安全的執(zhí)行.
 */
void
heap_execute_freeze_tuple(HeapTupleHeader tuple, xl_heap_freeze_tuple *frz)
{
    HeapTupleHeaderSetXmax(tuple, frz->xmax);
    if (frz->frzflags & XLH_FREEZE_XVAC)
        HeapTupleHeaderSetXvac(tuple, FrozenTransactionId);
    if (frz->frzflags & XLH_INVALID_XVAC)
        HeapTupleHeaderSetXvac(tuple, InvalidTransactionId);
    tuple->t_infomask = frz->t_infomask;
    tuple->t_infomask2 = frz->t_infomask2;
}
//設(shè)置元組的xmax值
#define HeapTupleHeaderSetXmax(tup, xid) \
( \
    (tup)->t_choice.t_heap.t_xmax = (xid) \
)
//設(shè)置
#define HeapTupleHeaderSetXvac(tup, xid) \
do { \
    Assert((tup)->t_infomask & HEAP_MOVED); \
    (tup)->t_choice.t_heap.t_field3.t_xvac = (xid); \
} while (0)

三、跟蹤分析

N/A

四、參考資料

PG Source Code

向AI問(wèn)一下細(xì)節(jié)

免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。

AI