您好,登錄后才能下訂單哦!
本節(jié)簡(jiǎn)單介紹了PostgreSQL手工執(zhí)行vacuum的處理流程,主要分析了ExecVacuum->vacuum->vacuum_rel->heap_vacuum_rel->lazy_scan_heap->heap_execute_freeze_tuple函數(shù)的實(shí)現(xiàn)邏輯,該函數(shù)執(zhí)行實(shí)際的元組凍結(jié)操作(先前已完成準(zhǔn)備工作)。
宏定義
Vacuum和Analyze命令選項(xiàng)
/* ----------------------
* Vacuum and Analyze Statements
* Vacuum和Analyze命令選項(xiàng)
*
* Even though these are nominally two statements, it's convenient to use
* just one node type for both. Note that at least one of VACOPT_VACUUM
* and VACOPT_ANALYZE must be set in options.
* 雖然在這里有兩種不同的語(yǔ)句,但只需要使用統(tǒng)一的Node類型即可.
* 注意至少VACOPT_VACUUM/VACOPT_ANALYZE在選項(xiàng)中設(shè)置.
* ----------------------
*/
typedef enum VacuumOption
{
VACOPT_VACUUM = 1 << 0, /* do VACUUM */
VACOPT_ANALYZE = 1 << 1, /* do ANALYZE */
VACOPT_VERBOSE = 1 << 2, /* print progress info */
VACOPT_FREEZE = 1 << 3, /* FREEZE option */
VACOPT_FULL = 1 << 4, /* FULL (non-concurrent) vacuum */
VACOPT_SKIP_LOCKED = 1 << 5, /* skip if cannot get lock */
VACOPT_SKIPTOAST = 1 << 6, /* don't process the TOAST table, if any */
VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7 /* don't skip any pages */
} VacuumOption;
HeapTupleHeaderData
堆元組頭部.為了避免浪費(fèi)空間,字段通過(guò)這么一種方式進(jìn)行布局避免不必要的對(duì)齊填充.
/*
* Heap tuple header. To avoid wasting space, the fields should be
* laid out in such a way as to avoid structure padding.
* 堆元組頭部.為了避免浪費(fèi)空間,字段通過(guò)這么一種方式進(jìn)行布局避免結(jié)構(gòu)體不必要的填充.
*
* Datums of composite types (row types) share the same general structure
* as on-disk tuples, so that the same routines can be used to build and
* examine them. However the requirements are slightly different: a Datum
* does not need any transaction visibility information, and it does need
* a length word and some embedded type information. We can achieve this
* by overlaying the xmin/cmin/xmax/cmax/xvac fields of a heap tuple
* with the fields needed in the Datum case. Typically, all tuples built
* in-memory will be initialized with the Datum fields; but when a tuple is
* about to be inserted in a table, the transaction fields will be filled,
* overwriting the datum fields.
* 組合類型(行類型)的Datums與磁盤(pán)上的元組共享相同的常規(guī)結(jié)構(gòu)體,
* 因此可以使用相同的處理過(guò)程來(lái)構(gòu)造和檢查這些信息.
* 但是,需求可能很不一樣:Datum不需要任何事物可見(jiàn)性相關(guān)的信息,但確實(shí)需要長(zhǎng)度字和一些嵌入的類型信息.
* 在Datum這種情況下,我們可以通過(guò)使用堆元組中的xmin/cmin/xmax/cmax/xvac字段疊加來(lái)獲取這些信息.
* 典型的,在內(nèi)存中構(gòu)造的所有元組會(huì)通過(guò)Datum字段初始化,但在元組將要插入到表時(shí),事務(wù)字段會(huì)被填充,覆寫(xiě)Datum字段.
*
* The overall structure of a heap tuple looks like:
* fixed fields (HeapTupleHeaderData struct)
* nulls bitmap (if HEAP_HASNULL is set in t_infomask)
* alignment padding (as needed to make user data MAXALIGN'd)
* object ID (if HEAP_HASOID_OLD is set in t_infomask, not created
* anymore)
* user data fields
* 堆元組的整體結(jié)構(gòu)看起來(lái)是這樣的:
* 固定字段(HeapTupleHeaderData結(jié)構(gòu)體)
* nulls位圖(如在t_infomask中設(shè)置了HEAP_HASNULL標(biāo)記位)
* 對(duì)齊填充(如MAXALIGN)
* 對(duì)象ID(如t_infomask設(shè)置了HEAP_HASOID_OLD標(biāo)記位,則沒(méi)有創(chuàng)建)
* 用戶數(shù)據(jù)字段
*
* We store five "virtual" fields Xmin, Cmin, Xmax, Cmax, and Xvac in three
* physical fields. Xmin and Xmax are always really stored, but Cmin, Cmax
* and Xvac share a field. This works because we know that Cmin and Cmax
* are only interesting for the lifetime of the inserting and deleting
* transaction respectively. If a tuple is inserted and deleted in the same
* transaction, we store a "combo" command id that can be mapped to the real
* cmin and cmax, but only by use of local state within the originating
* backend. See combocid.c for more details. Meanwhile, Xvac is only set by
* old-style VACUUM FULL, which does not have any command sub-structure and so
* does not need either Cmin or Cmax. (This requires that old-style VACUUM
* FULL never try to move a tuple whose Cmin or Cmax is still interesting,
* ie, an insert-in-progress or delete-in-progress tuple.)
* 在三個(gè)物理字段中存儲(chǔ)了5個(gè)"虛擬"字段,分別是Xmin, Cmin, Xmax, Cmax, and Xvac.
* Xmin和Xmax通常是實(shí)際存儲(chǔ)的,但Cmin,Cmax和Xvac共享一個(gè)字段.
* 這樣之所以可行是因?yàn)槲覀冎繡min和Cmax只在相應(yīng)的插入和刪除事務(wù)生命周期時(shí)才會(huì)有用.
* 如果元組在同一個(gè)事務(wù)中插入和刪除,則存儲(chǔ)一個(gè)"combo"命令I(lǐng)D,該ID可以映射到實(shí)際的cmin和cmax,
* 但只有在原始后臺(tái)進(jìn)程中使用本地狀態(tài)時(shí)才使用.
* 同時(shí),Xvac在老版本的VACUUM FULL時(shí)才會(huì)設(shè)置,該命令不存在命令子結(jié)構(gòu)因此不需要Cmin和Cmax.
* (這需要老版本的VACUUM FULL永遠(yuǎn)不要嘗試移動(dòng)Cmin和Cmax仍有用的元組,比如在插入或刪除元組期間).
*
* A word about t_ctid: whenever a new tuple is stored on disk, its t_ctid
* is initialized with its own TID (location). If the tuple is ever updated,
* its t_ctid is changed to point to the replacement version of the tuple. Or
* if the tuple is moved from one partition to another, due to an update of
* the partition key, t_ctid is set to a special value to indicate that
* (see ItemPointerSetMovedPartitions). Thus, a tuple is the latest version
* of its row iff XMAX is invalid or
* t_ctid points to itself (in which case, if XMAX is valid, the tuple is
* either locked or deleted). One can follow the chain of t_ctid links
* to find the newest version of the row, unless it was moved to a different
* partition. Beware however that VACUUM might
* erase the pointed-to (newer) tuple before erasing the pointing (older)
* tuple. Hence, when following a t_ctid link, it is necessary to check
* to see if the referenced slot is empty or contains an unrelated tuple.
* Check that the referenced tuple has XMIN equal to the referencing tuple's
* XMAX to verify that it is actually the descendant version and not an
* unrelated tuple stored into a slot recently freed by VACUUM. If either
* check fails, one may assume that there is no live descendant version.
* 關(guān)于c_ctid要說(shuō)的:不管什么時(shí)候元組存儲(chǔ)到磁盤(pán)上,元組的t_ctid使用自己的TID(位置)進(jìn)行初始化.
* 如果元組曾經(jīng)修改過(guò),那么t_ctid修改為指向元組的新版本上.
* 或者,如果元組從一個(gè)分區(qū)移動(dòng)到另外一個(gè)分區(qū),由于分區(qū)鍵的修改,
* t_ctid會(huì)設(shè)置為一個(gè)特別的值用以表示這種情況(詳細(xì)查看ItemPointerSetMovedPartitions).
* 因此,在XMAX是無(wú)需或者t_ctid指向自己的時(shí)候,元組是最后的版本
* (在這種情況下,如果XMAX是有效的,元組要么被鎖定要么已被刪除)
*
* t_ctid is sometimes used to store a speculative insertion token, instead
* of a real TID. A speculative token is set on a tuple that's being
* inserted, until the inserter is sure that it wants to go ahead with the
* insertion. Hence a token should only be seen on a tuple with an XMAX
* that's still in-progress, or invalid/aborted. The token is replaced with
* the tuple's real TID when the insertion is confirmed. One should never
* see a speculative insertion token while following a chain of t_ctid links,
* because they are not used on updates, only insertions.
* t_ctid有時(shí)候用于存儲(chǔ) speculative insertion token而不是一個(gè)實(shí)際的TID.
* 在正在插入的元組上設(shè)置speculative token,直至插入程序確定繼續(xù)插入.
* 因此token在XMAX事務(wù)正在處理或者無(wú)效/回滾時(shí)可以查看.
* token在插入確認(rèn)后被替換成實(shí)際的TID.
* 在跟蹤t_ctid鏈接鏈時(shí),不應(yīng)該看到speculative insertion token,
* 因?yàn)樗鼈儾挥糜诟?,只用于插入? *
* Following the fixed header fields, the nulls bitmap is stored (beginning
* at t_bits). The bitmap is *not* stored if t_infomask shows that there
* are no nulls in the tuple. If an OID field is present (as indicated by
* t_infomask), then it is stored just before the user data, which begins at
* the offset shown by t_hoff. Note that t_hoff must be a multiple of
* MAXALIGN.
* 在固定的頭部字段后是nulls位圖(以t_bits開(kāi)始).
* 如t_infomask標(biāo)記提示沒(méi)有空值,則不存才nulls位圖.
* 如果OID字段是現(xiàn)成的(通過(guò)t_infomask指示),那么在用戶數(shù)據(jù)前存儲(chǔ),用戶數(shù)據(jù)從t_hoff所示的偏移量開(kāi)始。
* 注意t_hoff必須是MAXALIGN的倍數(shù).
*/
typedef struct HeapTupleFields
{
TransactionId t_xmin; /* 插入事務(wù)ID;inserting xact ID */
TransactionId t_xmax; /* 刪除或鎖定事務(wù)ID;deleting or locking xact ID */
union
{
CommandId t_cid; /* 插入或刪除命令I(lǐng)D或者combo命令;inserting or deleting command ID, or both */
TransactionId t_xvac; /* old-style VACUUM FULL xact ID */
} t_field3;//聯(lián)合體
} HeapTupleFields;//頭部字段
typedef struct DatumTupleFields
{
int32 datum_len_; /* 可變長(zhǎng)頭部(不能夠直接接觸);varlena header (do not touch directly!) */
int32 datum_typmod; /* -1或者是記錄類型標(biāo)識(shí)符;-1, or identifier of a record type */
Oid datum_typeid; /* 組合類型OID或者RECORDOID;composite type OID, or RECORDOID */
/*
* datum_typeid cannot be a domain over composite, only plain composite,
* even if the datum is meant as a value of a domain-over-composite type.
* This is in line with the general principle that CoerceToDomain does not
* change the physical representation of the base type value.
* 即使datum是domain-over-composite類型,datum_typeid也不能是域組合只能是平面組合.
* 這與一般原則相一致,即CoerceToDomain不改變基類型值的物理表示形式。
*
* Note: field ordering is chosen with thought that Oid might someday
* widen to 64 bits.
* 注意:字段排序的選擇考慮到Oid可能有一天會(huì)擴(kuò)展到64位。
*/
} DatumTupleFields;
struct HeapTupleHeaderData
{
union
{
HeapTupleFields t_heap;
DatumTupleFields t_datum;
} t_choice;
ItemPointerData t_ctid; /* current TID of this or newer tuple (or a
* speculative insertion token) */
/* Fields below here must match MinimalTupleData! */
#define FIELDNO_HEAPTUPLEHEADERDATA_INFOMASK2 2
uint16 t_infomask2; /* number of attributes + various flags */
#define FIELDNO_HEAPTUPLEHEADERDATA_INFOMASK 3
uint16 t_infomask; /* various flag bits, see below */
#define FIELDNO_HEAPTUPLEHEADERDATA_HOFF 4
uint8 t_hoff; /* sizeof header incl. bitmap, padding */
/* ^ - 23 bytes - ^ */
#define FIELDNO_HEAPTUPLEHEADERDATA_BITS 5
bits8 t_bits[FLEXIBLE_ARRAY_MEMBER]; /* bitmap of NULLs */
/* MORE DATA FOLLOWS AT END OF STRUCT */
};
typedef HeapTupleHeaderData* HeapTupleHeader;
/*
結(jié)構(gòu)體展開(kāi),詳見(jiàn)下表:
Field Type Length Offset Description
t_xmin TransactionId 4 bytes 0 insert XID stamp
t_xmax TransactionId 4 bytes 4 delete XID stamp
t_cid CommandId 4 bytes 8 insert and/or delete CID stamp (overlays with t_xvac)
t_xvac TransactionId 4 bytes 8 XID for VACUUM operation moving a row version
t_ctid ItemPointerData 6 bytes 12 current TID of this or newer row version
t_infomask2 uint16 2 bytes 18 number of attributes, plus various flag bits
t_infomask uint16 2 bytes 20 various flag bits
t_hoff uint8 1 byte 22 offset to user data
//注意:t_cid和t_xvac為聯(lián)合體,共用存儲(chǔ)空間
*/
//t_infomask=\x0802,十進(jìn)制值為2050,二進(jìn)制值為100000000010
//t_infomask說(shuō)明
1 #define HEAP_HASNULL 0x0001 /* has null attribute(s) */
10 #define HEAP_HASVARWIDTH 0x0002 /* has variable-width attribute(s) */
100 #define HEAP_HASEXTERNAL 0x0004 /* has external stored attribute(s) */
1000 #define HEAP_HASOID 0x0008 /* has an object-id field */
10000 #define HEAP_XMAX_KEYSHR_LOCK 0x0010 /* xmax is a key-shared locker */
100000 #define HEAP_COMBOCID 0x0020 /* t_cid is a combo cid */
1000000 #define HEAP_XMAX_EXCL_LOCK 0x0040 /* xmax is exclusive locker */
10000000 #define HEAP_XMAX_LOCK_ONLY 0x0080 /* xmax, if valid, is only a locker */
/* xmax is a shared locker */
#define HEAP_XMAX_SHR_LOCK (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_KEYSHR_LOCK)
#define HEAP_LOCK_MASK (HEAP_XMAX_SHR_LOCK | HEAP_XMAX_EXCL_LOCK | \
HEAP_XMAX_KEYSHR_LOCK)
100000000 #define HEAP_XMIN_COMMITTED 0x0100 /* t_xmin committed */
1000000000 #define HEAP_XMIN_INVALID 0x0200 /* t_xmin invalid/aborted */
#define HEAP_XMIN_FROZEN (HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)
10000000000 #define HEAP_XMAX_COMMITTED 0x0400 /* t_xmax committed */
100000000000 #define HEAP_XMAX_INVALID 0x0800 /* t_xmax invalid/aborted */
1000000000000 #define HEAP_XMAX_IS_MULTI 0x1000 /* t_xmax is a MultiXactId */
10000000000000 #define HEAP_UPDATED 0x2000 /* this is UPDATEd version of row */
100000000000000 #define HEAP_MOVED_OFF 0x4000 /* moved to another place by pre-9.0
* VACUUM FULL; kept for binary
* upgrade support */
1000000000000000 #define HEAP_MOVED_IN 0x8000 /* moved from another place by pre-9.0
* VACUUM FULL; kept for binary
* upgrade support */
#define HEAP_MOVED (HEAP_MOVED_OFF | HEAP_MOVED_IN)
1111111111110000 #define HEAP_XACT_MASK 0xFFF0 /* visibility-related bits */
//\x0802,二進(jìn)制100000000010表示第2位和第12位為1,
//意味著存在可變長(zhǎng)屬性(HEAP_HASVARWIDTH),XMAX無(wú)效(HEAP_XMAX_INVALID)
/*
* information stored in t_infomask2:
*/
#define HEAP_NATTS_MASK 0x07FF /* 11 bits for number of attributes */
/* bits 0x1800 are available */
#define HEAP_KEYS_UPDATED 0x2000 /* tuple was updated and key cols
* modified, or tuple deleted */
#define HEAP_HOT_UPDATED 0x4000 /* tuple was HOT-updated */
#define HEAP_ONLY_TUPLE 0x8000 /* this is heap-only tuple */
#define HEAP2_XACT_MASK 0xE000 /* visibility-related bits */
//把十六進(jìn)制值轉(zhuǎn)換為二進(jìn)制顯示
11111111111 #define HEAP_NATTS_MASK 0x07FF
10000000000000 #define HEAP_KEYS_UPDATED 0x2000
100000000000000 #define HEAP_HOT_UPDATED 0x4000
1000000000000000 #define HEAP_ONLY_TUPLE 0x8000
1110000000000000 #define HEAP2_XACT_MASK 0xE000
1111111111111110 #define SpecTokenOffsetNumber 0xfffe
//前(低)11位為屬性的個(gè)數(shù),3意味著有3個(gè)屬性(字段)
xl_heap_freeze_tuple
xl_heap_freeze_tuple表示’freeze plan’,用于存儲(chǔ)在vacuum期間凍結(jié)tuple所需要的信息.
/*
* This struct represents a 'freeze plan', which is what we need to know about
* a single tuple being frozen during vacuum.
* 該結(jié)構(gòu)表示'freeze plan',用于存儲(chǔ)在vacuum期間凍結(jié)tuple所需要的信息
*/
/* 0x01 was XLH_FREEZE_XMIN */
#define XLH_FREEZE_XVAC 0x02
#define XLH_INVALID_XVAC 0x04
typedef struct xl_heap_freeze_tuple
{
TransactionId xmax;
OffsetNumber offset;
uint16 t_infomask2;
uint16 t_infomask;
uint8 frzflags;
} xl_heap_freeze_tuple;
heap_execute_freeze_tuple執(zhí)行實(shí)際的元組凍結(jié)操作(先前已完成準(zhǔn)備工作),邏輯很簡(jiǎn)單,設(shè)置xmax和凍結(jié)事務(wù)號(hào).
/*
* heap_execute_freeze_tuple
* Execute the prepared freezing of a tuple.
* 執(zhí)行實(shí)際的元組凍結(jié)操作(先前已完成準(zhǔn)備工作)
*
* Caller is responsible for ensuring that no other backend can access the
* storage underlying this tuple, either by holding an exclusive lock on the
* buffer containing it (which is what lazy VACUUM does), or by having it be
* in private storage (which is what CLUSTER and friends do).
* 調(diào)用者有責(zé)任確保沒(méi)有其他后臺(tái)進(jìn)程可以訪問(wèn)該元組所在的存儲(chǔ)空間,
* 通過(guò)持有該元組所在的buffer獨(dú)占鎖(lazy VACUUM所做的事情),
* 或者在私有存儲(chǔ)空間中存儲(chǔ)(CLUSTER和友元的處理方式)
*
* Note: it might seem we could make the changes without exclusive lock, since
* TransactionId read/write is assumed atomic anyway. However there is a race
* condition: someone who just fetched an old XID that we overwrite here could
* conceivably not finish checking the XID against pg_xact before we finish
* the VACUUM and perhaps truncate off the part of pg_xact he needs. Getting
* exclusive lock ensures no other backend is in process of checking the
* tuple status. Also, getting exclusive lock makes it safe to adjust the
* infomask bits.
* 注意:看起來(lái)我們可以不需要獨(dú)占鎖就可以進(jìn)行修改,因?yàn)門(mén)ransactionId R/W假定是原子操作.
* 但是,這里有條件爭(zhēng)用:某些進(jìn)程剛剛提取了一個(gè)舊的XID,而該XID已被覆蓋,
* 這時(shí)候會(huì)出現(xiàn)在完成VACUUM之前還沒(méi)有完成pg_xact之上的XID檢查,
* 并且可能會(huì)出現(xiàn)截?cái)嗔藀g_xact所需要的部分內(nèi)容.
* 獲取獨(dú)占鎖可以確保沒(méi)有其他后臺(tái)進(jìn)程正在檢查元組狀態(tài).
* 同時(shí),獲取獨(dú)占鎖可以安全的調(diào)整infomask標(biāo)記位.
*
* NB: All code in here must be safe to execute during crash recovery!
* 注意:這里的所有代碼必須在崩潰恢復(fù)期間可以安全的執(zhí)行.
*/
void
heap_execute_freeze_tuple(HeapTupleHeader tuple, xl_heap_freeze_tuple *frz)
{
HeapTupleHeaderSetXmax(tuple, frz->xmax);
if (frz->frzflags & XLH_FREEZE_XVAC)
HeapTupleHeaderSetXvac(tuple, FrozenTransactionId);
if (frz->frzflags & XLH_INVALID_XVAC)
HeapTupleHeaderSetXvac(tuple, InvalidTransactionId);
tuple->t_infomask = frz->t_infomask;
tuple->t_infomask2 = frz->t_infomask2;
}
//設(shè)置元組的xmax值
#define HeapTupleHeaderSetXmax(tup, xid) \
( \
(tup)->t_choice.t_heap.t_xmax = (xid) \
)
//設(shè)置
#define HeapTupleHeaderSetXvac(tup, xid) \
do { \
Assert((tup)->t_infomask & HEAP_MOVED); \
(tup)->t_choice.t_heap.t_field3.t_xvac = (xid); \
} while (0)
N/A
PG Source Code
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。