溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊(cè)×
其他方式登錄
點(diǎn)擊 登錄注冊(cè) 即表示同意《億速云用戶服務(wù)條款》

PostgreSQL 源碼解讀(90)- 查詢語句#75(ExecHashJoin函數(shù)#1)

發(fā)布時(shí)間:2020-08-11 10:27:47 來源:ITPUB博客 閱讀:283 作者:husthxd 欄目:關(guān)系型數(shù)據(jù)庫

本節(jié)介紹了ExecProcNode的其中一個(gè)Real函數(shù)(ExecHashJoin)。ExecHashJoin函數(shù)實(shí)現(xiàn)了Hash Join算法。

一、數(shù)據(jù)結(jié)構(gòu)

Plan
所有計(jì)劃節(jié)點(diǎn)通過將Plan結(jié)構(gòu)作為第一個(gè)字段從Plan結(jié)構(gòu)“派生”。這確保了在將節(jié)點(diǎn)轉(zhuǎn)換為計(jì)劃節(jié)點(diǎn)時(shí),一切都能正常工作。(在執(zhí)行器中以通用方式傳遞時(shí),節(jié)點(diǎn)指針經(jīng)常被轉(zhuǎn)換為Plan *)

/* ----------------
 *      Plan node
 *
 * All plan nodes "derive" from the Plan structure by having the
 * Plan structure as the first field.  This ensures that everything works
 * when nodes are cast to Plan's.  (node pointers are frequently cast to Plan*
 * when passed around generically in the executor)
 * 所有計(jì)劃節(jié)點(diǎn)通過將Plan結(jié)構(gòu)作為第一個(gè)字段從Plan結(jié)構(gòu)“派生”。
 * 這確保了在將節(jié)點(diǎn)轉(zhuǎn)換為計(jì)劃節(jié)點(diǎn)時(shí),一切都能正常工作。
 * (在執(zhí)行器中以通用方式傳遞時(shí),節(jié)點(diǎn)指針經(jīng)常被轉(zhuǎn)換為Plan *)
 *
 * We never actually instantiate any Plan nodes; this is just the common
 * abstract superclass for all Plan-type nodes.
 * 從未實(shí)例化任何Plan節(jié)點(diǎn);這只是所有Plan-type節(jié)點(diǎn)的通用抽象超類。
 * ----------------
 */
typedef struct Plan
{
    NodeTag     type;//節(jié)點(diǎn)類型

    /*
     * 成本估算信息;estimated execution costs for plan (see costsize.c for more info)
     */
    Cost        startup_cost;   /* 啟動(dòng)成本;cost expended before fetching any tuples */
    Cost        total_cost;     /* 總成本;total cost (assuming all tuples fetched) */

    /*
     * 優(yōu)化器估算信息;planner's estimate of result size of this plan step
     */
    double      plan_rows;      /* 行數(shù);number of rows plan is expected to emit */
    int         plan_width;     /* 平均行大小(Byte為單位);average row width in bytes */

    /*
     * 并行執(zhí)行相關(guān)的信息;information needed for parallel query
     */
    bool        parallel_aware; /* 是否參與并行執(zhí)行邏輯?engage parallel-aware logic? */
    bool        parallel_safe;  /* 是否并行安全;OK to use as part of parallel plan? */

    /*
     * Plan類型節(jié)點(diǎn)通用的信息.Common structural data for all Plan types.
     */
    int         plan_node_id;   /* unique across entire final plan tree */
    List       *targetlist;     /* target list to be computed at this node */
    List       *qual;           /* implicitly-ANDed qual conditions */
    struct Plan *lefttree;      /* input plan tree(s) */
    struct Plan *righttree;
    List       *initPlan;       /* Init Plan nodes (un-correlated expr
                                 * subselects) */

    /*
     * Information for management of parameter-change-driven rescanning
     * parameter-change-driven重掃描的管理信息.
     * 
     * extParam includes the paramIDs of all external PARAM_EXEC params
     * affecting this plan node or its children.  setParam params from the
     * node's initPlans are not included, but their extParams are.
     *
     * allParam includes all the extParam paramIDs, plus the IDs of local
     * params that affect the node (i.e., the setParams of its initplans).
     * These are _all_ the PARAM_EXEC params that affect this node.
     */
    Bitmapset  *extParam;
    Bitmapset  *allParam;
} Plan;

JoinState
Hash/NestLoop/Merge Join的基類

/* ----------------
 *   JoinState information
 *
 *      Superclass for state nodes of join plans.
 *      Hash/NestLoop/Merge Join的基類
 * ----------------
 */
typedef struct JoinState
{
    PlanState   ps;//基類PlanState
    JoinType    jointype;//連接類型
    //在找到一個(gè)匹配inner tuple的時(shí)候,如需要跳轉(zhuǎn)到下一個(gè)outer tuple,則該值為T
    bool        single_match;   /* True if we should skip to next outer tuple
                                 * after finding one inner match */
    //連接條件表達(dá)式(除了ps.qual)
    ExprState  *joinqual;       /* JOIN quals (in addition to ps.qual) */
} JoinState;

HashJoinState
Hash Join運(yùn)行期狀態(tài)結(jié)構(gòu)體

/* these structs are defined in executor/hashjoin.h: */
typedef struct HashJoinTupleData *HashJoinTuple;
typedef struct HashJoinTableData *HashJoinTable;

typedef struct HashJoinState
{
    JoinState   js;             /* 基類;its first field is NodeTag */
    ExprState  *hashclauses;//hash連接條件
    List       *hj_OuterHashKeys;   /* 外表?xiàng)l件鏈表;list of ExprState nodes */
    List       *hj_InnerHashKeys;   /* 內(nèi)表連接條件;list of ExprState nodes */
    List       *hj_HashOperators;   /* 操作符OIDs鏈表;list of operator OIDs */
    HashJoinTable hj_HashTable;//Hash表
    uint32      hj_CurHashValue;//當(dāng)前的Hash值
    int         hj_CurBucketNo;//當(dāng)前的bucket編號(hào)
    int         hj_CurSkewBucketNo;//行傾斜bucket編號(hào)
    HashJoinTuple hj_CurTuple;//當(dāng)前元組
    TupleTableSlot *hj_OuterTupleSlot;//outer relation slot
    TupleTableSlot *hj_HashTupleSlot;//Hash tuple slot
    TupleTableSlot *hj_NullOuterTupleSlot;//用于外連接的outer虛擬slot
    TupleTableSlot *hj_NullInnerTupleSlot;//用于外連接的inner虛擬slot
    TupleTableSlot *hj_FirstOuterTupleSlot;//
    int         hj_JoinState;//JoinState狀態(tài)
    bool        hj_MatchedOuter;//是否匹配
    bool        hj_OuterNotEmpty;//outer relation是否為空
} HashJoinState;

二、源碼解讀

ExecHashJoin函數(shù)實(shí)現(xiàn)了Hash Join算法,實(shí)際實(shí)現(xiàn)的函數(shù)是ExecHashJoinImpl.
ExecHashJoinImpl函數(shù)把Hash Join劃分為多個(gè)階段/狀態(tài)(有限狀態(tài)機(jī)),保存在HashJoinState->hj_JoinState字段中,這些狀態(tài)分別是分別為HJ_BUILD_HASHTABLE/HJ_NEED_NEW_OUTER/HJ_SCAN_BUCKET/HJ_FILL_OUTER_TUPLE/HJ_FILL_INNER_TUPLES/HJ_NEED_NEW_BATCH.
HJ_BUILD_HASHTABLE:創(chuàng)建Hash表;
HJ_NEED_NEW_OUTER:掃描outer relation,計(jì)算外表連接鍵的hash值,把相匹配元組放在合適的bucket中;
HJ_SCAN_BUCKET:掃描bucket,匹配的tuple返回
HJ_FILL_OUTER_TUPLE:當(dāng)前outer relation元組已耗盡,因此檢查是否發(fā)出一個(gè)虛擬的外連接元組。
HJ_FILL_INNER_TUPLES:已完成一個(gè)批處理,但做的是右外連接/完全連接,填充虛擬連接元組
HJ_NEED_NEW_BATCH:開啟下一批次
注意:在work_mem不足以裝下Hash Table時(shí),分批執(zhí)行.每個(gè)批次執(zhí)行時(shí),會(huì)把outer relation與inner relation匹配(指hash值一樣)的tuple會(huì)存儲(chǔ)起來,放在合適的批次文件中(hashtable->outerBatchFile[batchno]),以避免多次的outer relation掃描.


#define HJ_FILL_INNER(hjstate)  ((hjstate)->hj_NullOuterTupleSlot != NULL)

/* ----------------------------------------------------------------
 *      ExecHashJoin
 *
 *      Parallel-oblivious version.
 *      Parallel-oblivious版本。
 * ----------------------------------------------------------------
 */
static TupleTableSlot *         /* 返回元組或者NULL;return: a tuple or NULL */
ExecHashJoin(PlanState *pstate)
{
    /*
     * On sufficiently smart compilers this should be inlined with the
     * parallel-aware branches removed.
     * 在足夠智能的編譯器上,應(yīng)該內(nèi)聯(lián)刪除并行感知分支。
     */
    return ExecHashJoinImpl(pstate, false);
}

/*
 * States of the ExecHashJoin state machine
 */
#define HJ_BUILD_HASHTABLE      1
#define HJ_NEED_NEW_OUTER       2
#define HJ_SCAN_BUCKET          3
#define HJ_FILL_OUTER_TUPLE     4
#define HJ_FILL_INNER_TUPLES    5
#define HJ_NEED_NEW_BATCH       6

/* Returns true if doing null-fill on outer relation */
#define HJ_FILL_OUTER(hjstate)  ((hjstate)->hj_NullInnerTupleSlot != NULL)
/* Returns true if doing null-fill on inner relation */
#define HJ_FILL_INNER(hjstate)  ((hjstate)->hj_NullOuterTupleSlot != NULL)

static TupleTableSlot *ExecHashJoinOuterGetTuple(PlanState *outerNode,
                          HashJoinState *hjstate,
                          uint32 *hashvalue);
static TupleTableSlot *ExecParallelHashJoinOuterGetTuple(PlanState *outerNode,
                                  HashJoinState *hjstate,
                                  uint32 *hashvalue);
static TupleTableSlot *ExecHashJoinGetSavedTuple(HashJoinState *hjstate,
                          BufFile *file,
                          uint32 *hashvalue,
                          TupleTableSlot *tupleSlot);
static bool ExecHashJoinNewBatch(HashJoinState *hjstate);
static bool ExecParallelHashJoinNewBatch(HashJoinState *hjstate);
static void ExecParallelHashJoinPartitionOuter(HashJoinState *node);

/* ----------------------------------------------------------------
 *      ExecHashJoinImpl
 *
 *      This function implements the Hybrid Hashjoin algorithm.  It is marked
 *      with an always-inline attribute so that ExecHashJoin() and
 *      ExecParallelHashJoin() can inline it.  Compilers that respect the
 *      attribute should create versions specialized for parallel == true and
 *      parallel == false with unnecessary branches removed.
 *      這個(gè)函數(shù)實(shí)現(xiàn)了混合Hash Join算法。
 *      它被標(biāo)記為一個(gè)always-inline的屬性(pg_attribute_always_inline),
 *        以便ExecHashJoin()和ExecParallelHashJoin()可以內(nèi)聯(lián)它。
 *      可以識(shí)別該屬性的編譯器應(yīng)該創(chuàng)建專門針對(duì)parallel == true和parallel == false的版本,去掉不必要的分支。
 *
 *      Note: the relation we build hash table on is the "inner"
 *            the other one is "outer".
 *      注意:在inner上創(chuàng)建hash表,另外一個(gè)參與連接的成為outer
 * ----------------------------------------------------------------
 */
static pg_attribute_always_inline TupleTableSlot *
ExecHashJoinImpl(PlanState *pstate, bool parallel)
{
    HashJoinState *node = castNode(HashJoinState, pstate);
    PlanState  *outerNode;
    HashState  *hashNode;
    ExprState  *joinqual;
    ExprState  *otherqual;
    ExprContext *econtext;
    HashJoinTable hashtable;
    TupleTableSlot *outerTupleSlot;
    uint32      hashvalue;
    int         batchno;
    ParallelHashJoinState *parallel_state;

    /*
     * get information from HashJoin node
     * 從HashJon Node中獲取信息
     */
    joinqual = node->js.joinqual;
    otherqual = node->js.ps.qual;
    hashNode = (HashState *) innerPlanState(node);
    outerNode = outerPlanState(node);
    hashtable = node->hj_HashTable;
    econtext = node->js.ps.ps_ExprContext;
    parallel_state = hashNode->parallel_state;

    /*
     * Reset per-tuple memory context to free any expression evaluation
     * storage allocated in the previous tuple cycle.
     * 重置每個(gè)元組內(nèi)存上下文以釋放在前一個(gè)元組處理周期中分配的所有表達(dá)式計(jì)算存儲(chǔ)。
     */
    ResetExprContext(econtext);

    /*
     * run the hash join state machine
     * 執(zhí)行hash join狀態(tài)機(jī)
     */
    for (;;)
    {
        /*
         * It's possible to iterate this loop many times before returning a
         * tuple, in some pathological cases such as needing to move much of
         * the current batch to a later batch.  So let's check for interrupts
         * each time through.
         * 在返回元組之前,可以多次迭代此循環(huán),在某些"變態(tài)"的情況下,
         *   例如需要將當(dāng)前批處理的大部分轉(zhuǎn)移到下一批處理。
         * 所以需要每次檢查中斷。
         */
        CHECK_FOR_INTERRUPTS();

        switch (node->hj_JoinState)
        {
            case HJ_BUILD_HASHTABLE://-->HJ_BUILD_HASHTABLE階段

                /*
                 * First time through: build hash table for inner relation.
                 * 第一次的處理邏輯:為inner relation建立hash表
                 */
                Assert(hashtable == NULL);

                /*
                 * If the outer relation is completely empty, and it's not
                 * right/full join, we can quit without building the hash
                 * table.  However, for an inner join it is only a win to
                 * check this when the outer relation's startup cost is less
                 * than the projected cost of building the hash table.
                 * Otherwise it's best to build the hash table first and see
                 * if the inner relation is empty.  (When it's a left join, we
                 * should always make this check, since we aren't going to be
                 * able to skip the join on the strength of an empty inner
                 * relation anyway.)
                 * 如果外部關(guān)系是空的,并且它不是右外/完全連接,可以在不構(gòu)建哈希表的情況下退出。
                 * 但是,對(duì)于內(nèi)連接,只有當(dāng)外部關(guān)系的啟動(dòng)成本小于構(gòu)建哈希表的預(yù)期成本時(shí),才需要檢查這一點(diǎn)。
                 * 否則,最好先構(gòu)建哈希表,看看內(nèi)部關(guān)系是否為空。
                * (當(dāng)它是左外連接時(shí),應(yīng)該始終進(jìn)行檢查,因?yàn)闊o論如何,都不能基于空的內(nèi)部關(guān)系跳過連接。)
                 *
                 * If we are rescanning the join, we make use of information
                 * gained on the previous scan: don't bother to try the
                 * prefetch if the previous scan found the outer relation
                 * nonempty. This is not 100% reliable since with new
                 * parameters the outer relation might yield different
                 * results, but it's a good heuristic.
                 * 如果需要重新掃描連接,將利用上次掃描結(jié)果中獲得的信息:
                 *   如果上次掃描發(fā)現(xiàn)外部關(guān)系非空,則不必嘗試預(yù)取。
                 * 但這不是100%可靠的,因?yàn)橛辛诵碌膮?shù),外部關(guān)系可能會(huì)產(chǎn)生不同的結(jié)果,但這是一個(gè)很好的啟發(fā)式。
                 *
                 * The only way to make the check is to try to fetch a tuple
                 * from the outer plan node.  If we succeed, we have to stash
                 * it away for later consumption by ExecHashJoinOuterGetTuple.
                 * 進(jìn)行檢查的唯一方法是從外部plan節(jié)點(diǎn)獲取一個(gè)元組。
                 * 如果成功了,就必須通過ExecHashJoinOuterGetTuple將其存儲(chǔ)起來,以便以后使用。
                 */
                if (HJ_FILL_INNER(node))
                {
                    /* no chance to not build the hash table */
                    //不構(gòu)建哈希表是不可能的了
                    node->hj_FirstOuterTupleSlot = NULL;
                }
                else if (parallel)
                {
                    /*
                     * The empty-outer optimization is not implemented for
                     * shared hash tables, because no one participant can
                     * determine that there are no outer tuples, and it's not
                     * yet clear that it's worth the synchronization overhead
                     * of reaching consensus to figure that out.  So we have
                     * to build the hash table.
                     * 對(duì)于共享哈希表,并沒有實(shí)現(xiàn)空外關(guān)系優(yōu)化,因?yàn)闆]有任何參與者可以確定沒有外部元組,
                     * 而且還不清楚是否值得為了解決這個(gè)問題而進(jìn)行同步開銷。
                     * 所以我們要建立哈希表。
                     */
                    node->hj_FirstOuterTupleSlot = NULL;
                }
                else if (HJ_FILL_OUTER(node) ||
                         (outerNode->plan->startup_cost < hashNode->ps.plan->total_cost &&
                          !node->hj_OuterNotEmpty))
                {
                    node->hj_FirstOuterTupleSlot = ExecProcNode(outerNode);
                    if (TupIsNull(node->hj_FirstOuterTupleSlot))
                    {
                        node->hj_OuterNotEmpty = false;
                        return NULL;
                    }
                    else
                        node->hj_OuterNotEmpty = true;
                }
                else
                    node->hj_FirstOuterTupleSlot = NULL;

                /*
                 * Create the hash table.  If using Parallel Hash, then
                 * whoever gets here first will create the hash table and any
                 * later arrivals will merely attach to it.
                 * 創(chuàng)建哈希表。
                 * 如果使用并行哈希,那么最先到達(dá)這里的worker將創(chuàng)建哈希表,之后到達(dá)的只會(huì)附加到它上面。
                 */
                hashtable = ExecHashTableCreate(hashNode,
                                                node->hj_HashOperators,
                                                HJ_FILL_INNER(node));
                node->hj_HashTable = hashtable;

                /*
                 * Execute the Hash node, to build the hash table.  If using
                 * Parallel Hash, then we'll try to help hashing unless we
                 * arrived too late.
                 * 執(zhí)行哈希節(jié)點(diǎn),以構(gòu)建哈希表。
                 * 如果使用并行哈希,那么將嘗試協(xié)助哈希運(yùn)算,除非太晚了。
                 */
                hashNode->hashtable = hashtable;
                (void) MultiExecProcNode((PlanState *) hashNode);

                /*
                 * If the inner relation is completely empty, and we're not
                 * doing a left outer join, we can quit without scanning the
                 * outer relation.
                 * 如果內(nèi)部關(guān)系是空的,并且沒有執(zhí)行左外連接,可以在不掃描外部關(guān)系的情況下退出。
                 */
                if (hashtable->totalTuples == 0 && !HJ_FILL_OUTER(node))
                    return NULL;

                /*
                 * need to remember whether nbatch has increased since we
                 * began scanning the outer relation
                 * 需要記住自開始掃描外部關(guān)系以來nbatch是否增加了
                 */
                hashtable->nbatch_outstart = hashtable->nbatch;

                /*
                 * Reset OuterNotEmpty for scan.  (It's OK if we fetched a
                 * tuple above, because ExecHashJoinOuterGetTuple will
                 * immediately set it again.)
                 * 掃描前重置OuterNotEmpty。
                 * (在其上獲取一個(gè)tuple是可以的,因?yàn)镋xecHashJoinOuterGetTuple將立即再次設(shè)置它。)
                 */
                node->hj_OuterNotEmpty = false;//重置OuterNotEmpty為F

                if (parallel)
                {
                    //啟用并行
                    Barrier    *build_barrier;

                    build_barrier = &parallel_state->build_barrier;
                    Assert(BarrierPhase(build_barrier) == PHJ_BUILD_HASHING_OUTER ||
                           BarrierPhase(build_barrier) == PHJ_BUILD_DONE);
                    if (BarrierPhase(build_barrier) == PHJ_BUILD_HASHING_OUTER)
                    {
                        /*
                         * If multi-batch, we need to hash the outer relation
                         * up front.
                         * 如果是多批處理,需要預(yù)先散列外部關(guān)系。
                         */
                        if (hashtable->nbatch > 1)
                            ExecParallelHashJoinPartitionOuter(node);
                        BarrierArriveAndWait(build_barrier,
                                             WAIT_EVENT_HASH_BUILD_HASHING_OUTER);
                    }
                    Assert(BarrierPhase(build_barrier) == PHJ_BUILD_DONE);

                    /* Each backend should now select a batch to work on. */
                    //每一個(gè)后臺(tái)worker需選擇批次
                    hashtable->curbatch = -1;
                    node->hj_JoinState = HJ_NEED_NEW_BATCH;

                    continue;//下一循環(huán)
                }
                else
                    //非并行執(zhí)行,設(shè)置hj_JoinState狀態(tài)
                    node->hj_JoinState = HJ_NEED_NEW_OUTER;

                /* FALL THRU */

            case HJ_NEED_NEW_OUTER://-->HJ_NEED_NEW_OUTER階段

                /*
                 * We don't have an outer tuple, try to get the next one
                 * 沒有外部元組,試著獲取下一個(gè)
                 */
                if (parallel)
                    outerTupleSlot =
                        ExecParallelHashJoinOuterGetTuple(outerNode, node,
                                                          &hashvalue);//并行執(zhí)行
                else
                    outerTupleSlot =
                        ExecHashJoinOuterGetTuple(outerNode, node, &hashvalue);//普通執(zhí)行

                if (TupIsNull(outerTupleSlot))
                {
                    //如outerTupleSlot為NULL
                    /* end of batch, or maybe whole join */
                    //完成此批數(shù)據(jù)處理,或者可能是全連接
                    if (HJ_FILL_INNER(node))//hj_NullOuterTupleSlot != NULL
                    {
                        /* set up to scan for unmatched inner tuples */
                        //不匹配的行,填充NULL(外連接)
                        ExecPrepHashTableForUnmatched(node);
                        node->hj_JoinState = HJ_FILL_INNER_TUPLES;
                    }
                    else
                        node->hj_JoinState = HJ_NEED_NEW_BATCH;//需要下一個(gè)批次
                    continue;
                }
                //設(shè)置變量
                econtext->ecxt_outertuple = outerTupleSlot;
                node->hj_MatchedOuter = false;

                /*
                 * Find the corresponding bucket for this tuple in the main
                 * hash table or skew hash table.
                 * 在主哈希表或斜哈希表中為這個(gè)元組找到對(duì)應(yīng)的bucket。
                 */
                node->hj_CurHashValue = hashvalue;
                //獲取Hash Bucket并處理此批次
                ExecHashGetBucketAndBatch(hashtable, hashvalue,
                                          &node->hj_CurBucketNo, &batchno);
                //Hash傾斜優(yōu)化(某個(gè)值的數(shù)據(jù)特別多)
                node->hj_CurSkewBucketNo = ExecHashGetSkewBucket(hashtable,
                                                                 hashvalue);
                node->hj_CurTuple = NULL;

                /*
                 * The tuple might not belong to the current batch (where
                 * "current batch" includes the skew buckets if any).
                 * 元組可能不屬于當(dāng)前批處理(其中“當(dāng)前批處理”包括傾斜桶-如果有的話)。
                 */
                if (batchno != hashtable->curbatch &&
                    node->hj_CurSkewBucketNo == INVALID_SKEW_BUCKET_NO)
                {
                    /*
                     * Need to postpone this outer tuple to a later batch.
                     * Save it in the corresponding outer-batch file.
                     * 需要將這個(gè)外部元組推遲到稍后的批處理。保存在相應(yīng)的外部批處理文件中。
                     * 也就是說,INNER和OUTER屬于此批次的數(shù)據(jù)都可能存儲(chǔ)在外存中
                     */
                    Assert(parallel_state == NULL);
                    Assert(batchno > hashtable->curbatch);
                    ExecHashJoinSaveTuple(ExecFetchSlotMinimalTuple(outerTupleSlot),
                                          hashvalue,
                                          &hashtable->outerBatchFile[batchno]);

                    /* Loop around, staying in HJ_NEED_NEW_OUTER state */
                    //循環(huán),保持HJ_NEED_NEW_OUTER狀態(tài)
                    continue;
                }

                /* OK, let's scan the bucket for matches */
                //已完成此階段,切換至HJ_SCAN_BUCKET狀態(tài)
                node->hj_JoinState = HJ_SCAN_BUCKET;

                /* FALL THRU */

            case HJ_SCAN_BUCKET://-->HJ_SCAN_BUCKET階段

                /*
                 * Scan the selected hash bucket for matches to current outer
                 * 掃描選定的散列桶,查找與當(dāng)前外部匹配的散列桶
                 */
                if (parallel)
                {
                    //并行處理
                    if (!ExecParallelScanHashBucket(node, econtext))
                    {
                        /* out of matches; check for possible outer-join fill */
                        // 無法匹配,檢查可能的外連接填充,狀態(tài)切換為HJ_FILL_OUTER_TUPLE
                        node->hj_JoinState = HJ_FILL_OUTER_TUPLE;
                        continue;
                    }
                }
                else
                {
                    //非并行執(zhí)行
                    if (!ExecScanHashBucket(node, econtext))
                    {
                        /* out of matches; check for possible outer-join fill */
                        node->hj_JoinState = HJ_FILL_OUTER_TUPLE;//同上
                        continue;
                    }
                }

                /*
                 * We've got a match, but still need to test non-hashed quals.
                 * ExecScanHashBucket already set up all the state needed to
                 * call ExecQual.
                 * 發(fā)現(xiàn)一個(gè)匹配,但仍然需要測試非散列的quals。
                 * ExecScanHashBucket已經(jīng)設(shè)置了調(diào)用ExecQual所需的所有狀態(tài)。
                 * 
                 * If we pass the qual, then save state for next call and have
                 * ExecProject form the projection, store it in the tuple
                 * table, and return the slot.
                 * 如果我們傳遞了qual,那么將狀態(tài)保存為下一次調(diào)用,
                 * 并讓ExecProject形成投影,將其存儲(chǔ)在tuple table中,并返回slot。
                 *
                 * Only the joinquals determine tuple match status, but all
                 * quals must pass to actually return the tuple.
                 * 只有連接條件joinquals確定元組匹配狀態(tài),但所有條件quals必須通過才能返回元組。
                 */
                if (joinqual == NULL || ExecQual(joinqual, econtext))
                {
                    node->hj_MatchedOuter = true;
                    HeapTupleHeaderSetMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple));

                    /* In an antijoin, we never return a matched tuple */
                    //反連接,則不能返回匹配的元組
                    if (node->js.jointype == JOIN_ANTI)
                    {
                        node->hj_JoinState = HJ_NEED_NEW_OUTER;
                        continue;
                    }

                    /*
                     * If we only need to join to the first matching inner
                     * tuple, then consider returning this one, but after that
                     * continue with next outer tuple.
                     * 如果只需要連接到第一個(gè)匹配的內(nèi)表元組,那么可以考慮返回這個(gè)元組,
                     * 但是在此之后可以繼續(xù)使用下一個(gè)外表元組。
                     */
                    if (node->js.single_match)
                        node->hj_JoinState = HJ_NEED_NEW_OUTER;

                    if (otherqual == NULL || ExecQual(otherqual, econtext))
                        return ExecProject(node->js.ps.ps_ProjInfo);//執(zhí)行投影操作
                    else
                        InstrCountFiltered2(node, 1);//其他條件不匹配
                }
                else
                    InstrCountFiltered1(node, 1);//連接條件不匹配
                break;

            case HJ_FILL_OUTER_TUPLE://-->HJ_FILL_OUTER_TUPLE階段

                /*
                 * The current outer tuple has run out of matches, so check
                 * whether to emit a dummy outer-join tuple.  Whether we emit
                 * one or not, the next state is NEED_NEW_OUTER.
                 * 當(dāng)前外部元組已耗盡匹配項(xiàng),因此檢查是否發(fā)出一個(gè)虛擬的外連接元組。
                 * 不管是否發(fā)出一個(gè),下一個(gè)狀態(tài)是NEED_NEW_OUTER。
                 */
                node->hj_JoinState = HJ_NEED_NEW_OUTER;//切換狀態(tài)為HJ_NEED_NEW_OUTER

                if (!node->hj_MatchedOuter &&
                    HJ_FILL_OUTER(node))
                {
                    /*
                     * Generate a fake join tuple with nulls for the inner
                     * tuple, and return it if it passes the non-join quals.
                     * 為內(nèi)部元組生成一個(gè)帶有null的假連接元組,并在滿足非連接條件quals時(shí)返回它。
                     */
                    econtext->ecxt_innertuple = node->hj_NullInnerTupleSlot;

                    if (otherqual == NULL || ExecQual(otherqual, econtext))
                        return ExecProject(node->js.ps.ps_ProjInfo);//投影操作
                    else
                        InstrCountFiltered2(node, 1);
                }
                break;

            case HJ_FILL_INNER_TUPLES://-->HJ_FILL_INNER_TUPLES階段

                /*
                 * We have finished a batch, but we are doing right/full join,
                 * so any unmatched inner tuples in the hashtable have to be
                 * emitted before we continue to the next batch.
                 * 已經(jīng)完成了一個(gè)批處理,但是做的是右外/完全連接,
                     所以必須在繼續(xù)下一個(gè)批處理之前發(fā)出散列表中任何不匹配的內(nèi)部元組。
                 */
                if (!ExecScanHashTableForUnmatched(node, econtext))
                {
                    /* no more unmatched tuples */
                    //不存在更多不匹配的元組,切換狀態(tài)為HJ_NEED_NEW_BATCH(開始下一批次)
                    node->hj_JoinState = HJ_NEED_NEW_BATCH;
                    continue;
                }

                /*
                 * Generate a fake join tuple with nulls for the outer tuple,
                 * and return it if it passes the non-join quals.
                 * 為外表元組生成一個(gè)帶有null的假連接元組,并在滿足非連接條件quals時(shí)返回它。
                 */
                econtext->ecxt_outertuple = node->hj_NullOuterTupleSlot;

                if (otherqual == NULL || ExecQual(otherqual, econtext))
                    return ExecProject(node->js.ps.ps_ProjInfo);
                else
                    InstrCountFiltered2(node, 1);
                break;

            case HJ_NEED_NEW_BATCH://-->HJ_NEED_NEW_BATCH階段

                /*
                 * Try to advance to next batch.  Done if there are no more.
                 * 盡量提前到下一批。如果沒有了,就結(jié)束。
                 */
                if (parallel)
                {
                    //并行處理
                    if (!ExecParallelHashJoinNewBatch(node))
                        return NULL;    /* end of parallel-aware join */
                }
                else
                {
                    //非并行處理
                    if (!ExecHashJoinNewBatch(node))
                        return NULL;    /* end of parallel-oblivious join */
                }
                node->hj_JoinState = HJ_NEED_NEW_OUTER;//切換狀態(tài)
                break;

            default://非法的JoinState
                elog(ERROR, "unrecognized hashjoin state: %d",
                     (int) node->hj_JoinState);
        }
    }
}

三、跟蹤分析

測試腳本如下

testdb=# explain verbose select dw.*,grjf.grbh,grjf.xm,grjf.ny,grjf.je 
testdb-# from t_dwxx dw,lateral (select gr.grbh,gr.xm,jf.ny,jf.je 
testdb(#                         from t_grxx gr inner join t_jfxx jf 
testdb(#                                        on gr.dwbh = dw.dwbh 
testdb(#                                           and gr.grbh = jf.grbh) grjf
testdb-# order by dw.dwbh;
                                          QUERY PLAN                                           
-----------------------------------------------------------------------------------------------
 Sort  (cost=14828.83..15078.46 rows=99850 width=47)
   Output: dw.dwmc, dw.dwbh, dw.dwdz, gr.grbh, gr.xm, jf.ny, jf.je
   Sort Key: dw.dwbh
   ->  Hash Join  (cost=3176.00..6537.55 rows=99850 width=47)
         Output: dw.dwmc, dw.dwbh, dw.dwdz, gr.grbh, gr.xm, jf.ny, jf.je
         Hash Cond: ((gr.grbh)::text = (jf.grbh)::text)
         ->  Hash Join  (cost=289.00..2277.61 rows=99850 width=32)
               Output: dw.dwmc, dw.dwbh, dw.dwdz, gr.grbh, gr.xm
               Inner Unique: true
               Hash Cond: ((gr.dwbh)::text = (dw.dwbh)::text)
               ->  Seq Scan on public.t_grxx gr  (cost=0.00..1726.00 rows=100000 width=16)
                     Output: gr.dwbh, gr.grbh, gr.xm, gr.xb, gr.nl
               ->  Hash  (cost=164.00..164.00 rows=10000 width=20)
                     Output: dw.dwmc, dw.dwbh, dw.dwdz
                     ->  Seq Scan on public.t_dwxx dw  (cost=0.00..164.00 rows=10000 width=20)
                           Output: dw.dwmc, dw.dwbh, dw.dwdz
         ->  Hash  (cost=1637.00..1637.00 rows=100000 width=20)
               Output: jf.ny, jf.je, jf.grbh
               ->  Seq Scan on public.t_jfxx jf  (cost=0.00..1637.00 rows=100000 width=20)
                     Output: jf.ny, jf.je, jf.grbh
(20 rows)

啟動(dòng)gdb,設(shè)置斷點(diǎn),進(jìn)入ExecHashJoin

(gdb) b ExecHashJoin
Breakpoint 1 at 0x70292e: file nodeHashjoin.c, line 565.
(gdb) c
Continuing.

Breakpoint 1, ExecHashJoin (pstate=0x2ee1a88) at nodeHashjoin.c:565
565     return ExecHashJoinImpl(pstate, false);

繼續(xù)執(zhí)行,進(jìn)入第2個(gè)Hash Join,即t_grxx & t_dwxx的連接

(gdb) n

Breakpoint 1, ExecHashJoin (pstate=0x2ee1d98) at nodeHashjoin.c:565
565     return ExecHashJoinImpl(pstate, false);

查看輸入?yún)?shù),ExecProcNode=ExecProcNodeReal=ExecHashJoin

(gdb) p *pstate
$8 = {type = T_HashJoinState, plan = 0x2faaff8, state = 0x2ee1758, ExecProcNode = 0x70291d <ExecHashJoin>, 
  ExecProcNodeReal = 0x70291d <ExecHashJoin>, instrument = 0x0, worker_instrument = 0x0, worker_jit_instrument = 0x0, 
  qual = 0x0, lefttree = 0x2ee2070, righttree = 0x2ee2918, initPlan = 0x0, subPlan = 0x0, chgParam = 0x0, 
  ps_ResultTupleSlot = 0x2f20d98, ps_ExprContext = 0x2ee1fb0, ps_ProjInfo = 0x2ee3550, scandesc = 0x0}
(gdb) 

pstate的lefttree對(duì)應(yīng)的是SeqScan,righttree對(duì)應(yīng)的是Hash,即左樹(outer relation)為t_grxx的順序掃描運(yùn)算生成的relation,右樹(inner relation)為t_dwxx的順序掃描運(yùn)算生成的relation(在此relation上創(chuàng)建Hash Table)

(gdb) p *pstate->lefttree
$6 = {type = T_SeqScanState, plan = 0x2fa8ff0, state = 0x2ee1758, ExecProcNode = 0x6e4bde <ExecProcNodeFirst>, 
  ExecProcNodeReal = 0x71578d <ExecSeqScan>, instrument = 0x0, worker_instrument = 0x0, worker_jit_instrument = 0x0, 
  qual = 0x0, lefttree = 0x0, righttree = 0x0, initPlan = 0x0, subPlan = 0x0, chgParam = 0x0, 
  ps_ResultTupleSlot = 0x2ee27d8, ps_ExprContext = 0x2ee2188, ps_ProjInfo = 0x0, scandesc = 0x7f0710d02bd0}
(gdb) p *pstate->righttree
$9 = {type = T_HashState, plan = 0x2faaf60, state = 0x2ee1758, ExecProcNode = 0x6e4bde <ExecProcNodeFirst>, 
  ExecProcNodeReal = 0x6fc015 <ExecHash>, instrument = 0x0, worker_instrument = 0x0, worker_jit_instrument = 0x0, 
  qual = 0x0, lefttree = 0x2ee2af0, righttree = 0x0, initPlan = 0x0, subPlan = 0x0, chgParam = 0x0, 
  ps_ResultTupleSlot = 0x2ee3278, ps_ExprContext = 0x2ee2a30, ps_ProjInfo = 0x0, scandesc = 0x0}

進(jìn)入ExecHashJoinImpl函數(shù)

(gdb) step
ExecHashJoinImpl (pstate=0x2ee1d98, parallel=false) at nodeHashjoin.c:167
167     HashJoinState *node = castNode(HashJoinState, pstate);

賦值,查看HashJoinState等變量值

(gdb) n
182     joinqual = node->js.joinqual;
(gdb) n
183     otherqual = node->js.ps.qual;
(gdb) 
184     hashNode = (HashState *) innerPlanState(node);
(gdb) 
185     outerNode = outerPlanState(node);
(gdb) 
186     hashtable = node->hj_HashTable;
(gdb) 
187     econtext = node->js.ps.ps_ExprContext;
(gdb) 
188     parallel_state = hashNode->parallel_state;
(gdb) 
194     ResetExprContext(econtext);
(gdb) p *node
$10 = {js = {ps = {type = T_HashJoinState, plan = 0x2faaff8, state = 0x2ee1758, ExecProcNode = 0x70291d <ExecHashJoin>, 
      ExecProcNodeReal = 0x70291d <ExecHashJoin>, instrument = 0x0, worker_instrument = 0x0, worker_jit_instrument = 0x0, 
      qual = 0x0, lefttree = 0x2ee2070, righttree = 0x2ee2918, initPlan = 0x0, subPlan = 0x0, chgParam = 0x0, 
      ps_ResultTupleSlot = 0x2f20d98, ps_ExprContext = 0x2ee1fb0, ps_ProjInfo = 0x2ee3550, scandesc = 0x0}, 
    jointype = JOIN_INNER, single_match = true, joinqual = 0x0}, hashclauses = 0x2f21430, hj_OuterHashKeys = 0x2f22230, 
  hj_InnerHashKeys = 0x2f22740, hj_HashOperators = 0x2f227a0, hj_HashTable = 0x0, hj_CurHashValue = 0, hj_CurBucketNo = 0, 
  hj_CurSkewBucketNo = -1, hj_CurTuple = 0x0, hj_OuterTupleSlot = 0x2f212f0, hj_HashTupleSlot = 0x2ee3278, 
  hj_NullOuterTupleSlot = 0x0, hj_NullInnerTupleSlot = 0x0, hj_FirstOuterTupleSlot = 0x0, hj_JoinState = 1, 
  hj_MatchedOuter = false, hj_OuterNotEmpty = false}
(gdb) p *otherqual
Cannot access memory at address 0x0
(gdb) p *hashNode
$11 = {ps = {type = T_HashState, plan = 0x2faaf60, state = 0x2ee1758, ExecProcNode = 0x6e4bde <ExecProcNodeFirst>, 
    ExecProcNodeReal = 0x6fc015 <ExecHash>, instrument = 0x0, worker_instrument = 0x0, worker_jit_instrument = 0x0, 
    qual = 0x0, lefttree = 0x2ee2af0, righttree = 0x0, initPlan = 0x0, subPlan = 0x0, chgParam = 0x0, 
    ps_ResultTupleSlot = 0x2ee3278, ps_ExprContext = 0x2ee2a30, ps_ProjInfo = 0x0, scandesc = 0x0}, hashtable = 0x0, 
  hashkeys = 0x2f22740, shared_info = 0x0, hinstrument = 0x0, parallel_state = 0x0}
(gdb) p *hashtable
Cannot access memory at address 0x0
(gdb) p parallel_state
$12 = (ParallelHashJoinState *) 0x0
(gdb)   

進(jìn)入HJ_BUILD_HASHTABLE處理邏輯,創(chuàng)建Hash表

(gdb) p node->hj_JoinState
$13 = 1

HJ_BUILD_HASHTABLE->執(zhí)行相關(guān)判斷,本例為內(nèi)連接,因此不存在FILL_OUTER等情況

(gdb) n
216                 Assert(hashtable == NULL);
(gdb) 
241                 if (HJ_FILL_INNER(node))
(gdb) 
246                 else if (parallel)
(gdb) 
258                 else if (HJ_FILL_OUTER(node) ||
(gdb) 
259                          (outerNode->plan->startup_cost < hashNode->ps.plan->total_cost &&
(gdb) 

HJ_BUILD_HASHTABLE->outer node的啟動(dòng)成本低于創(chuàng)建Hash表的總成本而且outer relation為空(初始化node->hj_OuterNotEmpty為false),那么嘗試獲取outer relation的第一個(gè)元組,如為NULL,則可快速返回NULL,否則設(shè)置node->hj_OuterNotEmpty標(biāo)記為T

258                 else if (HJ_FILL_OUTER(node) ||
(gdb) 
260                           !node->hj_OuterNotEmpty))
(gdb) 
259                          (outerNode->plan->startup_cost < hashNode->ps.plan->total_cost &&
(gdb) 
262                     node->hj_FirstOuterTupleSlot = ExecProcNode(outerNode);
(gdb) 
263                     if (TupIsNull(node->hj_FirstOuterTupleSlot))
(gdb) 
269                         node->hj_OuterNotEmpty = true;

HJ_BUILD_HASHTABLE->創(chuàng)建Hash Table

(gdb) n
263                     if (TupIsNull(node->hj_FirstOuterTupleSlot))
(gdb) 
281                                                 HJ_FILL_INNER(node));
(gdb) 
279                 hashtable = ExecHashTableCreate(hashNode,
(gdb) 

HJ_BUILD_HASHTABLE->Hash Table(HashJoinTable結(jié)構(gòu)體)的內(nèi)存結(jié)構(gòu)
bucket數(shù)量為16384(16K),取對(duì)數(shù)結(jié)果為14(即log2_nbuckets/log2_nbuckets_optimal的結(jié)果值)
skewEnabled為F,沒有啟用傾斜優(yōu)化

(gdb) p *hashtable
$14 = {nbuckets = 16384, log2_nbuckets = 14, nbuckets_original = 16384, nbuckets_optimal = 16384, 
  log2_nbuckets_optimal = 14, buckets = {unshared = 0x2fb1260, shared = 0x2fb1260}, keepNulls = false, skewEnabled = false, 
  skewBucket = 0x0, skewBucketLen = 0, nSkewBuckets = 0, skewBucketNums = 0x0, nbatch = 1, curbatch = 0, 
  nbatch_original = 1, nbatch_outstart = 1, growEnabled = true, totalTuples = 0, partialTuples = 0, skewTuples = 0, 
  innerBatchFile = 0x0, outerBatchFile = 0x0, outer_hashfunctions = 0x3053b68, inner_hashfunctions = 0x3053bc0, 
  hashStrict = 0x3053c18, spaceUsed = 0, spaceAllowed = 16777216, spacePeak = 0, spaceUsedSkew = 0, 
  spaceAllowedSkew = 335544, hashCxt = 0x3053a50, batchCxt = 0x2f8b170, chunks = 0x0, current_chunk = 0x0, area = 0x0, 
  parallel_state = 0x0, batches = 0x0, current_chunk_shared = 9187201950435737471}

HJ_BUILD_HASHTABLE->使用的Hash函數(shù)

(gdb) p *hashtable->inner_hashfunctions
$15 = {fn_addr = 0x4c8a0a <hashtext>, fn_oid = 400, fn_nargs = 1, fn_strict = true, fn_retset = false, fn_stats = 2 '\002', 
  fn_extra = 0x0, fn_mcxt = 0x3053a50, fn_expr = 0x0}
(gdb) p *hashtable->outer_hashfunctions
$16 = {fn_addr = 0x4c8a0a <hashtext>, fn_oid = 400, fn_nargs = 1, fn_strict = true, fn_retset = false, fn_stats = 2 '\002', 
  fn_extra = 0x0, fn_mcxt = 0x3053a50, fn_expr = 0x0}

HJ_BUILD_HASHTABLE->賦值,并執(zhí)行此Hash Node節(jié)點(diǎn),結(jié)果總元組數(shù)為10000

(gdb) n
289                 hashNode->hashtable = hashtable;
(gdb) 
290                 (void) MultiExecProcNode((PlanState *) hashNode);
(gdb) 
297                 if (hashtable->totalTuples == 0 && !HJ_FILL_OUTER(node))
(gdb) p hashtable->totalTuples 
$18 = 10000

HJ_BUILD_HASHTABLE->批次數(shù)為1,只需要執(zhí)行1個(gè)批次即可

(gdb) n
304                 hashtable->nbatch_outstart = hashtable->nbatch;
(gdb) p hashtable->nbatch
$19 = 1

HJ_BUILD_HASHTABLE->重置OuterNotEmpty為F

(gdb) n
311                 node->hj_OuterNotEmpty = false;
(gdb) 
313                 if (parallel)

HJ_BUILD_HASHTABLE->非并行執(zhí)行,切換狀態(tài)為HJ_NEED_NEW_OUTER

(gdb) 
313                 if (parallel)
(gdb) n
340                     node->hj_JoinState = HJ_NEED_NEW_OUTER;

HJ_NEED_NEW_OUTER->獲取(執(zhí)行ExecHashJoinOuterGetTuple)下一個(gè)outer relation的一個(gè)元組

349                 if (parallel)
(gdb) n
354                     outerTupleSlot =
(gdb) 
357                 if (TupIsNull(outerTupleSlot))
(gdb) p *outerTupleSlot
$20 = {type = T_TupleTableSlot, tts_isempty = false, tts_shouldFree = false, tts_shouldFreeMin = false, tts_slow = true, 
  tts_tuple = 0x2f88300, tts_tupleDescriptor = 0x7f0710d02bd0, tts_mcxt = 0x2ee1640, tts_buffer = 507, tts_nvalid = 1, 
  tts_values = 0x2ee22a8, tts_isnull = 0x2ee22d0, tts_mintuple = 0x0, tts_minhdr = {t_len = 0, t_self = {ip_blkid = {
        bi_hi = 0, bi_lo = 0}, ip_posid = 0}, t_tableOid = 0, t_data = 0x0}, tts_off = 2, tts_fixedTupleDescriptor = true}

HJ_NEED_NEW_OUTER->設(shè)置相關(guān)變量

(gdb) n
371                 econtext->ecxt_outertuple = outerTupleSlot;
(gdb) 
372                 node->hj_MatchedOuter = false;
(gdb) 
378                 node->hj_CurHashValue = hashvalue;
(gdb) 
379                 ExecHashGetBucketAndBatch(hashtable, hashvalue,
(gdb) p hashvalue
$21 = 2324234220
(gdb) n
381                 node->hj_CurSkewBucketNo = ExecHashGetSkewBucket(hashtable,
(gdb) 
383                 node->hj_CurTuple = NULL;
(gdb) p *node
$22 = {js = {ps = {type = T_HashJoinState, plan = 0x2faaff8, state = 0x2ee1758, ExecProcNode = 0x70291d <ExecHashJoin>, 
      ExecProcNodeReal = 0x70291d <ExecHashJoin>, instrument = 0x0, worker_instrument = 0x0, worker_jit_instrument = 0x0, 
      qual = 0x0, lefttree = 0x2ee2070, righttree = 0x2ee2918, initPlan = 0x0, subPlan = 0x0, chgParam = 0x0, 
      ps_ResultTupleSlot = 0x2f20d98, ps_ExprContext = 0x2ee1fb0, ps_ProjInfo = 0x2ee3550, scandesc = 0x0}, 
    jointype = JOIN_INNER, single_match = true, joinqual = 0x0}, hashclauses = 0x2f21430, hj_OuterHashKeys = 0x2f22230, 
  hj_InnerHashKeys = 0x2f22740, hj_HashOperators = 0x2f227a0, hj_HashTable = 0x2f88ee8, hj_CurHashValue = 2324234220, 
  hj_CurBucketNo = 16364, hj_CurSkewBucketNo = -1, hj_CurTuple = 0x0, hj_OuterTupleSlot = 0x2f212f0, 
  hj_HashTupleSlot = 0x2ee3278, hj_NullOuterTupleSlot = 0x0, hj_NullInnerTupleSlot = 0x0, hj_FirstOuterTupleSlot = 0x0, 
  hj_JoinState = 2, hj_MatchedOuter = false, hj_OuterNotEmpty = true}
(gdb) p *econtext
$25 = {type = T_ExprContext, ecxt_scantuple = 0x0, ecxt_innertuple = 0x0, ecxt_outertuple = 0x2ee2248, 
  ecxt_per_query_memory = 0x2ee1640, ecxt_per_tuple_memory = 0x2f710c0, ecxt_param_exec_vals = 0x0, 
  ecxt_param_list_info = 0x0, ecxt_aggvalues = 0x0, ecxt_aggnulls = 0x0, caseValue_datum = 0, caseValue_isNull = true, 
  domainValue_datum = 0, domainValue_isNull = true, ecxt_estate = 0x2ee1758, ecxt_callbacks = 0x0}
(gdb) p *node->hj_HashTupleSlot
$26 = {type = T_TupleTableSlot, tts_isempty = true, tts_shouldFree = false, tts_shouldFreeMin = false, tts_slow = false, 
  tts_tuple = 0x0, tts_tupleDescriptor = 0x2ee3060, tts_mcxt = 0x2ee1640, tts_buffer = 0, tts_nvalid = 0, 
  tts_values = 0x2ee32d8, tts_isnull = 0x2ee32f0, tts_mintuple = 0x0, tts_minhdr = {t_len = 0, t_self = {ip_blkid = {
        bi_hi = 0, bi_lo = 0}, ip_posid = 0}, t_tableOid = 0, t_data = 0x0}, tts_off = 0, tts_fixedTupleDescriptor = true}  

HJ_NEED_NEW_OUTER->切換狀態(tài)為HJ_SCAN_BUCKET,開始掃描Hash Table

(gdb) n
407                 node->hj_JoinState = HJ_SCAN_BUCKET;
(gdb) 

HJ_SCAN_BUCKET->不匹配,切換狀態(tài)為HJ_FILL_OUTER_TUPLE

(gdb) 
416                 if (parallel)
(gdb) n
427                     if (!ExecScanHashBucket(node, econtext))
(gdb) 
430                         node->hj_JoinState = HJ_FILL_OUTER_TUPLE;
(gdb) 
431                         continue;
(gdb) 

HJ_FILL_OUTER_TUPLE->切換狀態(tài)為HJ_NEED_NEW_OUTER
不管是否獲得/發(fā)出一個(gè)元組,下一個(gè)狀態(tài)是NEED_NEW_OUTER

209         switch (node->hj_JoinState)
(gdb) 
483                 node->hj_JoinState = HJ_NEED_NEW_OUTER;

HJ_FILL_OUTER_TUPLE->由于不是外連接,無需FILL,回到HJ_NEED_NEW_OUTER處理邏輯

(gdb) n
485                 if (!node->hj_MatchedOuter &&
(gdb) 
486                     HJ_FILL_OUTER(node))
(gdb) 
485                 if (!node->hj_MatchedOuter &&
(gdb) 
549     }
(gdb) 

HJ_SCAN_BUCKET->在SCAN_BUCKET成功掃描的位置設(shè)置斷點(diǎn)

(gdb) b nodeHashjoin.c:441
Breakpoint 3 at 0x7025c3: file nodeHashjoin.c, line 441.
(gdb) c
Continuing.
Breakpoint 3, ExecHashJoinImpl (pstate=0x2ee1d98, parallel=false) at nodeHashjoin.c:447
447                 if (joinqual == NULL || ExecQual(joinqual, econtext))

HJ_SCAN_BUCKET->存在匹配的元組,設(shè)置相關(guān)標(biāo)記

(gdb) n
449                     node->hj_MatchedOuter = true;
(gdb) 
450                     HeapTupleHeaderSetMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple));
(gdb) 
453                     if (node->js.jointype == JOIN_ANTI)
(gdb) n
464                     if (node->js.single_match)
(gdb) 
465                         node->hj_JoinState = HJ_NEED_NEW_OUTER;
(gdb) 

HJ_SCAN_BUCKET->執(zhí)行投影操作并返回

467                     if (otherqual == NULL || ExecQual(otherqual, econtext))
(gdb) 
468                         return ExecProject(node->js.ps.ps_ProjInfo);
(gdb) 

總的來說,Hash Join的實(shí)現(xiàn)是創(chuàng)建inner relation的Hash Table,然后獲取outer relation的元組,如匹配則執(zhí)行投影操作返回相應(yīng)的元組,除了創(chuàng)建HT外,其他步驟不斷的變換狀態(tài)執(zhí)行,直至滿足Portal要求的元組數(shù)量為止.

四、參考資料

Hash Joins: Past, Present and Future/PGCon 2017
A Look at How Postgres Executes a Tiny Join - Part 1
A Look at How Postgres Executes a Tiny Join - Part 2
Assignment 2 Symmetric Hash Join

向AI問一下細(xì)節(jié)

免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場,如果涉及侵權(quán)請(qǐng)聯(lián)系站長郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。

AI