溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊(cè)×
其他方式登錄
點(diǎn)擊 登錄注冊(cè) 即表示同意《億速云用戶服務(wù)條款》

PostgreSQL中query_planner主計(jì)劃函數(shù)make_one_rel的實(shí)現(xiàn)邏輯

發(fā)布時(shí)間:2021-11-11 09:35:27 來(lái)源:億速云 閱讀:141 作者:小新 欄目:關(guān)系型數(shù)據(jù)庫(kù)

這篇文章主要介紹PostgreSQL中query_planner主計(jì)劃函數(shù)make_one_rel的實(shí)現(xiàn)邏輯,文中介紹的非常詳細(xì),具有一定的參考價(jià)值,感興趣的小伙伴們一定要看完!

一、Paths and Join Pairs

Paths and Join Pairs
訪問(wèn)路徑和連接對(duì)

During the planning/optimizing process, we build "Path" trees representing
the different ways of doing a query.  We select the cheapest Path that
generates the desired relation and turn it into a Plan to pass to the
executor.  (There is pretty nearly a one-to-one correspondence between the
Path and Plan trees, but Path nodes omit info that won't be needed during
planning, and include info needed for planning that won't be needed by the
executor.)

在計(jì)劃/優(yōu)化過(guò)程中,通過(guò)構(gòu)建"Path"樹(shù)表示執(zhí)行查詢的不同方法,在這些方法中,選擇生成所需Relation的最低成本路徑(Path結(jié)構(gòu)體)并轉(zhuǎn)換為Plan結(jié)構(gòu)體傳遞給執(zhí)行器.(Path和Plan樹(shù)兩者之間幾乎存在一對(duì)一的對(duì)應(yīng)關(guān)系,但Path節(jié)點(diǎn)省略了在計(jì)劃期間不需要但包含了在執(zhí)行期間不需要而在計(jì)劃期間需要的信息)

The optimizer builds a RelOptInfo structure for each base relation used in
the query.  Base rels are either primitive tables, or subquery subselects
that are planned via a separate recursive invocation of the planner.  A
RelOptInfo is also built for each join relation that is considered during
planning.  A join rel is simply a combination of base rels.  There is only
one join RelOptInfo for any given set of baserels --- for example, the join
{A B C} is represented by the same RelOptInfo no matter whether we build it
by joining A and B first and then adding C, or joining B and C first and
then adding A, etc.  These different means of building the joinrel are represented as Paths.  For each RelOptInfo we build a list of Paths that
represent plausible ways to implement the scan or join of that relation.
Once we've considered all the plausible Paths for a rel, we select the one
that is cheapest according to the planner's cost estimates.  The final plan
is derived from the cheapest Path for the RelOptInfo that includes all the
base rels of the query.

優(yōu)化器為查詢中使用到的每一個(gè)基礎(chǔ)關(guān)系(base relation)創(chuàng)建RelOptInfo結(jié)構(gòu)體.基礎(chǔ)關(guān)系(base relation)包括原始表(primitive table),子查詢(通過(guò)獨(dú)立的計(jì)劃器遞歸調(diào)用生成計(jì)劃)以及參與連接的Relation.連接生成的關(guān)系(join rel)是基礎(chǔ)關(guān)系的組合(可以理解為通過(guò)連接運(yùn)算生成的新關(guān)系).對(duì)于給定的基礎(chǔ)關(guān)系集合,只有一個(gè)連接的RelOptInfo結(jié)構(gòu)體生成,而這個(gè)RelOptInfo如何生成(比如基礎(chǔ)關(guān)系集合{A B C},A和B可以先連接然后與C連接,當(dāng)然也可以B和C先連接然后在與A連接)并不關(guān)心.這些構(gòu)建join rel的方法通過(guò)Paths表示.對(duì)每一個(gè)RelOptInfo,優(yōu)化器會(huì)構(gòu)建Paths鏈表來(lái)表示這些"貌似有理的"實(shí)現(xiàn)掃描或者連接的方法.對(duì)于這些所有可能的路徑,優(yōu)化器會(huì)根據(jù)成本估算選擇成本最低的訪問(wèn)路徑.最后的執(zhí)行計(jì)劃從RelOptInfo(最終生成的新關(guān)系)成本最低的訪問(wèn)路徑產(chǎn)生.

Possible Paths for a primitive table relation include plain old sequential
scan, plus index scans for any indexes that exist on the table, plus bitmap
index scans using one or more indexes.  Specialized RTE types, such as
function RTEs, may have only one possible Path.

訪問(wèn)原始表的可能路徑包括普通的順序掃描/基于索引的索引掃描/位圖索引掃描.對(duì)于某些特殊的RTE類型,比如函數(shù)RTEs,可能只有一種可能的路徑.

Joins always occur using two RelOptInfos.  One is outer, the other inner.
Outers drive lookups of values in the inner.  In a nested loop, lookups of
values in the inner occur by scanning the inner path once per outer tuple
to find each matching inner row.  In a mergejoin, inner and outer rows are
ordered, and are accessed in order, so only one scan is required to perform
the entire join: both inner and outer paths are scanned in-sync.  (There's
not a lot of difference between inner and outer in a mergejoin...)  In a
hashjoin, the inner is scanned first and all its rows are entered in a
hashtable, then the outer is scanned and for each row we lookup the join
key in the hashtable.

連接通常出現(xiàn)在兩個(gè)RelOptInfo之間,俗稱外部表和內(nèi)部表,其中外部表又稱驅(qū)動(dòng)表.在Nested Loop連接方式,對(duì)于外部的每一個(gè)元組,都會(huì)訪問(wèn)內(nèi)部表以掃描滿足條件的數(shù)據(jù)行.Merge Join連接方式,外部表和內(nèi)部表的元組已排序,順序訪問(wèn)外部表和內(nèi)部表的每一個(gè)元組即可,這種方式只需要同步掃描一次.Hash Join連接方式,首先會(huì)掃描內(nèi)部表并建立HashTable,然后掃描外部表,對(duì)于外部表的每一行掃描哈希表找出匹配行.

A Path for a join relation is actually a tree structure, with the topmost
Path node representing the last-applied join method.  It has left and right
subpaths that represent the scan or join methods used for the two input
relations.

join rel的訪問(wèn)路徑實(shí)際上是一種樹(shù)狀結(jié)構(gòu),最頂層的路徑節(jié)點(diǎn)表示最好應(yīng)用的連接方法.這顆樹(shù)有左右兩個(gè)子路徑(subpath)用以表示兩個(gè)relations的掃描或連接方法.

二、Join Tree Construction

Join Tree Construction
連接樹(shù)的構(gòu)造

The optimizer generates optimal query plans by doing a more-or-less
exhaustive search through the ways of executing the query.  The best Path
tree is found by a recursive process:

優(yōu)化器盡可能的通過(guò)窮盡搜索的方法生成最優(yōu)的查詢執(zhí)行計(jì)劃,最優(yōu)的訪問(wèn)路徑樹(shù)通過(guò)以下遞歸過(guò)程實(shí)現(xiàn):

1).Take each base relation in the query, and make a RelOptInfo structure
for it.  Find each potentially useful way of accessing the relation,
including sequential and index scans, and make Paths representing those
ways.  All the Paths made for a given relation are placed in its
RelOptInfo.pathlist.  (Actually, we discard Paths that are obviously
inferior alternatives before they ever get into the pathlist --- what
ends up in the pathlist is the cheapest way of generating each potentially
useful sort ordering and parameterization of the relation.)  Also create a
RelOptInfo.joininfo list including all the join clauses that involve this
relation.  For example, the WHERE clause "tab1.col1 = tab2.col1" generates
entries in both tab1 and tab2's joininfo lists.

1).為查詢中每個(gè)基礎(chǔ)關(guān)系生成RelOptInfo結(jié)構(gòu)體.為每個(gè)基礎(chǔ)關(guān)系生成順序掃描或索引掃描訪問(wèn)路徑.這些生成的訪問(wèn)路徑存儲(chǔ)在RelOptInfo.pathlist鏈表中(實(shí)際上,在此過(guò)程中優(yōu)化器已經(jīng)拋棄了明顯不合理的訪問(wèn)路徑,在pathlist中的路徑是生成排序路徑和參數(shù)化Relation的最可能路徑).在此期間,會(huì)生成RelOptInfo.joininfo鏈表,用于保存與此Relation相關(guān)的索引的連接語(yǔ)句(join clauses).比如,WHERE語(yǔ)句"tab1.col1 = tab2.col1"在tab1和tab2的joininfo鏈表中會(huì)產(chǎn)生相應(yīng)的數(shù)據(jù)結(jié)構(gòu)(entries).

If we have only a single base relation in the query, we are done.
Otherwise we have to figure out how to join the base relations into a
single join relation.

如果查詢中只有一個(gè)基礎(chǔ)關(guān)系,優(yōu)化器已完成所有工作,否則的話,優(yōu)化器需要得出如何連接基礎(chǔ)關(guān)系,從而得到一個(gè)新關(guān)系(通過(guò)join連接而來(lái)).

2).Normally, any explicit JOIN clauses are "flattened" so that we just
have a list of relations to join.  However, FULL OUTER JOIN clauses are
never flattened, and other kinds of JOIN might not be either, if the
flattening process is stopped by join_collapse_limit or from_collapse_limit
restrictions.  Therefore, we end up with a planning problem that contains
lists of relations to be joined in any order, where any individual item
might be a sub-list that has to be joined together before we can consider
joining it to its siblings.  We process these sub-problems recursively,
bottom up.  Note that the join list structure constrains the possible join
orders, but it doesn't constrain the join implementation method at each
join (nestloop, merge, hash), nor does it say which rel is considered outer
or inner at each join.  We consider all these possibilities in building
Paths. We generate a Path for each feasible join method, and select the
cheapest Path.

2)通常來(lái)說(shuō),顯式的JOIN語(yǔ)句已被扁平化(flattened)處理,優(yōu)化器可以直接根據(jù)關(guān)系鏈表進(jìn)行連接.但是,全外連接(FULL OUTER JOIN)以及某些類型的JOIN無(wú)法扁平化(比如由于join_collapse_limit或from_collapse_limit這些約束條件).這里會(huì)遇到這么一個(gè)問(wèn)題:嘗試以任意順序進(jìn)行連接的關(guān)系鏈表,鏈表中的子鏈表必須在兩兩連接之前進(jìn)行連接.優(yōu)化器會(huì)同自底向上的方式遞歸處理這些子問(wèn)題.注意:連接鏈表限制了連接順序,但沒(méi)有限制連接的實(shí)現(xiàn)方法或者那個(gè)關(guān)系是內(nèi)表或外表,這些問(wèn)題在生成訪問(wèn)路徑時(shí)解決.優(yōu)化器會(huì)為每個(gè)可行的連接方式生成訪問(wèn)路徑,并選擇其中成本最低的那個(gè).

For each planning problem, therefore, we will have a list of relations
that are either base rels or joinrels constructed per sub-join-lists.
We can join these rels together in any order the planner sees fit.
The standard (non-GEQO) planner does this as follows:

對(duì)于每一個(gè)計(jì)劃過(guò)程中出現(xiàn)的問(wèn)題,優(yōu)化器把每一個(gè)構(gòu)成子連接鏈表(sub-join-list)的
基礎(chǔ)關(guān)系或連接關(guān)系存儲(chǔ)在一個(gè)鏈表中.優(yōu)化器會(huì)根據(jù)看起來(lái)合適的順序連接這些關(guān)系.標(biāo)準(zhǔn)的(非遺傳算法,即動(dòng)態(tài)規(guī)劃算法)計(jì)劃器執(zhí)行以下操作:

Consider joining each RelOptInfo to each other RelOptInfo for which there
is a usable joinclause, and generate a Path for each possible join method
for each such pair.  (If we have a RelOptInfo with no join clauses, we have
no choice but to generate a clauseless Cartesian-product join; so we
consider joining that rel to each other available rel.  But in the presence
of join clauses we will only consider joins that use available join
clauses.  Note that join-order restrictions induced by outer joins and
IN/EXISTS clauses are also checked, to ensure that we find a workable join
order in cases where those restrictions force a clauseless join to be done.)

對(duì)于每一個(gè)可用的連接條件,考慮使用兩兩連接的方式,為每一對(duì)RelOptInfo的連接生成訪問(wèn)路徑.如果沒(méi)有連接條件,那么會(huì)使用笛卡爾連接,因此會(huì)優(yōu)先考慮連接其他可用的Relation.

If we only had two relations in the list, we are done: we just pick
the cheapest path for the join RelOptInfo.  If we had more than two, we now
need to consider ways of joining join RelOptInfos to each other to make
join RelOptInfos that represent more than two list items.

如果鏈表中只有2個(gè)Relations,優(yōu)化器已完成所有工作,為參與連接的RelOptInfo挑選最優(yōu)的訪問(wèn)路徑即可.否則的話(>3個(gè)Relations),優(yōu)化器需要考慮如何兩兩進(jìn)行連接.

The join tree is constructed using a "dynamic programming" algorithm:
in the first pass (already described) we consider ways to create join rels
representing exactly two list items.  The second pass considers ways
to make join rels that represent exactly three list items; the next pass,
four items, etc.  The last pass considers how to make the final join
relation that includes all list items --- obviously there can be only one
join rel at this top level, whereas there can be more than one join rel
at lower levels.  At each level we use joins that follow available join
clauses, if possible, just as described for the first level.

連接樹(shù)通過(guò)"動(dòng)態(tài)規(guī)劃"算法構(gòu)造:
在第一輪(先前已描述)中,優(yōu)化器已完成兩個(gè)Relations的連接方式;第二輪,優(yōu)化器考慮如何創(chuàng)建三個(gè)Relations的join rels;下一輪,四個(gè)Relations,以此類推.最后一輪,考慮如何構(gòu)造包含所有Relations的join rel.顯然,在最高層,只有一個(gè)join rel,而在低層則可能會(huì)有多個(gè)join rel.在每一個(gè)層次上,如前所述,如果可以的話,優(yōu)化器會(huì)按照可用的連接條件進(jìn)行連接.

For example:
SELECT  *
FROM    tab1, tab2, tab3, tab4
WHERE   tab1.col = tab2.col AND
tab2.col = tab3.col AND
tab3.col = tab4.col

Tables 1, 2, 3, and 4 are joined as:
{1 2},{2 3},{3 4}
{1 2 3},{2 3 4}
{1 2 3 4}
(other possibilities will be excluded for lack of join clauses)

SELECT  *
FROM    tab1, tab2, tab3, tab4
WHERE   tab1.col = tab2.col AND
tab1.col = tab3.col AND
tab1.col = tab4.col

Tables 1, 2, 3, and 4 are joined as:
{1 2},{1 3},{1 4}
{1 2 3},{1 3 4},{1 2 4}
{1 2 3 4}

We consider left-handed plans (the outer rel of an upper join is a joinrel,
but the inner is always a single list item); right-handed plans (outer rel
is always a single item); and bushy plans (both inner and outer can be
joins themselves).  For example, when building {1 2 3 4} we consider
joining {1 2 3} to {4} (left-handed), {4} to {1 2 3} (right-handed), and
{1 2} to {3 4} (bushy), among other choices.  Although the jointree
scanning code produces these potential join combinations one at a time,
all the ways to produce the same set of joined base rels will share the
same RelOptInfo, so the paths produced from different join combinations
that produce equivalent joinrels will compete in add_path().

下面來(lái)看看left-handed計(jì)劃,bushy計(jì)劃和right-handed計(jì)劃.比如,在構(gòu)建{1 2 3 4}4個(gè)關(guān)系的連接時(shí),在眾多的選擇中存在以下方式,left-handed:{1 2 3} 連接 {4},right-handed:{4}連接{1 2 3}和bushy:{1 2}連接{3 4}.雖然掃描連接樹(shù)時(shí)一次產(chǎn)生這些潛在的連接組合,但是所有產(chǎn)生相同連接base rels集合的方法會(huì)共享相同的RelOptInfo的數(shù)據(jù)結(jié)構(gòu),因此這些不同的連接組合在生成等價(jià)的join rel時(shí)會(huì)調(diào)用add_path方法時(shí)相互PK.

The dynamic-programming approach has an important property that's not
immediately obvious: we will finish constructing all paths for a given
relation before we construct any paths for relations containing that rel.
This means that we can reliably identify the "cheapest path" for each rel
before higher-level relations need to know that.  Also, we can safely
discard a path when we find that another path for the same rel is better,
without worrying that maybe there is already a reference to that path in
some higher-level join path.  Without this, memory management for paths
would be much more complicated.

動(dòng)態(tài)規(guī)劃方法有一個(gè)重要的特性(自底向上):那就是在構(gòu)建高層RelOptInfo的訪問(wèn)路徑前,下層的RelOptInfo的訪問(wèn)路徑已明確,而且優(yōu)化器確保該訪問(wèn)路徑是成本最低的.

Once we have built the final join rel, we use either the cheapest path
for it or the cheapest path with the desired ordering (if that's cheaper
than applying a sort to the cheapest other path).

一旦完成了最終結(jié)果join rel的構(gòu)建,存在兩條路徑:成本最低或者按排序的要求最低

If the query contains one-sided outer joins (LEFT or RIGHT joins), or
IN or EXISTS WHERE clauses that were converted to semijoins or antijoins,
then some of the possible join orders may be illegal.  These are excluded
by having join_is_legal consult a side list of such "special" joins to see
whether a proposed join is illegal.  (The same consultation allows it to
see which join style should be applied for a valid join, ie, JOIN_INNER,
JOIN_LEFT, etc.)

如果查詢語(yǔ)句存在外連接或者轉(zhuǎn)換為半連接或反連接的IN或EXISTS語(yǔ)句,那么有些可能的連接順序是非法的.優(yōu)化器通過(guò)額外的方法進(jìn)行處理.

以上是“PostgreSQL中query_planner主計(jì)劃函數(shù)make_one_rel的實(shí)現(xiàn)邏輯”這篇文章的所有內(nèi)容,感謝各位的閱讀!希望分享的內(nèi)容對(duì)大家有幫助,更多相關(guān)知識(shí),歡迎關(guān)注億速云行業(yè)資訊頻道!

向AI問(wèn)一下細(xì)節(jié)

免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。

AI