溫馨提示×

溫馨提示×

您好，登錄后才能下訂單哦！

密碼登錄×

忘記密碼？

登錄注冊(cè)×

獲取短信驗(yàn)證碼

其他方式登錄

點(diǎn)擊登錄注冊(cè) 即表示同意《億速云用戶服務(wù)條款》

用戶登錄×

賬戶密碼登錄

請(qǐng)使用微信掃描上方二維碼

使用幫助

請(qǐng)求超時(shí)！

請(qǐng)點(diǎn)擊重新獲取二維碼

怎么在PostgreSQL中利用遞歸優(yōu)化求稀疏列唯一值

發(fā)布時(shí)間：2021-01-20 14:08:01 來源：億速云閱讀：133 作者：Leah 欄目：開發(fā)技術(shù)

怎么在PostgreSQL中利用遞歸優(yōu)化求稀疏列唯一值？相信很多沒有經(jīng)驗(yàn)的人對(duì)此束手無策，為此本文總結(jié)了問題出現(xiàn)的原因和解決方法，通過這篇文章希望你能解決這個(gè)問題。

創(chuàng)建測(cè)試表

bill=# create table t_sex (sex char(1), otherinfo text);
CREATE TABLE
bill=# insert into t_sex select 'm', generate_series(1,10000000)||'this is test';
INSERT 0 10000000
bill=# insert into t_sex select 'w', generate_series(1,10000000)||'this is test';
INSERT 0 10000000

查詢：
可以看到下面的查詢速度很慢。

bill=# select count(distinct sex) from t_sex;
 count
-------
   2
(1 row)

Time: 8803.505 ms (00:08.804)
bill=# select sex from t_sex t group by sex;
 sex
-----
 m
 w
(2 rows)

Time: 1026.464 ms (00:01.026)

那么我們對(duì)該字段加上索引又是什么情況呢？

速度依然沒有明顯

bill=# create index idx_sex_1 on t_sex(sex);
CREATE INDEX
bill=# select count(distinct sex) from t_sex;
 count
-------
   2
(1 row)

Time: 8502.460 ms (00:08.502)
bill=# select sex from t_sex t group by sex;
 sex
-----
 m
 w
(2 rows)

Time: 572.353 ms

的變化，可以看到執(zhí)行計(jì)劃已經(jīng)使用Index Only Scan了。

bill=# explain select count(distinct sex) from t_sex;
                     QUERY PLAN
----------------------------------------------------------------------------------------------
 Aggregate (cost=371996.44..371996.45 rows=1 width=8)
  -> Index Only Scan using idx_sex_1 on t_sex (cost=0.44..321996.44 rows=20000000 width=2)
(2 rows)

同樣的SQL我們看看在Oracle中性能如何？

創(chuàng)建測(cè)試表：

SQL> create table t_sex (sex char(1), otherinfo varchar2(100));

Table created.

SQL> insert into t_sex select 'm', rownum||'this is test' from dual connect by level <=10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> insert into t_sex select 'w', rownum||'this is test' from dual connect by level <=10000000;

10000000 rows created.

SQL> commit;

Commit complete.

性能測(cè)試：

SQL> set lines 1000 pages 2000
SQL> set autotrace on
SQL> set timing on

SQL> select count(distinct sex) from t_sex;

COUNT(DISTINCTSEX)
------------------
         2

Elapsed: 00:00:01.58

Execution Plan
----------------------------------------------------------
Plan hash value: 3915432945

----------------------------------------------------------------------------
| Id | Operation     | Name | Rows | Bytes | Cost (%CPU)| Time   |
----------------------------------------------------------------------------
|  0 | SELECT STATEMENT  |    |   1 |   3 | 20132  (1)| 00:00:01 |
|  1 | SORT GROUP BY   |    |   1 |   3 |      |     |
|  2 |  TABLE ACCESS FULL| T_SEX |  14M|  42M| 20132  (1)| 00:00:01 |
----------------------------------------------------------------------------

Note
-----
  - dynamic statistics used: dynamic sampling (level=2)


Statistics
----------------------------------------------------------
     0 recursive calls
     0 db block gets
   74074 consistent gets
     0 physical reads
     0 redo size
    552 bytes sent via SQL*Net to client
    608 bytes received via SQL*Net from client
     2 SQL*Net roundtrips to/from client
     1 sorts (memory)
     0 sorts (disk)
     1 rows processed

SQL> select sex from t_sex t group by sex;

SE
--
m
w

Elapsed: 00:00:01.08

Execution Plan
----------------------------------------------------------
Plan hash value: 3915432945

----------------------------------------------------------------------------
| Id | Operation     | Name | Rows | Bytes | Cost (%CPU)| Time   |
----------------------------------------------------------------------------
|  0 | SELECT STATEMENT  |    |  14M|  42M| 20558  (3)| 00:00:01 |
|  1 | SORT GROUP BY   |    |  14M|  42M| 20558  (3)| 00:00:01 |
|  2 |  TABLE ACCESS FULL| T_SEX |  14M|  42M| 20132  (1)| 00:00:01 |
----------------------------------------------------------------------------

Note
-----
  - dynamic statistics used: dynamic sampling (level=2)


Statistics
----------------------------------------------------------
     0 recursive calls
     0 db block gets
   74074 consistent gets
     0 physical reads
     0 redo size
    589 bytes sent via SQL*Net to client
    608 bytes received via SQL*Net from client
     2 SQL*Net roundtrips to/from client
     1 sorts (memory)
     0 sorts (disk)
     2 rows processed

可以看到Oracle的性能即使不加索引也明顯比PostgreSQL中要好。
那么我們?cè)赑ostgreSQL中是不是沒辦法繼續(xù)優(yōu)化了呢？這種情況我們利用pg中的遞歸語句結(jié)合索引可以大幅提升性能。

SQL改寫：

bill=# with recursive tmp as (
bill(#  (
bill(#   select min(t.sex) as sex from t_sex t where t.sex is not null
bill(#  )
bill(#  union all
bill(#  (
bill(#   select (select min(t.sex) from t_sex t where t.sex > s.sex and t.sex is not null)
bill(#    from tmp s where s.sex is not null
bill(#  )
bill(# )
bill-# select count(distinct sex) from tmp;
 count
-------
   2
(1 row)

Time: 2.711 ms

查看執(zhí)行計(jì)劃：

bill=# explain with recursive tmp as (
bill(#  (
bill(#   select min(t.sex) as sex from t_sex t where t.sex is not null
bill(#  )
bill(#  union all
bill(#  (
bill(#   select (select min(t.sex) from t_sex t where t.sex > s.sex and t.sex is not null)
bill(#    from tmp s where s.sex is not null
bill(#  )
bill(# )
bill-# select count(distinct sex) from tmp;
                           QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Aggregate (cost=53.62..53.63 rows=1 width=8)
  CTE tmp
   -> Recursive Union (cost=0.46..51.35 rows=101 width=32)
      -> Result (cost=0.46..0.47 rows=1 width=32)
         InitPlan 3 (returns $1)
          -> Limit (cost=0.44..0.46 rows=1 width=2)
             -> Index Only Scan using idx_sex_1 on t_sex t (cost=0.44..371996.44 rows=20000000 width=2)
                Index Cond: (sex IS NOT NULL)
      -> WorkTable Scan on tmp s (cost=0.00..4.89 rows=10 width=32)
         Filter: (sex IS NOT NULL)
  -> CTE Scan on tmp (cost=0.00..2.02 rows=101 width=32)
(11 rows)

Time: 1.371 ms

可以看到執(zhí)行時(shí)間從原先的8000ms降低到了2ms，提升了幾千倍！

甚至對(duì)比Oracle，性能也是提升了很多。

但是需要注意的是：這種寫法僅僅是針對(duì)稀疏列，換成數(shù)據(jù)分布廣泛的字段，顯然性能是下降的, 所以使用遞歸SQL不適合數(shù)據(jù)分布廣泛的字段的group by或者count(distinct)操作。

看完上述內(nèi)容，你們掌握怎么在PostgreSQL中利用遞歸優(yōu)化求稀疏列唯一值的方法了嗎？如果還想學(xué)到更多技能或想了解更多相關(guān)內(nèi)容，歡迎關(guān)注億速云行業(yè)資訊頻道，感謝各位的閱讀！

向AI問一下細(xì)節(jié)

推薦閱讀：

免責(zé)聲明：本站發(fā)布的內(nèi)容（圖片、視頻和文字）以原創(chuàng)、轉(zhuǎn)載和分享為主，文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng)，如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱：is@yisu.com進(jìn)行舉報(bào)，并提供相關(guān)證據(jù)，一經(jīng)查實(shí)，將立刻刪除涉嫌侵權(quán)內(nèi)容。

上一篇新聞：
使用node腳手架如何搭建一個(gè)服務(wù)器
下一篇新聞：
微信小程序快速搭建步驟以及流程

猜你喜歡

AI
助
手

產(chǎn)品服務(wù)

地區(qū)劃分

專題活動(dòng)

幫助支持

關(guān)于我們

售后咨詢

7*24小時(shí)在線電話：400-100-2938

7*24小時(shí)在線 QQ：800811969

關(guān)注億速云

億速云公眾號(hào)

手機(jī)網(wǎng)站二維碼

<nobr id="7sl36"></nobr>