您好,登錄后才能下訂單哦!
怎么在PostgreSQL中利用遞歸優(yōu)化求稀疏列唯一值?相信很多沒有經(jīng)驗(yàn)的人對(duì)此束手無策,為此本文總結(jié)了問題出現(xiàn)的原因和解決方法,通過這篇文章希望你能解決這個(gè)問題。
創(chuàng)建測(cè)試表
bill=# create table t_sex (sex char(1), otherinfo text); CREATE TABLE bill=# insert into t_sex select 'm', generate_series(1,10000000)||'this is test'; INSERT 0 10000000 bill=# insert into t_sex select 'w', generate_series(1,10000000)||'this is test'; INSERT 0 10000000
查詢:
可以看到下面的查詢速度很慢。
bill=# select count(distinct sex) from t_sex; count ------- 2 (1 row) Time: 8803.505 ms (00:08.804) bill=# select sex from t_sex t group by sex; sex ----- m w (2 rows) Time: 1026.464 ms (00:01.026)
那么我們對(duì)該字段加上索引又是什么情況呢?
速度依然沒有明顯
bill=# create index idx_sex_1 on t_sex(sex); CREATE INDEX bill=# select count(distinct sex) from t_sex; count ------- 2 (1 row) Time: 8502.460 ms (00:08.502) bill=# select sex from t_sex t group by sex; sex ----- m w (2 rows) Time: 572.353 ms
的變化,可以看到執(zhí)行計(jì)劃已經(jīng)使用Index Only Scan了。
bill=# explain select count(distinct sex) from t_sex; QUERY PLAN ---------------------------------------------------------------------------------------------- Aggregate (cost=371996.44..371996.45 rows=1 width=8) -> Index Only Scan using idx_sex_1 on t_sex (cost=0.44..321996.44 rows=20000000 width=2) (2 rows)
同樣的SQL我們看看在Oracle中性能如何?
創(chuàng)建測(cè)試表:
SQL> create table t_sex (sex char(1), otherinfo varchar2(100)); Table created. SQL> insert into t_sex select 'm', rownum||'this is test' from dual connect by level <=10000000; 10000000 rows created. SQL> commit; Commit complete. SQL> insert into t_sex select 'w', rownum||'this is test' from dual connect by level <=10000000; 10000000 rows created. SQL> commit; Commit complete.
性能測(cè)試:
SQL> set lines 1000 pages 2000 SQL> set autotrace on SQL> set timing on SQL> select count(distinct sex) from t_sex; COUNT(DISTINCTSEX) ------------------ 2 Elapsed: 00:00:01.58 Execution Plan ---------------------------------------------------------- Plan hash value: 3915432945 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 3 | 20132 (1)| 00:00:01 | | 1 | SORT GROUP BY | | 1 | 3 | | | | 2 | TABLE ACCESS FULL| T_SEX | 14M| 42M| 20132 (1)| 00:00:01 | ---------------------------------------------------------------------------- Note ----- - dynamic statistics used: dynamic sampling (level=2) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 74074 consistent gets 0 physical reads 0 redo size 552 bytes sent via SQL*Net to client 608 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 1 rows processed SQL> select sex from t_sex t group by sex; SE -- m w Elapsed: 00:00:01.08 Execution Plan ---------------------------------------------------------- Plan hash value: 3915432945 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 14M| 42M| 20558 (3)| 00:00:01 | | 1 | SORT GROUP BY | | 14M| 42M| 20558 (3)| 00:00:01 | | 2 | TABLE ACCESS FULL| T_SEX | 14M| 42M| 20132 (1)| 00:00:01 | ---------------------------------------------------------------------------- Note ----- - dynamic statistics used: dynamic sampling (level=2) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 74074 consistent gets 0 physical reads 0 redo size 589 bytes sent via SQL*Net to client 608 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 2 rows processed
可以看到Oracle的性能即使不加索引也明顯比PostgreSQL中要好。
那么我們?cè)赑ostgreSQL中是不是沒辦法繼續(xù)優(yōu)化了呢?這種情況我們利用pg中的遞歸語句結(jié)合索引可以大幅提升性能。
SQL改寫:
bill=# with recursive tmp as ( bill(# ( bill(# select min(t.sex) as sex from t_sex t where t.sex is not null bill(# ) bill(# union all bill(# ( bill(# select (select min(t.sex) from t_sex t where t.sex > s.sex and t.sex is not null) bill(# from tmp s where s.sex is not null bill(# ) bill(# ) bill-# select count(distinct sex) from tmp; count ------- 2 (1 row) Time: 2.711 ms
查看執(zhí)行計(jì)劃:
bill=# explain with recursive tmp as ( bill(# ( bill(# select min(t.sex) as sex from t_sex t where t.sex is not null bill(# ) bill(# union all bill(# ( bill(# select (select min(t.sex) from t_sex t where t.sex > s.sex and t.sex is not null) bill(# from tmp s where s.sex is not null bill(# ) bill(# ) bill-# select count(distinct sex) from tmp; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Aggregate (cost=53.62..53.63 rows=1 width=8) CTE tmp -> Recursive Union (cost=0.46..51.35 rows=101 width=32) -> Result (cost=0.46..0.47 rows=1 width=32) InitPlan 3 (returns $1) -> Limit (cost=0.44..0.46 rows=1 width=2) -> Index Only Scan using idx_sex_1 on t_sex t (cost=0.44..371996.44 rows=20000000 width=2) Index Cond: (sex IS NOT NULL) -> WorkTable Scan on tmp s (cost=0.00..4.89 rows=10 width=32) Filter: (sex IS NOT NULL) -> CTE Scan on tmp (cost=0.00..2.02 rows=101 width=32) (11 rows) Time: 1.371 ms
可以看到執(zhí)行時(shí)間從原先的8000ms降低到了2ms,提升了幾千倍!
甚至對(duì)比Oracle,性能也是提升了很多。
但是需要注意的是:這種寫法僅僅是針對(duì)稀疏列,換成數(shù)據(jù)分布廣泛的字段,顯然性能是下降的, 所以使用遞歸SQL不適合數(shù)據(jù)分布廣泛的字段的group by或者count(distinct)操作。
看完上述內(nèi)容,你們掌握怎么在PostgreSQL中利用遞歸優(yōu)化求稀疏列唯一值的方法了嗎?如果還想學(xué)到更多技能或想了解更多相關(guān)內(nèi)容,歡迎關(guān)注億速云行業(yè)資訊頻道,感謝各位的閱讀!
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。