DISTICT的一次質疑

發(fā)布時間：2020-06-27 22:46:03 來源：網(wǎng)絡閱讀：786 作者：onlinekof2001 欄目：關系型數(shù)據(jù)庫

一直以來DISTINCT的用法都為DBA所知，就是用來濾重。我們也沒有必要質疑DISTINCT的濾重準確性。

但今天突然不知道怎么想的，就想質疑一下。DISTINCT真的能保證過濾的對象沒有重復記錄嗎？

SQL> create table test_distinct as select rownum id from all_objects where rownum < 50000;
insert into test_distinct select rownum id from all_objects where rownum < 50000;

這表中有重復的數(shù)據(jù)，數(shù)據(jù)插入順序1~4999，然后再重復一次

SQL> select count(id) from (select distinct id from test_distinct) group by id having count(id) > 1;
no rows selected

可以看到，通過這條語句發(fā)現(xiàn)確實沒有重復的行。那么進一步思考如果不用distinct如何實現(xiàn)這個效果呢。我想起了ROWID這個東西，以下兩句的效果相同。

select count(distinct phoneno) from CUSTPHONE

select count(phoneno) from CUSTPHONE where rowid in( select min(rowid) from CUSTPHONE group by phoneno)

我想了解一下DISTINCT語句實際在Oracle中是如何操作的，通過10046事件和tkprof工具獲取跟蹤的信息。

SQL ID: 8vtyapcbqkbwf
Plan Hash: 2372476266
select distinct id
from
 test_distinct where rownum < 100

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.02       0.02          0        138          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        8      0.00       0.00          0          4          0          99
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       10      0.02       0.02          0        142          0          99
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: SYS
Rows     Row Source Operation
-------  ---------------------------------------------------
     99  HASH UNIQUE (cr=4 pr=0 pw=0 time=0 us cost=528 size=1287 card=99)
     99   COUNT STOPKEY (cr=4 pr=0 pw=0 time=196 us)
     99    TABLE ACCESS FULL TEST_DISTINCT (cr=4 pr=0 pw=0 time=98 us cost=44 size=1318174 card=101398)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       8        0.00          0.00
  asynch descriptor resize                        2        0.00          0.00
  SQL*Net message from client                     8       59.43         59.43

可以看到執(zhí)行計劃中DISINCT是通過HASH UNIQUE算法來實現(xiàn)的。同時ROWNUM虛列使用的是COUNT算法，STOPKEY說明我給ROWNUM虛列加了限定條件100，當?shù)竭_這個限定條件時，該語句查詢結束。

那么到這我該怎么理解HASH UNIQUE算法的目的呢？我在網(wǎng)上查看了相關信息，發(fā)現(xiàn)真有人做了實驗實驗幫助我們加上對該算法的印象。在10G2R以前，Oracle對DISTINCT使用的是sort unique這種操作方式因為涉及到排序，是非常影響語句的執(zhí)行效率的。因此10G2R之后的版本，Oracle改進了算法。

SQL> select distinct id from test_distinct where rownum < 100;
99 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2372476266
---------------------------------------------------------------------------------------------
| Id  | Operation           | Name          | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |               |    99 |  1287 |       |   528   (1)| 00:00:07 |
|   1 |  HASH UNIQUE        |               |    99 |  1287 |  2000K|   528   (1)| 00:00:07 |
|*  2 |   COUNT STOPKEY     |               |       |       |       |            |          |
|   3 |    TABLE ACCESS FULL| TEST_DISTINCT |   101K|  1287K|       |    44   (3)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(ROWNUM<100)
Note
-----
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets
          8  physical reads
          0  redo size
       2134  bytes sent via SQL*Net to client
        589  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         99  rows processed

SQL> alter system flush buffer_cache

SQL> select id from test_distinct where rownum < 100 group by id;
99 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 521476922
-------------------------------------------------------------------------------------
| Id  | Operation           | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |               |    99 |  1287 |    47   (9)| 00:00:01 |
|   1 |  HASH GROUP BY      |               |    99 |  1287 |    47   (9)| 00:00:01 |
|*  2 |   COUNT STOPKEY     |               |       |       |            |          |
|   3 |    TABLE ACCESS FULL| TEST_DISTINCT |   101K|  1287K|    44   (3)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(ROWNUM<100)
Note
-----
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets
          8  physical reads
          0  redo size
       2134  bytes sent via SQL*Net to client
        589  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         99  rows processed

通過網(wǎng)上這組實驗可以看到DISTINCT和GROUP BY分別使用了HASH UNIQUE和HASH GROUP BY算法。而兩者執(zhí)行效果和結果都相同?；谝?guī)則的DISTINCT和GROUP BY的查詢的執(zhí)行計劃如下

SQL> select /*+ rule*/ distinct id from test_distinct where rownum < 100;
99 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3449293992
---------------------------------------------
| Id  | Operation           | Name          |
---------------------------------------------
|   0 | SELECT STATEMENT    |               |
|   1 |  SORT UNIQUE        |               |
|*  2 |   COUNT STOPKEY     |               |
|   3 |    TABLE ACCESS FULL| TEST_DISTINCT |
---------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(ROWNUM<100)
Note
-----
   - rule based optimizer used (consider using cbo)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          4  consistent gets
          8  physical reads
          0  redo size
       2134  bytes sent via SQL*Net to client
        589  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         99  rows processed

 
SQL> select /*+ rule*/ id from test_distinct where rownum < 100 group by id;
99 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 351786816
---------------------------------------------
| Id  | Operation           | Name          |
---------------------------------------------
|   0 | SELECT STATEMENT    |               |
|   1 |  SORT GROUP BY      |               |
|*  2 |   COUNT STOPKEY     |               |
|   3 |    TABLE ACCESS FULL| TEST_DISTINCT |
---------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(ROWNUM<100)
Note
-----
   - rule based optimizer used (consider using cbo)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets
          8  physical reads
          0  redo size
       2134  bytes sent via SQL*Net to client
        589  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         99  rows processed

因為表的數(shù)據(jù)量教啥，基于rule的算法和通過hash的算法區(qū)別并不明顯。但重點在于了解Oracle的不同算法的可能。

另外，ORACLE除了提供DISTINCT以外，還提供了UNIQUE來過濾重復的數(shù)據(jù)。官方文檔給出的解釋。

http://docs.oracle.com/cd/E11882_01/server.112/e41084/statements_10002.htm#SQLRF55272

DISTINCT | UNIQUE

Specify DISTINCT or UNIQUE if you want the database to return only one copy of each set of duplicate rows selected. These two keywords are synonymous. Duplicate rows are those with matching values for each expression in the select list.

到這里我已經(jīng)理解了DISTINCT的作用，同時還學習了其它新的知識。非常棒！

向AI問一下細節(jié)

DISTICT的一次質疑

猜你喜歡

最新資訊

相關推薦

相關標簽