<center id="wmkhs"></center>

<kbd id="wmkhs"><abbr id="wmkhs"></abbr></kbd>

溫馨提示×

溫馨提示×

您好，登錄后才能下訂單哦！

密碼登錄×

忘記密碼？

登錄注冊×

獲取短信驗證碼

其他方式登錄

點擊登錄注冊即表示同意《億速云用戶服務(wù)條款》

用戶登錄×

賬戶密碼登錄

請使用微信掃描上方二維碼

使用幫助

請求超時！

請點擊重新獲取二維碼

騰訊數(shù)據(jù)庫診斷大賽題目回顧與分析

發(fā)布時間：2020-07-21 12:37:33 來源：網(wǎng)絡(luò) 閱讀：3490 作者：暮回_zz 欄目：MySQL數(shù)據(jù)庫

只是從個人角度分析了下這次比賽的題目，涉及到一些個人經(jīng)驗，學(xué)習(xí)交流而已～

貼上官方的git鏈接：
https://github.com/DBbrain/Diagnosis

初賽

data

order 表的數(shù)據(jù)量 2000
CREATE TABLE `order` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `name` varchar ( 32) COLLATE utf8_bin NOT NULL,
  `creator` varchar(24) COLLATE utf8_bin NOT NULL,
  `price` varchar(64) COLLATE utf8_bin NOT NULL,
  `create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `status` tinyint(1) NOT NULL,
  PRIMARY KEY (`id`)
);

order_item 表的數(shù)據(jù)量 499760
CREATE TABLE `order_item` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `name` varchar(32) COLLATE utf8_bin NOT NULL,
  `parent` bigint(20) NOT NULL,
  `status` int(11) NOT NULL,
  `type` varchar(12) COLLATE utf8_bin NOT NULL DEFAULT '0',
  `quantity` int(11) NOT NULL DEFAULT '1',
  `update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
);

1. select

分析

SELECT * FROM `order` o INNER JOIN order_item i  ON i.parent = o.id 
        ORDER BY o.status ASC, i.update_time DESC LIMIT  0, 20;

mysql> explain SELECT * FROM `order` o INNER JOIN order_item i 
    ->     ON i.parent = o.id  ORDER BY o.status ASC, i.update_time DESC LIMIT  0, 20;
+----+-------------+-------+------------+--------+---------------+---------+---------+---------------------------------+--------+----------+---------------------------------+
| id      | select_type | table   | partitions   | type     | possible_keys | key        | key_len   | ref                                         | rows      | filtered    | Extra                                  |
+----+-------------+-------+------------+--------+---------------+---------+---------+---------------------------------+--------+----------+---------------------------------+
|  1 | SIMPLE      | i     | NULL       | ALL    | NULL          | NULL    | NULL    | NULL                            | 497839 |   100.00 | Using temporary; Using filesort |
|  1 | SIMPLE      | o     | NULL       | eq_ref | PRIMARY       | PRIMARY | 8       | sql_optimization_match.i.parent |      1 |   100.00 | NULL                            |
+----+-------------+-------+------------+--------+---------------+---------+---------+---------------------------------+--------+----------+---------------------------------+

兩張表 inner join，根據(jù)執(zhí)行計劃，驅(qū)動表選擇 order_item ，同時由于排序字段來自兩張表，且方向不一致，造成了寫臨時表，由于排序字段不能使用到索引，因此造成了外排序。

分析 join 字段

    order_item.parent = order.id

mysql> select count(distinct(parent)) from order_item;  // 區(qū)分度較低，忽略
+-------------------------+
| count(distinct(parent)) |
+-------------------------+
|                     300     |
+-------------------------+
mysql> select count(distinct(id)) from `order`; // 區(qū)分度較高，但 id 上已有主鍵索引
+---------------------+
| count(distinct(id))  |
+---------------------+
|                2000    |
+---------------------+
1 row in set (0.00 sec)

order_item.parent 沒有索引；
order.id 有索引。

分析聚合字段

ORDER BY o.status ASC, i.update_time DESC

mysql> select count(distinct(status)) from `order`; //區(qū)分度很低
+-------------------------+
| count(distinct(status)) |
+-------------------------+
|                       2 |
+-------------------------+
mysql> select count(distinct(update_time)) from `order_item`;   // 區(qū)分度一般
+------------------------------+
| count(distinct(update_time)) |
+------------------------------+
|                        32768 |
+------------------------------+

兩張表的 order by 字段排序方式不一樣，可能需要用到外排序，同時原表在排序字段上沒有索引。

優(yōu)化

根據(jù)之前的查詢計劃可以看到，是對 order_item 進(jìn)行了一次全表掃，之后再進(jìn)行外排序。由于 sql 語義中需要對兩列進(jìn)行排序，因此，可以通過其他的方式，減少外排序的數(shù)據(jù)量，從而降低時耗。

order 表中的排序字段 status 僅有兩個不同值，嘗試去掉 status 排序字段之后，速度明顯提高，此時，order by 中的 update_time 字段可以嘗試增加索引，區(qū)分度也滿足要求；

status 僅有兩列，可以使用 union all 來代替，避免 order by 中不同的表不同的排序順序?qū)е聼o法使用索引。

這里是官方給出的建議（應(yīng)該不是ML自動改寫的），不過這種改寫sql的方式有一定局限性，適用場景受限，如果 status 類型不是tinyint(1)，且以后如果會增加新的類型，sql需要不斷改寫。

可以嘗試推動業(yè)務(wù)改造，在重新優(yōu)化索引。

另外，sql改寫之后，給出的索引建議是增加聯(lián)合索引（update_time,parent），上面分析可以看到，parent的區(qū)分度較低，這里增加聯(lián)合索引或者只給 update_time 增加索引，性能相差不多。

# sql 改寫
SELECT o.*,i.* FROM  (
    ( SELECT o.id, i.id item_id FROM  `order_1` o 
            INNER JOIN order_item i ON i.parent =o.id
            WHERE  o.status = 0
            ORDER  BY i.update_time DESC LIMIT  0, 20)
    UNION ALL
    (SELECT o.id, i.id item_id FROM  `order_1` o
            INNER JOIN order_item i ON i.parent =o.id
            WHERE  o.status = 1
            ORDER  BY i.update_time DESC LIMIT  0, 20)
    ) tmp
    INNER JOIN `order_1` o ON tmp.id = o.id
    INNER JOIN order_item i ON tmp.item_id = i.id
    ORDER  BY o.status ASC,
    i.update_time DESC
    LIMIT  0, 20

# 增加索引
alter table order_item add index `item_idx_1` (`update_time`,`parent`);

# 執(zhí)行計劃
+----+-------------+------------+------------+--------+---------------+------------+---------+---------------------------------+------+----------+---------------------------------+order_item i ON tmp.ite 
| id | select_type | table      | partitions | type   | possible_keys | key        | key_len | ref                             | rows | filtered | Extra                           |
+----+-------------+------------+------------+--------+---------------+------------+---------+---------------------------------+------+----------+---------------------------------+
|  1 | PRIMARY     | <derived2> | NULL       | ALL    | NULL          | NULL       | NULL    | NULL                            |   40 |   100.00 | Using temporary; Using filesort |
|  1 | PRIMARY     | o          | NULL       | eq_ref | PRIMARY       | PRIMARY    | 8       | tmp.id                          |    1 |   100.00 | NULL                            |
|  1 | PRIMARY     | i          | NULL       | eq_ref | PRIMARY       | PRIMARY    | 8       | tmp.item_id                     |    1 |   100.00 | NULL                            |
|  2 | DERIVED     | i          | NULL       | index  | NULL          | item_idx_1 | 12      | NULL                            |   20 |   100.00 | Using index                     |
|  2 | DERIVED     | o          | NULL       | eq_ref | PRIMARY       | PRIMARY    | 8       | sql_optimization_match.i.parent |    1 |    10.00 | Using where                     |
|  3 | UNION       | i          | NULL       | index  | NULL          | item_idx_1 | 12      | NULL                            |   20 |   100.00 | Using index                     |
|  3 | UNION       | o          | NULL       | eq_ref | PRIMARY       | PRIMARY    | 8       | sql_optimization_match.i.parent |    1 |    10.00 | Using where                     |
+----+-------------+------------+------------+--------+---------------+------------+---------+---------------------------------+------+----------+---------------------------------+

總結(jié)

1. 對區(qū)分度極低的字段如果有排序、范圍比較等操作，可以轉(zhuǎn)換為 union all；
2. 對排序字段，嘗試使用索引避免 filesort，如果不可避免，在 filesort 之前嘗試減少排序的數(shù)據(jù)量；

2. update

分析

update `order` set create_time = now()
    where id in (select parent from order_item where type = 2 );

# 執(zhí)行計劃
mysql> explain update `order_1` set create_time = now() where id in (select parent from order_item where type = 2 );
+----+--------------------+------------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
| id | select_type        | table      | partitions | type  | possible_keys | key     | key_len | ref  | rows   | filtered | Extra       |
+----+--------------------+------------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
|  1 | UPDATE             | order_1    | NULL       | index | NULL          | PRIMARY | 8       | NULL |   2000 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | order_item | NULL       | ALL   | NULL          | NULL    | NULL    | NULL | 496836 |     1.00 | Using where |
+----+--------------------+------------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+

update 的條件是 in 子查詢的方式，explain 中注意到 select type 為 DEPENDENT SUBQUERY ，表示先做外查詢，外查詢匹配到的行數(shù)為N，那么接下來會進(jìn)行N次子查詢，效率極低。

對子查詢通常的做法是轉(zhuǎn)換為連表查詢 join。

優(yōu)化

1.最簡單的子查詢->join 的優(yōu)化操作

update `order` o  inner join (select parent from `order_item` where type = 2) tmp on o.id = tmp.parent  set create_time = now();

mysql> explain update `order` o  inner join (select parent from `order_item` where type = 2) tmp on o.id = tmp.parent  set create_time = now() \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: order_item
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 497839
     filtered: 10.00
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: UPDATE
        table: o
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: sql_optimization_match.order_item.parent
         rows: 1
     filtered: 100.00
        Extra: NULL
2 rows in set (0.00 sec)

轉(zhuǎn)換后的速度相比之前已經(jīng)快了很多，沒有了dependence subquery，不過還是秒級，驅(qū)動表選擇了 order_item 表，但是基本是一次全表掃，意味著要使用 49w 行的數(shù)據(jù)和 order 表進(jìn)行 join，開銷還是很大。

看最原始的慢update，修改的只有 order 表，條件的話只需要 id 在對 order_item 的子查詢范圍內(nèi)即可，重復(fù)的 parent 對于 update 毫無意義，因此，可以對parent字段進(jìn)行一次聚合（group by），由于子查詢中有 order_item.type = 2 的條件，因此，可以對 type 字段同時進(jìn)行聚合。

由于 order_item 僅有主鍵索引，對 order_item 表的等值判斷條件和聚合操作使用索引最佳，因此，可以建立聯(lián)合索引，索引順序優(yōu)先等值操作。

額外注意一點，我們要創(chuàng)建索引，索引字段類型和 sql 中的等值類型是否一致。

1. 優(yōu)化連表查詢

增加索引：
    alter table `order_item` add index idx_1(type,parent);

sql 優(yōu)化：
    update `order` o inner join (
        select parent from `order_item` 
                where type = '2' group by type, parent) i 
        on o.id = i.parent set create_time = now();

mysql> explain update `order` o inner join (    select  parent from `order_item` where type = '2' group by type, parent ) i on o.id = i.parent set create_time = now()\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 571
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: UPDATE
        table: o
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: i.parent
         rows: 1
     filtered: 100.00
        Extra: NULL
*************************** 3. row ***************************
           id: 2
  select_type: DERIVED
        table: order_item
   partitions: NULL
         type: range
possible_keys: idx_1
          key: idx_1
      key_len: 46
          ref: NULL
         rows: 571
     filtered: 100.00
        Extra: Using where; Using index for group-by

優(yōu)化之后，使用了 order_item 表做了驅(qū)動表，同時這里也使用到了上面建立的索引 idx_1，之后生成的臨時表和 order 表進(jìn)行 join。由于 group by 的原因，order_item 生成的結(jié)果集數(shù)量更少，因此被選為了驅(qū)動表。

另外需要注意的是在 group by 這里使用了兩列，這個是為了使用 idx_1 索引（盡管 groiup by parent 和 group by type,parent 的返回結(jié)果行數(shù)都是一樣的，但是執(zhí)行計劃還是有很大差距）

優(yōu)化之后的執(zhí)行時間在毫秒級。

總結(jié)

1. 驅(qū)動表的選擇，始終是小表驅(qū)動大表，驅(qū)動表會走全表掃，所以通常索引都是在被驅(qū)動表上增加；
2. 如果執(zhí)行計劃中出現(xiàn)了 DEPENDENT SUBQUERY，一定會對 sql 的執(zhí)行效率有影響（同時 DEPENDENT SUBQUERY 還會潛在地造成一定程度的鎖放大）， in + 子查詢 方式很容易引起，可以將子查詢優(yōu)化為 join 操作；
3. 對于 join 連表查詢，進(jìn)行連表的數(shù)據(jù)越少，執(zhí)行效率就越高，因此，在不改變sql語義的前提下，盡量使參加 join 的數(shù)據(jù)量減少;
4. 關(guān)于索引順序： 等值條件 > group by  > order by
5. 注意索引字段類型和 sql 中的的判斷條件中的數(shù)據(jù)類型是否一致。

決賽

data

區(qū)分度的計算過程省略，這里直接給出區(qū)分度好壞

由于某些表的行數(shù)較多，區(qū)分度的計算使用的是統(tǒng)計前5000行中 distinct 的值（生成環(huán)境中也可以這樣做，可以降低計算區(qū)分度帶來的額外開銷），極端情況下部分小表可能會造成誤判，但行數(shù)極少的表加索引的意義也不是很大。

# customer 數(shù)據(jù)量 1,200,000
CREATE TABLE `customer` (
  `custkey` int(11) NOT NULL,       // 區(qū)分度 OK
  `name` varchar(25) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,  // 區(qū)分度 OK
  `address` varchar(40) NOT NULL,
  `nationkey` int(11) NOT NULL,     // 區(qū)分度較低
  `phone` char(15) NOT NULL,        // 區(qū)分度 OK
  `acctbal` decimal(15,2) NOT NULL,
  `mktsegment` char(10) NOT NULL,   // 區(qū)分度較低
  `comment` varchar(117) NOT NULL,
  PRIMARY KEY (`custkey`),
  KEY `idx_nationkey` (`nationkey`)
);

# nation 數(shù)據(jù)量 25
CREATE TABLE `nation` (
  `nationkey` int(11) NOT NULL,     // 區(qū)分度 OK
  `name` char(25) NOT NULL,
  `regionkey` int(11) NOT NULL,
  `comment` varchar(152) DEFAULT NULL,
  PRIMARY KEY (`nationkey`),
  KEY `idx_4_0` (`name`)
);

# orders 數(shù)據(jù)量12,000,000
CREATE TABLE `orders` (
  `orderkey` int(11) NOT NULL,
  `custkey` int(11) NOT NULL,       // 區(qū)分度 OK
  `orderstatus` varchar(1) NOT NULL,
  `totalprice` decimal(15,2) NOT NULL,      // 區(qū)分度 OK
  `orderdate` date NOT NULL,
  `orderpriority` char(15) NOT NULL,
  `clerk` char(15) NOT NULL,                // 區(qū)分度 OK
  `shippriority` int(11) NOT NULL,
  `comment` varchar(79) NOT NULL,
  PRIMARY KEY (`orderkey`)
);

# region 數(shù)據(jù)量 5
CREATE TABLE `region` (
  `regionkey` int(11) NOT NULL,
  `name` varchar(25) NOT NULL,
  `comment` varchar(152) DEFAULT NULL,
  PRIMARY KEY (`regionkey`)
);

1. select

分析

select c.custkey, c.phone, sum(o.totalprice) totalprice
    from nation n 
    inner join customer c on c.nationkey = n.nationkey
    inner join orders o on o.clerk = c.name
    where n.name = "CHINA" and c.mktsegment = "HOUSEHOLD" and c.phone like "28-520%"
    group by c.custkey, c.phone;

# 執(zhí)行計劃
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows     | filtered | Extra                                              |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+----------------------------------------------------+
|  1 | SIMPLE      | n     | NULL       | ALL  | PRIMARY       | NULL | NULL    | NULL |       25 |    10.00 | Using where; Using temporary; Using filesort       |
|  1 | SIMPLE      | c     | NULL       | ALL  | NULL          | NULL | NULL    | NULL |  1189853 |     0.11 | Using where; Using join buffer (Block Nested Loop) |
|  1 | SIMPLE      | o     | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 10963843 |    10.00 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+----------------------------------------------------+

三張表【customer c】；【nation n】；【orders o】

customer

    where條件：
             c.mktsegment = "HOUSEHOLD"：區(qū)分度較低，放棄
        ?  c.phone like "28-520%"：區(qū)分度較好，考慮添加索引
    聚合條件：
             group by c.custkey： 區(qū)分度較好，但是已經(jīng)是主鍵，放棄
        ?  c.phone：同where，考慮添加
    join 條件：
            c.nationkey = n.nationkey：區(qū)分度較低，放棄
        ? o.clerk = c.name：區(qū)分度較高，考慮添加索引

    advice：
        add index `dx_1_0`(name);
        add index `idx_1_1` (phone);

nation

    數(shù)據(jù)量 25，不考慮添加索引

    nation 表可以考慮增加索引 add index `idx_1_0`(name);  但意義不大

orders

    join 條件：
        ? o.clerk = c.name：區(qū)分度較高，考慮添加索引

    advice：
        add index `idx_1_0` (clerk)

優(yōu)化

按上述分析增加三條索引后，執(zhí)行計劃如下

+----+-------------+-------+------------+--------+----------------+---------+---------+-------------------+----------+----------+---------------------------------------------------------------------+
| id | select_type | table | partitions | type   | possible_keys  | key     | key_len | ref               | rows     | filtered | Extra                                                               |
+----+-------------+-------+------------+--------+----------------+---------+---------+-------------------+----------+----------+---------------------------------------------------------------------+
|  1 | SIMPLE      | c     | NULL       | range  | dx_1_0,idx_1_1 | idx_1_1 | 45      | NULL              |       46 |    10.00 | Using index condition; Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | n     | NULL       | eq_ref | PRIMARY        | PRIMARY | 4       | dbaas.c.nationkey |        1 |    10.00 | Using where                                                         |
|  1 | SIMPLE      | o     | NULL       | ALL    | idx_1_0        | NULL    | NULL    | NULL              | 10963843 |    10.00 | Range checked for each record (index map: 0x2)                      |
+----+-------------+-------+------------+--------+----------------+---------+---------+-------------------+----------+----------+---------------------------------------------------------------------+

總結(jié)

1. 在 inner join 的情況下，我們無法判斷出驅(qū)動表，因此，我們會選擇在合適的字段上都添加索引；
2. 在 sql 中的條件類型較多時，選擇把等值條件和聚合條件添加為組合索引，join 條件單獨增加索引；
3. 如果數(shù)據(jù)量過少，增加索引意義不大，可以不考慮；
4. dbrain 給出的是組合索引，兩者相比，性能基本一致；

2. select

分析

select * from (
    select custkey, orderdate, sum(totalprice) as totalprice
        from orders group by custkey, orderdate
    ) o
    where orderdate = "2019-08-01"

# 執(zhí)行計劃
+----+-------------+------------+------------+------+---------------+-------------+---------+-------+----------+----------+---------------------------------+
| id | select_type | table      | partitions | type | possible_keys | key         | key_len | ref   | rows     | filtered | Extra                           |
+----+-------------+------------+------------+------+---------------+-------------+---------+-------+----------+----------+---------------------------------+
|  1 | PRIMARY     | <derived2> | NULL       | ref  | <auto_key0>   | <auto_key0> | 3       | const |       10 |   100.00 | NULL                            |
|  2 | DERIVED     | orders     | NULL       | ALL  | NULL          | NULL        | NULL    | NULL  | 10963843 |   100.00 | Using temporary; Using filesort |
+----+-------------+------------+------------+------+---------------+-------------+---------+-------+----------+----------+---------------------------------+

僅涉及到一張表，group by 用到了 filesort，sql 看上去并不復(fù)雜，但是卻產(chǎn)生了驅(qū)動表。

查看sql，發(fā)現(xiàn)select * from (子查詢)，多余的嵌套，可以考慮去掉，sql 可以改寫為

 select custkey, orderdate, sum(totalprice) as totalprice
        from orders where orderdate = "2019-08-01" group by custkey, orderdate;

索引分析

    where 條件：
        ? orderdate = "2019-08-01"：區(qū)分度較高，考慮增加索引

    聚合條件：
        ?  group by custkey, orderdate：兩個字段區(qū)分區(qū)都較高，考慮增加索引

    advice：
        等值條件 優(yōu)先于 聚合條件
        add index `idx_2_0` (orderdate, custkey)

優(yōu)化

使用優(yōu)化后的 sql，增加聯(lián)合索引，執(zhí)行計劃為
+----+-------------+--------+------------+------+---------------+---------+---------+-------+------+----------+-----------------------+
| id | select_type | table  | partitions | type | possible_keys | key     | key_len | ref   | rows | filtered | Extra                 |
+----+-------------+--------+------------+------+---------------+---------+---------+-------+------+----------+-----------------------+
|  1 | SIMPLE      | orders | NULL       | ref  | idx_2_0       | idx_2_0 | 3       | const |    1 |   100.00 | Using index condition |
+----+-------------+--------+------------+------+---------------+---------+---------+-------+------+----------+-----------------------+

如果是增加兩個單獨的索引，
    add index `idx_2_1` (custkey);
    add index `idx_2_2` (orderdate);
執(zhí)行計劃為
+----+-------------+--------+------------+------+---------------+---------+---------+-------+------+----------+--------------------------------------------------------+
| id | select_type | table  | partitions | type | possible_keys | key     | key_len | ref   | rows | filtered | Extra                                                  |
+----+-------------+--------+------------+------+---------------+---------+---------+-------+------+----------+--------------------------------------------------------+
|  1 | SIMPLE      | orders | NULL       | ref  | idx_2_2       | idx_2_2 | 3       | const |    1 |   100.00 | Using index condition; Using temporary; Using filesort |
+----+-------------+--------+------------+------+---------------+---------+---------+-------+------+----------+--------------------------------------------------------+

使用到了 filesort，只有custkey可以使用索引，因此建議聯(lián)合索引。

結(jié)論

1. 單張表的sql如果執(zhí)行計劃出現(xiàn) filesort 等需要關(guān)注，頻繁嵌套的子查詢，會對性能有一定影響，可以考慮 sql 重寫；
2. 關(guān)于加索引，等值條件要優(yōu)先于聚合、join等條件；

3. select

分析

select c.custkey, sum(o.totalprice) totalprice from customer c
        left join orders o on o.custkey = c.custkey
        where c.phone like "33-64%" and c.name like concat("Customer#00003", "%")
        group by c.custkey

在已經(jīng)有前兩條 sql 增加的索引前提下，執(zhí)行計劃為
+----+-------------+-------+------------+-------+------------------------+---------+---------+------+----------+----------+---------------------------------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys          | key     | key_len | ref  | rows     | filtered | Extra                                                               |
+----+-------------+-------+------------+-------+------------------------+---------+---------+------+----------+----------+---------------------------------------------------------------------+
|  1 | SIMPLE      | c     | NULL       | range | PRIMARY,dx_1_0,idx_1_1 | idx_1_1 | 45      | NULL |      552 |     1.63 | Using index condition; Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | o     | NULL       | ALL   | NULL                   | NULL    | NULL    | NULL | 10963843 |   100.00 | Using where; Using join buffer (Block Nested Loop)                  |
+----+-------------+-------+------------+-------+------------------------+---------+---------+------+----------+----------+---------------------------------------------------------------------+

customer 表已經(jīng)使用到了索引，是否需要增加其它索引稍后分析；
order 表是走了全表掃，掃描 12,000,000 行數(shù)據(jù)，可能缺少索引；

兩張表【customer c】；【order o】

customer

where條件：
         c.phone like "33-64%"：第一條 select 已經(jīng)添加過索引
         c.name like concat("Customer#00003", "%")：第一條 select 已經(jīng)添加過索引
聚合條件：
         group by c.custkey： 區(qū)分度較好，但是已經(jīng)是主鍵，放棄
join 條件：
         o.custkey = c.custkey：區(qū)分度較好，但是已經(jīng)是主鍵，放棄

advice：
    無建議

order

join 條件：
        o.custkey = c.custkey：區(qū)分度較高，考慮增加索引

advice：
        add index `idx_3_0` (custkey)

優(yōu)化

增加索引之后的執(zhí)行計劃為：

+----+-------------+-------+------------+-------+------------------------+---------+---------+-----------------+------+----------+---------------------------------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys          | key     | key_len | ref             | rows | filtered | Extra                                                               |
+----+-------------+-------+------------+-------+------------------------+---------+---------+-----------------+------+----------+---------------------------------------------------------------------+
|  1 | SIMPLE      | c     | NULL       | range | PRIMARY,dx_1_0,idx_1_1 | idx_1_1 | 45      | NULL            |  552 |     1.63 | Using index condition; Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | o     | NULL       | ref   | idx_3_0                | idx_3_0 | 4       | dbaas.c.custkey |   13 |   100.00 | NULL                                                                |
+----+-------------+-------+------------+-------+------------------------+---------+---------+-----------------+------+----------+---------------------------------------------------------------------+

增加索引之后。掃描 order 表的行數(shù)已經(jīng)大大減少，執(zhí)行效率也提升很高

總結(jié)

1. 關(guān)于 【using where】, 【using index】, 【using index condition】; 【Using where &&Using index】的區(qū)別（為什么總結(jié)這個呢，我這邊的話是建立了兩個庫，數(shù)據(jù)和基本的表結(jié)構(gòu)是一致的，但是其中一個庫中表的索引是按照我自己分析的情況加上去的，另一個庫是官方給出的建議，發(fā)現(xiàn)在執(zhí)行效率都很高的情況下，兩者執(zhí)行計劃的 extra 內(nèi)容有所區(qū)別，本想 google 解決，但是看了排名前三的博客，兩篇的內(nèi)容是一樣的，和第三篇的解釋完全不同，自己嘗試了下，這里給出結(jié)論，最后附上測試流程）：

4. select

分析

select c.custkey, c.phone from nation n
        inner join customer c on c.nationkey = n.nationkey
        where n.name = "CHINA" and exists (
            select 1 from orders o where o.custkey = c.custkey and o.orderdate = "1998-08-11");

在上面已有的索引前提下，執(zhí)行計劃為

+----+--------------------+-------+------------+------+-----------------+---------+---------+-----------------------+---------+----------+----------------------------------------------------+
| id | select_type        | table | partitions | type | possible_keys   | key     | key_len | ref                   | rows    | filtered | Extra                                              |
+----+--------------------+-------+------------+------+-----------------+---------+---------+-----------------------+---------+----------+----------------------------------------------------+
|  1 | PRIMARY            | n     | NULL       | ref  | PRIMARY,idx_1_0 | idx_1_0 | 75      | const                 |       1 |   100.00 | Using index                                        |
|  1 | PRIMARY            | c     | NULL       | ALL  | NULL            | NULL    | NULL    | NULL                  | 1189853 |    10.00 | Using where; Using join buffer (Block Nested Loop) |
|  2 | DEPENDENT SUBQUERY | o     | NULL       | ref  | idx_2_0,idx_3_0 | idx_2_0 | 7       | const,dbaas.c.custkey |       1 |   100.00 | Using index                                        |
+----+--------------------+-------+------------+------+-----------------+---------+---------+-----------------------+---------+----------+----------------------------------------------------+

看到了 DEPENDENT SUBQUERY，在 in/exists +子查詢的條件下。經(jīng)常會出現(xiàn)，有什么危害上面有解釋，出現(xiàn)了這個東西，就要想辦法改寫 sql。既然是 exists + 子查詢，那么優(yōu)化策略就是改寫為 join。

最通俗的改寫方式：先全部 inner join，最后加 where 條件

    select c.custkey, c.phone from nation n
        inner join customer c on c.nationkey = n.nationkey
        inner join orders o on o.custkey = c.custkey
      where n.name = "CHINA" and o.orderdate = "1998-08-11";

官方給出的 sql 比較復(fù)雜，但做的事情差不多，多考慮了一點試圖使用 group by 來減少 join 的數(shù)據(jù)量，給出官方答案，這里不多解釋它【不過這里去掉 group by 會更好】

SELECT `t1`.`custkey`, `t1`.`phone` FROM 
        ( SELECT * FROM `dbaas`.`nation` AS `t` WHERE `t`.`name` = 'CHINA' ) AS `t0`
    INNER JOIN `dbaas`.`customer` AS `t1` 
        ON `t0`.`nationkey` = `t1`.`nationkey`
    INNER JOIN (
        SELECT `t2`.`custkey` FROM `dbaas`.`orders` AS `t2` 
            WHERE `t2`.`orderdate` = '1998-08-11' GROUP BY `t2`.`custkey` ) AS `t5` 
        ON `t1`.`custkey` = `t5`.`custkey`

索引建議的話這里就沒有太多了，條件字段已經(jīng)都有了相應(yīng)的索引。

優(yōu)化

優(yōu)化后的執(zhí)行計劃如下：

+----+-------------+-------+------------+--------+-----------------+---------+---------+-----------------+------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys   | key     | key_len | ref             | rows | filtered | Extra       |
+----+-------------+-------+------------+--------+-----------------+---------+---------+-----------------+------+----------+-------------+
|  1 | SIMPLE      | n     | NULL       | ref    | PRIMARY,idx_1_0 | idx_1_0 | 75      | const           |    1 |   100.00 | Using index |
|  1 | SIMPLE      | o     | NULL       | ref    | idx_2_0,idx_3_0 | idx_2_0 | 3       | const           |   20 |   100.00 | Using index |
|  1 | SIMPLE      | c     | NULL       | eq_ref | PRIMARY         | PRIMARY | 4       | dbaas.o.custkey |    1 |    10.00 | Using where |
+----+-------------+-------+------------+--------+-----------------+---------+---------+-----------------+------+----------+-------------+

可以看到兩次join的驅(qū)動表分別選擇了n和o，ref 也是 const，性能要比 DEPENDENT SUBQUERY 這種要好太多了

總結(jié)

1. 并不是所有的復(fù)雜 join 都要使用 group by，和數(shù)據(jù)分布有關(guān)，如果 group by 并不能顯著降低 join 行數(shù)的話， 沒有必要；

胡思亂想

mysql 的查詢優(yōu)化器相對來說是一個比較復(fù)雜的邏輯，期待它可以更好工作的前提是sql的寫法要合理，同時也要有恰當(dāng)?shù)乃饕?/p>

我們對 sql 的優(yōu)化，通常是先去考慮優(yōu)化sql，再根據(jù)優(yōu)化后的 sql 增加所需索引。（在實際數(shù)據(jù)庫開發(fā)過程中，尤其是 2B 的服務(wù)提供端，我們會優(yōu)先在不需要業(yè)務(wù)改動的情況下增加所需索引嘗試解決慢查詢的問題，如果增加索引不能解決問題，那么就需要業(yè)務(wù)進(jìn)行相應(yīng)改造）

首先，關(guān)于sql改寫，這個要考慮的比較多，因為mysql的優(yōu)化器、執(zhí)行器做了太多的事情，靠AI可以自動改寫優(yōu)化的的話，不敢想象。。。（DBA又要有一波人下崗了）人工的話根據(jù)經(jīng)驗吧，根據(jù)執(zhí)行計劃中的異常點去考慮改進(jìn)，比如子查詢改寫為 join 等，像預(yù)賽題目中關(guān)于order by status 改寫為 union all 的做法，確實有一定的效果，但是并不是一個通用的方法，這里就太靈活了；
其次，相比之下，在已知sql和表結(jié)構(gòu)的情況下，依靠AI給出索引建議還是更加讓人有真實感，索引的話有一些通用的規(guī)則，網(wǎng)上介紹的很多了，自己瞎寫了點，看看就好

1. 找出所有條件字段，計算字段區(qū)分度，區(qū)分度很低的字段沒有必要加索引，數(shù)據(jù)量很少的字段一樣，加上意義也不大；
2. 條件的話優(yōu)先級  等值 > 聚合（group/order by） > join ，同一優(yōu)先級根據(jù)區(qū)分度建立聯(lián)合索引；
3. 在聚合條件下，如果聚合后的行數(shù)太多，回表量太大的情況下，mysql 可能不會使用這些索引；
4. 驅(qū)動表不需要考慮什么索引，驅(qū)動表的數(shù)據(jù)必然都在join的結(jié)果集中，對于 inner join 這種無法確定驅(qū)動表的，可以考慮在兩張表的合適字段上都添加索引

繼續(xù)扯。。。

現(xiàn)在大家都在往云上遷業(yè)務(wù)，云上的db智能診斷難以避免是以后的剛需。AI 可以做到哪一步，不清楚，以后的事情誰知道呢。

向AI問一下細(xì)節(jié)

推薦閱讀：

免責(zé)聲明：本站發(fā)布的內(nèi)容（圖片、視頻和文字）以原創(chuàng)、轉(zhuǎn)載和分享為主，文章觀點不代表本網(wǎng)站立場，如果涉及侵權(quán)請聯(lián)系站長郵箱：is@yisu.com進(jìn)行舉報，并提供相關(guān)證據(jù)，一經(jīng)查實，將立刻刪除涉嫌侵權(quán)內(nèi)容。

上一篇新聞：
Restful: Spring Boot with Mongodb
下一篇新聞：
iOS 對象的歸檔

猜你喜歡

AI
助
手

產(chǎn)品服務(wù)

地區(qū)劃分

專題活動

幫助支持

關(guān)于我們

售后咨詢

7*24小時在線電話：400-100-2938

7*24小時在線 QQ：800811969

關(guān)注億速云

億速云公眾號

手機(jī)網(wǎng)站二維碼

<table id="g05ir"></table>

<strong id="g05ir"><big id="g05ir"></big></strong>