溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點擊 登錄注冊 即表示同意《億速云用戶服務(wù)條款》

騰訊數(shù)據(jù)庫診斷大賽題目回顧與分析

發(fā)布時間:2020-07-21 12:37:33 來源:網(wǎng)絡(luò) 閱讀:3490 作者:暮回_zz 欄目:MySQL數(shù)據(jù)庫

只是從個人角度分析了下這次比賽的題目,涉及到一些個人經(jīng)驗,學(xué)習(xí)交流而已~

貼上官方的git鏈接:
https://github.com/DBbrain/Diagnosis

初賽

data

order 表的數(shù)據(jù)量 2000
CREATE TABLE `order` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `name` varchar ( 32) COLLATE utf8_bin NOT NULL,
  `creator` varchar(24) COLLATE utf8_bin NOT NULL,
  `price` varchar(64) COLLATE utf8_bin NOT NULL,
  `create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `status` tinyint(1) NOT NULL,
  PRIMARY KEY (`id`)
);

order_item 表的數(shù)據(jù)量 499760
CREATE TABLE `order_item` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `name` varchar(32) COLLATE utf8_bin NOT NULL,
  `parent` bigint(20) NOT NULL,
  `status` int(11) NOT NULL,
  `type` varchar(12) COLLATE utf8_bin NOT NULL DEFAULT '0',
  `quantity` int(11) NOT NULL DEFAULT '1',
  `update_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
);

1. select

分析

SELECT * FROM `order` o INNER JOIN order_item i  ON i.parent = o.id 
        ORDER BY o.status ASC, i.update_time DESC LIMIT  0, 20;

mysql> explain SELECT * FROM `order` o INNER JOIN order_item i 
    ->     ON i.parent = o.id  ORDER BY o.status ASC, i.update_time DESC LIMIT  0, 20;
+----+-------------+-------+------------+--------+---------------+---------+---------+---------------------------------+--------+----------+---------------------------------+
| id      | select_type | table   | partitions   | type     | possible_keys | key        | key_len   | ref                                         | rows      | filtered    | Extra                                  |
+----+-------------+-------+------------+--------+---------------+---------+---------+---------------------------------+--------+----------+---------------------------------+
|  1 | SIMPLE      | i     | NULL       | ALL    | NULL          | NULL    | NULL    | NULL                            | 497839 |   100.00 | Using temporary; Using filesort |
|  1 | SIMPLE      | o     | NULL       | eq_ref | PRIMARY       | PRIMARY | 8       | sql_optimization_match.i.parent |      1 |   100.00 | NULL                            |
+----+-------------+-------+------------+--------+---------------+---------+---------+---------------------------------+--------+----------+---------------------------------+

兩張表 inner join,根據(jù)執(zhí)行計劃,驅(qū)動表選擇 order_item ,同時由于排序字段來自兩張表,且方向不一致,造成了寫臨時表,由于排序字段不能使用到索引,因此造成了外排序。

  • 分析 join 字段
    order_item.parent = order.id

mysql> select count(distinct(parent)) from order_item;  // 區(qū)分度較低,忽略
+-------------------------+
| count(distinct(parent)) |
+-------------------------+
|                     300     |
+-------------------------+
mysql> select count(distinct(id)) from `order`; // 區(qū)分度較高,但 id 上已有主鍵索引
+---------------------+
| count(distinct(id))  |
+---------------------+
|                2000    |
+---------------------+
1 row in set (0.00 sec)

order_item.parent 沒有索引;
order.id 有索引。

  • 分析聚合字段
ORDER BY o.status ASC, i.update_time DESC

mysql> select count(distinct(status)) from `order`; //區(qū)分度很低
+-------------------------+
| count(distinct(status)) |
+-------------------------+
|                       2 |
+-------------------------+
mysql> select count(distinct(update_time)) from `order_item`;   // 區(qū)分度一般
+------------------------------+
| count(distinct(update_time)) |
+------------------------------+
|                        32768 |
+------------------------------+

兩張表的 order by 字段排序方式不一樣,可能需要用到外排序,同時原表在排序字段上沒有索引。

優(yōu)化

根據(jù)之前的查詢計劃可以看到,是對 order_item 進(jìn)行了一次全表掃,之后再進(jìn)行外排序。由于 sql 語義中需要對兩列進(jìn)行排序,因此,可以通過其他的方式,減少外排序的數(shù)據(jù)量,從而降低時耗。

order 表中的排序字段 status 僅有兩個不同值,嘗試去掉 status 排序字段之后,速度明顯提高,此時,order by 中的 update_time 字段可以嘗試增加索引,區(qū)分度也滿足要求;

status 僅有兩列,可以使用 union all 來代替,避免 order by 中不同的表不同的排序順序?qū)е聼o法使用索引。

這里是官方給出的建議(應(yīng)該不是ML自動改寫的),不過這種改寫sql的方式有一定局限性,適用場景受限,如果 status 類型不是tinyint(1),且以后如果會增加新的類型,sql需要不斷改寫。

可以嘗試推動業(yè)務(wù)改造,在重新優(yōu)化索引。

另外,sql改寫之后,給出的索引建議是增加聯(lián)合索引(update_time,parent),上面分析可以看到,parent的區(qū)分度較低,這里增加聯(lián)合索引或者只給 update_time 增加索引,性能相差不多。

# sql 改寫
SELECT o.*,i.* FROM  (
    ( SELECT o.id, i.id item_id FROM  `order_1` o 
            INNER JOIN order_item i ON i.parent =o.id
            WHERE  o.status = 0
            ORDER  BY i.update_time DESC LIMIT  0, 20)
    UNION ALL
    (SELECT o.id, i.id item_id FROM  `order_1` o
            INNER JOIN order_item i ON i.parent =o.id
            WHERE  o.status = 1
            ORDER  BY i.update_time DESC LIMIT  0, 20)
    ) tmp
    INNER JOIN `order_1` o ON tmp.id = o.id
    INNER JOIN order_item i ON tmp.item_id = i.id
    ORDER  BY o.status ASC,
    i.update_time DESC
    LIMIT  0, 20

# 增加索引
alter table order_item add index `item_idx_1` (`update_time`,`parent`);

# 執(zhí)行計劃
+----+-------------+------------+------------+--------+---------------+------------+---------+---------------------------------+------+----------+---------------------------------+order_item i ON tmp.ite 
| id | select_type | table      | partitions | type   | possible_keys | key        | key_len | ref                             | rows | filtered | Extra                           |
+----+-------------+------------+------------+--------+---------------+------------+---------+---------------------------------+------+----------+---------------------------------+
|  1 | PRIMARY     | <derived2> | NULL       | ALL    | NULL          | NULL       | NULL    | NULL                            |   40 |   100.00 | Using temporary; Using filesort |
|  1 | PRIMARY     | o          | NULL       | eq_ref | PRIMARY       | PRIMARY    | 8       | tmp.id                          |    1 |   100.00 | NULL                            |
|  1 | PRIMARY     | i          | NULL       | eq_ref | PRIMARY       | PRIMARY    | 8       | tmp.item_id                     |    1 |   100.00 | NULL                            |
|  2 | DERIVED     | i          | NULL       | index  | NULL          | item_idx_1 | 12      | NULL                            |   20 |   100.00 | Using index                     |
|  2 | DERIVED     | o          | NULL       | eq_ref | PRIMARY       | PRIMARY    | 8       | sql_optimization_match.i.parent |    1 |    10.00 | Using where                     |
|  3 | UNION       | i          | NULL       | index  | NULL          | item_idx_1 | 12      | NULL                            |   20 |   100.00 | Using index                     |
|  3 | UNION       | o          | NULL       | eq_ref | PRIMARY       | PRIMARY    | 8       | sql_optimization_match.i.parent |    1 |    10.00 | Using where                     |
+----+-------------+------------+------------+--------+---------------+------------+---------+---------------------------------+------+----------+---------------------------------+

總結(jié)

1. 對區(qū)分度極低的字段如果有排序、范圍比較等操作,可以轉(zhuǎn)換為 union all;
2. 對排序字段,嘗試使用索引避免 filesort,如果不可避免,在 filesort 之前嘗試減少排序的數(shù)據(jù)量;

2. update

分析

update `order` set create_time = now()
    where id in (select parent from order_item where type = 2 );

# 執(zhí)行計劃
mysql> explain update `order_1` set create_time = now() where id in (select parent from order_item where type = 2 );
+----+--------------------+------------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
| id | select_type        | table      | partitions | type  | possible_keys | key     | key_len | ref  | rows   | filtered | Extra       |
+----+--------------------+------------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
|  1 | UPDATE             | order_1    | NULL       | index | NULL          | PRIMARY | 8       | NULL |   2000 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | order_item | NULL       | ALL   | NULL          | NULL    | NULL    | NULL | 496836 |     1.00 | Using where |
+----+--------------------+------------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+

update 的條件是 in 子查詢的方式,explain 中注意到 select type 為 DEPENDENT SUBQUERY ,表示先做外查詢,外查詢匹配到的行數(shù)為N,那么接下來會進(jìn)行N次子查詢,效率極低。

對子查詢通常的做法是轉(zhuǎn)換為連表查詢 join。

優(yōu)化

  • 1.最簡單的 子查詢->join 的優(yōu)化操作
update `order` o  inner join (select parent from `order_item` where type = 2) tmp on o.id = tmp.parent  set create_time = now();

mysql> explain update `order` o  inner join (select parent from `order_item` where type = 2) tmp on o.id = tmp.parent  set create_time = now() \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: order_item
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 497839
     filtered: 10.00
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: UPDATE
        table: o
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: sql_optimization_match.order_item.parent
         rows: 1
     filtered: 100.00
        Extra: NULL
2 rows in set (0.00 sec)

轉(zhuǎn)換后的速度相比之前已經(jīng)快了很多,沒有了dependence subquery,不過還是秒級,驅(qū)動表選擇了 order_item 表, 但是基本是一次全表掃,意味著要使用 49w 行的數(shù)據(jù)和 order 表進(jìn)行 join,開銷還是很大。

看最原始的慢update,修改的只有 order 表,條件的話只需要 id 在對 order_item 的子查詢范圍內(nèi)即可,重復(fù)的 parent 對于 update 毫無意義,因此,可以對parent字段進(jìn)行一次聚合(group by),由于子查詢中有 order_item.type = 2 的條件,因此,可以對 type 字段同時進(jìn)行聚合。

由于 order_item 僅有主鍵索引,對 order_item 表的等值判斷條件和聚合操作使用索引最佳,因此,可以建立聯(lián)合索引,索引順序優(yōu)先等值操作。

額外注意一點,我們要創(chuàng)建索引,索引字段類型和 sql 中的等值類型是否一致。

    1. 優(yōu)化連表查詢
增加索引:
    alter table `order_item` add index idx_1(type,parent);

sql 優(yōu)化:
    update `order` o inner join (
        select parent from `order_item` 
                where type = '2' group by type, parent) i 
        on o.id = i.parent set create_time = now();

mysql> explain update `order` o inner join (    select  parent from `order_item` where type = '2' group by type, parent ) i on o.id = i.parent set create_time = now()\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 571
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: UPDATE
        table: o
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: i.parent
         rows: 1
     filtered: 100.00
        Extra: NULL
*************************** 3. row ***************************
           id: 2
  select_type: DERIVED
        table: order_item
   partitions: NULL
         type: range
possible_keys: idx_1
          key: idx_1
      key_len: 46
          ref: NULL
         rows: 571
     filtered: 100.00
        Extra: Using where; Using index for group-by

優(yōu)化之后,使用了 order_item 表做了驅(qū)動表,同時這里也使用到了上面建立的索引 idx_1,之后生成的臨時表和 order 表進(jìn)行 join。由于 group by 的原因,order_item 生成的結(jié)果集數(shù)量更少,因此被選為了驅(qū)動表。

另外需要注意的是在 group by 這里使用了兩列,這個是為了使用 idx_1 索引(盡管 groiup by parent 和 group by type,parent 的返回結(jié)果行數(shù)都是一樣的,但是執(zhí)行計劃還是有很大差距)

優(yōu)化之后的執(zhí)行時間在毫秒級。

總結(jié)

1. 驅(qū)動表的選擇,始終是小表驅(qū)動大表,驅(qū)動表會走全表掃,所以通常索引都是在被驅(qū)動表上增加;
2. 如果執(zhí)行計劃中出現(xiàn)了 DEPENDENT SUBQUERY,一定會對 sql 的執(zhí)行效率有影響(同時 DEPENDENT SUBQUERY 還會潛在地造成一定程度的鎖放大), in + 子查詢 方式很容易引起,可以將子查詢優(yōu)化為 join 操作;
3. 對于 join 連表查詢,進(jìn)行連表的數(shù)據(jù)越少,執(zhí)行效率就越高,因此,在不改變sql語義的前提下,盡量使參加 join 的數(shù)據(jù)量減少;
4. 關(guān)于索引順序: 等值條件 > group by  > order by
5. 注意索引字段類型和 sql 中的的判斷條件中的數(shù)據(jù)類型是否一致。

決賽

data

區(qū)分度的計算過程省略,這里直接給出區(qū)分度好壞

由于某些表的行數(shù)較多,區(qū)分度的計算使用的是統(tǒng)計前5000行中 distinct 的值(生成環(huán)境中也可以這樣做,可以降低計算區(qū)分度帶來的額外開銷),極端情況下部分小表可能會造成誤判,但行數(shù)極少的表加索引的意義也不是很大。

# customer 數(shù)據(jù)量 1,200,000
CREATE TABLE `customer` (
  `custkey` int(11) NOT NULL,       // 區(qū)分度 OK
  `name` varchar(25) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,  // 區(qū)分度 OK
  `address` varchar(40) NOT NULL,
  `nationkey` int(11) NOT NULL,     // 區(qū)分度較低
  `phone` char(15) NOT NULL,        // 區(qū)分度 OK
  `acctbal` decimal(15,2) NOT NULL,
  `mktsegment` char(10) NOT NULL,   // 區(qū)分度較低
  `comment` varchar(117) NOT NULL,
  PRIMARY KEY (`custkey`),
  KEY `idx_nationkey` (`nationkey`)
);

# nation 數(shù)據(jù)量 25
CREATE TABLE `nation` (
  `nationkey` int(11) NOT NULL,     // 區(qū)分度 OK
  `name` char(25) NOT NULL,
  `regionkey` int(11) NOT NULL,
  `comment` varchar(152) DEFAULT NULL,
  PRIMARY KEY (`nationkey`),
  KEY `idx_4_0` (`name`)
);

# orders 數(shù)據(jù)量12,000,000
CREATE TABLE `orders` (
  `orderkey` int(11) NOT NULL,
  `custkey` int(11) NOT NULL,       // 區(qū)分度 OK
  `orderstatus` varchar(1) NOT NULL,
  `totalprice` decimal(15,2) NOT NULL,      // 區(qū)分度 OK
  `orderdate` date NOT NULL,
  `orderpriority` char(15) NOT NULL,
  `clerk` char(15) NOT NULL,                // 區(qū)分度 OK
  `shippriority` int(11) NOT NULL,
  `comment` varchar(79) NOT NULL,
  PRIMARY KEY (`orderkey`)
);

# region 數(shù)據(jù)量 5
CREATE TABLE `region` (
  `regionkey` int(11) NOT NULL,
  `name` varchar(25) NOT NULL,
  `comment` varchar(152) DEFAULT NULL,
  PRIMARY KEY (`regionkey`)
);

1. select

分析

select c.custkey, c.phone, sum(o.totalprice) totalprice
    from nation n 
    inner join customer c on c.nationkey = n.nationkey
    inner join orders o on o.clerk = c.name
    where n.name = "CHINA" and c.mktsegment = "HOUSEHOLD" and c.phone like "28-520%"
    group by c.custkey, c.phone;

# 執(zhí)行計劃
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows     | filtered | Extra                                              |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+----------------------------------------------------+
|  1 | SIMPLE      | n     | NULL       | ALL  | PRIMARY       | NULL | NULL    | NULL |       25 |    10.00 | Using where; Using temporary; Using filesort       |
|  1 | SIMPLE      | c     | NULL       | ALL  | NULL          | NULL | NULL    | NULL |  1189853 |     0.11 | Using where; Using join buffer (Block Nested Loop) |
|  1 | SIMPLE      | o     | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 10963843 |    10.00 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------+----------------------------------------------------+

三張表 【customer c】;【nation n】;【orders o】

  • customer
    where條件:
             c.mktsegment = "HOUSEHOLD":區(qū)分度較低,放棄
        ?  c.phone like "28-520%":區(qū)分度較好,考慮添加索引
    聚合條件:
             group by c.custkey: 區(qū)分度較好,但是已經(jīng)是主鍵,放棄
        ?  c.phone:同where,考慮添加
    join 條件:
            c.nationkey = n.nationkey:區(qū)分度較低,放棄
        ? o.clerk = c.name:區(qū)分度較高,考慮添加索引

    advice:
        add index `dx_1_0`(name);
        add index `idx_1_1` (phone);
  • nation
    數(shù)據(jù)量 25,不考慮添加索引

    nation 表可以考慮增加索引 add index `idx_1_0`(name);  但意義不大
  • orders
    join 條件:
        ? o.clerk = c.name:區(qū)分度較高,考慮添加索引

    advice:
        add index `idx_1_0` (clerk)

優(yōu)化

按上述分析增加三條索引后,執(zhí)行計劃如下

+----+-------------+-------+------------+--------+----------------+---------+---------+-------------------+----------+----------+---------------------------------------------------------------------+
| id | select_type | table | partitions | type   | possible_keys  | key     | key_len | ref               | rows     | filtered | Extra                                                               |
+----+-------------+-------+------------+--------+----------------+---------+---------+-------------------+----------+----------+---------------------------------------------------------------------+
|  1 | SIMPLE      | c     | NULL       | range  | dx_1_0,idx_1_1 | idx_1_1 | 45      | NULL              |       46 |    10.00 | Using index condition; Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | n     | NULL       | eq_ref | PRIMARY        | PRIMARY | 4       | dbaas.c.nationkey |        1 |    10.00 | Using where                                                         |
|  1 | SIMPLE      | o     | NULL       | ALL    | idx_1_0        | NULL    | NULL    | NULL              | 10963843 |    10.00 | Range checked for each record (index map: 0x2)                      |
+----+-------------+-------+------------+--------+----------------+---------+---------+-------------------+----------+----------+---------------------------------------------------------------------+

總結(jié)

1. 在 inner join 的情況下,我們無法判斷出驅(qū)動表,因此,我們會選擇在合適的字段上都添加索引;
2. 在 sql 中的條件類型較多時,選擇把等值條件和聚合條件添加為組合索引,join 條件單獨增加索引;
3. 如果數(shù)據(jù)量過少,增加索引意義不大,可以不考慮;
4. dbrain 給出的是組合索引,兩者相比,性能基本一致;

2. select

分析

select * from (
    select custkey, orderdate, sum(totalprice) as totalprice
        from orders group by custkey, orderdate
    ) o
    where orderdate = "2019-08-01"

# 執(zhí)行計劃
+----+-------------+------------+------------+------+---------------+-------------+---------+-------+----------+----------+---------------------------------+
| id | select_type | table      | partitions | type | possible_keys | key         | key_len | ref   | rows     | filtered | Extra                           |
+----+-------------+------------+------------+------+---------------+-------------+---------+-------+----------+----------+---------------------------------+
|  1 | PRIMARY     | <derived2> | NULL       | ref  | <auto_key0>   | <auto_key0> | 3       | const |       10 |   100.00 | NULL                            |
|  2 | DERIVED     | orders     | NULL       | ALL  | NULL          | NULL        | NULL    | NULL  | 10963843 |   100.00 | Using temporary; Using filesort |
+----+-------------+------------+------------+------+---------------+-------------+---------+-------+----------+----------+---------------------------------+

僅涉及到一張表,group by 用到了 filesort,sql 看上去并不復(fù)雜,但是卻產(chǎn)生了驅(qū)動表。

查看sql,發(fā)現(xiàn)select * from (子查詢),多余的嵌套,可以考慮去掉,sql 可以改寫為

 select custkey, orderdate, sum(totalprice) as totalprice
        from orders where orderdate = "2019-08-01" group by custkey, orderdate;

索引分析

    where 條件:
        ? orderdate = "2019-08-01":區(qū)分度較高,考慮增加索引

    聚合條件:
        ?  group by custkey, orderdate:兩個字段區(qū)分區(qū)都較高,考慮增加索引

    advice:
        等值條件 優(yōu)先于 聚合條件
        add index `idx_2_0` (orderdate, custkey)

優(yōu)化

使用優(yōu)化后的 sql,增加聯(lián)合索引,執(zhí)行計劃為
+----+-------------+--------+------------+------+---------------+---------+---------+-------+------+----------+-----------------------+
| id | select_type | table  | partitions | type | possible_keys | key     | key_len | ref   | rows | filtered | Extra                 |
+----+-------------+--------+------------+------+---------------+---------+---------+-------+------+----------+-----------------------+
|  1 | SIMPLE      | orders | NULL       | ref  | idx_2_0       | idx_2_0 | 3       | const |    1 |   100.00 | Using index condition |
+----+-------------+--------+------------+------+---------------+---------+---------+-------+------+----------+-----------------------+

如果是增加兩個單獨的索引,
    add index `idx_2_1` (custkey);
    add index `idx_2_2` (orderdate);
執(zhí)行計劃為
+----+-------------+--------+------------+------+---------------+---------+---------+-------+------+----------+--------------------------------------------------------+
| id | select_type | table  | partitions | type | possible_keys | key     | key_len | ref   | rows | filtered | Extra                                                  |
+----+-------------+--------+------------+------+---------------+---------+---------+-------+------+----------+--------------------------------------------------------+
|  1 | SIMPLE      | orders | NULL       | ref  | idx_2_2       | idx_2_2 | 3       | const |    1 |   100.00 | Using index condition; Using temporary; Using filesort |
+----+-------------+--------+------------+------+---------------+---------+---------+-------+------+----------+--------------------------------------------------------+

使用到了 filesort,只有custkey可以使用索引,因此建議聯(lián)合索引。

結(jié)論

1. 單張表的sql如果執(zhí)行計劃出現(xiàn) filesort 等需要關(guān)注,頻繁嵌套的子查詢,會對性能有一定影響,可以考慮 sql 重寫;
2. 關(guān)于加索引,等值條件要優(yōu)先于聚合、join等條件;

3. select

分析

select c.custkey, sum(o.totalprice) totalprice from customer c
        left join orders o on o.custkey = c.custkey
        where c.phone like "33-64%" and c.name like concat("Customer#00003", "%")
        group by c.custkey

在已經(jīng)有前兩條 sql 增加的索引前提下,執(zhí)行計劃為
+----+-------------+-------+------------+-------+------------------------+---------+---------+------+----------+----------+---------------------------------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys          | key     | key_len | ref  | rows     | filtered | Extra                                                               |
+----+-------------+-------+------------+-------+------------------------+---------+---------+------+----------+----------+---------------------------------------------------------------------+
|  1 | SIMPLE      | c     | NULL       | range | PRIMARY,dx_1_0,idx_1_1 | idx_1_1 | 45      | NULL |      552 |     1.63 | Using index condition; Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | o     | NULL       | ALL   | NULL                   | NULL    | NULL    | NULL | 10963843 |   100.00 | Using where; Using join buffer (Block Nested Loop)                  |
+----+-------------+-------+------------+-------+------------------------+---------+---------+------+----------+----------+---------------------------------------------------------------------+

customer 表已經(jīng)使用到了索引,是否需要增加其它索引稍后分析;
order 表是走了全表掃,掃描 12,000,000 行數(shù)據(jù),可能缺少索引;

兩張表 【customer c】;【order o】

  • customer

    where條件:
             c.phone like "33-64%":第一條 select 已經(jīng)添加過索引
             c.name like concat("Customer#00003", "%"):第一條 select 已經(jīng)添加過索引
    聚合條件:
             group by c.custkey: 區(qū)分度較好,但是已經(jīng)是主鍵,放棄
    join 條件:
             o.custkey = c.custkey:區(qū)分度較好,但是已經(jīng)是主鍵,放棄
    
    advice:
        無建議
  • order

    join 條件:
            o.custkey = c.custkey:區(qū)分度較高,考慮增加索引
    
    advice:
            add index `idx_3_0` (custkey)

優(yōu)化

增加索引之后的執(zhí)行計劃為:

+----+-------------+-------+------------+-------+------------------------+---------+---------+-----------------+------+----------+---------------------------------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys          | key     | key_len | ref             | rows | filtered | Extra                                                               |
+----+-------------+-------+------------+-------+------------------------+---------+---------+-----------------+------+----------+---------------------------------------------------------------------+
|  1 | SIMPLE      | c     | NULL       | range | PRIMARY,dx_1_0,idx_1_1 | idx_1_1 | 45      | NULL            |  552 |     1.63 | Using index condition; Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | o     | NULL       | ref   | idx_3_0                | idx_3_0 | 4       | dbaas.c.custkey |   13 |   100.00 | NULL                                                                |
+----+-------------+-------+------------+-------+------------------------+---------+---------+-----------------+------+----------+---------------------------------------------------------------------+

增加索引之后。掃描 order 表的行數(shù)已經(jīng)大大減少,執(zhí)行效率也提升很高

總結(jié)

1. 關(guān)于 【using where】, 【using index】, 【using index condition】; 【Using where &&Using index】的區(qū)別(為什么總結(jié)這個呢,我這邊的話是建立了兩個庫,數(shù)據(jù)和基本的表結(jié)構(gòu)是一致的,但是其中一個庫中表的索引是按照我自己分析的情況加上去的,另一個庫是官方給出的建議,發(fā)現(xiàn)在執(zhí)行效率都很高的情況下,兩者執(zhí)行計劃的 extra 內(nèi)容有所區(qū)別,本想 google 解決,但是看了排名前三的博客,兩篇的內(nèi)容是一樣的,和第三篇的解釋完全不同,自己嘗試了下,這里給出結(jié)論,最后附上測試流程):

4. select

分析

select c.custkey, c.phone from nation n
        inner join customer c on c.nationkey = n.nationkey
        where n.name = "CHINA" and exists (
            select 1 from orders o where o.custkey = c.custkey and o.orderdate = "1998-08-11");

在上面已有的索引前提下,執(zhí)行計劃為

+----+--------------------+-------+------------+------+-----------------+---------+---------+-----------------------+---------+----------+----------------------------------------------------+
| id | select_type        | table | partitions | type | possible_keys   | key     | key_len | ref                   | rows    | filtered | Extra                                              |
+----+--------------------+-------+------------+------+-----------------+---------+---------+-----------------------+---------+----------+----------------------------------------------------+
|  1 | PRIMARY            | n     | NULL       | ref  | PRIMARY,idx_1_0 | idx_1_0 | 75      | const                 |       1 |   100.00 | Using index                                        |
|  1 | PRIMARY            | c     | NULL       | ALL  | NULL            | NULL    | NULL    | NULL                  | 1189853 |    10.00 | Using where; Using join buffer (Block Nested Loop) |
|  2 | DEPENDENT SUBQUERY | o     | NULL       | ref  | idx_2_0,idx_3_0 | idx_2_0 | 7       | const,dbaas.c.custkey |       1 |   100.00 | Using index                                        |
+----+--------------------+-------+------------+------+-----------------+---------+---------+-----------------------+---------+----------+----------------------------------------------------+

看到了 DEPENDENT SUBQUERY,在 in/exists +子查詢的條件下。經(jīng)常會出現(xiàn),有什么危害上面有解釋,出現(xiàn)了這個東西,就要想辦法改寫 sql。既然是 exists + 子查詢,那么優(yōu)化策略就是改寫為 join。

最通俗的改寫方式:先全部 inner join,最后加 where 條件

    select c.custkey, c.phone from nation n
        inner join customer c on c.nationkey = n.nationkey
        inner join orders o on o.custkey = c.custkey
      where n.name = "CHINA" and o.orderdate = "1998-08-11";

官方給出的 sql 比較復(fù)雜,但做的事情差不多,多考慮了一點試圖使用 group by 來減少 join 的數(shù)據(jù)量,給出官方答案,這里不多解釋它【不過這里去掉 group by 會更好】

SELECT `t1`.`custkey`, `t1`.`phone` FROM 
        ( SELECT * FROM `dbaas`.`nation` AS `t` WHERE `t`.`name` = 'CHINA' ) AS `t0`
    INNER JOIN `dbaas`.`customer` AS `t1` 
        ON `t0`.`nationkey` = `t1`.`nationkey`
    INNER JOIN (
        SELECT `t2`.`custkey` FROM `dbaas`.`orders` AS `t2` 
            WHERE `t2`.`orderdate` = '1998-08-11' GROUP BY `t2`.`custkey` ) AS `t5` 
        ON `t1`.`custkey` = `t5`.`custkey`

索引建議的話這里就沒有太多了,條件字段已經(jīng)都有了相應(yīng)的索引。

優(yōu)化

優(yōu)化后的執(zhí)行計劃如下:

+----+-------------+-------+------------+--------+-----------------+---------+---------+-----------------+------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys   | key     | key_len | ref             | rows | filtered | Extra       |
+----+-------------+-------+------------+--------+-----------------+---------+---------+-----------------+------+----------+-------------+
|  1 | SIMPLE      | n     | NULL       | ref    | PRIMARY,idx_1_0 | idx_1_0 | 75      | const           |    1 |   100.00 | Using index |
|  1 | SIMPLE      | o     | NULL       | ref    | idx_2_0,idx_3_0 | idx_2_0 | 3       | const           |   20 |   100.00 | Using index |
|  1 | SIMPLE      | c     | NULL       | eq_ref | PRIMARY         | PRIMARY | 4       | dbaas.o.custkey |    1 |    10.00 | Using where |
+----+-------------+-------+------------+--------+-----------------+---------+---------+-----------------+------+----------+-------------+

可以看到兩次join的驅(qū)動表分別選擇了n和o,ref 也是 const,性能要比 DEPENDENT SUBQUERY 這種要好太多了

總結(jié)

1. 并不是所有的復(fù)雜 join 都要使用 group by,和數(shù)據(jù)分布有關(guān),如果 group by 并不能顯著降低 join 行數(shù)的話, 沒有必要;

胡思亂想

mysql 的查詢優(yōu)化器相對來說是一個比較復(fù)雜的邏輯,期待它可以更好工作的前提是sql的寫法要合理,同時也要有恰當(dāng)?shù)乃饕?/p>

我們對 sql 的優(yōu)化,通常是先去考慮優(yōu)化sql,再根據(jù)優(yōu)化后的 sql 增加所需索引。(在實際數(shù)據(jù)庫開發(fā)過程中,尤其是 2B 的服務(wù)提供端,我們會優(yōu)先在不需要業(yè)務(wù)改動的情況下增加所需索引嘗試解決慢查詢的問題,如果增加索引不能解決問題,那么就需要業(yè)務(wù)進(jìn)行相應(yīng)改造)

首先,關(guān)于sql改寫,這個要考慮的比較多,因為mysql的優(yōu)化器、執(zhí)行器做了太多的事情,靠AI可以自動改寫優(yōu)化的的話,不敢想象。。。(DBA又要有一波人下崗了)人工的話根據(jù)經(jīng)驗吧,根據(jù)執(zhí)行計劃中的異常點去考慮改進(jìn),比如子查詢改寫為 join 等,像預(yù)賽題目中關(guān)于order by status 改寫為 union all 的做法,確實有一定的效果,但是并不是一個通用的方法,這里就太靈活了;
其次,相比之下,在已知sql和表結(jié)構(gòu)的情況下,依靠AI給出索引建議還是更加讓人有真實感,索引的話有一些通用的規(guī)則,網(wǎng)上介紹的很多了,自己瞎寫了點,看看就好

1. 找出所有條件字段,計算字段區(qū)分度,區(qū)分度很低的字段沒有必要加索引,數(shù)據(jù)量很少的字段一樣,加上意義也不大;
2. 條件的話優(yōu)先級  等值 > 聚合(group/order by) > join ,同一優(yōu)先級根據(jù)區(qū)分度建立聯(lián)合索引;
3. 在聚合條件下,如果聚合后的行數(shù)太多,回表量太大的情況下,mysql 可能不會使用這些索引;
4. 驅(qū)動表不需要考慮什么索引,驅(qū)動表的數(shù)據(jù)必然都在join的結(jié)果集中,對于 inner join 這種無法確定驅(qū)動表的,可以考慮在兩張表的合適字段上都添加索引

繼續(xù)扯。。。

現(xiàn)在大家都在往云上遷業(yè)務(wù),云上的db智能診斷難以避免是以后的剛需。AI 可以做到哪一步,不清楚,以后的事情誰知道呢。

向AI問一下細(xì)節(jié)

免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點不代表本網(wǎng)站立場,如果涉及侵權(quán)請聯(lián)系站長郵箱:is@yisu.com進(jìn)行舉報,并提供相關(guān)證據(jù),一經(jīng)查實,將立刻刪除涉嫌侵權(quán)內(nèi)容。

AI