mysql 大事物commit慢造成全庫堵塞問題
原創(chuàng)轉載請注明出處
本文使用引擎INNODB版本
MYSQL5.7.13
sync_binlog = 1
innodb_flush_log_at_trx_commit = 1
也就是雙1設置,
本文使用通過對mysql 進行trace,gdb 自制工具infobin,以及
select * from sys.session where command<>'SLEEP' \G完成。
這里只是涉及binlog的寫入問題和innodb層無關
1、問題由來
在線上環(huán)境一次同事刪除了1000W的數(shù)據(jù),在commit的時候,整個數(shù)據(jù)庫
大約有幾十秒的全部hang住的情況,本文就是為了深入剖析這個問題。
2、可能需要的知識
http://blog.itpub.net/7728585/viewspace-2133589/ MYSQL 什么時候同步binlog文件以及它的臨時文件在哪里?
http://blog.itpub.net/7728585/viewspace-2133188/ 解析MYSQL BINLOG 二進制格式(1)--準備工作
http://blog.itpub.net/7728585/viewspace-2133189/ 解析MYSQL BINLOG 二進制格式(2)--FORMAT_DESCRIPTION_EVENT
http://blog.itpub.net/7728585/viewspace-2133321/ 解析MYSQL BINLOG 二進制格式(3)--QUERY_EVENT
http://blog.itpub.net/7728585/viewspace-2133429/ 解析MYSQL BINLOG 二進制格式(4)--TABLE_MAP_EVENT
http://blog.itpub.net/7728585/viewspace-2133463/ 解析MYSQL BINLOG 二進制格式(5)--WRITE_ROW_EVENT
http://blog.itpub.net/7728585/viewspace-2133469/ 解析MYSQL BINLOG 二進制格式(6)--UPDATE_ROW_EVENT/DELETE_ROW_EVENT
http://blog.itpub.net/7728585/viewspace-2133502/ 解析MYSQL BINLOG 二進制格式(7)--Xid_log_event/XID_EVENT
http://blog.itpub.net/7728585/viewspace-2133506/ 解析MYSQL BINLOG二進制格式(8)--GTID_LOG_EVENT/ANONYMOUS_GTID_LOG_EVENT及其他
http://blog.itpub.net/7728585/viewspace-2133534/ 解析MYSQL BINLOG二進制格式(9)--infobin解析binlog幫助文檔
http://blog.itpub.net/7728585/viewspace-2133537/ 解析MYSQL BINLOG二進制格式(10)--問題解答
3、簡單說明commit的時候做了什么以及為什么要上鎖
在<>一文中有如下的圖
其實這幅圖中我已經(jīng)寫了需要一把鎖,但是沒仔細說,這把鎖影響的是全部當前數(shù)據(jù)庫的需要寫binlog的全部語句,我們已經(jīng)知道對于大事物(超過)的commit期間需要將臨時
文件的中的binlog event 全部拷貝到binlog中,并且是一個事物的event必須是連續(xù)的,也就是一個事物的event中不能包含其他事物的event那么這期間需要一把mutex鎖,為什
么要這把鎖,這是在多線程/進程并發(fā)對同一個文件進行修改,也就是這里的binlog文件修改的,一種保護臨界資源的方法,如果沒有這把鎖就會出現(xiàn)binlog中的文件出現(xiàn)錯亂的情況。
注意這個鎖是mysql層的而非innodb層。
4、模擬方法
對表test 進行刪除不提交,觀察臨時文件大小。
關于生成臨時文件的原理參考:
(http://blog.itpub.net/7728585/viewspace-2133589/ MYSQL 什么時候同步binlog文件以及它的臨時文件在哪里?)
對刪除delete test表commit期間馬上對testsort和testsort2表插入一條數(shù)據(jù),
這個時候可以觀察到這兩個插入,會被堵塞知道delete test的事物提交完成
才完成。
mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
| 2621440 |
+----------+
1 row in set (3.14 sec)
及如下:
可以看到一個簡單的insert居然花了10多秒,commit也是做了14秒。原因就在于需要將MLq9INFu這個臨時文件同步在binlog這個過程
其他需要寫入binlog的會話是堵塞的。
刪除期間會看到臨時文件不斷增大如下:
[root@testmy ~]# lsof |grep /tmp/ML
mysqld 2095 mysql 84u REG 8,3 328560640 14680133 /tmp/MLq9INFu (deleted)
[root@testmy ~]# lsof |grep /tmp/ML
mysqld 2095 mysql 84u REG 8,3 346337280 14680133 /tmp/MLq9INFu (deleted)
[root@testmy ~]# lsof |grep /tmp/ML
mysqld 2095 mysql 84u REG 8,3 367812608 14680133 /tmp/MLq9INFu (deleted)
到了commit后這個文件變?yōu)?,但是任然存在因為這個會話線程還在:
[root@testmy ~]# lsof |grep /tmp/ML
mysqld 2095 mysql 84u REG 8,3 0 14680133 /tmp/MLq9INFu (deleted)
5、使用infobin 對binlog文件中進行驗證
使用./infobin test.000211>log.log 進行分析
(可以使用mysqlbinlog但是不直觀
工具可以參考
http://blog.itpub.net/7728585/viewspace-2133534/ 解析MYSQL BINLOG二進制格式(9)--infobin解析binlog幫助文檔
獲得)
-----delete from test; 的binlog event
>Gtid Event:Pos:368030739(0X15efb413) N_pos:368030804(0X15efb454) Time:1487197597 Event_size:65(bytes)
Gtid:4a6f2a67-5d87-11e6-a6bd-0c29a879a3:1158184
-->Query Event:Pos:368030804(0X15efb454) N_Pos:368030876(0X15efb49c) Time:1487197355 Event_size:72(bytes)
Exe_time:0 Use_db:test Statment(35b-trun):BEGIN /*!Trx begin!*/ Gno:1158184
---->Map Event:Pos368030876(0X15efb49c) N_pos:368031047(0X15efb547) Time:1487197355 Event_size:171(bytes)
TABLE_ID:129 DB_NAME:test TABLE_NAME:test Gno:1158184
------>Delete Event:Pos:368031047(0X15efb547) N_pos:368039196(0X15efd51c) Time:1487197355 Event_size:8149(bytes)
Dml on table: test.test table_id:129 Gno:1158184
....(中間省略)
------>Delete Event:Pos:736052116(0X2bdf4394) N_pos:736054135(0X2bdf4b77) Time:1487197355 Event_size:2019(bytes)
Dml on table: test.test table_id:129 Gno:1158184
>Xid Event:Pos:736054135(0X2bdf4b77) N_Pos:736054166(0X2bdf4b96) Time:1487197597 Event_size:31(bytes)
COMMIT; /*!Trx end*/ Gno:1158184
-----insert into testsort values(1);的binlog
>Gtid Event:Pos:736054166(0X2bdf4b96) N_pos:736054231(0X2bdf4bd7) Time:1487197599 Event_size:65(bytes)
Gtid:4a6f2a67-5d87-11e6-a6bd-0c29a879a3:1158185
-->Query Event:Pos:736054231(0X2bdf4bd7) N_Pos:736054303(0X2bdf4c1f) Time:1487197599 Event_size:72(bytes)
Exe_time:0 Use_db:test Statment(35b-trun):BEGIN /*!Trx begin!*/ Gno:1158185
---->Map Event:Pos736054303(0X2bdf4c1f) N_pos:736054358(0X2bdf4c56) Time:1487197599 Event_size:55(bytes)
TABLE_ID:187 DB_NAME:test TABLE_NAME:testsort Gno:1158185
------>Insert Event:Pos:736054358(0X2bdf4c56) N_pos:736054410(0X2bdf4c8a) Time:1487197599 Event_size:52(bytes)
Dml on table: test.testsort table_id:187 Gno:1158185
>Xid Event:Pos:736054410(0X2bdf4c8a) N_Pos:736054441(0X2bdf4ca9) Time:1487197599 Event_size:31(bytes)
COMMIT; /*!Trx end*/ Gno:1158185
------insert into testsort2 values(1);的binlog
>Gtid Event:Pos:736054441(0X2bdf4ca9) N_pos:736054506(0X2bdf4cea) Time:1487197600 Event_size:65(bytes)
Gtid:4a6f2a67-5d87-11e6-a6bd-0c29a879a3:1158186
-->Query Event:Pos:736054506(0X2bdf4cea) N_Pos:736054578(0X2bdf4d32) Time:1487197600 Event_size:72(bytes)
Exe_time:0 Use_db:test Statment(35b-trun):BEGIN /*!Trx begin!*/ Gno:1158186
---->Map Event:Pos736054578(0X2bdf4d32) N_pos:736054634(0X2bdf4d6a) Time:1487197600 Event_size:56(bytes)
TABLE_ID:188 DB_NAME:test TABLE_NAME:testsort2 Gno:1158186
------>Insert Event:Pos:736054634(0X2bdf4d6a) N_pos:736054686(0X2bdf4d9e) Time:1487197600 Event_size:52(bytes)
Dml on table: test.testsort2 table_id:188 Gno:1158186
>Xid Event:Pos:736054686(0X2bdf4d9e) N_Pos:736054717(0X2bdf4dbd) Time:1487197600 Event_size:31(bytes)
COMMIT; /*!Trx end*/ Gno:1158186
可以看到確實insert into testsort values(1)和insert into testsort2 values(1);的event確實在delete后才進入
binlog file
6、分析等待點。
那么到底在等待什么呢?這是另外一次測試的sys.session輸出
mysql> select * from sys.session where command<>'SLEEP' \G
*************************** 1. row ***************************
thd_id: 28
conn_id: 3
user: root@localhost
db: test
command: Query
state: starting
time: 13
current_statement: commit
statement_latency: 12.55 s
progress: NULL
lock_latency: 0 ps
rows_examined: 0
rows_sent: 0
rows_affected: 0
tmp_tables: 0
tmp_disk_tables: 0
full_scan: NO
last_statement: NULL
last_statement_latency: NULL
current_memory: 179.02 KiB
last_wait: wait/io/file/sql/binlog
last_wait_latency: Still Waiting
source: binlog.cc:8469
trx_latency: 3.22 m
trx_state: ACTIVE
trx_autocommit: NO
pid: 5821
program_name: mysql
*************************** 2. row ***************************
thd_id: 3031
conn_id: 3006
user: root@localhost
db: test
command: Query
state: query end
time: 12
current_statement: insert into testsort3(id1,id2, ... OR((RAND()*100000)),'gaopeng')
statement_latency: 11.82 s
progress: NULL
lock_latency: 29.18 ms
rows_examined: 0
rows_sent: 0
rows_affected: 0
tmp_tables: 0
tmp_disk_tables: 0
full_scan: NO
last_statement: NULL
last_statement_latency: NULL
current_memory: 556.53 KiB
last_wait: wait/synch/cond/sql/MYSQL_BIN_LOG::COND_done
last_wait_latency: Still Waiting
source: binlog.cc:1894
trx_latency: 12.97 s
trx_state: ACTIVE
trx_autocommit: YES
pid: 15312
program_name: mysql
不難看出current_statement: commit 等待是
last_wait: wait/io/file/sql/binlog
last_wait_latency: Still Waiting
source: binlog.cc:8469
而被堵塞的insert等待是
last_wait: wait/synch/cond/sql/MYSQL_BIN_LOG::COND_done
last_wait_latency: Still Waiting
source: binlog.cc:1894
這是commit 等待的binlog.cc:8469的內(nèi)容
/**
On *pure non-transactional* workloads there is a small window
in time where a concurrent rotate might be able to close
the file before the sync is actually done. In that case,
ignore the bad file descriptor errors.
Transactional workloads (InnoDB) are not affected since the
the rotation will not happen until all transactions have
committed to the storage engine, thence decreased the XID
counters.(重點)
TODO: fix this properly even for non-transactional storage
engines.
*/
if (DBUG_EVALUATE_IF("simulate_error_during_sync_binlog_file", 1,
mysql_file_sync(log_file.file,
MYF(MY_WME | MY_IGNORE_BADFD))))
從源碼注釋上可以看出原因再來看看他掉用int my_sync(File fd, myf my_flags)
既然加入文件描述符fd,應該就是在寫binlog file
下面是insert等待的source: binlog.cc:1894的內(nèi)容
mysql_cond_wait(&m_cond_done, &m_lock_done);
這個地方就是拿不到這把鎖,他是Stage_manager::enroll_for調(diào)用的函數(shù)
而 Stage_manager源碼注釋是:
/**
Class for maintaining the commit stages for binary log group commit.
*/
而enroll_for方法的描述為:
/**
Enroll a set of sessions for a stage.
This will queue the session thread for writing and flushing.
If the thread being queued is assigned as stage leader, it will
return immediately.
If wait_if_follower is true the thread is not the stage leader,
the thread will be wait for the queue to be processed by the
leader before it returns.(重點)
In DBUG-ON version the follower marks is preempt status as ready.
@param stage Stage identifier for the queue to append to.
@param first Queue to append.
@param stage_mutex
Pointer to the currently held stage mutex, or NULL if
we're not in a stage.
@retval true Thread is stage leader.
@retval false Thread was not stage leader and processing has been done.
*/
那么確實commit后binary log確實有一個寫入的策略,會根據(jù)是否為leader進行寫入,
如果不是leader則等到leader全部寫完才進行寫入,多個等待寫入binlog的會話計入
一個隊列中,這貌似就是并行了。
我使用GDB將斷點打到
Stage_manager::enroll_for
MYSQL_BIN_LOG::sync_binlog_file
發(fā)現(xiàn)整個流程如下:
A會話COMMIT-->拿到鎖-->進行binlog寫-->commit完成
B會話COMMIT-->等待鎖--------------------------->拿到鎖-->進行binlog寫-->commit完成
7、如何避免
沒有什么好的辦法如果使用innodb雙1的情況下,只能每次修改少量的數(shù)據(jù),增加修改次數(shù)
前面也說了binlog生成量大約為你修改數(shù)據(jù)量的2/3如果update是4/3那么就要評估你的I/O
性能,如果順序?qū)懭朊棵雽懭霝閄需要修改的數(shù)據(jù)量大約為Y
那么
S(秒)=(Y*(2/3))/(X/2)
控制S到一個你認為可以接受的范圍內(nèi)。其實可以通過copy來試試你的順序的I/O性能。
比如我們寫這樣一個存儲過程:
DELIMITER //
CREATE PROCEDURE d_move(in beid int,in cunt int,in rws int,in maxid int)
begin
declare num int;
declare nowid int;
set num = 1;
set nowid = beid;
while num <= cunt do
insert into test_vest_hist select * from test_vest where id>=nowid and id<nowid+rws and="" id<nowid+rws="" id<maxid;="" ="" select="" concat(now(),'="" ',row_count(),'="" insert!'); delete from test_vest where id>=nowid and id<nowid+rws and="" id<maxid;="" ="" select="" concat(now(),'="" ',row_count(),'="" delete!'); select concat(now(),' ',num,' times delete finsh!');
set nowid = nowid+rws;
set num = num+1;
select sleep(2);
end while;
end //
將rws控制到一個可以接受的范圍比如5W,當然最好進行sleep(),這樣給innodb引擎一個刷入臟數(shù)據(jù)的時間減少壓力。
如果sync_log不設置為1,這個我做了trace確實就不會在commit的進行binlog寫入了,但是這是不安全的,commit結束
后日志不落地是不安全的。
作者微信: