溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點擊 登錄注冊 即表示同意《億速云用戶服務條款》

sql_slave_skip_counter,你真的用對了嗎?

發(fā)布時間:2020-06-28 13:41:10 來源:網絡 閱讀:1115 作者:insist_way 欄目:MySQL數(shù)據(jù)庫

寫在前面:

??? 最近一段時間都在做傳統(tǒng)主從復制相關的測試,思考了很多上線主從復制架構后,可能會發(fā)生的問題,然后針對性設置了這些故障,再然后思考如何在保證業(yè)務可用,或者對業(yè)務沖擊盡可能的小的前提下,進行故障的恢復,也算是小有所得,現(xiàn)在呢,筆者就主從復制故障時,需要跳過故障點時所用到的 sql_slave_skip_counte 變量進行記錄描述


sql_slave_skip_counter 介紹:

摘自MySQL官方的解釋(強烈建議閱讀英文原文。中文版,是筆者自己的理解,只能說仁者見仁)

SET GLOBAL sql_slave_skip_counter Syntax:
??????? SET GLOBAL sql_slave_skip_counter = N
This statement skips the next N events from the master. This is useful for recovering from replication stops caused by a statement.
??????? 跳過N個events。注意:以event為單位,而不是以事務為單位,只有在由單條語句組成的事務時,兩者才等價。
??????? 如:一個事務由多個EVENT組成,BEGIN;INSERT;UPDATE;DELETE;COMMOIT; 這種情況下,兩者絕不相等
This statement is valid only when the slave threads are not running. Otherwise, it produces an error.

When using this statement, it is important to understand that the binary log is actually organized as a sequence of groups known as event groups. Each event group consists of a sequence of events.

For transactional tables, an event group corresponds to a transaction.
??? ????對于事務表,一個event group對應一個事務
or nontransactional tables, an event group corresponds to a single SQL statement.
??????? 對于非事務表,一個event group對應一條SQL
When you use SET GLOBAL sql_slave_skip_counter to skip events and the result is in the middle of a group, the slave continues to skip events until it reaches the end of the group. Execution then starts with the next event group
??? ????當你跳過event的時候,如果N的值,處于event group之中,那么slave會繼續(xù)跳過event,直至跳過這個event group,從下一個event group開始


對于事務表使用sql_slave_skip_counter的情況:


1、跳過1032復制錯誤(update/delete error)

跳過由單條SQL組成的事務:


在Slave主機上人為的刪除兩條數(shù)據(jù):

DELETE FROM `edusoho_e`.`t1` WHERE `id` = '9';
DELETE FROM `edusoho_e`.`t1` WHERE `id` = '11';


而Master在變更上述兩條記錄的時候會報錯,導致復制中斷:

INSERT INTO `edusoho_e`.`t1` (`xname`, `address`, `hobby`) VALUES ('孫權', '吳國', '妹妹');
UPDATE `edusoho_e`.`t1` SET xname='游戲' WHERE id=7;
UPDATE `edusoho_e`.`t1` SET age=40 WHERE id=11;????#報錯
DELETE FROM `edusoho_e`.`t1` WHERE age=40;????????????#報錯
INSERT INTO `edusoho_e`.`t1` (`xname`, `address`, `hobby`) VALUES ('曹丕', '魏國', '甄姬');
DELETE FROM `edusoho_e`.`t1` WHERE id=1;
UPDATE `edusoho_e`.`t1` SET hobby='Games' WHERE id=3;?


在Slave查看主從復制狀態(tài)時,就會發(fā)現(xiàn)報錯信息:

mysql> show slave status\G;
*************************** 1. row ***************************
Read_Master_Log_Pos: 2176
Exec_Master_Log_Pos: 874
Last_Errno: 1032
Last_Error: Could not execute Update_rows event on table edusoho_e.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000002, end_log_pos 1127
Slave_IO_Running: Yes
Slave_SQL_Running: No


在Master主機上查看position做了什么操作:

mysql> show binlog events in 'mysql-bin.000002' from 874;
+------------------+------+-------------+-----------+-------------+---------------------------------+
| Log_name???????? | Pos? | Event_type? | Server_id | End_log_pos | Info??????????????????????????? |
+------------------+------+-------------+-----------+-------------+---------------------------------+
| mysql-bin.000002 |? 874 | Query?????? |???????? 2 |???????? 956 | BEGIN?????????????????????????? |
| mysql-bin.000002 |? 956 | Table_map?? |???????? 2 |??????? 1017 | table_id: 213 (edusoho_e.t1)??? |
| mysql-bin.000002 | 1017 | Update_rows |???????? 2 |??????? 1127 | table_id: 213 flags: STMT_END_F |
| mysql-bin.000002 | 1127 | Xid???????? |???????? 2 |??????? 1158 | COMMIT /* xid=437 */??????????? |
| mysql-bin.000002 | 1158 | Query?????? |???????? 2 |??????? 1240 | BEGIN?????????????????????????? |
| mysql-bin.000002 | 1240 | Table_map?? |???????? 2 |??????? 1301 | table_id: 213 (edusoho_e.t1)??? |
| mysql-bin.000002 | 1301 | Delete_rows |???????? 2 |??????? 1407 | table_id: 213 flags: STMT_END_F |
| mysql-bin.000002 | 1407 | Xid???????? |???????? 2 |??????? 1438 | COMMIT /* xid=446 */??????????? |
| mysql-bin.000002 | 1438 | Query?????? |???????? 2 |??????? 1520 | BEGIN?????????????????????????? |
| mysql-bin.000002 | 1520 | Table_map?? |???????? 2 |??????? 1581 | table_id: 213 (edusoho_e.t1)??? |
| mysql-bin.000002 | 1581 | Write_rows? |???????? 2 |??????? 1644 | table_id: 213 flags: STMT_END_F |
| mysql-bin.000002 | 1644 | Xid???????? |???????? 2 |??????? 1675 | COMMIT /* xid=455 */??????????? |
| mysql-bin.000002 | 1675 | Query?????? |???????? 2 |??????? 1757 | BEGIN?????????????????????????? |
| mysql-bin.000002 | 1757 | Table_map?? |???????? 2 |??????? 1818 | table_id: 213 (edusoho_e.t1)??? |
| mysql-bin.000002 | 1818 | Delete_rows |???????? 2 |??????? 1880 | table_id: 213 flags: STMT_END_F |
| mysql-bin.000002 | 1880 | Xid???????? |???????? 2 |??????? 1911 | COMMIT /* xid=464 */??????????? |
| mysql-bin.000002 | 1911 | Query?????? |???????? 2 |??????? 1993 | BEGIN?????????????????????????? |
| mysql-bin.000002 | 1993 | Table_map?? |???????? 2 |??????? 2054 | table_id: 213 (edusoho_e.t1)??? |
| mysql-bin.000002 | 2054 | Update_rows |???????? 2 |??????? 2145 | table_id: 213 flags: STMT_END_F |
| mysql-bin.000002 | 2145 | Xid???????? |???????? 2 |??????? 2176 | COMMIT /* xid=473 */??????????? |
+------------------+------+-------------+-----------+-------------+---------------------------------+


在Slave跳過第一個Update_rows event復制報錯:

mysql> set global sql_slave_skip_counter=1;
mysql> start slave sql_thread;
mysql> show slave status\G;
Slave_IO_Running: Yes
Slave_SQL_Running: No
Exec_Master_Log_Pos: 1158
Last_SQL_Errno: 1032
Last_SQL_Error: Could not execute Delete_rows event on table edusoho_e.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000002, end_log_pos 1407


成功跳過第一個events group


在Slave繼續(xù)跳過第二個Delete_rows event復制報錯:

mysql> set global sql_slave_skip_counter=1;
mysql> start slave sql_thread;
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Last_SQL_Errno: 0
Last_SQL_Error:


成功跳過第二個events group


注意:

雖然主從復制出現(xiàn)的故障成功跳過了,但只是暫時恢復了正常的主從復制狀態(tài),需要盡快的對Slave缺失的數(shù)據(jù)進行補齊,不然Master對Slave不存在的數(shù)據(jù)做的變更,仍然會重復導致主從復制故障,筆者覺得如果你的數(shù)據(jù)量差異不是太大的話,可以考慮使用pt-table-checksum和pt-table-sync工具進行恢復,如果你的數(shù)據(jù)量很大且數(shù)據(jù)差異很多,還是建議重做Slave較好,因為使用工具會鎖表,會對線上業(yè)務造成一定的影響,具體情況,請自行考量。


跳過由多條SQL(event)組成的事務:

在Slave主機上人為的刪除一條數(shù)據(jù):

DELETE FROM `edusoho_e`.`t1` WHERE `id` = '7';


在Master主機上產生一個由多條SQL組成的事務:

BEGIN;
DELETE FROM `edusoho_e`.`t1` WHERE `id` = '7';
INSERT INTO `edusoho_e`.`t1` (`xname`, `address`, `hobby`) VALUES ('懶死', '不知道', '吃了睡睡了吃');
COMMIT;


因為Slave主機上已經刪除id=7的數(shù)據(jù),在Slave查看主從復制狀態(tài)時,就會發(fā)現(xiàn)報錯信息:

mysql> show slave status\G;
*************************** 1. row ***************************
Read_Master_Log_Pos: 7219
Exec_Master_Log_Pos: 6840
Slave_IO_Running: Yes
Slave_SQL_Running: No
Last_Errno: 1032
Last_Error: Could not execute Delete_rows event on table edusoho_e.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000002, end_log_pos 7049


在Master主機上查看position做了什么操作:

mysql> show binlog events in 'mysql-bin.000002' from 6840;
+------------------+------+-------------+-----------+-------------+---------------------------------+
| Log_name???????? | Pos? | Event_type? | Server_id | End_log_pos | Info??????????????????????????? |
+------------------+------+-------------+-----------+-------------+---------------------------------+
| mysql-bin.000002 | 6840 | Query?????? |???????? 2 |??????? 6922 | BEGIN?????????????????????????? |
| mysql-bin.000002 | 6922 | Table_map?? |???????? 2 |??????? 6983 | table_id: 213 (edusoho_e.t1)??? |
| mysql-bin.000002 | 6983 | Delete_rows |???????? 2 |??????? 7049 | table_id: 213 flags: STMT_END_F |
| mysql-bin.000002 | 7049 | Table_map?? |???????? 2 |??????? 7110 | table_id: 213 (edusoho_e.t1)??? |
| mysql-bin.000002 | 7110 | Write_rows? |???????? 2 |??????? 7188 | table_id: 213 flags: STMT_END_F |
| mysql-bin.000002 | 7188 | Xid???????? |???????? 2 |??????? 7219 | COMMIT /* xid=825 */??????????? |
+------------------+------+-------------+-----------+-------------+---------------------------------+


可以看到,這個事務是由兩個SQL(event)組成的


如果使用 sql_slave_skip_counter=N 跳過由多條SQL組成的事務會怎樣呢?

mysql> set global sql_slave_skip_counter=1;
mysql> start slave sql_thread;
mysql> show slave status\G;
*************************** 1. row ***************************
Read_Master_Log_Pos: 7219
Exec_Master_Log_Pos: 7219
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Last_Errno: 0
Last_Error:


發(fā)現(xiàn)問題沒有,在使用sql_slave_skip_counter跳過由多條SQL(event)組成的事務時,從在Master上執(zhí)行的 show binlog events 可以看到,如果只是跳過出報錯SQL語句,那么 Exec_Master_Log_Pos 值應該為7110,但是現(xiàn)在為7219,說明將整個event group跳過了,但是7110的SQL數(shù)據(jù)是我們需要的,所以,和單條SQL組成的事務一樣,主從復制狀態(tài)雖然恢復,但是數(shù)據(jù)仍處于不一致狀態(tài),要抓緊時間補齊數(shù)據(jù)或重做Slave


2、由多條SQL(event)組成的事務時,僅跳過一個event,而不是一個event group:

在Slave主機上人為的刪除一條數(shù)據(jù):

DELETE FROM `edusoho_e`.`t1` WHERE `id` = '17';


在Master主機上產生一個由多條SQL組成的事務:

BEGIN;
DELETE FROM `edusoho_e`.`t1` WHERE `id` = '17';
INSERT INTO `edusoho_e`.`t1` (`xname`, `address`, `hobby`) VALUES ('我是誰', '不知道', '吃了睡睡了吃');
COMMIT;


因為Slave主機上已經刪除id=17的數(shù)據(jù),在Slave查看主從復制狀態(tài)時,就會發(fā)現(xiàn)報錯信息:

Exec_Master_Log_Pos: 120

Slave_IO_Running: Yes
Slave_SQL_Running: No

Last_Errno: 1032
Last_Error: Could not execute Delete_rows event on table edusoho_e.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000004, end_log_pos 341


在Master主機上查看position做了什么操作:

mysqlbinlog -v --base64-output=decode --start-position=120 mysql-bin.000004
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 120
#190507 13:52:05 server id 2? end_log_pos 202 CRC32 0x0ca0c280 ?? ?Query?? ?thread_id=3?? ?exec_time=0?? ?error_code=0
SET TIMESTAMP=1557208325/*!*/;
SET @@session.pseudo_thread_id=3/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1073741824/*!*/;
SET @@session.auto_increment_increment=2, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 202
#190507 13:52:05 server id 2? end_log_pos 263 CRC32 0x20d2e89d ?? ?Table_map: `edusoho_e`.`t1` mapped to number 216
# at 263
#190507 13:52:05 server id 2? end_log_pos 341 CRC32 0xbec6fd45 ?? ?Delete_rows: table id 216 flags: STMT_END_F
### DELETE FROM `edusoho_e`.`t1`
### WHERE
###?? @1=17
###?? @2='懶死'
###?? @3='不知道'
###?? @4=1
###?? @5='吃了睡睡了吃'
###?? @6=18

# at 341
#190507 13:52:05 server id 2? end_log_pos 402 CRC32 0xa37bc5c9 ?? ?Table_map: `edusoho_e`.`t1` mapped to number 216
# at 402
#190507 13:52:05 server id 2? end_log_pos 483 CRC32 0x0d774707 ?? ?Write_rows: table id 216 flags: STMT_END_F
### INSERT INTO `edusoho_e`.`t1`
### SET
###?? @1=21
###?? @2='我是誰'
###?? @3='不知道'
###?? @4=1
###?? @5='吃了睡睡了吃'
###?? @6=18
# at 483
#190507 13:52:05 server id 2? end_log_pos 514 CRC32 0x8c333b30 ?? ?Xid = 411
COMMIT/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

可以看到,綠色的部分就是我們需要跳過的,而第二個event是需要我們保留的

這個時候,就需要用到slave_exec_mode這個變量了,至于slave_exec_mode詳細介紹,還是請參考MySQL官網資料


mysql> set global slave_exec_mode='IDEMPOTENT';
mysql> start slave sql_thread;
mysql> show slave status\G;
*************************** 1. row ***************************

Slave_IO_Running: Yes
Slave_SQL_Running: Yes

Exec_Master_Log_Pos: 514


去Slave上edusoho_e.t1表上查看,數(shù)據(jù)id=21的數(shù)據(jù)已經過去了,此時,數(shù)據(jù)處于一致性狀態(tài)


3、跳過主鍵沖突1062錯誤(Duplicate entry):

在Slave主鍵上先插入一條id值:

INSERT INTO `edusoho_e`.`t1` (`id`,`xname`, `address`, `hobby`, `age`) VALUES (19,'小玩子', '明朝', '皇后', '25');


因為Slave已經占用了Master要自動產生的主鍵值id=19,所以Slave主機會報錯:

INSERT INTO `edusoho_e`.`t1` (`id`,`xname`, `address`, `hobby`, `age`) VALUES (19,'朱棣', '明朝', '皇帝', '36');?


查看Slave主從復制狀態(tài)發(fā)現(xiàn)已經發(fā)生了主從復制報錯:

mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_Running: Yes
Slave_SQL_Running: No
Last_Errno: 1062
Last_Error: Could not execute Write_rows event on table edusoho_e.t1; Duplicate entry '19' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000002, end_log_pos 7425
Exec_Master_Log_Pos: 7219


查看Master binlog:

mysql> show binlog events in 'mysql-bin.000002' from 7219;
+------------------+------+------------+-----------+-------------+---------------------------------+
| Log_name???????? | Pos? | Event_type | Server_id | End_log_pos | Info??????????????????????????? |
+------------------+------+------------+-----------+-------------+---------------------------------+
| mysql-bin.000002 | 7219 | Query????? |???????? 2 |??????? 7301 | BEGIN?????????????????????????? |
| mysql-bin.000002 | 7301 | Table_map? |???????? 2 |??????? 7362 | table_id: 213 (edusoho_e.t1)??? |
| mysql-bin.000002 | 7362 | Write_rows |???????? 2 |??????? 7425 | table_id: 213 flags: STMT_END_F |
| mysql-bin.000002 | 7425 | Xid??????? |???????? 2 |??????? 7456 | COMMIT /* xid=893 */??????????? |
+------------------+------+------------+-----------+-------------+---------------------------------+


思考:

因為Slave這條數(shù)據(jù)已經存在,如果在Slave主機上把這條數(shù)據(jù)刪除了,Slave會不會直接同步過來?(答案是:不會。需要重啟Slave thread):

DELETE FROM `edusoho_e`.`t1` WHERE `id` = '19';
mysql> stop slave;
mysql> start slave user='repliter' password='123456';

驗證的時候,發(fā)現(xiàn)數(shù)據(jù)已經同步過去了


題外:

以上是筆者對于單條SQL組成的事務、多條SQL組成的事務,及在這些單/多條SQL組成的事務下,人為設置的1032和1062復制錯誤和解決方法,還有sql_slave_skip_counter和slave_exec_mode各自的用法和跳過的范圍,當然了,筆者呢,做的只是線上應用前的部署測試,并沒有經過任何的實戰(zhàn)檢測。一方面,僅為廣大同行做個參考;另一方面,記錄筆者自己的心得和針對問題解決的思路做個總結,當問題真正發(fā)生的時候,有個方向可以進行參考,而不至于手忙腳亂,不知所措,所以,對其中有誤之處和理解不到位的地方,望請下方留言指正,不勝感激!

還有,筆者做的,只是針對事務表,做的sql_slave_skip_counter和slave_exec_mode測試,對于非事務表,sql_slave_skip_counter和slave_exec_mode用途會稍有不同,請自行百度吧





向AI問一下細節(jié)

免責聲明:本站發(fā)布的內容(圖片、視頻和文字)以原創(chuàng)、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯(lián)系站長郵箱:is@yisu.com進行舉報,并提供相關證據(jù),一經查實,將立刻刪除涉嫌侵權內容。

AI