您好,登錄后才能下訂單哦!
本文主要給大家介紹MySQL主從復(fù)制出現(xiàn)報(bào)錯(cuò)Errno1205解決方案,文章內(nèi)容都是筆者用心摘選和編輯的,具有一定的針對性,對大家的參考意義還是比較大的,下面跟筆者一起了解下MySQL主從復(fù)制出現(xiàn)報(bào)錯(cuò)Errno1205解決方案吧。
nagios報(bào)警如下大致意思就是
14:39分 vip發(fā)生了漂移
**** Nagios ***** Notification Type: PROBLEM Service: check_ha_mysqld Host: prod-bjuc-mysql1-vip Address: 192.168.87.74 State: CRITICAL Date/Time: Wed Mar 28 14:39:27 CST 2018 Additional Info: CRITICAL: HA Failover from 192.168.87.72 to 192.168.87.123, 192.168.87.123 result:PROCS OK: 1 processes with command name /usr/sbin/mysqld, args VIP=192.168.87.74, 192.168.87.72:PROCS OK: 1 processes with command name /usr/sbin/mysqld
14:46分主從復(fù)制中斷
***** Nagios ***** Notification Type: PROBLEM Service: check_mysql_health_slave-sql-running Host: bjuc-mysql1 Address: 192.168.87.72 State: CRITICAL Date/Time: Wed Mar 28 14:46:06 CST 2018 Additional Info: CRITICAL - Slave sql is not running
登錄數(shù)據(jù)庫云服務(wù)器查看主從復(fù)制的狀態(tài),發(fā)現(xiàn)報(bào)錯(cuò)1205錯(cuò)
mysql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.87.123 Master_User: zhu Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000481 Read_Master_Log_Pos: 218802975 Relay_Log_File: mysql-relay-bin.000428 Relay_Log_Pos: 2428110 Relay_Master_Log_File: mysql-bin.000481 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: performance_schema,information_schema,test,mysql Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: %.% Replicate_Wild_Ignore_Table: Last_Errno: 1205 Last_Error: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN(4337475,4337529,4337471,4337365)' Skip_Counter: 0 Exec_Master_Log_Pos: 182088298 Relay_Log_Space: 39142986 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 1205 Last_SQL_Error: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN(4337475,4337529,4337471,4337365)' Replicate_Ignore_Server_Ids: Master_Server_Id: 6 1 row in set (0.00 sec)
查看錯(cuò)誤日志有一下報(bào)錯(cuò)信息:
錯(cuò)誤日志中給出了報(bào)錯(cuò)原因和解決方法
tail -70 error.log
180328 14:36:05 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN(4337475,4337529,4337471,4337365)', Error_code: 1205 180328 14:36:21 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN(4337475,4337529,4337471,4337365)', Error_code: 1205 180328 14:36:38 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN(4337475,4337529,4337471,4337365)', Error_code: 1205 180328 14:36:56 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN(4337475,4337529,4337471,4337365)', Error_code: 1205 180328 14:37:15 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN(4337475,4337529,4337471,4337365)', Error_code: 1205 180328 14:37:35 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN(4337475,4337529,4337471,4337365)', Error_code: 1205 180328 14:37:56 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN(4337475,4337529,4337471,4337365)', Error_code: 1205 180328 14:38:17 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN(4337475,4337529,4337471,4337365)', Error_code: 1205 180328 14:38:38 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN(4337475,4337529,4337471,4337365)', Error_code: 1205 180328 14:38:59 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN(4337475,4337529,4337471,4337365)', Error_code: 1205 180328 14:39:20 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN(4337475,4337529,4337471,4337365)', Error_code: 1205 180328 14:39:20 [ERROR] Slave SQL thread retried transaction 10 time(s) in vain, giving up. Consider raising the value of the slave_transaction_retries variable. 180328 14:39:20 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205 180328 14:39:20 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000481' position 182088298
查看參數(shù)結(jié)合著兩個(gè)參數(shù)可以知道主從復(fù)制報(bào)錯(cuò)的原因?yàn)椋?/p>
180328 14:39:20 [ERROR] Slave SQL thread retried transaction 10 time(s) in vain, giving up. Consider raising the value of the slave_transaction_retries variable.
180328 14:39:20 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205
180328 14:39:20 [ERROR] Error running query, slave SQL thread aborted,Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000481' position 182088298
因?yàn)楸韓otice_status_new產(chǎn)生了鎖,阻塞了這條update語句
參數(shù)innodb_lock_wait_time 設(shè)置的為15秒,單個(gè)事務(wù)在等待15秒后開始報(bào)1025錯(cuò):因?yàn)殒i執(zhí)行超時(shí)并重啟事務(wù)
參數(shù)slave_transaction_retries 設(shè)置的為10次,如果事務(wù)重試次數(shù)超過10次,復(fù)制中斷。(但是錯(cuò)誤日志中嘗試了11次)
mysql> show variables like '%innodb_lock_wait_timeout%'; +--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | innodb_lock_wait_timeout | 15 | +--------------------------+-------+ 1 row in set (0.01 sec) mysql> show variables like '%slave_transaction_retries%'; +---------------------------+-------+ | Variable_name | Value | +---------------------------+-------+ | slave_transaction_retries | 10 | +---------------------------+-------+ 1 row in set (0.00 sec)
解決方法:重啟slave后主從復(fù)制恢復(fù),并開始追日志
mysql> stop slave; Query OK, 0 rows affected (0.02 sec) mysql> start slave; Query OK, 0 rows affected (0.00 sec) mysql> show slave status\G *************************** 1. row *************************** Slave_IO_Running: Yes Slave_SQL_Running: Yes 1 row in set (0.00 sec)
vip的問題等到晚上在切回來,
---------------------------------我是分割線--------------------------
這次問題雖然解決了,但是發(fā)現(xiàn)問題我們一定要找到根音,這樣問題才能算是閉環(huán)。
為什么會產(chǎn)生這個(gè)鎖呢,因?yàn)橛涗涙i信息的表都是實(shí)時(shí)的,現(xiàn)在查不出當(dāng)時(shí)的鎖信息,但是從存儲引擎的運(yùn)行狀態(tài)中還是能發(fā)現(xiàn)一些問題,數(shù)據(jù)庫出現(xiàn)了死鎖??!顯然在update這張表一定會等待的!!
發(fā)現(xiàn)了一條奇葩SQL,被鎖的那張表也是notice_status_new
UPDATE notice_status_new SET count = count + 1, push_count = push_count + 1 WHERE conversation = 2609759 AND user_id != 7384662 AND user_id IN(7359498)
show engine innodb status\G 截取死鎖的這部分 ------------------------ LATEST DETECTED DEADLOCK ------------------------ 180328 14:33:52 *** (1) TRANSACTION: TRANSACTION 1B285E9F9, ACTIVE 0 sec starting index read mysql tables in use 1, locked 1 LOCK WAIT 4 lock struct(s), heap size 1248, 3 row lock(s), undo log entries 1 MySQL thread id 3972036, OS thread handle 0x7fc0c3464700, query id 2525495433 192.168.87.42 dstatusnet Updating UPDATE notice_status_new SET count = count + 1, push_count = push_count + 1 WHERE conversation = 2609759 AND user_id != 7384662 AND user_id IN(7359498) *** (1) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 3535 page no 65360 n bits 448 index `conv_user` of table `statusnet`.`notice_status_new` trx id 1B285E9F9 lock_mode X locks rec but not gap waiting Record lock, heap no 241 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 8; hex 000000000027d25f; asc ' _;; 1: len 4; hex 80704c0a; asc pL ;; 2: len 4; hex 80b1a0fe; asc ;; *** (2) TRANSACTION: TRANSACTION 1B285E9FD, ACTIVE 0 sec starting index read, thread declared inside InnoDB 500 mysql tables in use 1, locked 1 4 lock struct(s), heap size 1248, 3 row lock(s), undo log entries 1 MySQL thread id 3867908, OS thread handle 0x7fbfc6d4f700, query id 2525495437 192.168.87.42 dstatusnet Updating UPDATE notice_status_new SET count = count + 1, push_count = push_count + 1 WHERE conversation = 2609759 AND user_id != 7359498 AND user_id IN(7384662) *** (2) HOLDS THE LOCK(S): RECORD LOCKS space id 3535 page no 65360 n bits 448 index `conv_user` of table `statusnet`.`notice_status_new` trx id 1B285E9FD lock_mode X locks rec but not gap Record lock, heap no 241 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 8; hex 000000000027d25f; asc ' _;; 1: len 4; hex 80704c0a; asc pL ;; 2: len 4; hex 80b1a0fe; asc ;; *** (2) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 3535 page no 65360 n bits 448 index `conv_user` of table `statusnet`.`notice_status_new` trx id 1B285E9FD lock_mode X locks rec but not gap waiting Record lock, heap no 242 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 8; hex 000000000027d25f; asc ' _;; 1: len 4; hex 8070ae56; asc p V;; 2: len 4; hex 80b1a0fc; asc ;; *** WE ROLL BACK TRANSACTION (2)
問題找到了,找對該業(yè)務(wù)的開發(fā),溝通后決定在周六上線新版本,徹底解決問題。
可以寫一個(gè)實(shí)時(shí)監(jiān)控鎖的腳本,把內(nèi)容輸出打印,這樣方便以后定位問題!
-----------------------------------------------
后面想了想可能是由于vip漂移導(dǎo)致的主從復(fù)制報(bào)錯(cuò)
線上架構(gòu)為MM+heartbeat
vip漂移的原因待定,vip漂移后,數(shù)據(jù)庫開始在192.168.87.123云服務(wù)器上寫入數(shù)據(jù)庫,72數(shù)據(jù)庫作為備份庫,但是還會同步123數(shù)據(jù)庫的數(shù)據(jù),由于192.168.87.72數(shù)據(jù)庫表notice_status_new還存在死鎖,所以在slave SQL thread應(yīng)用來自123云服務(wù)器的中繼日志時(shí),執(zhí)行到該SQL時(shí),發(fā)生了鎖等待,等待15秒后超時(shí)并重新嘗試該事務(wù),嘗試10后,還是在鎖表狀態(tài),slave SQL thread 線程停止工作,主從同步中斷。
----------------------------------------
3月29日和開發(fā)復(fù)盤
因?yàn)閳?bào)警時(shí)間不是很準(zhǔn)確,實(shí)際時(shí)間還是以heartbeat日志和mysql日志為準(zhǔn),通過查看日志確定是heartbeat首先發(fā)生漂移,然后才是主從復(fù)制報(bào)錯(cuò)
主要是討論的問題有兩個(gè)
① Heartbeat為什么會發(fā)生漂移
通過查看heartbeat日志,72和123存在心跳檢測,在72ping123的時(shí)候發(fā)生了丟包,
在123ping72的時(shí)候正常,72節(jié)點(diǎn)以為自己死掉?在14:35:26發(fā)生vip漂移,從72漂移到123。
Mar 28 14:35:26 bjuc-mysql1 pengine: [14176]: notice: LogActions: Move VIP_192_168_87_74 (Started bjuc-mysql1 -> prod-uc-yewu-db-slave2)
② 為什么會出現(xiàn)死鎖
從存儲引擎的運(yùn)行狀態(tài)中查看,如果這兩條SQL的請求同時(shí)到達(dá)數(shù)據(jù)庫,因?yàn)閕nnodb支持到行鎖的粒度,第二條SQL會鎖住很多行,如果和第一條SQL中鎖住有重復(fù)的,就會發(fā)生相互等待的狀況,所以出現(xiàn)死鎖。
這兩條SQL的where條件如下:
user_id IN(6032694,5824664,4344233,5450840,6032678,4951596,6032668,4344453,6038786,4344347,6032676,5754768);
user_id != 6185996
這兩個(gè)條件可能發(fā)生邏輯上的沖突,然后導(dǎo)致死鎖
看完以上關(guān)于MySQL主從復(fù)制出現(xiàn)報(bào)錯(cuò)Errno1205解決方案,很多讀者朋友肯定多少有一定的了解,如需獲取更多的行業(yè)知識信息 ,可以持續(xù)關(guān)注我們的行業(yè)資訊欄目的。
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場,如果涉及侵權(quán)請聯(lián)系站長郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。