您好,登錄后才能下訂單哦!
最近我們遇到一個(gè)MySQL的問題,分析后很有代表意義,特地寫出來供大家參考。
出現(xiàn)問題是,數(shù)據(jù)庫先是被置為只讀,然后過了一段時(shí)間,MySQL直接Crash掉了
發(fā)生Crash時(shí)MySQL的error日志中打印了以下內(nèi)容:
- ----------
- SEMAPHORES
- ----------
- OS WAIT ARRAY INFO: reservation count 1246555
- --Thread 140363572082432 has waited at row0upd.cc line 2354 for 253.00 seconds the semaphore:
- X-lock (wait_ex) on RW-latch at 0x7fa949340740 created in file buf0buf.cc line 1069
- a writer (thread id 140363572082432) has reserved it in mode wait exclusive
- number of readers 1, waiters flag 1, lock_word: ffffffffffffffff
- Last time read locked in file btr0sea.cc line 931
- Last time write locked in file /export/home/pb2/build/sb_0-17068951-1447697721.44/mysql-5.6.28/storage/innobase/row/row0upd.cc line 2354
根據(jù)日志中我們可以看到,線程140363572082432要對(duì)記錄上一個(gè)X鎖,但是等待0x7fa949340740線程的RW-latch的釋放
我們?cè)谙蛳驴床樵兊饺缦滦畔?涉及到用戶信息 謂詞就用xxx代替):
- 173 lock struct(s), heap size 30248, 7925 row lock(s), undo log entries 7924
- MySQL thread id 5709783, OS thread handle 0x7fa8f0da7700, query id 92213034 10.23.163.54 citicqyh updating
- update TB_DEPARTMENT_INFO set TOTAL_USER=1 where ID='ac84f17e-82d3-4519-a1da-0d5a5a835d44'
- ---TRANSACTION 53065242, ACTIVE 313 sec fetching rows, thread declared inside InnoDB 2081
- mysql tables in use 2, locked 0
- MySQL thread id 5428690, OS thread handle 0x7fa8f0136700, query id 92213061 10.23.163.55 citicqyh Sending data
- SELECT COUNT(DISTINCT r.user_id) FROM TB_DEPARTMENT_INFO d left join tb_qy_user_department_ref r on r.department_id = d.id WHERE d.org_id = 'xxx' AND (d.dept_full_name LIKE 'xxx%' or d.dept_full_name = 'xxx
- Trx read view will not see trx with id >= xxx, sees < xxx
- ......還有很多select語句省略
根據(jù)上面信息我們?nèi)?shù)據(jù)庫中查看了這些select語句,發(fā)現(xiàn)執(zhí)行計(jì)劃都是全表掃描。
首先數(shù)據(jù)庫變成了只讀,最后數(shù)據(jù)庫crash了,crash輸出的信息如下:
- ----------------------------
- END OF INNODB MONITOR OUTPUT
- ============================
- InnoDB: ###### Diagnostic info printed to the standard error stream
- InnoDB: Error: semaphore wait has lasted > 600 seconds
- InnoDB: We intentionally crash the server, because it appears to be hung.
- 7fa8f9580700 InnoDB: Assertion failure in thread 140363714529024 in file srv0srv.cc line 1754
- InnoDB: We intentionally generate a memory trap.
- InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
- InnoDB: If you get repeated assertion failures or crashes, even
- InnoDB: immediately after the mysqld startup, there may be
- InnoDB: corruption in the InnoDB tablespace. Please refer to
- InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
- InnoDB: about forcing recovery.
InnoDB: Error: semaphore wait has lasted > 600 seconds 提示600秒沒有響應(yīng) 數(shù)據(jù)庫選擇了Crash 強(qiáng)制重啟
從報(bào)錯(cuò)信息來看:
這里首先需要補(bǔ)充一下Latch的概念:Latch在MySQL中是用于保護(hù)高速緩沖區(qū)中共享數(shù)據(jù)的,舉個(gè)例子:
當(dāng)我們執(zhí)行select時(shí),數(shù)據(jù)是緩存在buffer pool中的,多個(gè)線程并發(fā)訪問或者修改這個(gè)數(shù)據(jù)必然需要一個(gè)并發(fā)控制機(jī)制,這個(gè)就是Latch
大家知道,數(shù)據(jù)庫要訪問的數(shù)據(jù)都必須先存在緩存中,而緩存一般比磁盤空間要小,數(shù)據(jù)緩存使用hash表來記錄數(shù)據(jù)頁是否在內(nèi)存中。在Oracle中的并發(fā)控制比較精細(xì):首先會(huì)對(duì)hash桶加latch,并根據(jù)hash桶查找對(duì)應(yīng)的數(shù)據(jù)并加上pin,然后釋放Latch。而MySQL相對(duì)沒有控制得這么精細(xì),對(duì)應(yīng)的RW-Latch在errlog中說的很清楚,該RW-Latch是在buf0buf.cc的1069行創(chuàng)建的
RW-latch at 0x7fa949340740 created in file buf0buf.cc line 1069
對(duì)應(yīng)的代碼摘錄如下:
- rw_lock_create(PFS_NOT_INSTRUMENTED, &block->lock, SYNC_LEVEL_VARYING);
跟蹤源碼,知道這個(gè)Latch是MySQL在數(shù)據(jù)庫啟動(dòng),初始化 innodb_buffer_pool時(shí),將Latch創(chuàng)建好的。對(duì)應(yīng)的函數(shù)調(diào)用過·程:
- buf_pool_init_instance()->buf_chunk_init()->buf_block_init()
正是由于這個(gè)RW-Latch被長(zhǎng)時(shí)間占用了,其他的線程一直競(jìng)爭(zhēng)不到,才導(dǎo)致了這個(gè)問題
這類問題的發(fā)生多數(shù)都是因?yàn)镾QL寫的不好,在表上面進(jìn)行了大量的全表掃描占用了大量的Latch,解決方案就是避免SQL長(zhǎng)時(shí)間占用latch:
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。