您好,登錄后才能下訂單哦!
下文給大家?guī)?lái)關(guān)于keepalived+MHA應(yīng)該如何實(shí)現(xiàn)mysql主從高可用集群,感興趣的話就一起來(lái)看看這篇文章吧,相信看完keepalived+MHA應(yīng)該如何實(shí)現(xiàn)mysql主從高可用集群對(duì)大家多少有點(diǎn)幫助吧。
一 原理分析
1 MHA簡(jiǎn)介:
MHA(Master High Availability)目前在MySQL高可用方面是一個(gè)相對(duì)成熟的解決方案,它由日本DeNA公司youshimaton(現(xiàn)就職于Facebook公司)開發(fā),是一套優(yōu)秀的作為MySQL高可用性環(huán)境下故障切換和主從提升的高可用軟件。在MySQL故障切換過(guò)程中,MHA能做到在0~30秒之內(nèi)自動(dòng)完成數(shù)據(jù)庫(kù)的故障切換操作,并且在進(jìn)行故障切換的過(guò)程中,MHA能在最大程度上保證數(shù)據(jù)的一致性,以達(dá)到真正意義上的高可用。
2 MHA組成:
該軟件由兩部分組成:MHA Manager(管理節(jié)點(diǎn))和MHA Node(數(shù)據(jù)節(jié)點(diǎn))。MHA Manager可以單獨(dú)部署在一臺(tái)獨(dú)立的機(jī)器上管理多個(gè)master-slave集群,也可以部署在一臺(tái)slave節(jié)點(diǎn)上。MHA Node運(yùn)行在每臺(tái)MySQL云服務(wù)器上,MHA Manager會(huì)定時(shí)探測(cè)集群中的master節(jié)點(diǎn),當(dāng)master出現(xiàn)故障時(shí),它可以自動(dòng)將最新數(shù)據(jù)的slave提升為新的master,然后將所有其他的slave重新指向新的master。整個(gè)故障轉(zhuǎn)移過(guò)程對(duì)應(yīng)用程序完全透明。
Manager工具包主要包括以下幾個(gè)工具:
masterha_check_ssh 檢查MHA的SSH配置狀況 masterha_check_repl 檢查MySQL復(fù)制狀況 masterha_manger 啟動(dòng)MHA masterha_check_status 檢測(cè)當(dāng)前MHA運(yùn)行狀態(tài) masterha_master_monitor 檢測(cè)master是否宕機(jī) masterha_master_switch 控制故障轉(zhuǎn)移(自動(dòng)或者手動(dòng)) masterha_conf_host 添加或刪除配置的server信息
Node工具包(這些工具通常由MHA Manager的腳本觸發(fā),無(wú)需人為操作)主要包括以下幾個(gè)工具:
save_binary_logs 保存和復(fù)制master的二進(jìn)制日志 apply_diff_relay_logs 識(shí)別差異的中繼日志事件并將其差異的事件應(yīng)用于其他的slave filter_mysqlbinlog 去除不必要的ROLLBACK事件(MHA已不再使用這個(gè)工具) purge_relay_logs 清除中繼日志(不會(huì)阻塞SQL線程)
3 MHA工作原理:
在MHA自動(dòng)故障切換過(guò)程中,MHA試圖從宕機(jī)的主云服務(wù)器上保存二進(jìn)制日志,最大程度的保證數(shù)據(jù)的不丟失,但這并不總是可行的。例如,如果主云服務(wù)器硬件故障或無(wú)法通過(guò)ssh訪問(wèn),MHA沒(méi)法保存二進(jìn)制日志,只進(jìn)行故障轉(zhuǎn)移而丟失了最新的數(shù)據(jù)。使用MySQL 5.5的半同步復(fù)制,可以大大降低數(shù)據(jù)丟失的風(fēng)險(xiǎn)。MHA可以與半同步復(fù)制結(jié)合起來(lái)。如果只有一個(gè)slave已經(jīng)收到了最新的二進(jìn)制日志,MHA可以將最新的二進(jìn)制日志應(yīng)用于其他所有的slave云服務(wù)器上,因此可以保證所有節(jié)點(diǎn)的數(shù)據(jù)一致性。
目前MHA主要支持一主多從的架構(gòu),要搭建MHA,要求一個(gè)復(fù)制集群中必須最少有三臺(tái)數(shù)據(jù)庫(kù)云服務(wù)器,一主二從,即一臺(tái)充當(dāng)master,一臺(tái)充當(dāng)備用master,另外一臺(tái)充當(dāng)從庫(kù),因?yàn)橹辽傩枰_(tái)云服務(wù)器,出于機(jī)器成本的考慮,淘寶也在該基礎(chǔ)上進(jìn)行了改造,目前淘寶TMHA已經(jīng)支持一主一從。我們自己使用其實(shí)也可以使用1主1從,但是master主機(jī)宕機(jī)后無(wú)法切換,以及無(wú)法補(bǔ)全binlog。master的mysqld進(jìn)程crash后,還是可以切換成功,以及補(bǔ)全binlog的。其結(jié)構(gòu)如下:
官方地址:https://code.google.com/p/mysql-master-ha/
二 實(shí)驗(yàn)環(huán)境準(zhǔn)備
1 系統(tǒng)版本
統(tǒng)一版本,統(tǒng)一規(guī)范,這是將來(lái)能夠自動(dòng)戶運(yùn)維的前提。
[root@vin ~]# cat /etc/redhat-release CentOS Linux release 7.3.1611 (Core)
2 內(nèi)核參數(shù)
[root@vin ~]# uname -r 3.10.0-514.el7.x86_64
3 主機(jī)配置參數(shù):準(zhǔn)備4臺(tái)干凈的主機(jī)node{1,2,3,4}
互相能夠解析主機(jī)名,由于節(jié)點(diǎn)上配置文件,很多都是大體相同的,只需要修改一份讓后使用for循環(huán)復(fù)制給其他節(jié)點(diǎn)即可,簡(jiǎn)單方便,所以這里實(shí)現(xiàn)主機(jī)名的認(rèn)證。
角色 ip地址 主機(jī)名 server_id 類型 MHA-Manager 172.18.253.73 node1 - 監(jiān)控復(fù)制組 Master 172.18.250.27 node2 1 寫入 Candicate master 172.18.253.160 node3 2 讀 Slave 172.18.254.15 node4 3 讀
4 實(shí)現(xiàn)互相能夠解析主機(jī)名
[root@vin ~]# cat /etc/hosts 172.18.253.73 node1 172.18.250.27 node2 172.18.253.160 node3 172.18.254.15 node4
5 實(shí)現(xiàn)主機(jī)間的互相無(wú)密鑰通信
由于使用MHA時(shí),Manager需要驗(yàn)證各個(gè)節(jié)點(diǎn)之間的ssh連通性,所以我們?cè)谶@里需要實(shí)現(xiàn)給節(jié)點(diǎn)之間的無(wú)密鑰通信,這里采用了一個(gè)簡(jiǎn)單的方法,那就是在某個(gè)節(jié)點(diǎn)上生成ssh密鑰對(duì),實(shí)現(xiàn)本主機(jī)的認(rèn)證,然后將認(rèn)證文件以及公私鑰都復(fù)制到其他節(jié)點(diǎn),這樣就不需要,每個(gè)節(jié)點(diǎn)都去創(chuàng)建密鑰對(duì),在實(shí)現(xiàn)認(rèn)證了。
[root@vin ~]# ssh-keygen -t rsa -P '' [root@vin ~]# ssh-copy-id -i ./id_rsa.pub node1: [root@vin ~]# for i in {2..4};do scp id_rsa{,.pub} authorized_keys root@node$i:/root/.ssh/;done
三 實(shí)現(xiàn)主從復(fù)制集群
1 Master配置:
修改配置文件
[root@vin ~]# cat /etc/my.cnf.d/server.cnf [server] server_id = 1 # 提供給主節(jié)點(diǎn)一個(gè)server號(hào)碼,可以是任意的整數(shù) log_bin = master-log # 啟用二進(jìn)制日志 relay_log = relay-log # 啟用中繼日志,因?yàn)橹鲗?lái)也會(huì)成為從 innodb_file_per_table = ON # 每個(gè)數(shù)據(jù)表存儲(chǔ)為單個(gè)文件 skip_name_resolve = ON # 關(guān)閉主機(jī)名解析,有助于提升性能 max_connections = 5000 # 最大并發(fā)連接數(shù)
創(chuàng)建具有復(fù)制功能的用戶與用于Manager節(jié)點(diǎn)管理的用戶;
[root@vin ~]# mysql MariaDB [(none)]> show master status\G; *************************** 1. row *************************** File: master-log.000003 Position: 245 Binlog_Do_DB: Binlog_Ignore_DB: MariaDB [(none)]> grant replication slave,replication client on *.* to -> 'vinsent'@'172.18.%.%' identified by 'vinsent'; # 這是一個(gè)語(yǔ)句 MariaDB [(none)]> grant ALL on *.* to 'MhaAdmin'@'172.18.%.%' identified by 'MhaPass'; MariaDB [(none)]> flush privileges;
說(shuō)明:我們應(yīng)該先查看主節(jié)點(diǎn)正在使用的日志文件及對(duì)應(yīng)的POSITION,再創(chuàng)建用戶,以便于從節(jié)點(diǎn)能夠同步擁有這些用戶;創(chuàng)建Manager節(jié)點(diǎn)用于管理的用戶時(shí)需要注意的是,用戶名中的主機(jī)范圍必須能夠囊括其他節(jié)點(diǎn)的地址。
2 Slave{1,2}配置:
兩個(gè)從節(jié)點(diǎn)的配置相同;修改配置文件,以支持主從復(fù)制功能;
[root@vin ~]# cat /etc/my.cnf.d/server.cnf [server] server_id = 2 log_bin = master-log relay_log = relay-log relay_log_purge = OFF # 關(guān)閉中繼日志裁剪功能 read_only = ON # 由于是從節(jié)點(diǎn),故設(shè)置為只讀模式 innodb_file_per_table = ON skip_name_resolve = ON max_connections = 5000
連接至主節(jié)點(diǎn),實(shí)現(xiàn)同步;
[root@vin ~]# mysql MariaDB [(none)]> change master to master_host='172.18.250.27',master_user='vinsent',master_password='vinsent',master_log_file='master-log.000003',master_log_pos=245; MariaDB [(none)]> start slave; MariaDB [(none)]> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 172.18.250.27 Master_User: vinsent Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-log.000003 Read_Master_Log_Pos: 637 Relay_Log_File: relay-log.000002 Relay_Log_Pos: 922 Relay_Master_Log_File: master-log.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 637 Relay_Log_Space: 1210 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1
說(shuō)明:查看從節(jié)點(diǎn)狀態(tài),確保"Slave_IO_Running","Slave_SQL_Running"的值為"YES",即從節(jié)點(diǎn)正常工作,并且"Last_IO_Errno","Last_SQL_Errno"中沒(méi)有錯(cuò)誤信息提示,出現(xiàn)錯(cuò)誤,一般就是連接性錯(cuò)誤,這說(shuō)明要么用戶創(chuàng)建的有問(wèn)題,要么主從節(jié)點(diǎn)的數(shù)據(jù)不同步,請(qǐng)確保兩者數(shù)據(jù)一致。
測(cè)試一下從節(jié)點(diǎn)是否將主節(jié)點(diǎn)的數(shù)據(jù)同步至本地:
MariaDB [(none)]> select user from mysql.user; +----------+ | user | +----------+ | root | | MhaAdmin | | vinsent | | root | | | | root | | | | root | +----------+
四 安裝MHA包
除了源碼包,MHA官方也提供了rpm格式的程序包,其下載地址為http://code.google.com/p/mysql-master/wiki/Downloads?tm=2。CentOS 7 系統(tǒng)可直接使用適用于el6的程序包,另外,MHA Manager和MHA NODe程序包的版本并不強(qiáng)制要求一致。
1 Manager節(jié)點(diǎn)
[root@vin ~]# ls mha4mysql-manager-0.56-0.el6.noarch.rpm mha4mysql-node-0.56-0.el6.noarch.rpm
主節(jié)點(diǎn)需要安裝mha4mysql-manager管理包,以及node幾點(diǎn)包,
2 Master && SLave{1,2}節(jié)點(diǎn)
從節(jié)點(diǎn)只需要安裝mode包即可
[root@vin ~]# ls mha4mysql-node-0.56-0.el6.noarch.rpm [root@vin ~]# yum install /root/*.rpm
3 檢測(cè)各節(jié)點(diǎn)之間ssh可用性
出現(xiàn)下面的結(jié)果則說(shuō)明ssh聯(lián)通性無(wú)誤
[root@vin ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf ... - [info] All SSH connection tests passed successfully.
4 檢測(cè)管理的mysql主從復(fù)制集群的連接配置參數(shù)是否滿足
[root@vin ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf ... Mon Nov 13 22:11:30 2017 - [info] Slaves settings check done. Mon Nov 13 22:11:30 2017 - [info] 172.18.250.27(172.18.250.27:3306) (current master) +--172.18.253.160(172.18.253.160:3306) +--172.18.254.15(172.18.254.15:3306) ... MySQL Replication Health is OK.
如果配置參數(shù)滿足要求,那么你將看到這個(gè)集群的主從節(jié)點(diǎn),如上實(shí)例。
5 啟動(dòng)MHA
Manager節(jié)點(diǎn):
[root@vin ~]# masterha_manager --conf=/etc/masterha/app1.cnf Mon Nov 13 22:16:17 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. # 沒(méi)有默認(rèn)配置文件 Mon Nov 13 22:16:17 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Mon Nov 13 22:16:17 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
說(shuō)明:MHA默認(rèn)是工作在前臺(tái)的,要想將它防止至后臺(tái)運(yùn)行,可使用下面的命令:
[root@vin ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf > \ /data/masterha/app1/managerha/manager.log 2&>1 &
成功啟動(dòng)之后,查看一下Master節(jié)點(diǎn)的狀態(tài);
[root@vin ~]# masterha_check_status --conf=/etc/masterha/app1.cnf app1 (pid:4090) is running(0:PING_OK), master:172.18.250.27
說(shuō)明:如果未成功啟動(dòng),這里的命令將不能夠正確執(zhí)行;提示:"app1 is stopped(2:NOT_RUNNING)."
六 配置keepalived
設(shè)置為用戶提供服務(wù)的地址為"172.18.14.55/16",通過(guò)keepalived實(shí)現(xiàn)VIP在Mysql復(fù)制集群中浮動(dòng)。
1 安裝keepalived
使用默認(rèn)yum安裝即可;在Mysql復(fù)制集群的所有主機(jī)上都安裝配置keepalived;
[root@vin ~]# yum install keepalived -y
2修改keepalived配置文件實(shí)現(xiàn)keepalived集群
Master:
[root@vin ~]# vim /etc/keepalived/keepalived.conf ! Configuration File for keepalived global_defs { notification_email { root@localhost } notification_email_from kadmin@localhost smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id LVS_DEVEL route_mcast_group4 224.14.0.14 # 廣播地址 } vrrp_script chk_mysql { script "killall -0 mysql" # 監(jiān)控mysql健康性腳本 insterval 1 weight -10 } vrrp_instance VI_1 { # keepalived實(shí)例 state BACKUP interface ens33 virtual_router_id 66 priority 98 # keepalived節(jié)點(diǎn)優(yōu)先級(jí) advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 172.18.14.55/16 # 面向客戶端的地址 } track_script { chk_mysql } }
Slave{1,2}:復(fù)主節(jié)點(diǎn)的配置文件至Slave節(jié)點(diǎn):
[root@vin ~]# for i in {3,4};do scp /etc/keepalived/keepalived.conf \ root@node$i:/etc/keepalived/ ;done
說(shuō)明:復(fù)制過(guò)去并不能直接使用,由于keepalived通過(guò)優(yōu)先級(jí)機(jī)制來(lái)決定VIP工作在哪臺(tái)主機(jī),所以將兩個(gè)從節(jié)點(diǎn)的優(yōu)先級(jí)調(diào)節(jié)至比主節(jié)點(diǎn)上keepalived的優(yōu)先級(jí)低,且互相不同。
有心人可能發(fā)現(xiàn)問(wèn)題了,怎么沒(méi)有修改VRRP實(shí)例的狀態(tài)"state BACKUP";上面云服務(wù)器的keepalived都設(shè)置為了BACKUP模式,在keepalived中2種模式,分別是master->backup模式和backup->backup模式。這兩種模式有很大區(qū)別。在master->backup模式下,一旦主節(jié)點(diǎn)宕機(jī),虛擬ip會(huì)自動(dòng)漂移到從節(jié)點(diǎn),當(dāng)主節(jié)點(diǎn)修復(fù)后,keepalived啟動(dòng)后,還會(huì)把虛擬ip搶占過(guò)來(lái),即使設(shè)置了非搶占模式(nopreempt)搶占ip的動(dòng)作也會(huì)發(fā)生。在backup->backup模式下,當(dāng)主節(jié)點(diǎn)故障后虛擬ip會(huì)自動(dòng)漂移到從節(jié)點(diǎn)上,當(dāng)原主節(jié)點(diǎn)恢復(fù)后,并不會(huì)搶占新主的虛擬ip,即使是優(yōu)先級(jí)高于從庫(kù)的優(yōu)先級(jí)別,也不會(huì)發(fā)生搶占。為了減少ip漂移次數(shù),通常是把修復(fù)好的主庫(kù)當(dāng)做新的備庫(kù)。
七 故障出現(xiàn)
模擬故障發(fā)生,我們手動(dòng)"down"掉了主節(jié)點(diǎn),生產(chǎn)中可能有各種原因?qū)е鹿收系某霈F(xiàn),這里為了最好的模擬辦法,當(dāng)然是關(guān)停服務(wù)了。
1 Master
[root@vin ~]# systemctl stop mariadb
2 在MHA節(jié)點(diǎn)上查看MHA的狀態(tài)
[root@vin ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf .... Mon Nov 13 22:36:37 2017 - [info] MHA::MasterMonitor version 0.56. Mon Nov 13 22:36:37 2017 - [info] GTID failover mode = 0 Mon Nov 13 22:36:37 2017 - [info] Dead Servers: # 指明故障的節(jié)點(diǎn) Mon Nov 13 22:36:37 2017 - [info] 172.18.250.27(172.18.250.27:3306) Mon Nov 13 22:36:37 2017 - [info] Alive Servers: Mon Nov 13 22:36:37 2017 - [info] 172.18.253.160(172.18.253.160:3306) Mon Nov 13 22:36:37 2017 - [info] 172.18.254.15(172.18.254.15:3306) Mon Nov 13 22:36:37 2017 - [info] Alive Slaves: Mon Nov 13 22:36:37 2017 - [info] 172.18.254.15(172.18.254.15:3306) # 從節(jié)點(diǎn)由兩個(gè)轉(zhuǎn)為了一個(gè),另為一個(gè)升級(jí)為主節(jié)點(diǎn) ....
3 在從節(jié)點(diǎn)進(jìn)行測(cè)試看主節(jié)點(diǎn)是否正確切換
Slave1:
MariaDB [(none)]> show slave status; # 查看從節(jié)點(diǎn)狀態(tài)為空,說(shuō)明已非從節(jié)點(diǎn) Empty set (0.00 sec) MariaDB [(none)]> show master status; # 再查看master狀態(tài),已正確切換 +-------------------+----------+--------------+------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | +-------------------+----------+--------------+------------------+ | master-log.000003 | 245 | | | +-------------------+----------+--------------+------------------+
Slave2:
MariaDB [(none)]> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 172.18.253.160 # 從節(jié)點(diǎn)"Slave2"已經(jīng)將主節(jié)點(diǎn)指向了新的主節(jié)點(diǎn) Master_User: vinsent Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-log.000003 ...
4 查看keepalived地址綁定情況:
Master:
[root@vin ~]# ip a | grep ens33 2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 inet 172.18.250.27/16 brd 172.18.255.255 scope global dynamic ens33
Slave1:
[root@vin ~]# ip a | grep ens33 2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 inet 172.18.250.160/16 brd 172.18.255.255 scope global dynamic ens33 inet 172.18.14.55/16 scope global secondary ens33 # 地址正確漂移至從節(jié)點(diǎn)Slave1
Slave2:
[root@vin ~]# ip a | grep ens33 2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 inet 172.18.254.15/16 brd 172.18.255.255 scope global dynamic ens33
八 故障恢復(fù)
為了滿足集群要求,應(yīng)當(dāng)立即將故障的主節(jié)點(diǎn)修復(fù)上線。由于Mysql復(fù)制集群的主節(jié)點(diǎn)已然切換,那么故障的原主節(jié)點(diǎn)上線之后只能為從節(jié)點(diǎn),所以應(yīng)當(dāng)修改其配置文件滿足從節(jié)點(diǎn)的要求。
1 Master節(jié)點(diǎn)
[root@vin ~]# vim /etc/my.cnf.d/server.cnf # 添加如下兩項(xiàng) [server] relay_log_purge = OFF read_only = ON
啟動(dòng)服務(wù),并連接Mysql;并連接至新的主節(jié)點(diǎn)做主從同步;這里值得注意的是:如果你的主節(jié)點(diǎn)是在運(yùn)行過(guò)程中故障宕機(jī)來(lái)了,那么你要做的就不僅僅是修改配置,啟動(dòng)服務(wù)了。修改配置文件之后,應(yīng)當(dāng)對(duì)新主做一個(gè)完全備份,將新主節(jié)點(diǎn)的數(shù)據(jù)恢復(fù)至本機(jī),然后在連接至新的主節(jié)點(diǎn)做復(fù)制同步(本實(shí)驗(yàn)沒(méi)有太多的數(shù)據(jù),故直接上線)。
[root@vin ~]# systemctl start mariadb [root@vin ~]# mysql MariaDB [(none)]> change master to master_host='172.18.253.160',master_user='vinsent',master_password='vinsent',master_log_file='master-log.000003',master_log_pos=245; MariaDB [(none)]> start slave; MariaDB [(none)]> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 172.18.253.160 Master_User: vinsent Master_Port: 3306 ...
2 Manager:
切換至MHA上并查看集群狀態(tài);
[root@vin ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf ... Mon Nov 13 22:54:53 2017 - [info] GTID failover mode = 0 Mon Nov 13 22:54:53 2017 - [info] Dead Servers: Mon Nov 13 22:54:53 2017 - [info] Alive Servers: Mon Nov 13 22:54:53 2017 - [info] 172.18.250.27(172.18.250.27:3306) # 由于沒(méi)有在配置文件中指明誰(shuí)是主,故這里只能看到所有工作的主機(jī) Mon Nov 13 22:54:53 2017 - [info] 172.18.253.160(172.18.253.160:3306) Mon Nov 13 22:54:53 2017 - [info] 172.18.254.15(172.18.254.15:3306) Mon Nov 13 22:54:53 2017 - [info] Alive Slaves: Mon Nov 13 22:54:53 2017 - [info] 172.18.250.27(172.18.250.27:3306) ... MySQL Replication Health is OK.
啟動(dòng)MHA Manger監(jiān)控,查看集群里面現(xiàn)在誰(shuí)是master;
[root@vin ~]# masterha_check_status --conf=/etc/masterha/app1.cnf app1 is stopped(2:NOT_RUNNING).
??怎么回事,明明已經(jīng)正確啟動(dòng),為何此處顯示為“stopped”;趕緊去官網(wǎng)一查發(fā)現(xiàn):"Currently MHA Manager process does not run as a daemon. If failover completed successfully or the master process was killed by accident, the manager stops working. To run as a daemon, daemontool. or any external daemon program can be used. Here is an example to run from daemontools.",原來(lái)如此。
九 總結(jié)
通過(guò)查日志觀察切換過(guò)程分析MHA切換過(guò)程:
[root@vin masterha]# cat manager.log Mon Nov 13 22:36:03 2017 - [info] MHA::MasterMonitor version 0.56. Mon Nov 13 22:36:04 2017 - [info] GTID failover mode = 0 Mon Nov 13 22:36:04 2017 - [info] Dead Servers: Mon Nov 13 22:36:04 2017 - [info] 172.18.250.27(172.18.250.27:3306) Mon Nov 13 22:36:04 2017 - [info] Alive Servers: Mon Nov 13 22:36:04 2017 - [info] 172.18.253.160(172.18.253.160:3306) Mon Nov 13 22:36:04 2017 - [info] 172.18.254.15(172.18.254.15:3306) Mon Nov 13 22:36:04 2017 - [info] Alive Slaves: Mon Nov 13 22:36:04 2017 - [info] 172.18.254.15(172.18.254.15:3306) Version=5.5.52-MariaDB (oldest major version between slaves) log-bin:enabled Mon Nov 13 22:36:04 2017 - [info] Replicating from 172.18.253.160(172.18.253.160:3306) Mon Nov 13 22:36:04 2017 - [info] Primary candidate for the new Master (candidate_master is set) Mon Nov 13 22:36:04 2017 - [info] Current Alive Master: 172.18.253.160(172.18.253.160:3306) Mon Nov 13 22:36:04 2017 - [info] Checking slave configurations.. Mon Nov 13 22:36:04 2017 - [warning] relay_log_purge=0 is not set on slave 172.18.254.15(172.18.254.15:3306). Mon Nov 13 22:36:04 2017 - [info] Checking replication filtering settings.. Mon Nov 13 22:36:04 2017 - [info] binlog_do_db= , binlog_ignore_db= Mon Nov 13 22:36:04 2017 - [info] Replication filtering check ok. Mon Nov 13 22:36:04 2017 - [info] GTID (with auto-pos) is not supported Mon Nov 13 22:36:04 2017 - [info] Starting SSH connection tests.. Mon Nov 13 22:36:05 2017 - [info] All SSH connection tests passed successfully. Mon Nov 13 22:36:05 2017 - [info] Checking MHA Node version.. Mon Nov 13 22:36:06 2017 - [info] Version check ok. Mon Nov 13 22:36:06 2017 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln492] Server 172.18.250.27(172.18.250.27:3306) is dead, but must be alive! Check server settings. Mon Nov 13 22:36:06 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations. at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 399. Mon Nov 13 22:36:06 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers. Mon Nov 13 22:36:06 2017 - [info] Got exit code 1 (Not master dead). Mon Nov 13 22:36:13 2017 - [info] MHA::MasterMonitor version 0.56. Mon Nov 13 22:36:13 2017 - [info] GTID failover mode = 0 Mon Nov 13 22:36:13 2017 - [info] Dead Servers: Mon Nov 13 22:36:13 2017 - [info] 172.18.250.27(172.18.250.27:3306) Mon Nov 13 22:36:13 2017 - [info] Alive Servers: Mon Nov 13 22:36:13 2017 - [info] 172.18.253.160(172.18.253.160:3306) Mon Nov 13 22:36:13 2017 - [info] 172.18.254.15(172.18.254.15:3306) Mon Nov 13 22:36:13 2017 - [info] Alive Slaves: Mon Nov 13 22:36:13 2017 - [info] 172.18.254.15(172.18.254.15:3306) Version=5.5.52-MariaDB (oldest major version between slaves) log-bin:enabled Mon Nov 13 22:36:13 2017 - [info] Replicating from 172.18.253.160(172.18.253.160:3306) Mon Nov 13 22:36:13 2017 - [info] Primary candidate for the new Master (candidate_master is set) Mon Nov 13 22:36:13 2017 - [info] Current Alive Master: 172.18.253.160(172.18.253.160:3306) Mon Nov 13 22:36:13 2017 - [info] Checking slave configurations.. Mon Nov 13 22:36:13 2017 - [warning] relay_log_purge=0 is not set on slave 172.18.254.15(172.18.254.15:3306). Mon Nov 13 22:36:13 2017 - [info] Checking replication filtering settings.. Mon Nov 13 22:36:13 2017 - [info] binlog_do_db= , binlog_ignore_db= Mon Nov 13 22:36:13 2017 - [info] Replication filtering check ok. Mon Nov 13 22:36:13 2017 - [info] GTID (with auto-pos) is not supported Mon Nov 13 22:36:13 2017 - [info] Starting SSH connection tests.. Mon Nov 13 22:36:15 2017 - [info] All SSH connection tests passed successfully. Mon Nov 13 22:36:15 2017 - [info] Checking MHA Node version.. Mon Nov 13 22:36:15 2017 - [info] Version check ok. Mon Nov 13 22:36:15 2017 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln492] Server 172.18.250.27(172.18.250.27:3306) is dead, but must be alive! Check server settings. Mon Nov 13 22:36:15 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations. at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 399. Mon Nov 13 22:36:15 2017 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers. Mon Nov 13 22:36:15 2017 - [info] Got exit code 1 (Not master dead).
從上面的輸出可以看出整個(gè)MHA的切換過(guò)程,共包括以下的步驟:
配置文件檢查階段,這個(gè)階段會(huì)檢查整個(gè)集群配置文件配置
宕機(jī)的master處理,這個(gè)階段包括虛擬ip摘除操作,主機(jī)關(guān)機(jī)操作,這是MHA管理keepalived的時(shí)候,我們這里是通過(guò)keepalived的腳本實(shí)現(xiàn)mysql狀態(tài)監(jiān)控的。MHA也有管理keepalived的腳本,有需要的可以自行研究。
復(fù)制dead maste和最新slave相差的relay log,并保存到MHA Manger具體的目錄下
識(shí)別含有最新更新的slave
應(yīng)用從master保存的二進(jìn)制日志事件(binlog events)
提升一個(gè)slave為新的master進(jìn)行復(fù)制
使其他的slave連接新的master進(jìn)行復(fù)制
目前高可用方案可以一定程度上實(shí)現(xiàn)數(shù)據(jù)庫(kù)的高可用,比如MMM,heartbeat+drbd,Cluster等。還有percona的Galera Cluster等。這些高可用軟件各有優(yōu)劣。在進(jìn)行高可用方案選擇時(shí),主要是看業(yè)務(wù)還有對(duì)數(shù)據(jù)一致性方面的要求。最后出于對(duì)數(shù)據(jù)庫(kù)的高可用和數(shù)據(jù)一致性的要求,推薦使用MHA架構(gòu)。
看了以上關(guān)于keepalived+MHA應(yīng)該如何實(shí)現(xiàn)mysql主從高可用集群詳細(xì)內(nèi)容,是否有所收獲。如果想要了解更多相關(guān),可以繼續(xù)關(guān)注我們的行業(yè)資訊板塊。
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。