您好,登錄后才能下訂單哦!
最近測試了RedHat 6.5上RHCS,搭建了一個雙機(jī)HA群集,在此將配置過程和測試過程分享給大家,主要包括節(jié)點(diǎn)配置、群集管理服務(wù)器配置、群集創(chuàng)建與配置、群集測試等內(nèi)容。
一、測試環(huán)境
計(jì)算機(jī)名 | 操作系統(tǒng) | IP地址 | 群集IP | 安裝的軟件包 |
HAmanager | RedHat 6.5 | 192.168.10.150 | - | luci、iscsi target(用于仲裁盤) |
node1 | RedHat 6.5 | 192.168.10.104 | 192.168.10.103 | High Availability、httpd |
node2 | RedHat 6.5 | 192.168.10.105 | High Availability、httpd |
二、節(jié)點(diǎn)配置
1、在三臺機(jī)分別配置hosts互相解析
[root@HAmanager ~]# cat /etc/hosts
192.168.10.104 node1 node1.localdomain
192.168.10.105 node2 node2.localdomain
192.168.10.150 HAmanager HAmanager.localdomain
[root@node1 ~]# cat /etc/hosts
192.168.10.104 node1 node1.localdomain
192.168.10.105 node2 node2.localdomain
192.168.10.150 HAmanager HAmanager.localdomain
[root@node2 ~]# cat /etc/hosts
192.168.10.104 node1 node1.localdomain
192.168.10.105 node2 node2.localdomain
192.168.10.150 HAmanager HAmanager.localdomain
2、在三臺機(jī)分別配置SSH互信
[root@HAmanager ~]# ssh-keygen -t rsa
[root@HAmanager ~]# ssh-copy-id -i node1
[root@node1 ~]# ssh-keygen -t rsa
[root@node1 ~]# ssh-copy-id -i node2
[root@node2 ~]# ssh-keygen -t rsa
[root@node2 ~]# ssh-copy-id -i node1
3、兩個節(jié)點(diǎn)關(guān)閉NetworkManager和acpid服務(wù)
[root@node1 ~]# service NetworkManager stop
[root@node1 ~]# chkconfig NetworkManager off
[root@node1 ~]# service acpid stop
[root@node1 ~]# chkconfig acpid off
[root@node2 ~]# service NetworkManager stop
[root@node2 ~]# chkconfig NetworkManager off
[root@node2 ~]# service acpid stop
[root@node2 ~]# chkconfig acpid off
4、兩個節(jié)點(diǎn)配置本地yum源
[root@node1 ~]# cat/etc/yum.repos.d/rhel6.5.repo
[Server]
name=base
baseurl=file:///mnt/
enabled=1
gpgcheck=0
[HighAvailability]
name=base
baseurl=file:///mnt/HighAvailability
enabled=1
gpgcheck=0
[root@node2 ~]# cat/etc/yum.repos.d/rhel6.5.repo
[Server]
name=base
baseurl=file:///mnt/
enabled=1
gpgcheck=0
[HighAvailability]
name=base
baseurl=file:///mnt/HighAvailability
enabled=1
gpgcheck=0
5、兩個節(jié)點(diǎn)分別安裝群集軟件包
[root@node1 ~]# yum groupinstall 'High Availability' –y
Installed:
ccs.x86_64 0:0.16.2-69.el6 cman.x86_640:3.0.12.1-59.el6
omping.x86_64 0:0.0.4-1.el6 rgmanager.x86_640:3.0.12.1-19.el6
Dependency Installed:
cifs-utils.x86_640:4.8.1-19.el6 clusterlib.x86_64 0:3.0.12.1-59.el6
corosync.x86_640:1.4.1-17.el6 corosynclib.x86_64 0:1.4.1-17.el6
cyrus-sasl-md5.x86_640:2.1.23-13.el6_3.1 fence-agents.x86_64 0:3.1.5-35.el6
fence-virt.x86_640:0.2.3-15.el6 gnutls-utils.x86_64 0:2.8.5-10.el6_4.2
ipmitool.x86_64 0:1.8.11-16.el6 keyutils.x86_640:1.4-4.el6
libevent.x86_640:1.4.13-4.el6 libgssglue.x86_64 0:0.1-11.el6
libibverbs.x86_640:1.1.7-1.el6 librdmacm.x86_64 0:1.0.17-1.el6
libtirpc.x86_640:0.2.1-6.el6_4 libvirt-client.x86_64 0:0.10.2-29.el6
lm_sensors-libs.x86_640:3.1.1-17.el6 modcluster.x86_640:0.16.2-28.el6
nc.x86_64 0:1.84-22.el6 net-snmp-libs.x86_64 1:5.5-49.el6
net-snmp-utils.x86_641:5.5-49.el6 nfs-utils.x86_641:1.2.3-39.el6
nfs-utils-lib.x86_640:1.1.5-6.el6 numactl.x86_640:2.0.7-8.el6
oddjob.x86_64 0:0.30-5.el6 openais.x86_64 0:1.1.1-7.el6
openaislib.x86_640:1.1.1-7.el6 perl-Net-Telnet.noarch 0:3.03-11.el6
pexpect.noarch 0:2.3-6.el6 python-suds.noarch0:0.4.1-3.el6
quota.x86_64 1:3.17-20.el6 resource-agents.x86_640:3.9.2-40.el6
ricci.x86_64 0:0.16.2-69.el6 rpcbind.x86_640:0.2.0-11.el6
sg3_utils.x86_64 0:1.28-5.el6 tcp_wrappers.x86_640:7.6-57.el6
telnet.x86_64 1:0.17-47.el6_3.1 yajl.x86_64 0:1.0.7-3.el6
Complete!
[root@node2 ~]# yum groupinstall 'High Availability' –y
Installed:
ccs.x86_64 0:0.16.2-69.el6 cman.x86_640:3.0.12.1-59.el6
omping.x86_64 0:0.0.4-1.el6 rgmanager.x86_640:3.0.12.1-19.el6
Dependency Installed:
cifs-utils.x86_640:4.8.1-19.el6 clusterlib.x86_64 0:3.0.12.1-59.el6
corosync.x86_640:1.4.1-17.el6 corosynclib.x86_64 0:1.4.1-17.el6
cyrus-sasl-md5.x86_640:2.1.23-13.el6_3.1 fence-agents.x86_64 0:3.1.5-35.el6
fence-virt.x86_640:0.2.3-15.el6 gnutls-utils.x86_64 0:2.8.5-10.el6_4.2
ipmitool.x86_64 0:1.8.11-16.el6 keyutils.x86_640:1.4-4.el6
libevent.x86_640:1.4.13-4.el6 libgssglue.x86_64 0:0.1-11.el6
libibverbs.x86_640:1.1.7-1.el6 librdmacm.x86_64 0:1.0.17-1.el6
libtirpc.x86_640:0.2.1-6.el6_4 libvirt-client.x86_64 0:0.10.2-29.el6
lm_sensors-libs.x86_640:3.1.1-17.el6 modcluster.x86_640:0.16.2-28.el6
nc.x86_64 0:1.84-22.el6 net-snmp-libs.x86_64 1:5.5-49.el6
net-snmp-utils.x86_641:5.5-49.el6 nfs-utils.x86_641:1.2.3-39.el6
nfs-utils-lib.x86_640:1.1.5-6.el6 numactl.x86_640:2.0.7-8.el6
oddjob.x86_64 0:0.30-5.el6 openais.x86_64 0:1.1.1-7.el6
openaislib.x86_640:1.1.1-7.el6 perl-Net-Telnet.noarch 0:3.03-11.el6
pexpect.noarch 0:2.3-6.el6 python-suds.noarch0:0.4.1-3.el6
quota.x86_64 1:3.17-20.el6 resource-agents.x86_640:3.9.2-40.el6
ricci.x86_64 0:0.16.2-69.el6 rpcbind.x86_640:0.2.0-11.el6
sg3_utils.x86_64 0:1.28-5.el6 tcp_wrappers.x86_640:7.6-57.el6
telnet.x86_64 1:0.17-47.el6_3.1 yajl.x86_64 0:1.0.7-3.el6
Complete!
6、兩個節(jié)點(diǎn)分別啟動群集服務(wù)
[root@node1 ~]# service ricci start
[root@node1 ~]# chkconfig ricci on
[root@node1 ~]# chkconfig cman on
[root@node1 ~]# chkconfig rgmanager on
[root@node2 ~]# service ricci start
[root@node2 ~]# chkconfig ricci on
[root@node2 ~]# chkconfig cman on
[root@node2 ~]# chkconfig rgmanager on
7、兩個節(jié)點(diǎn)分別配置ricci密碼
[root@node1 ~]# passwd ricci
New password:
BAD PASSWORD: it is too short
BAD PASSWORD: is too simple
Retype new password:
passwd: all authentication tokens updated successfully.
[root@node2 ~]# passwd ricci
New password:
BAD PASSWORD: it is too short
BAD PASSWORD: is too simple
Retype new password:
passwd: all authentication tokens updated successfully.
8、兩個節(jié)點(diǎn)分別安裝httpd服務(wù),方便后面測試應(yīng)用的高可用性
[root@node1 ~]# yum -y install httpd
[root@node1 ~]# echo "This is Node1" > /var/www/html/index.html
[root@node2 ~]# yum -y install httpd
[root@node2 ~]# echo "This is Node2" > /var/www/html/index.html
二、群集管理服務(wù)器配置
1、在群集管理服務(wù)器安裝luci軟件包
[root@HAmanager ~]#yum -y install luci
Installed:
luci.x86_64 0:0.26.0-48.el6
Dependency Installed:
TurboGears2.noarch 0:2.0.3-4.el6
python-babel.noarch 0:0.9.4-5.1.el6
python-beaker.noarch 0:1.3.1-7.el6
python-cheetah.x86_64 0:2.4.1-1.el6
python-decorator.noarch 0:3.0.1-3.1.el6
python-decoratortools.noarch 0:1.7-4.1.el6
python-formencode.noarch 0:1.2.2-2.1.el6
python-genshi.x86_64 0:0.5.1-7.1.el6
python-mako.noarch 0:0.3.4-1.el6
python-markdown.noarch 0:2.0.1-3.1.el6
python-markupsafe.x86_64 0:0.9.2-4.el6
python-myghty.noarch 0:1.1-11.el6
python-nose.noarch 0:0.10.4-3.1.el6
python-paste.noarch 0:1.7.4-2.el6
python-paste-deploy.noarch 0:1.3.3-2.1.el6
python-paste-script.noarch 0:1.7.3-5.el6_3
python-peak-rules.noarch 0:0.5a1.dev-9.2582.1.el6
python-peak-util-addons.noarch 0:0.6-4.1.el6
python-peak-util-assembler.noarch 0:0.5.1-1.el6
python-peak-util-extremes.noarch 0:1.1-4.1.el6
python-peak-util-symbols.noarch 0:1.0-4.1.el6
python-prioritized-methods.noarch 0:0.2.1-5.1.el6
python-pygments.noarch 0:1.1.1-1.el6
python-pylons.noarch 0:0.9.7-2.el6
python-repoze-tm2.noarch 0:1.0-0.5.a4.el6
python-repoze-what.noarch 0:1.0.8-6.el6
python-repoze-what-pylons.noarch 0:1.0-4.el6
python-repoze-who.noarch 0:1.0.18-1.el6
python-repoze-who-friendlyform.noarch 0:1.0-0.3.b3.el6
python-repoze-who-testutil.noarch 0:1.0-0.4.rc1.el6
python-routes.noarch 0:1.10.3-2.el6
python-setuptools.noarch 0:0.6.10-3.el6
python-sqlalchemy.noarch 0:0.5.5-3.el6_2
python-tempita.noarch 0:0.4-2.el6
python-toscawidgets.noarch 0:0.9.8-1.el6
python-transaction.noarch 0:1.0.1-1.el6
python-turbojson.noarch 0:1.2.1-8.1.el6
python-weberror.noarch 0:0.10.2-2.el6
python-webflash.noarch 0:0.1-0.2.a9.el6
python-webhelpers.noarch 0:0.6.4-4.el6
python-webob.noarch 0:0.9.6.1-3.el6
python-webtest.noarch 0:1.2-2.el6
python-zope-filesystem.x86_64 0:1-5.el6
python-zope-interface.x86_64 0:3.5.2-2.1.el6
python-zope-sqlalchemy.noarch 0:0.4-3.el6
Complete!
[root@HAmanager ~]#
2、啟動luci服務(wù)
[root@HAmanager ~]# service luci start
Adding following auto-detected host IDs (IP addresses/domain names), corresponding to `HAmanager.localdomain' address, to the configuration of self-managed certificate `/var/lib/luci/etc/cacert.config' (you can change them by editing `/var/lib/luci/etc/cacert.config', removing the generated certificate `/var/lib/luci/certs/host.pem' and restarting luci):
(none suitable found, you can still do it manually as mentioned above)
Generating a 2048 bit RSA private key
writing new private key to '/var/lib/luci/certs/host.pem'
Starting saslauthd: [ OK ]
Start luci...
Point your web browser to https://HAmanager.localdomain:8084 (or equivalent) to access luci
[root@HAmanager ~]# chkconfig luci on
江健龍的技術(shù)博客http://jiangjianlong.blog.51cto.com/3735273/1931499
三、創(chuàng)建和配置群集
1、使用瀏覽器訪問HA的web管理界面https://192.168.10.150:8084
2、創(chuàng)建群集并添加節(jié)點(diǎn)至群集中
3、添加vCenter為fence設(shè)備
4、查找節(jié)點(diǎn)的虛擬機(jī)UUID
[root@node1 ~]# fence_vmware_soap -a 192.168.10.91 -z -l administrator@vsphere.local -p P@ssw0rd -o list
node1,564df192-7755-9cd6-8a8b-45d6d74eabbb
node2,564df4ed-cda1-6383-bbf5-f99807416184
5、兩個節(jié)點(diǎn)添加fence方法和實(shí)例
6、查看fence設(shè)備狀態(tài)
[root@node1 ~]# fence_vmware_soap -a 192.168.10.91 -z -l administrator@vsphere.local -p P@ssw0rd -o status
Status: ON
7、測試fence設(shè)備
[root@node2 ~]# fence_check
fence_check run at Tue May 23 09:41:30 CST 2017 pid: 3455
Testing node1.localdomain method 1: success
Testing node2.localdomain method 1: success
8、創(chuàng)建故障域
9、添加群集資源,分別添加IP地址和腳本為群集資源
10、創(chuàng)建群集服務(wù)組并添加已有的資源
11、配置仲裁盤,在HAmanager服務(wù)器安裝iSCSI target服務(wù)并創(chuàng)建一塊100M的共享磁盤給兩個節(jié)點(diǎn)
[root@HAmanager ~]#yum install scsi-target-utils -y
[root@HAmanager ~]#dd if=/dev/zero of=/iSCSIdisk/100m.img bs=1M seek=100 count=0
[root@HAmanager ~]#vi /etc/tgt/targets.conf
<target iqn.2016-08.disk.rh7:disk100m>
backing-store /iSCSIdisk/100m.img
initiator-address 192.168.10.104 #for node1
initiator-address 192.168.10.105 #for node2
</target>
[root@HAmanager ~]#service tgtd start
[root@HAmanager ~]#chkconfig tgtd on
[root@HAmanager ~]#tgt-admin –show
Target 1: iqn.2016-08.disk.rh7:disk100m
System information:
Driver: iscsi
State: ready
I_T nexus information:
LUN information:
LUN: 0
Type: controller
SCSI ID: IET 00010000
SCSI SN: beaf10
Size: 0 MB, Block size: 1
Online: Yes
Removable media: No
Prevent removal: No
Readonly: No
Backing store type: null
Backing store path: None
Backing store flags:
LUN: 1
Type: disk
SCSI ID: IET 00010001
SCSI SN: beaf11
Size: 105 MB, Block size: 512
Online: Yes
Removable media: No
Prevent removal: No
Readonly: No
Backing store type: rdwr
Backing store path: /sharedisk/100m.img
Backing store flags:
Account information:
ACL information:
192.168.10.104
192.168.10.105
[root@HAmanager ~]#
12、兩個節(jié)點(diǎn)安裝iscsi-initiator-utils并登錄iscsi目標(biāo)
[root@node1 ~]# yum install iscsi-initiator-utils
[root@node1 ~]# chkconfig iscsid on
[root@node1 ~]# iscsiadm -m discovery -t sendtargets -p 192.168.10.150
[root@node1 ~]# iscsiadm -m node
[root@node1 ~]# iscsiadm -m node -T iqn.2016-08.disk.rh7:disk100m --login
[root@node2 ~]# yum install iscsi-initiator-utils
[root@node2 ~]# chkconfig iscsid on
[root@node2 ~]# iscsiadm -m discovery -t sendtargets -p 192.168.10.150
[root@node2 ~]# iscsiadm -m node
[root@node2 ~]# iscsiadm -m node -T iqn.2016-08.disk.rh7:disk100m --login
13、在節(jié)點(diǎn)一將共享磁盤/dev/sdb創(chuàng)建分區(qū)sdb1
[root@node1 ~]# fdisk /dev/sdb
然后創(chuàng)建成sdb1
[root@node1 ~]# partprobe /dev/sdb1
14、在節(jié)點(diǎn)一將sdb1創(chuàng)建成仲裁盤
[root@node1 ~]# mkqdisk -c /dev/sdb1 -l testqdisk
mkqdisk v3.0.12.1
Writing new quorum disk label 'testqdisk' to /dev/sdb1.
WARNING: About to destroy all data on /dev/sdb1; proceed [N/y] ? y
Initializing status block for node 1...
Initializing status block for node 2...
Initializing status block for node 3...
Initializing status block for node 4...
Initializing status block for node 5...
Initializing status block for node 6...
Initializing status block for node 7...
Initializing status block for node 8...
Initializing status block for node 9...
Initializing status block for node 10...
Initializing status block for node 11...
Initializing status block for node 12...
Initializing status block for node 13...
Initializing status block for node 14...
Initializing status block for node 15...
Initializing status block for node 16...
[root@node1 ~]#
[root@node1 ~]# mkqdisk -L
mkqdisk v3.0.12.1
/dev/block/8:17:
/dev/disk/by-id/scsi-1IET_00010001-part1:
/dev/disk/by-path/ip-192.168.10.150:3260-iscsi-iqn.2016-08.disk.rh7:disk100m-lun-1-part1:
/dev/sdb1:
Magic: eb7a62c2
Label: testqdisk
Created: Mon May 22 22:52:01 2017
Host: node1.localdomain
Kernel Sector Size: 512
Recorded Sector Size: 512
[root@node1 ~]#
15、在節(jié)點(diǎn)二查看仲裁盤,也正常識別
[root@node2 ~]# partprobe /dev/sdb1
[root@node2 ~]# mkqdisk -L
mkqdisk v3.0.12.1
/dev/block/8:17:
/dev/disk/by-id/scsi-1IET_00010001-part1:
/dev/disk/by-path/ip-192.168.10.150:3260-iscsi-iqn.2016-08.disk.rh7:disk100m-lun-1-part1:
/dev/sdb1:
Magic: eb7a62c2
Label: testqdisk
Created: Mon May 22 22:52:01 2017
Host: node1.localdomain
Kernel Sector Size: 512
Recorded Sector Size: 512
16、配置群集使用該仲裁盤
17、重啟群集,使仲裁盤生效
[root@node1 ~]# ccs -h node1 --stopall
node1 password:
Stopped node2.localdomain
Stopped node1.localdomain
[root@node1 ~]# ccs -h node1 --startall
Started node2.localdomain
Started node1.localdomain
[root@node1 ~]#
18、查看群集狀態(tài)
[root@node1 ~]# clustat
Cluster Status for TestCluster2 @ Mon May 22 23:48:27 2017
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
node1.localdomain 1 Online, Local, rgmanager
node2.localdomain 2 Online, rgmanager
/dev/block/8:17 0 Online, Quorum Disk
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:TestServGrp node1.localdomain started
[root@node1 ~]#
19、查看群集節(jié)點(diǎn)狀態(tài)
[root@node1 ~]# ccs_tool lsnode
Cluster name: icpl_cluster, config_version: 21
Nodename Votes Nodeid Fencetype
node1.localdomain 1 1 vcenter_fence
node2.localdomain 1 2 vcenter_fence
20、查看群集節(jié)點(diǎn)同步狀態(tài)
[root@node1 ~]# ccs -h node1 --checkconf
All nodes in sync.
21、使用群集IP訪問web服務(wù)
江健龍的技術(shù)博客http://jiangjianlong.blog.51cto.com/3735273/1931499
四、群集故障轉(zhuǎn)移測試
1、關(guān)閉主節(jié)點(diǎn),故障自動轉(zhuǎn)移功能正常
[root@node1 ~]#poweroff
[root@node1 ~]#tail–f /var/log/messages
May 23 10:29:26 node1 modclusterd: shutdown succeeded
May 23 10:29:26 node1 rgmanager[2125]: Shutting down
May 23 10:29:26 node1 rgmanager[2125]: Shutting down
May 23 10:29:26 node1 rgmanager[2125]:Stopping service service:TestServGrp
May 23 10:29:27 node1 rgmanager[2125]: [ip] Removing IPv4 address 192.168.10.103/24 from eth0
May 23 10:29:36 node1rgmanager[2125]: Service service:TestServGrp is stopped
May 23 10:29:36 node1 rgmanager[2125]: Disconnecting from CMAN
May 23 10:29:52 node1 rgmanager[2125]: Exiting
May 23 10:29:53 node1 ricci:shutdown succeeded
May 23 10:29:54 node1 oddjobd: oddjobd shutdown succeeded
May 23 10:29:54 node1 saslauthd[2315]:server_exit : master exited: 2315
[root@node2 ~]#tail–f /var/log/messages
May 23 10:29:45 node2 rgmanager[2130]: Member 1 shutting down
May 23 10:29:45 node2 rgmanager[2130]: Starting stopped service service:TestServGrp
May 23 10:29:45 node2 rgmanager[5688]: [ip] Adding IPv4 address 192.168.10.103/24 to eth0
May 23 10:29:49 node2 rgmanager[2130]: Service service:TestServGrp started
May 23 10:30:06 node2 qdiskd[1480]: Node 1 shutdown
May 23 10:30:06 node2 corosync[1437]: [QUORUM Members[1]: 2
May 23 10:30:06 node2 corosync[1437]: [TOTEM ] A processor joined or left the membership
and a new membership was formed.
May 23 10:30:06 node2 corosync[1437]: [CPG ] chosen downlist: sender r(0) ip (192.168.10.105) :
members(old:2 left:1)
May 23 10:30:06 node2 corosync[1437]: [MAIN ] Completed service synchronization, ready to
provide service
May 23 10:30:06 node2 kernel: dlm: closing connection to node 1
May 23 10:30:06 node2 qdiskd[1480]: Assuming master role
[root@node2 ~]# clustat
Cluster Status for TestCluster2 @ Mon May 22 23:48:27 2017
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
node1.localdomain 1 Online, Local, rgmanager
node2.localdomain 2 Online, rgmanager
/dev/block/8:17 0 Online, Quorum Disk
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:TestServGrp node2.localdomain started
[root@node2 ~]#
2、停掉主節(jié)點(diǎn)應(yīng)用服務(wù),故障自動轉(zhuǎn)移功能正常
[root@node2 ~]# /etc/init.d/httpd stop
[root@node2 ~]#tail–f /var/log/messages
May 23 11:14:02 node2 rgmanager[11264]: [script] Executing /etc/init.d/httpd status
May 23 11:14:02 node2 rgmanager[11289]: [script] script:icpl: status of /etc/init.d/httpd failed (returned 3)
May 23 11:14:02 node2 rgmanager[2127]: status on script "httpd" returned 1 (generic error)
May 23 11:14:02 node2 rgmanager[2127]: Stopping service service:TestServGrp
May 23 11:14:03 node2 rgmanager[11320]: [script] Executing /etc/init.d/httpd stop
May 23 11:14:03 node2 rgmanager[11384]: [ip] Removing IPv4 address 192.168.10.103/24 from eth0
May 23 11:14:08 node2 ricci[11416]: Executing '/usr/bin/virsh nodeinfo'
May 23 11:14:08 node2 ricci[11418]: Executing '/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/2116732044'
May 23 11:14:09 node2 ricci[11422]: Executing '/usr/libexec/ricci/ricci-worker -f /var/lib/ricci/queue/1193918332'
May 23 11:14:13 node2 rgmanager[2127]: Service service:TestServGrp is recovering
May 23 11:14:17 node2 rgmanager[2127]: Service service:TestServGrp is now running on member 1
[root@node1 ~]#tail–f /var/log/messages
May 23 11:14:20 node1 rgmanager[2130]: Recovering failed service service:TestServGrp
May 23 11:14:20 node1 rgmanager[13006]: [ip] Adding IPv4 address 192.168.10.103/24 to eth0
May 23 11:14:24 node1 rgmanager[13092]: [script] Executing /etc/init.d/httpd start
May 23 11:14:24 node1 rgmanager[2130]: Service service:TestServGrp started
May 23 11:14:58 node1 rgmanager[13280]: [script] Executing /etc/init.d/httpd status
[root@node1 ~]# clustat
Cluster Status for TestCluster2 @ Mon May 22 23:48:27 2017
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
node1.localdomain 1 Online, Local, rgmanager
node2.localdomain 2 Online, rgmanager
/dev/block/8:17 0 Online, Quorum Disk
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:TestServGrp node1.localdomain started
[root@node1 ~]#
3、停掉主節(jié)點(diǎn)網(wǎng)絡(luò)服務(wù),故障自動轉(zhuǎn)移功能正常
[root@node1 ~]#service network stop
[root@node2 ~]#tail–f /var/log/messages
May 23 22:11:16 node2 qdiskd[1480]: Assuming master role
May 23 22:11:17 node2 qdiskd[1480]: Writing eviction notice for node 1
May 23 22:11:17 node2 corosync[1437]: [TOTEM ] A processor failed, forming new configuration.
May 23 22:11:18 node2 qdiskd[1480]: Node 1 evicted
May 23 22:11:19 node2 corosync[1437]: [QUORUM] Members[1]: 2
May 23 22:11:19 node2 corosync[1437]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
May 23 22:11:19 node2 corosync[1437]: [CPG ] chosen downlist: sender r(0) ip(192.168.10.105) ; members(old:2 left:1)
May 23 22:11:19 node2 corosync[1437]: [MAIN ] Completed service synchronization, ready to provide service.
May 23 22:11:19 node2 kernel: dlm: closing connection to node 1
May 23 22:11:19 node2 rgmanager[2131]: State change: node1.localdomain DOWN
May 23 22:11:19 node2 fenced[1652]: fencing node1.localdomain
May 23 22:11:58 node2 fenced[1652]: fence node1.localdomain success
May 23 22:11:59 node2 rgmanager[2131]: Taking over service service:TestServGrp from down member node1.localdomain
May 23 22:11:59 node2 rgmanager[6145]: [ip] Adding IPv4 address 192.168.10.103/24 to eth0
May 23 22:12:03 node2 rgmanager[6234]: [script] Executing /etc/init.d/httpd start
May 23 22:12:03 node2 rgmanager[2131]: Service service:TestServGrp started
May 23 22:12:35 node2 corosync[1437]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
May 23 22:12:35 node2 corosync[1437]: [QUORUM] Members[2]: 1 2
May 23 22:12:35 node2 corosync[1437]: [QUORUM] Members[2]: 1 2
May 23 22:12:35 node2 corosync[1437]: [CPG ] chosen downlist: sender r(0) ip(192.168.10.105) ; members(old:1 left:0)
May 23 22:12:35 node2 corosync[1437]: [MAIN ] Completed service synchronization, ready to provide service.
May 23 22:12:41 node2 rgmanager[6425]: [script] Executing /etc/init.d/httpd status
May 23 22:12:43 node2 qdiskd[1480]: Node 1 shutdown
May 23 22:12:55 node2 kernel: dlm: got connection from 1
May 23 22:13:08 node2 rgmanager[2131]: State change: node1.localdomain UP
[root@node2 ~]# clustat
Cluster Status for TestCluster2 @ Mon May 22 23:48:27 2017
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
node1.localdomain 1 Online, Local, rgmanager
node2.localdomain 2 Online, rgmanager
/dev/block/8:17 0 Online, Quorum Disk
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:TestServGrp node2.localdomain started
[root@node2 ~]#
附:RHCS名詞解釋
1 分布式集群管理器(CMAN,Cluster Manager)
管理集群成員,了解成員之間的運(yùn)行狀態(tài)。
2 分布式鎖管理器(DLM,Distributed Lock Manager)
每一個節(jié)點(diǎn)都運(yùn)行了一個后臺進(jìn)程DLM,當(dāng)用記操作一個元數(shù)據(jù)時(shí),會通知其它節(jié)點(diǎn),只能讀取這個元數(shù)據(jù)。
3 配置文件管理(CCS,Cluster Configuration System)
主要用于集群配置文件管理,用于配置文件的同步。每個節(jié)點(diǎn)運(yùn)行了CSS后臺進(jìn)程。當(dāng)發(fā)現(xiàn)配置文件(/etc/cluster/cluster.conf)變化后,馬上將此變化傳播到其它節(jié)點(diǎn)上去。
4.fence設(shè)備(fence)
工作原理:當(dāng)主機(jī)異常,務(wù)機(jī)會調(diào)用fence設(shè)備,然后將異常主機(jī)重啟,當(dāng)fence設(shè)備操作成功后,返回信息給備機(jī),備機(jī)接到fence設(shè)備的消息后,接管主機(jī)的服務(wù)和資源。
5、Conga集群管理軟件:
Conga由兩部分組成:luci和ricci,luci是跑在集群管理服務(wù)器上的服務(wù),而ricci則是跑在各集群節(jié)點(diǎn)上的服務(wù),luci也可以裝在節(jié)點(diǎn)上。集群的管理和配置由這兩個服務(wù)進(jìn)行通信,可以使用Conga的web界面來管理RHCS集群。
6、高可用性服務(wù)管理(rgmanager)
提供節(jié)點(diǎn)服務(wù)監(jiān)控和服務(wù)故障轉(zhuǎn)移功能,當(dāng)一個節(jié)點(diǎn)服務(wù)出現(xiàn)故障時(shí),將服務(wù)轉(zhuǎn)移到另一個健康節(jié)點(diǎn)。
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場,如果涉及侵權(quán)請聯(lián)系站長郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。