您好,登錄后才能下訂單哦!
小編給大家分享一下RabbitMQ如何實(shí)現(xiàn)服務(wù)檢查,希望大家閱讀完這篇文章之后都有所收獲,下面讓我們一起去探討吧!
登錄到各個(gè)RabbitMQ節(jié)點(diǎn)上,執(zhí)行
rabbitmqctl status 正常狀態(tài)如下: # Status of node rabbit@devxyz ... # [{pid,13505}, # {running_applications, # [{rabbitmq_management,"RabbitMQ Management Console","3.6.5"}, # {rabbitmq_management_agent,"RabbitMQ Management Agent","3.6.5"}, # {rabbit,"RabbitMQ","3.6.5"}, # {os_mon,"CPO CXC 138 46","2.4"}, # {rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.6.5"}, # {webmachine,"webmachine","1.10.3"}, # {mochiweb,"MochiMedia Web Server","2.13.1"}, # {amqp_client,"RabbitMQ AMQP Client","3.6.5"}, # {rabbit_common,[],"3.6.5"}, # {mnesia,"MNESIA CXC 138 12","4.13.4"}, # {compiler,"ERTS CXC 138 10","6.0.3"}, # {ssl,"Erlang/OTP SSL application","7.3.3.1"}, # {ranch,"Socket acceptor pool for TCP protocols.","1.2.1"}, # {public_key,"Public key infrastructure","1.1.1"}, # {xmerl,"XML parser","1.3.10"}, # {inets,"INETS CXC 138 49","6.2.4"}, # {asn1,"The Erlang ASN1 compiler version 4.0.2","4.0.2"}, # {crypto,"CRYPTO","3.6.3"}, # {syntax_tools,"Syntax tools","1.7"}, # {sasl,"SASL CXC 138 11","2.7"}, # {stdlib,"ERTS CXC 138 10","2.8"}, # {kernel,"ERTS CXC 138 10","4.2"}]}, # {os,{unix,linux}}, # {erlang_version, # "Erlang/OTP 18 [erts-7.3.1.2] [source] [64-bit] [smp:8:8] [async-threads:128] [hipe] [kernel-poll:true]\n"}, # {memory, # [{total,119288000}, # {connection_readers,491304}, # {connection_writers,33944}, # {connection_channels,115312}, # {connection_other,563312}, # {queue_procs,510368}, # {queue_slave_procs,0}, # {plugins,1254560}, # {other_proc,18328184}, # {mnesia,160320}, # {mgmt_db,2527968}, # {msg_index,66840}, # {other_ets,1641160}, # {binary,55247472}, # {code,27655723}, # {atom,992409}, # {other_system,9699124}]}, # {alarms,[]}, # {listeners,[{clustering,25672,"::"},{amqp,5672,"::"}]}, # {vm_memory_high_watermark,0.4}, # {vm_memory_limit,6663295795}, # {disk_free_limit,50000000}, # {disk_free,53003800576}, # {file_descriptors, # [{total_limit,1948}, # {total_used,23}, # {sockets_limit,1751}, # {sockets_used,21}]}, # {processes,[{limit,1048576},{used,498}]}, # {run_queue,0}, # {uptime,47953}, # {kernel,{net_ticktime,60}}] 得到rabbitmq服務(wù)的狀態(tài),得到的結(jié)果中顯示服務(wù)正在running,且不存在nodedown、error等字樣,并且 running_applications中包含了rabbitmq_management等應(yīng)用名(如果開(kāi)啟了rabbitmq_management等插件) 1.1、如果running狀態(tài),但是沒(méi)有rabbitmq_management字樣,類似如下結(jié)果: # Status of node rabbit@devxyz ... # [{pid,13505}, # {running_applications,[{compiler,"ERTS CXC 138 10","6.0.3"}, # {ssl,"Erlang/OTP SSL application","7.3.3.1"}, # {ranch,"Socket acceptor pool for TCP protocols.", # "1.2.1"}, # {public_key,"Public key infrastructure","1.1.1"}, # {xmerl,"XML parser","1.3.10"}, # {inets,"INETS CXC 138 49","6.2.4"}, # {asn1,"The Erlang ASN1 compiler version 4.0.2", # "4.0.2"}, # {crypto,"CRYPTO","3.6.3"}, # {syntax_tools,"Syntax tools","1.7"}, # {sasl,"SASL CXC 138 11","2.7"}, # {stdlib,"ERTS CXC 138 10","2.8"}, # {kernel,"ERTS CXC 138 10","4.2"}]}, # {os,{unix,linux}}, # {erlang_version,"Erlang/OTP 18 [erts-7.3.1.2] [source] [64-bit] [smp:8:8] [async-threads:128] [hipe] [kernel-poll:true]\n"}, # {memory,[{total,58267544}, # {connection_readers,0}, # {connection_writers,0}, # {connection_channels,0}, # {connection_other,0}, # {queue_procs,0}, # {queue_slave_procs,0}, # {plugins,0}, # {other_proc,18771312}, # {mnesia,0}, # {mgmt_db,0}, # {msg_index,0}, # {other_ets,1218464}, # {binary,29984}, # {code,27655723}, # {atom,992409}, # {other_system,9599652}]}, # {alarms,[]}, # {listeners,[]}, # {processes,[{limit,1048576},{used,73}]}, # {run_queue,0}, # {uptime,48363}, # {kernel,{net_ticktime,60}}] 則說(shuō)明rabbitmq應(yīng)用沒(méi)有啟動(dòng),只啟動(dòng)了基礎(chǔ)服務(wù),則執(zhí)行 rabbitmqctl start_app 得到: # Starting node rabbit@devxyz ... 然后 rabbitmqctl status 再次驗(yàn)證服務(wù)狀態(tài) 1.2、如果存在error,例如: # Status of node rabbit@devxyz ... # Error: unable to connect to node rabbit@devxyz: nodedown # # DIAGNOSTICS # =========== # # attempted to contact: [rabbit@devxyz] # # rabbit@devxyz: # * connected to epmd (port 4369) on devxyz # * epmd reports: node 'rabbit' not running at all # no other nodes on devxyz # * suggestion: start the node # # current node details: # - node name: 'rabbitmq-cli-07@devxyz' # - home dir: /var/lib/rabbitmq # - cookie hash: duuNopvOx1ChRdjrRHPo+A== 說(shuō)明rabbitmq的基礎(chǔ)服務(wù)都沒(méi)有啟動(dòng)起來(lái),首先嘗試如下命令看是否可以啟動(dòng): rabbitmq-server -detached 得到: # Warning: PID file not written; -detached was passed. rabbitmqctl start_app 得到 # Starting node rabbit@devxyz ... rabbitmqctl status驗(yàn)證 如果無(wú)法得到正常狀態(tài),則需要根據(jù)報(bào)錯(cuò)信息進(jìn)行判斷再進(jìn)行相應(yīng)操作
登錄到任意一個(gè)存活的RabbitMQ節(jié)點(diǎn)上,執(zhí)行
rabbitmqctl cluster_status 得到: # Cluster status of node rabbit@HYRBT001 ... # [{nodes,[{disc,[rabbit@HYRBT001,rabbit@HYRBT002,rabbit@HYRBT003]}]}, # {running_nodes,[rabbit@HYRBT003,rabbit@HYRBT002,rabbit@HYRBT001]}, # {cluster_name,<<"HYRBT001">>}, # {partitions,[]}, # {alarms,[{rabbit@HYRBT003,[]},{rabbit@HYRBT002,[]},{rabbit@HYRBT001,[]}]}] 得到集群的狀態(tài)信息 nodes: 后面會(huì)顯示所有的rabbitmq節(jié)點(diǎn) running_nodes: 后面會(huì)顯示所有的rabbitmq節(jié)點(diǎn) cluster_name:后面會(huì)顯示集群名稱 partitions之后為空 alarms之后跟的節(jié)點(diǎn)之后的[]中為空 2.1、如果nodes后面的rabbitmq節(jié)點(diǎn)不全,說(shuō)明存在節(jié)點(diǎn)沒(méi)有加入到集群中 例如: # Cluster status of node rabbit@HYCTL001 ... # [{nodes,[{disc,[rabbit@HYCTL001,rabbit@HYCTL002]}]}, # {running_nodes,[rabbit@HYCTL002,rabbit@HYCTL001]}, # {cluster_name,<<"rabbit@HYCTL001">>}, # {partitions,[]}, # {alarms,[{rabbit@HYCTL002,[]},{rabbit@HYCTL001,[]}]}] 但實(shí)際上應(yīng)該有三個(gè)節(jié)點(diǎn)上,則登錄到未加入到集群中的節(jié)點(diǎn)3上 首先驗(yàn)證此節(jié)點(diǎn)與已經(jīng)加入集群的節(jié)點(diǎn)的連通性,通過(guò)ping測(cè)試 然后驗(yàn)證.erlang.cookie是否相同 .erlang.cookie位于/var/lib/rabbitmq/下 如果不同,則將集群中節(jié)點(diǎn)的內(nèi)容復(fù)制到此節(jié)點(diǎn)上 驗(yàn)證都通過(guò)后,查看rabbitmq服務(wù)是否已經(jīng)開(kāi)啟,具體見(jiàn)步驟1 服務(wù)正常之后,執(zhí)行如下命令加入集群: rabbitmqctl stop_app 得到: # Stopping node rabbit@HYCTL003 ... rabbitmqctl reset 得到: # Resetting node rabbit@HYCTL003 ... rabbitmqctl join_cluster rabbit@集群節(jié)點(diǎn)名 得到: # Clustering node rabbit@HYCTL003 with rabbit@HYCTL001 ... rabbitmqctl start_app 得到: # Starting node rabbit@HYCTL003 ... rabbitmqctl cluster_status驗(yàn)證節(jié)點(diǎn)已經(jīng)加入到nodes、running_nodes及alarms之后 得到: # Cluster status of node rabbit@HYCTL003 ... # [{nodes,[{disc,[rabbit@HYCTL001,rabbit@HYCTL002,rabbit@HYCTL003]}]}, # {running_nodes,[rabbit@HYCTL001,rabbit@HYCTL002,rabbit@HYCTL003]}, # {cluster_name,<<"rabbit@HYCTL001">>}, # {partitions,[]}, # {alarms,[{rabbit@HYCTL001,[]},{rabbit@HYCTL002,[]},{rabbit@HYCTL003,[]}]}] 2.2、如果running_nodes之后未顯示所有的節(jié)點(diǎn),說(shuō)明部分節(jié)點(diǎn)上的rabbitmq服務(wù)未正常,例如: # Cluster status of node rabbit@HYCTL001 ... # [{nodes,[{disc,[rabbit@HYCTL001,rabbit@HYCTL002,rabbit@HYCTL003]}]}, # {running_nodes,[rabbit@HYCTL002,rabbit@HYCTL001]}, # {cluster_name,<<"rabbit@HYCTL001">>}, # {partitions,[]}, # {alarms,[{rabbit@HYCTL002,[]},{rabbit@HYCTL001,[]}]}] 發(fā)現(xiàn)節(jié)點(diǎn)3沒(méi)有running,則登錄到節(jié)點(diǎn)3 參考步驟1進(jìn)行處理,處理完成后,執(zhí)行 rabbitmqctl cluster_status驗(yàn)證 # Cluster status of node rabbit@HYCTL003 ... # [{nodes,[{disc,[rabbit@HYCTL001,rabbit@HYCTL002,rabbit@HYCTL003]}]}, # {running_nodes,[rabbit@HYCTL001,rabbit@HYCTL002,rabbit@HYCTL003]}, # {cluster_name,<<"rabbit@HYCTL001">>}, # {partitions,[]}, # {alarms,[{rabbit@HYCTL001,[]},{rabbit@HYCTL002,[]},{rabbit@HYCTL003,[]}]}] 2.3、如果partitions中存在節(jié)點(diǎn),則說(shuō)明發(fā)生了腦裂(一般為網(wǎng)絡(luò)問(wèn)題,導(dǎo)致節(jié)點(diǎn)之間通信異常),集群服務(wù)處于異常狀態(tài)。 例如: # Cluster status of node rabbit@HYCTL001 ... # [{nodes,[{disc,[rabbit@HYCTL001,rabbit@HYCTL002,rabbit@HYCTL00]}]}, # {running_nodes,[rabbit@HYCTL001,rabbit@HYCTL002,rabbit@HYCTL003]}, # {cluster_name,<<"rabbit@HYCTL001">>}, # {partitions,[{rabbit@HYCTL001,rabbit@HYCTL002,[rabbit@HYCTL001]}, # {rabbit@HYCTL003,[rabbit@HYCTL001,rabbit@HYCTL002]}]}, # {alarms,[{rabbit@HYCTL001,[]},{rabbit@HYCTL002,[]},{rabbit@HYCTL003,[]}]}] 需要確定一個(gè)主節(jié)點(diǎn)進(jìn)行保留,然后把另外partition中節(jié)點(diǎn)進(jìn)行服務(wù)重啟。 主節(jié)點(diǎn)的確定主要分兩種情況: 2.3.1、如果使用了haproxy來(lái)對(duì)rabbitmq集群進(jìn)行負(fù)載均衡,并且設(shè)置了主備模式,則可以通過(guò)查看haproxy的配置 文件來(lái)確定: 登錄到某一臺(tái)控制節(jié)點(diǎn),查看haproxy配置文件: cat /etc/haproxy/conf.d/100-rabbitmq.cfg 得到: # listen rabbitmq # bind 192.168.0.10:5672 # balance roundrobin # mode tcp # option tcpka # timeout client 48h # timeout server 48h # server HYCTL001 192.168.0.11:5673 check inter 5000 rise 2 fall 3 # server HYCTL002 192.168.0.12:5673 backup check inter 5000 rise 2 fall 3 # server HYCTL003 192.168.0.13:5673 backup check inter 5000 rise 2 fall 3 配置文件中存在backup的是備節(jié)點(diǎn),無(wú)backup的是主節(jié)點(diǎn),由此可見(jiàn),對(duì)于本環(huán)境,HYCTL001為主節(jié)點(diǎn),處理業(yè)務(wù) 確定好主節(jié)點(diǎn)之后,登錄到其他非主節(jié)點(diǎn)的rabbitmq節(jié)點(diǎn)進(jìn)行rabbitmq服務(wù)的重啟 執(zhí)行如下命令: rabbitmqctl stop 得到: # Stopping and halting node rabbit@HYCTL003 ... rabbitmq-server -detached 得到: # Warning: PID file not written; -detached was passed. rabbitmqctl start_app 得到: # Starting node rabbit@HYCTL003 ... 使用rabbitmqctl status檢查狀態(tài) 使用rabbitmqctl cluster_status檢查集群狀態(tài),如果依然存在其他腦裂的節(jié)點(diǎn),則partitions中主節(jié)點(diǎn)所在元組 會(huì)增加剛剛重啟的節(jié)點(diǎn),其他元組中該節(jié)點(diǎn)被移除。如果所有的腦裂節(jié)點(diǎn)都已經(jīng)處理完畢,則partitions后無(wú)節(jié)點(diǎn)存在, 得到: # Cluster status of node rabbit@HYCTL003 ... # [{nodes,[{disc,[rabbit@HYCTL001,rabbit@HYCTL002,rabbit@HYCTL003]}]}, # {running_nodes,[rabbit@HYCTL001,rabbit@HYCTL002,rabbit@HYCTL003]}, # {cluster_name,<<"rabbit@HYCTL001">>}, # {partitions,[]}, # {alarms,[{rabbit@HYCTL001,[]},{rabbit@HYCTL002,[]},{rabbit@HYCTL003,[]}]}] 2.3.2 如果沒(méi)有設(shè)置主備模式,則需要確定下當(dāng)前連接數(shù)最多的節(jié)點(diǎn),以此節(jié)點(diǎn)為主 通過(guò)查看連接數(shù)來(lái)進(jìn)行判斷,在任意一個(gè)RabbitMQ節(jié)點(diǎn)上執(zhí)行: rabbitmqctl list_connections pid | grep HYCTL001(節(jié)點(diǎn)名) | wc -l 對(duì)所有的節(jié)點(diǎn)名進(jìn)行連接數(shù)個(gè)數(shù)的選取,最終選擇連接數(shù)目最多的那個(gè)partition元組作為主元組,對(duì)其他元組中的節(jié)點(diǎn) 進(jìn)行RabbitMQ服務(wù)的重啟,重啟步驟與2.3.1相同 2.4、如果alarms中存在節(jié)點(diǎn),說(shuō)明內(nèi)存或者磁盤(pán)占用過(guò)多,例如: # Cluster status of node rabbit@HYCTL003 ... # [{nodes,[{disc,[rabbit@HYCTL001,rabbit@HYCTL002,rabbit@HYCTL003]}]}, # {running_nodes,[rabbit@HYCTL001,rabbit@HYCTL002,rabbit@HYCTL003]}, # {cluster_name,<<"rabbit@HYCTL001">>}, # {partitions,[]}, # {alarms,[{rabbit@HYCTL001,[]}, # {rabbit@HYCTL002,[]}, # {rabbit@HYCTL003,[disk,memory]}]}] 說(shuō)明節(jié)點(diǎn)3上內(nèi)存和磁盤(pán)都出現(xiàn)了報(bào)警,說(shuō)明有大量的消息堆積在了節(jié)點(diǎn)3上,原因可能是后端消費(fèi)消息的服務(wù)異常或者存在無(wú)效的隊(duì)列 一直在接收消息,但是并沒(méi)有消費(fèi)者進(jìn)行消費(fèi)。 RabbitMQ報(bào)警的參數(shù)是可以設(shè)置的,具體的值通過(guò)rabbitmqctl status可以看到,如下: # {vm_memory_high_watermark,0.4},內(nèi)存使用閾值 # {vm_memory_limit,81016840192},內(nèi)存使用限值 # {disk_free_limit,50000000},空閑磁盤(pán)限值 # {disk_free,553529729024},磁盤(pán)空余量 # {file_descriptors, # [{total_limit,102300}, # {total_used,2040}, # {sockets_limit,92068}, # {sockets_used,2038}]},文件描述符和socket的使用及閾值 # {processes,[{limit,1048576},{used,31681}]},進(jìn)程數(shù)使用及閾值 消息堆積數(shù)目的確定可以通過(guò)如下命令: rabbitmqctl list_queues messages_ready | awk 'NR>=2{print }'| awk '{sum+=$1}END{print sum}' 得到最終消息堆積數(shù)目 rabbitmqctl list_queues message_bytes_ram | awk 'NR>=2{print }'| awk '{sum+=$1}END{print sum}' 可以得到消息堆積占用的內(nèi)存 rabbitmqctl list_queues message_bytes_persistent | awk 'NR>=2{print }'| awk '{sum+=$1}END{print sum}' 可以得到消息堆積占用的磁盤(pán) 消息堆積時(shí)需要先檢查是那些隊(duì)列堆積消息過(guò)多 rabbitmqctl list_queues message_bytes_ram name | awk 'NR>=2{print }'|sort -rn|less 得到消息堆積數(shù)目的從大到小的排序,并顯示隊(duì)列名稱,然后根據(jù)隊(duì)列名進(jìn)行不同節(jié)點(diǎn)服務(wù)的排查,如果是服務(wù)狀態(tài)異常,則 對(duì)服務(wù)進(jìn)行處理,如果是無(wú)效隊(duì)列(前期使用當(dāng)前已經(jīng)不再使用的服務(wù)產(chǎn)生的隊(duì)列),則進(jìn)行刪除,隊(duì)列的刪除需要登錄到 RabbitMQ的管理頁(yè)面上進(jìn)行處理,后面會(huì)寫(xiě)管理頁(yè)面的操作。 2.5、檢查RabbitMQ的隊(duì)列或者連接是否處于流控狀態(tài) 當(dāng)RabbitMQ的消費(fèi)者端的處理能力遠(yuǎn)低于消息的生產(chǎn)速度時(shí),RabbitMQ會(huì)自動(dòng)進(jìn)行流控,避免消息過(guò)度堆積且導(dǎo)致消息從 產(chǎn)生到被消費(fèi)時(shí)間間隔過(guò)長(zhǎng)。 是否發(fā)生了流控可以通過(guò)命令行查看,登錄到任意一個(gè)RabbitMQ節(jié)點(diǎn),執(zhí)行 rabbitmqctl list_queus name state | grep flow 如果得到結(jié)果,說(shuō)明對(duì)應(yīng)的隊(duì)列產(chǎn)生了流控,需要對(duì)隊(duì)列的生產(chǎn)進(jìn)程和消費(fèi)進(jìn)程進(jìn)行檢查,參考2.4 rabbitmqctl list_connections name state|grep flow 如果得到結(jié)果,說(shuō)明對(duì)應(yīng)的連接產(chǎn)生了流控,此時(shí)隊(duì)列中也一定會(huì)有流控,對(duì)隊(duì)列的生產(chǎn)進(jìn)程和消費(fèi)進(jìn)程進(jìn)行檢查,參考2.4
RabbitMQ管理頁(yè)面的開(kāi)啟需要先啟用rabbitmq_management插件 登錄到任意一臺(tái)RabbitMQ節(jié)點(diǎn),首先查看是否啟用了rabbitmq_management插件: rabbitmq-plugins list -v -E |grep -A5 rabbitmq_management 得到: # [E*] rabbitmq_management # Version: 3.6.5 # Dependencies: [rabbitmq_web_dispatch,amqp_client, # rabbitmq_management_agent] # Description: RabbitMQ Management Console 說(shuō)明rabbitmq_management插件已經(jīng)啟用 如果未啟用,則通過(guò)如下命令開(kāi)啟: rabbitmq-plugins enable rabbitmq_management 得到: # The following plugins have been enabled: # mochiweb # webmachine # rabbitmq_web_dispatch # amqp_client # rabbitmq_management_agent # rabbitmq_management # # Applying plugin configuration to rabbit@devxyz... started 6 plugins. rabbitmq_management插件啟用以后,需要開(kāi)通15672端口的防火墻規(guī)則,rabbitmq_management插件默認(rèn)使用15672端口 進(jìn)行訪問(wèn) iptables -I INPUT -p tcp --dport 15672 -j ACCEPT service iptables save 開(kāi)啟iptables規(guī)則并保存 然后使用此節(jié)點(diǎn)的ip:15672登錄到管理頁(yè)面 輸入用戶名密碼 用戶名可以通過(guò) rabbitmqctl list_users來(lái)獲取,對(duì)應(yīng)的密碼為之前用戶設(shè)置的密碼,使用非guest用戶登錄 登錄過(guò)后可以看到RabbitMQ整個(gè)集群的狀態(tài),各節(jié)點(diǎn)的狀態(tài),是否存在腦裂,是否存在報(bào)警,當(dāng)前的消息堆積數(shù)目等, 如果需要對(duì)隊(duì)列進(jìn)行刪除,需要點(diǎn)擊Queues標(biāo)簽,然后再Filter后輸入隊(duì)列名,點(diǎn)擊進(jìn)入隊(duì)列,拉到頁(yè)面下方,點(diǎn)擊 Delete/purge欄,點(diǎn)擊Delete可刪除隊(duì)列,purge可清空隊(duì)列
看完了這篇文章,相信你對(duì)“RabbitMQ如何實(shí)現(xiàn)服務(wù)檢查”有了一定的了解,如果想了解更多相關(guān)知識(shí),歡迎關(guān)注億速云行業(yè)資訊頻道,感謝各位的閱讀!
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。