溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊(cè)×
其他方式登錄
點(diǎn)擊 登錄注冊(cè) 即表示同意《億速云用戶(hù)服務(wù)條款》

hdfs使用手冊(cè)balancer(數(shù)據(jù)平衡)命令詳解

發(fā)布時(shí)間:2020-08-05 17:54:33 來(lái)源:網(wǎng)絡(luò) 閱讀:1289 作者:馬吉輝 欄目:大數(shù)據(jù)

2019/1/21 星期一


2.3.1. balancer
運(yùn)行一個(gè)集群平衡工具。管理員可以通過(guò)按Ctrl+C鍵停止再平衡過(guò)程。
數(shù)據(jù)平衡的需求
HDFS數(shù)據(jù)不平衡的原因
1.某個(gè)DataNode機(jī)器內(nèi)硬盤(pán)存儲(chǔ)達(dá)到飽和值。
2.集群內(nèi)新增、刪除節(jié)點(diǎn)。
數(shù)據(jù)不平衡的影響
1.Map任務(wù)可能會(huì)被分配給沒(méi)有存儲(chǔ)數(shù)據(jù)的機(jī)器,結(jié)果是不能實(shí)現(xiàn)本地計(jì)算,最終會(huì)導(dǎo)致網(wǎng)絡(luò)
帶寬的消耗。
2.當(dāng)一些數(shù)據(jù)節(jié)點(diǎn)數(shù)據(jù)完全滿(mǎn)載時(shí),新的數(shù)據(jù)塊只會(huì)被存放在有空余數(shù)據(jù)的節(jié)點(diǎn)機(jī)器上,造成了并行讀取的可能性。
數(shù)據(jù)平衡過(guò)程的要求
1.數(shù)據(jù)平衡不會(huì)導(dǎo)致數(shù)據(jù)塊減少、數(shù)據(jù)塊備份丟失。
2.管理員可以中止數(shù)據(jù)平衡進(jìn)程。
3.每次數(shù)據(jù)塊移動(dòng)的大小應(yīng)該是可控的,這樣可以放置阻塞網(wǎng)絡(luò)。
4.namenode不會(huì)因?yàn)閿?shù)據(jù)平衡服務(wù)而導(dǎo)致過(guò)于繁忙。
數(shù)據(jù)自動(dòng)平衡原理
由于使用了平衡算法,導(dǎo)致數(shù)據(jù)平衡是一個(gè)迭代的、周而復(fù)始的過(guò)程。每一次迭代的最終目的
是讓高負(fù)載的機(jī)器能夠降低數(shù)據(jù)負(fù)載,所以數(shù)據(jù)平衡會(huì)最大程度上地使用網(wǎng)絡(luò)帶寬。

數(shù)據(jù)平衡流程交互圖
hdfs使用手冊(cè)balancer(數(shù)據(jù)平衡)命令詳解

步驟分析如下:

1、數(shù)據(jù)均衡服務(wù)(Rebalancing Server)首先要求 NameNode 生成 DataNode 數(shù)據(jù)分布分析報(bào)告,獲取每個(gè)DataNode磁盤(pán)使用情況
2、Rebalancing Server匯總需要移動(dòng)的數(shù)據(jù)分布情況,計(jì)算具體數(shù)據(jù)塊遷移路線(xiàn)圖。數(shù)據(jù)塊遷移路線(xiàn)圖,確保網(wǎng)絡(luò)內(nèi)最短路徑
3、開(kāi)始數(shù)據(jù)塊遷移任務(wù),Proxy Source Data Node復(fù)制一塊需要移動(dòng)數(shù)據(jù)塊
4、將復(fù)制的數(shù)據(jù)塊復(fù)制到目標(biāo)DataNode上
5、刪除原始數(shù)據(jù)塊
6、目標(biāo)DataNode向Proxy Source Data Node確認(rèn)該數(shù)據(jù)塊遷移完成
7、Proxy Source Data Node向Rebalancing Server確認(rèn)本次數(shù)據(jù)塊遷移完成。然后繼續(xù)執(zhí)行這個(gè)過(guò)程,直至集群達(dá)到數(shù)據(jù)均衡標(biāo)準(zhǔn)

實(shí)際操作
1.切換到hdfs用戶(hù)

[root@hadoop-master ~]# su - hdfs
2.查看當(dāng)前的數(shù)據(jù)分布情況
[hdfs@hadoop-master ~]$ hdfs dfsadmin -report > /tmp/bq
[hdfs@hadoop-master ~]$ cat /tmp/bq 
Configured Capacity: 273287419086 (254.52 GB)
Present Capacity: 209643254756 (195.25 GB)
DFS Remaining: 199579415524 (185.87 GB)
DFS Used: 10063839232 (9.37 GB)
DFS Used%: 4.80%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

配置容量:273287419086(254.52 GB)
現(xiàn)有容量:209643254756(195.25 GB)
DFS剩余:199579415524(185.87 GB)
使用的DFS:10063839232(9.37 GB)
使用DFS%:4.80%
在復(fù)制塊下:0
具有損壞副本的塊:0
缺少塊:0
缺少塊(復(fù)制因子1):0
-------------------------------------------------
Live datanodes (3):  實(shí)時(shí)數(shù)據(jù)節(jié)點(diǎn)(3):

Name: 192.168.0.117:50010 (hadoop-node01)
Hostname: hadoop-node01
Rack: /default
Decommission Status : Normal
Configured Capacity: 91095806362 (84.84 GB)
DFS Used: 3354603520 (3.12 GB)
Non DFS Used: 12246245786 (11.41 GB)
DFS Remaining: 69809631564 (65.02 GB)
DFS Used%: 3.68%
DFS Remaining%: 76.63%
Configured Cache Capacity: 4294967296 (4 GB)
Cache Used: 0 (0 B)
Cache Remaining: 4294967296 (4 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 10
Last contact: Mon Jan 21 10:45:24 CST 2019

名稱(chēng):192.168.0.117:50010(hadoop-node01)
主機(jī)名:hadoop-node01
機(jī)架:/默認(rèn)
退役狀態(tài):正常
配置容量:91095806362(84.84 GB)
使用的DFS:3354603520(3.12 GB)
非DFS使用:12246245786(11.41 GB)
剩余DFS:69809631564(65.02 GB)
使用DFS%:3.68%
DFS剩余%:76.63%
配置的緩存容量:4294967296(4 GB)
使用的緩存:0(0 B)
剩余高速緩存:4294967296(4 GB)
使用的緩存%:0.00%
剩余高速緩存%:100.00%
Xceivers:10
最后聯(lián)系人:Mon Jan 21 10:45:24 CST 2019

Name: 192.168.0.118:50010 (hadoop-master)
Hostname: hadoop-master
Rack: /default
Decommission Status : Normal
Configured Capacity: 91095806362 (84.84 GB)
DFS Used: 3354632192 (3.12 GB)
Non DFS Used: 29517959578 (27.49 GB)
DFS Remaining: 52537889100 (48.93 GB)
DFS Used%: 3.68%
DFS Remaining%: 57.67%
Configured Cache Capacity: 4294967296 (4 GB)
Cache Used: 0 (0 B)
Cache Remaining: 4294967296 (4 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 10
Last contact: Mon Jan 21 10:45:24 CST 2019

Name: 192.168.0.121:50010 (hadoop-node02)
Hostname: hadoop-node02
Rack: /default
Decommission Status : Normal
Configured Capacity: 91095806362 (84.84 GB)
DFS Used: 3354603520 (3.12 GB)
Non DFS Used: 4823982490 (4.49 GB)
DFS Remaining: 77231894860 (71.93 GB)
DFS Used%: 3.68%
DFS Remaining%: 84.78%
Configured Cache Capacity: 4294967296 (4 GB)
Cache Used: 0 (0 B)
Cache Remaining: 4294967296 (4 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 10
Last contact: Mon Jan 21 10:45:24 CST 2019

3.使用命令平衡數(shù)據(jù)
[hdfs@hadoop-master ~]$ hdfs balancer
19/01/21 10:49:19 INFO balancer.Balancer: namenodes  = [hdfs://vg-cdh-test]
19/01/21 10:49:19 INFO balancer.Balancer: parameters = Balancer.Parameters [BalancingPolicy.Node, threshold = 10.0, max idle iteration = 5, number of nodes to be excluded = 0, number of nodes to be included = 0, run during upgrade = false]
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved
19/01/21 10:49:21 INFO net.NetworkTopology: Adding a new node: /default/192.168.0.117:50010
19/01/21 10:49:21 INFO net.NetworkTopology: Adding a new node: /default/192.168.0.118:50010
19/01/21 10:49:21 INFO net.NetworkTopology: Adding a new node: /default/192.168.0.121:50010
19/01/21 10:49:21 INFO balancer.Balancer: 0 over-utilized: []
19/01/21 10:49:21 INFO balancer.Balancer: 0 underutilized: []
The cluster is balanced. Exiting...
2019-1-21 10:49:21                0                  0 B                 0 B               -1 B
2019-1-21 10:49:21       Balancing took 2.738 seconds
4.查看數(shù)據(jù)平衡后的數(shù)據(jù)分布情況
[hdfs@hadoop-master ~]$ hdfs dfsadmin -report > /tmp/bh
[hdfs@hadoop-master ~]$ cat /tmp/bh
Configured Capacity: 273287419086 (254.52 GB)
Present Capacity: 209660106924 (195.26 GB)
DFS Remaining: 199596266468 (185.89 GB)
DFS Used: 10063840456 (9.37 GB)
DFS Used%: 4.80%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (3):

Name: 192.168.0.117:50010 (hadoop-node01)
Hostname: hadoop-node01
Rack: /default
Decommission Status : Normal
Configured Capacity: 91095806362 (84.84 GB)
DFS Used: 3354603928 (3.12 GB)
Non DFS Used: 12246663170 (11.41 GB)
DFS Remaining: 69809213772 (65.01 GB)
DFS Used%: 3.68%
DFS Remaining%: 76.63%
Configured Cache Capacity: 4294967296 (4 GB)
Cache Used: 0 (0 B)
Cache Remaining: 4294967296 (4 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 10
Last contact: Mon Jan 21 10:50:10 CST 2019

Name: 192.168.0.118:50010 (hadoop-master)
Hostname: hadoop-master
Rack: /default
Decommission Status : Normal
Configured Capacity: 91095806362 (84.84 GB)
DFS Used: 3354632600 (3.12 GB)
Non DFS Used: 29501419522 (27.48 GB)
DFS Remaining: 52554428748 (48.95 GB)
DFS Used%: 3.68%
DFS Remaining%: 57.69%
Configured Cache Capacity: 4294967296 (4 GB)
Cache Used: 0 (0 B)
Cache Remaining: 4294967296 (4 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 10
Last contact: Mon Jan 21 10:50:10 CST 2019

Name: 192.168.0.121:50010 (hadoop-node02)
Hostname: hadoop-node02
Rack: /default
Decommission Status : Normal
Configured Capacity: 91095806362 (84.84 GB)
DFS Used: 3354603928 (3.12 GB)
Non DFS Used: 4823252994 (4.49 GB)
DFS Remaining: 77232623948 (71.93 GB)
DFS Used%: 3.68%
DFS Remaining%: 84.78%
Configured Cache Capacity: 4294967296 (4 GB)
Cache Used: 0 (0 B)
Cache Remaining: 4294967296 (4 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 10
Last contact: Mon Jan 21 10:50:10 CST 2019
————————————————————————————————————————————————————————————————————————————————
5.對(duì)比數(shù)據(jù)平衡前后的報(bào)告信息
[hdfs@hadoop-master ~]$ diff /tmp/bq /tmp/bh
2,4c2,4
< Present Capacity: 209643254756 (195.25 GB)
< DFS Remaining: 199579415524 (185.87 GB)
< DFS Used: 10063839232 (9.37 GB)
---
> Present Capacity: 209660106924 (195.26 GB)
> DFS Remaining: 199596266468 (185.89 GB)
> DFS Used: 10063840456 (9.37 GB)
19,21c19,21
< DFS Used: 3354603520 (3.12 GB)
< Non DFS Used: 12246245786 (11.41 GB)
< DFS Remaining: 69809631564 (65.02 GB)
---
> DFS Used: 3354603928 (3.12 GB)
> Non DFS Used: 12246663170 (11.41 GB)
> DFS Remaining: 69809213772 (65.01 GB)
30c30
< Last contact: Mon Jan 21 10:45:24 CST 2019
---
> Last contact: Mon Jan 21 10:50:10 CST 2019
38,40c38,40
< DFS Used: 3354632192 (3.12 GB)
< Non DFS Used: 29517959578 (27.49 GB)
< DFS Remaining: 52537889100 (48.93 GB)
---
> DFS Used: 3354632600 (3.12 GB)
> Non DFS Used: 29501419522 (27.48 GB)
> DFS Remaining: 52554428748 (48.95 GB)
42c42
< DFS Remaining%: 57.67%
---
> DFS Remaining%: 57.69%
49c49
< Last contact: Mon Jan 21 10:45:24 CST 2019
---
> Last contact: Mon Jan 21 10:50:10 CST 2019
57,59c57,59
< DFS Used: 3354603520 (3.12 GB)
< Non DFS Used: 4823982490 (4.49 GB)
< DFS Remaining: 77231894860 (71.93 GB)
---
> DFS Used: 3354603928 (3.12 GB)
> Non DFS Used: 4823252994 (4.49 GB)
> DFS Remaining: 77232623948 (71.93 GB)
68c68
< Last contact: Mon Jan 21 10:45:24 CST 2019
---
> Last contact: Mon Jan 21 10:50:10 CST 2019
在生產(chǎn)上實(shí)際的操作如下:
hdfs dfsadmin -fs hdfs://uhadoop-mzwc2w-master2:8020 -setBalancerBandwidth 3145728000
[hadoop@uhadoop-mzwc2w-master1 ~]$ hdfs dfsadmin -fs hdfs://uhadoop-mzwc2w-master2:8020 -setBalancerBandwidth 3145728000
Balancer bandwidth is set to 3145728000
[hadoop@uhadoop-mzwc2w-master1 ~]$ hdfs dfsadmin -fs hdfs://uhadoop-mzwc2w-master1:8020 -setBalancerBandwidth 3145728000
Balancer bandwidth is set to 3145728000
//一定要2個(gè)節(jié)點(diǎn)namenode節(jié)點(diǎn) master1 master2 都要執(zhí)行 

在生產(chǎn)上2019/7/22 星期一 我在master1 和master2 都執(zhí)行balancer 
[hadoop@uhadoop-mzwc2w-master1 majihui0718]$ nohup hdfs balancer > balancer.log & //生產(chǎn)上這樣處理
[hadoop@uhadoop-mzwc2w-master2 majihui0722]$ pwd
/home/hadoop/majihui0722
[hadoop@uhadoop-mzwc2w-master2 majihui0722]$ ll
total 596
-rw-r--r-- 1 hadoop wheel 609794 Jul 22 14:11 balancer.log

我們是1000M的網(wǎng),給300M的帶寬用于datanode的數(shù)據(jù)balance

參考鏈接
https://www.cnblogs.com/qingyunzong/p/8535995.html
參考鏈接
HDFS balance策略詳解 https://www.jianshu.com/p/f7c1cd476601

向AI問(wèn)一下細(xì)節(jié)

免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。

AI