您好,登錄后才能下訂單哦!
小編給大家分享一下如何完全分布式安裝Hadoop,相信大部分人都還不怎么了解,因此分享這篇文章給大家參考一下,希望大家閱讀完這篇文章后大有收獲,下面讓我們一起去了解一下吧!
Hadoop完全分布式模式安裝步驟
Hadoop模式介紹
單機(jī)模式:安裝簡(jiǎn)單,幾乎不用作任何配置,但僅限于調(diào)試用途
偽分布模式:在單節(jié)點(diǎn)上同時(shí)啟動(dòng)namenode、datanode、jobtracker、tasktracker、secondary namenode等5個(gè)進(jìn)程,模擬分布式運(yùn)行的各個(gè)節(jié)點(diǎn)
完全分布式模式:正常的Hadoop集群,由多個(gè)各司其職的節(jié)點(diǎn)構(gòu)成
安裝環(huán)境
操作平臺(tái):vmware2
操作系統(tǒng):oracle linux 5.6
軟件版本:hadoop-0.22.0,jdk-6u18
集群架構(gòu):3 node,master node(gc),slave node(rac1,rac2)
安裝步驟
1. 下載Hadoop和jdk:
如:hadoop-0.22.0
2. 配置hosts文件
所有的節(jié)點(diǎn)(gc,rac1,rac2)都修改/etc/hosts,使彼此之間都能把主機(jī)名解析為ip
[root@gc ~]$ cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
192.168.2.101 rac1.localdomain rac1
192.168.2.102 rac2.localdomain rac2
192.168.2.100 gc.localdomain gc
3. 建立hadoop運(yùn)行賬號(hào)
在所有的節(jié)點(diǎn)創(chuàng)建hadoop運(yùn)行賬號(hào)
[root@gc ~]# groupadd hadoop
[root@gc ~]# useradd -g hadoop grid --注意此處一定要指定分組,不然可能會(huì)不能建立互信
[root@gc ~]# id grid
uid=501(grid) gid=54326(hadoop) groups=54326(hadoop)
[root@gc ~]# passwd grid
Changing password for user grid.
New UNIX password:
BAD PASSWORD: it is too short
Retype new UNIX password:
passwd: all authentication tokens updated successfully.
4. 配置ssh免密碼連入
注意要以hadoop用戶登錄,在hadoop用戶的主目錄下進(jìn)行操作。
每個(gè)節(jié)點(diǎn)做下面相同的操作
[hadoop@gc ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
54:80:fd:77:6b:87:97:ce:0f:32:34:43:d1:d2:c2:0d hadoop@gc.localdomain
[hadoop@gc ~]$ cd .ssh
[hadoop@gc .ssh]$ ls
id_rsa id_rsa.pub
把各個(gè)節(jié)點(diǎn)的authorized_keys的內(nèi)容互相拷貝加入到對(duì)方的此文件中,然后就可以免密碼彼此ssh連入。
在其中一節(jié)點(diǎn)(gc)節(jié)點(diǎn)就可完成操作
[hadoop@gc .ssh]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@gc .ssh]$ ssh rac1 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
The authenticity of host 'rac1 (192.168.2.101)' can't be established.
RSA key fingerprint is 19:48:e0:0a:37:e1:2a:d5:ba:c8:7e:1b:37:c6:2f:0e.
Are you sure you want to continue connecting (yes/no) yes
Warning: Permanently added 'rac1,192.168.2.101' (RSA) to the list of known hosts.
hadoop@rac1's password:
[hadoop@gc .ssh]$ ssh rac2 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
The authenticity of host 'rac2 (192.168.2.102)' can't be established.
RSA key fingerprint is 19:48:e0:0a:37:e1:2a:d5:ba:c8:7e:1b:37:c6:2f:0e.
Are you sure you want to continue connecting (yes/no) yes
Warning: Permanently added 'rac2,192.168.2.102' (RSA) to the list of known hosts.
hadoop@rac2's password:
[hadoop@gc .ssh]$ scp ~/.ssh/authorized_keys rac1:~/.ssh/authorized_keys
hadoop@rac1's password:
authorized_keys 100% 1213 1.2KB/s 00:00
[hadoop@gc .ssh]$ scp ~/.ssh/authorized_keys rac2:~/.ssh/authorized_keys
hadoop@rac2's password:
authorized_keys 100% 1213 1.2KB/s 00:00
[hadoop@gc .ssh]$ ll
總計(jì) 16
-rw-rw-r-- 1 hadoop hadoop 1213 10-30 09:18 authorized_keys
-rw------- 1 hadoop hadoop 1675 10-30 09:05 id_rsa
-rw-r--r-- 1 hadoop hadoop 403 10-30 09:05 id_rsa.pub
--分別測(cè)試連接
[grid@gc .ssh]$ ssh rac1 date
2012年 11月 18日星期日 01:35:39 CST
[grid@gc .ssh]$ ssh rac2 date
2012年 10月 30日星期二 09:52:46 CST
--可以看到這步和配置oracle RAC中使用 SSH 建立用戶等效性步驟是一樣的。
5. 解壓hadoop安裝包
--可先一某節(jié)點(diǎn)解壓配置文件
[grid@gc ~]$ ll
總計(jì) 43580
-rw-r--r-- 1 grid hadoop 44575568 2012-11-19 hadoop-0.20.2.tar.gz
[grid@gc ~]$ tar xzvf /home/grid/hadoop-0.20.2.tar.gz
[grid@gc ~]$ ll
總計(jì) 43584
drwxr-xr-x 12 grid hadoop 4096 2010-02-19 hadoop-0.20.2
-rw-r--r-- 1 grid hadoop 44575568 2012-11-19 hadoop-0.20.2.tar.gz
--在各節(jié)點(diǎn)安裝jdk
[root@gc ~]# ./jdk-6u18-linux-x64-rpm.bin
6. Hadoop配置有關(guān)文件
n 配置hadoop-env.sh
[root@gc conf]# pwd
/root/hadoop-0.20.2/conf
--修改jdk安裝路徑
[root@gc conf]vi hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.6.0_18
n 配置namenode,修改site文件
--修改core-site.xml文件
[gird@gc conf]# vi core-site.xml
< xml version="1.0" >
< xml-stylesheet type="text/xsl" href="configuration.xsl" >
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.2.100:9000</value> --注意完全分布模式此地一定要用IP,下同
</property>
</configuration>
注:fs.default.name NameNode的IP地址和端口
--修改hdfs-site.xml文件
[gird@gc conf]# vi hdfs-site.xml
< xml version="1.0" >
< xml-stylesheet type="text/xsl" href="configuration.xsl" >
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/home/grid/hadoop-0.20.2/data</value> --注意此目錄必需已經(jīng)創(chuàng)建并能讀寫
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
hdfs-site.xml文件中常用配置參數(shù):
--修改mapred-site.xml文件
[gird@gc conf]# vi mapred-site.xml
< xml version="1.0" >
< xml-stylesheet type="text/xsl" href="configuration.xsl" >
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>192.168.2.100:9001</value>
</property>
</configuration>
mapred-site.xml文件中常用配置參數(shù)
n 配置masters和slaves文件
[grid@gc conf]$ vi masters
gc
[grid@gc conf]$ vi slaves
rac1
rac2
n 向各節(jié)點(diǎn)復(fù)制hadoop
--把gc主機(jī)上面hadoop配置好文件分別copy到各節(jié)點(diǎn)
--注意:復(fù)制到其它的節(jié)點(diǎn)后配置文件中要修改為此節(jié)點(diǎn)的IP
[grid@gc conf]$ scp -r hadoop-0.20.2 rac1:/home/grid/
[grid@gc conf]$ scp -r hadoop-0.20.2 rac2:/home/grid/
7. 格式化namenode
--分別在各節(jié)點(diǎn)進(jìn)行格式化
[grid@rac2 bin]$ pwd
/home/grid/hadoop-0.20.2/bin
[grid@gc bin]$ ./hadoop namenode –format
12/10/31 08:03:31 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = gc.localdomain/192.168.2.100
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = ; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
12/10/31 08:03:31 INFO namenode.FSNamesystem: fsOwner=grid,hadoop
12/10/31 08:03:31 INFO namenode.FSNamesystem: supergroup=supergroup
12/10/31 08:03:31 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/10/31 08:03:32 INFO common.Storage: Image file of size 94 saved in 0 seconds.
12/10/31 08:03:32 INFO common.Storage: Storage directory /tmp/hadoop-grid/dfs/name has been successfully formatted.
12/10/31 08:03:32 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at gc.localdomain/192.168.2.100
************************************************************/
8. 啟動(dòng)hadoop
--在master節(jié)點(diǎn)啟動(dòng)hadoop守護(hù)進(jìn)程
[grid@gc bin]$ pwd
/home/grid/hadoop-0.20.2/bin
[grid@gc bin]$ ./start-all.sh
starting namenode, logging to /home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-namenode-gc.localdomain.out
rac2: starting datanode, logging to /home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-datanode-rac2.localdomain.out
rac1: starting datanode, logging to /home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-datanode-rac1.localdomain.out
The authenticity of host 'gc (192.168.2.100)' can't be established.
RSA key fingerprint is 8e:47:42:44:bd:e2:28:64:10:40:8e:b5:72:f9:6c:82.
Are you sure you want to continue connecting (yes/no) yes
gc: Warning: Permanently added 'gc,192.168.2.100' (RSA) to the list of known hosts.
gc: starting secondarynamenode, logging to /home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-secondarynamenode-gc.localdomain.out
starting jobtracker, logging to /home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-jobtracker-gc.localdomain.out
rac2: starting tasktracker, logging to /home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-tasktracker-rac2.localdomain.out
rac1: starting tasktracker, logging to /home/grid/hadoop-0.20.2/bin/../logs/hadoop-grid-tasktracker-rac1.localdomain.out
9. 用jps檢驗(yàn)各后臺(tái)進(jìn)程是否成功啟動(dòng)
--在master節(jié)點(diǎn)查看后臺(tái)進(jìn)程
[grid@gc bin]$ /usr/java/jdk1.6.0_18/bin/jps
27462 NameNode
29012 Jps
27672 JobTracker
27607 SecondaryNameNode
--在slave節(jié)點(diǎn)查看后臺(tái)進(jìn)程
[grid@rac1 conf]$ /usr/java/jdk1.6.0_18/bin/jps
16722 Jps
16672 TaskTracker
16577 DataNode
[grid@rac2 conf]$ /usr/java/jdk1.6.0_18/bin/jps
31451 DataNode
31547 TaskTracker
31608 Jps
10. 安裝過(guò)程中遇到的問(wèn)題
1) Ssh不能建立互信
建用戶時(shí)不指定分組,Ssh不能建立互信,如下的步驟
[root@gc ~]# useradd grid
[root@gc ~]# passwd grid
解決:
創(chuàng)建新的用戶組,創(chuàng)建用戶時(shí)并指定此用戶組。
[root@gc ~]# groupadd hadoop
[root@gc ~]# useradd -g hadoop grid
[root@gc ~]# id grid
uid=501(grid) gid=54326(hadoop) groups=54326(hadoop)
[root@gc ~]# passwd grid
2) 啟動(dòng)hadoop后,slave節(jié)點(diǎn)沒(méi)有datanode進(jìn)程
現(xiàn)象:
在master節(jié)點(diǎn)啟動(dòng)hadoop后,master節(jié)點(diǎn)進(jìn)程正常,但slave節(jié)點(diǎn)沒(méi)有datanode進(jìn)程。
--Master節(jié)點(diǎn)正常
[grid@gc bin]$ /usr/java/jdk1.6.0_18/bin/jps
29843 Jps
29703 JobTracker
29634 SecondaryNameNode
29485 NameNode
--此時(shí)再在兩slave節(jié)點(diǎn)查看進(jìn)程,發(fā)現(xiàn)還是沒(méi)有datanode進(jìn)程
[grid@rac1 bin]$ /usr/java/jdk1.6.0_18/bin/jps
5528 Jps
3213 TaskTracker
[grid@rac2 bin]$ /usr/java/jdk1.6.0_18/bin/jps
30518 TaskTracker
30623 Jps
原因:
--回頭查看在master節(jié)點(diǎn)啟動(dòng)hadoop時(shí)的輸出日志,在slave節(jié)點(diǎn)找到啟動(dòng)datanode進(jìn)程的日志
[grid@rac2 logs]$ pwd
/home/grid/hadoop-0.20.2/logs
[grid@rac1 logs]$ more hadoop-grid-datanode-rac1.localdomain.log
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = rac1.localdomain/192.168.2.101
STARTUP_MSG: args = []
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = ; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
2012-11-18 07:43:33,513 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Invalid directory in dfs.data.dir: can not create directory: /usr/hadoop-0.20.2/data
2012-11-18 07:43:33,513 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: All directories in dfs.data.dir are invalid.
2012-11-18 07:43:33,571 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at rac1.localdomain/192.168.2.101
************************************************************/
--發(fā)現(xiàn)是hdfs-site.xml配置文件的目錄data目錄沒(méi)有創(chuàng)建
解決:
在各節(jié)點(diǎn)創(chuàng)建hdfs的data目錄,并修改hdfs-site.xml配置文件參數(shù)
[grid@gc ~]# mkdir -p /home/grid/hadoop-0.20.2/data
[grid@gc conf]# vi hdfs-site.xml
< xml version="1.0" >
< xml-stylesheet type="text/xsl" href="configuration.xsl" >
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/home/grid/hadoop-0.20.2/data</value> --注意此目錄必需已經(jīng)創(chuàng)建并能讀寫
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
--重新啟動(dòng)hadoop,slave進(jìn)程正常
[grid@gc bin]$ ./stop-all.sh
[grid@gc bin]$ ./start-all.sh
以上是“如何完全分布式安裝Hadoop”這篇文章的所有內(nèi)容,感謝各位的閱讀!相信大家都有了一定的了解,希望分享的內(nèi)容對(duì)大家有所幫助,如果還想學(xué)習(xí)更多知識(shí),歡迎關(guān)注億速云行業(yè)資訊頻道!
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。