您好,登錄后才能下訂單哦!
第一步:安裝Hadoop集群
1、搭建環(huán)境所需介質(zhì)準(zhǔn)備
Enterprise-R5-U4-Server-x86_64-dvd.iso
hadoop-1.1.1.tar.gz
jdk-6u26-linux-x64-rpm.bin
2、創(chuàng)建5個(gè)節(jié)點(diǎn)的虛擬機(jī)
192.168.0.202 hd202 #NameNode
192.168.0.203 hd203 #SecondaryNameNode
192.168.0.204 hd204 #DataNode
192.168.0.205 hd205 #DataNode
192.168.0.206 hd206 #DataNode
虛擬機(jī)安裝過(guò)程中,需要將sshd服務(wù)安裝上。如果磁盤(pán)空間允許的話(huà),盡可能的將系統(tǒng)包安裝齊全了。
3、在五個(gè)節(jié)點(diǎn)的虛擬機(jī)中都安裝Jdk(以root用戶(hù)安裝)
[root@hd202 ~]# mkdir /usr/java
[root@hd202 ~]# mv jdk-6u26-linux-x64-rpm.bin /usr/java
[root@hd202 ~]# cd /usr/java
[root@hd202 java]# chmod 744 jdk-6u26-linux-x64-rpm.bin
[root@hd202 java]# ./jdk-6u26-linux-x64-rpm.bin
[root@hd202 java]# ln -s jdk1.6.0_26 default
4、創(chuàng)建hadoop管理用戶(hù)(5臺(tái)虛擬機(jī)中都要?jiǎng)?chuàng)建用戶(hù))
[root@hd202 ~]# useradd cbcloud #在沒(méi)有先創(chuàng)建用戶(hù)組的情況下,直接新增用戶(hù),用戶(hù)默認(rèn)所屬的組和用戶(hù)名相同。即cbcloud.cbcloud
[root@hd202 ~]# passwd cbcloud #修改用戶(hù)cbcloud的密碼,測(cè)試環(huán)境可設(shè)置為111111
5、編輯/etc/hosts文件(使用root用戶(hù)分別在五臺(tái)虛擬機(jī)上都編輯)
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
192.168.0.202 hd202
192.168.0.203 hd203
192.168.0.204 hd204
192.168.0.205 hd205
192.168.0.206 hd206
6、編輯/etc/sysconfig/network文件(使用root用戶(hù)分別在五臺(tái)虛擬機(jī)上都編輯)
NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=hd202 #主機(jī)名(192.168.0.203上應(yīng)該改為hd203,以此類(lèi)推,五臺(tái)機(jī)器都要修改為相應(yīng)的名稱(chēng))
GATEWAY=192.168.0.1
7、在五臺(tái)機(jī)器之間配置用戶(hù)等價(jià)性(以前面創(chuàng)建的用戶(hù)cbcloud登陸進(jìn)行操作)
[cbcloud@hd202 ~]$ mkdir .ssh
[cbcloud@hd202 ~]$ chmod 700 .ssh
[cbcloud@hd202 ~]$ ssh-keygen -t rsa
[cbcloud@hd202 ~]$ ssh-keygen -t dsa
[cbcloud@hd202 ~]$ cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
[cbcloud@hd202 ~]$ cat ~/.ssh/id_dsa.pub > ~/.ssh/authorized_keys
[cbcloud@hd203 ~]$ mkdir .ssh
[cbcloud@hd203 ~]$ chmod 700 .ssh
[cbcloud@hd203 ~]$ ssh-keygen -t rsa
[cbcloud@hd203 ~]$ ssh-keygen -t dsa
[cbcloud@hd203 ~]$ cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
[cbcloud@hd203 ~]$ cat ~/.ssh/id_dsa.pub > ~/.ssh/authorized_keys
[cbcloud@hd204 ~]$ mkdir .ssh
[cbcloud@hd204 ~]$ chmod 700 .ssh
[cbcloud@hd204 ~]$ ssh-keygen -t rsa
[cbcloud@hd204 ~]$ ssh-keygen -t dsa
[cbcloud@hd204 ~]$ cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
[cbcloud@hd204 ~]$ cat ~/.ssh/id_dsa.pub > ~/.ssh/authorized_keys
[cbcloud@hd205 ~]$ mkdir .ssh
[cbcloud@hd205 ~]$ chmod 700 .ssh
[cbcloud@hd205 ~]$ ssh-keygen -t rsa
[cbcloud@hd205 ~]$ ssh-keygen -t dsa
[cbcloud@hd205 ~]$ cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
[cbcloud@hd205 ~]$ cat ~/.ssh/id_dsa.pub > ~/.ssh/authorized_keys
[cbcloud@hd206 ~]$ mkdir .ssh
[cbcloud@hd206 ~]$ chmod 700 .ssh
[cbcloud@hd206 ~]$ ssh-keygen -t rsa
[cbcloud@hd206 ~]$ ssh-keygen -t dsa
[cbcloud@hd206 ~]$ cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
[cbcloud@hd206 ~]$ cat ~/.ssh/id_dsa.pub > ~/.ssh/authorized_keys
[cbcloud@hd202 ~]$ cd .ssh
[cbcloud@hd202 .ssh]$ scp authorized_keys cbcloud@hd203:/home/cbcloud/.ssh/authorized_keys2 #將hd202機(jī)器上的authorized_keys文件遠(yuǎn)程復(fù)制到hd203上的/home/cbcloud/.ssh/目錄下,并重命名為authorized_keys2
[cbcloud@hd203 ~]$ cd .ssh
[cbcloud@hd203 ~]$ cat authorized_keys2 > authorized_keys #也就是將hd202上的authorized_keys中的內(nèi)容合并到hd203機(jī)器上的authorized_keys文件中。
然后再將合并后的authorized_keys文件復(fù)制到hd204上,與204上的authorized_keys文件合并,依次類(lèi)推,最后將5個(gè)節(jié)點(diǎn)的authorized_keys文件的內(nèi)容都合并在一起以后,再將包含有五個(gè)節(jié)點(diǎn)密鑰內(nèi)容的authorized_keys文件,覆蓋到其余4個(gè)節(jié)點(diǎn)上。
注意:authorized_keys文件的權(quán)限必須為644,否則用戶(hù)等價(jià)性會(huì)失效。
在五個(gè)節(jié)點(diǎn)上都執(zhí)行以下命令:
[cbcloud@hd202 ~]$ cd .ssh
[cbcloud@hd202 ~]$ chmod 644 authorized_keys
8、開(kāi)始安裝hadoop集群
8.1 建立目錄 (在五臺(tái)虛擬機(jī)上都執(zhí)行以下命令_使用root用戶(hù))
[root@hd202 ~]# mkdir /home/cbcloud/hdtmp
[root@hd202 ~]# mkdir /home/cbcloud/hddata
[root@hd202 ~]# mkdir /home/cbcloud/hdconf
[root@hd202 ~]# chown -R cbcloud:cbcloud /home/cbcloud/hdtmp
[root@hd202 ~]# chown -R cbcloud:cbcloud /home/cbcloud/hddata
[root@hd202 ~]# chown -R cbcloud:cbcloud /home/cbcloud/hdconf
[root@hd202 ~]# chmod -R 755 /home/cbcloud/hddata #切記,hddata是用于DataNode節(jié)點(diǎn)存放數(shù)據(jù)用的,hadoop嚴(yán)格歸定,這個(gè)目錄的權(quán)限必須為755。如果不是這個(gè)權(quán)限值,則在后面啟動(dòng)DataNode時(shí),將會(huì)因?yàn)闄?quán)限不對(duì),而不能成功啟動(dòng)DataNode節(jié)點(diǎn)。
8.2 解壓hadoop-1.1.1.tar.gz到/home/cbcloud目錄下(只需要在hd202一臺(tái)機(jī)器上執(zhí)行即可)
[root@hd202 ~]# mv hadoop-1.1.1.tar.gz /home/cbcloud
[root@hd202 ~]# cd /home/cbcloud
[root@hd202 cbcloud]# tar -xzvf hadoop-1.1.1.tar.gz
[root@hd202 cbcloud]# mv hadoop-1.1.1 hadoop
[root@hd202 cbcloud]# chown -R cbcloud.cbcloud hadoop/
8.3 配置系統(tǒng)環(huán)境變量/etc/profile(在五臺(tái)虛擬機(jī)上都執(zhí)行_使用root用戶(hù))
[root@hd202 ~]# vi /etc/profile
在文件尾部加入以下內(nèi)容
export JAVA_HOME=/usr/java/default
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=$JAVA_HOME/bin:$JAVA_HOME/lib:$JAVA_HOME/jre/bin:$PATH:$HOME/bin
export HADOOP_HOME=/home/cbcloud/hadoop
export HADOOP_DEV_HOME=/home/cbcloud/hadoop
export HADOOP_COMMON_HOME=/home/cbcloud/hadoop
export HADOOP_HDFS_HOME=/home/cbcloud/hadoop
export HADOOP_CONF_DIR=/home/cbcloud/hdconf
export HADOOP_HOME_WARN_SUPPRESS=1
export PATH=$PATH:$HADOOP_HOME/bin
export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib
8.4 配置用戶(hù)環(huán)境變量
[cbcloud@hd202 ~]$ vi .bash_profile
在文件尾部加入以下內(nèi)容
export JAVA_HOME=/usr/java/default
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=$JAVA_HOME/bin:$JAVA_HOME/lib:$JAVA_HOME/jre/bin:$PATH:$HOME/bin
export HADOOP_HOME=/home/cbcloud/hadoop
export HADOOP_DEV_HOME=/home/cbcloud/hadoop
export HADOOP_COMMON_HOME=/home/cbcloud/hadoop
export HADOOP_HDFS_HOME=/home/cbcloud/hadoop
export HADOOP_CONF_DIR=/home/cbcloud/hdconf
export HADOOP_HOME_WARN_SUPPRESS=1
export PATH=$PATH:$HADOOP_HOME/bin
export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib
8.5 修改hadoop配置文件(使用cbcloud用戶(hù)操作,并且只需要在hd202一臺(tái)機(jī)器上操作)
[cbcloud@hd202 ~]$ cp $HADOOP_HOME/conf/* $HADOOP_CONF_DIR/*
#從上一步的環(huán)境變量紅色那一行可以看到,目前hadoop使用的配置文件應(yīng)該位于/home/cbcloud/hdconf目錄中,所以需要將/home/cbcloud/hadoop/conf目錄下的所有配置文件都復(fù)制一份到/home/cbcloud/hdconf目錄下。
8.5.1 編輯core-site.xml配置文件
[cbcloud@hd202 ~]$ cd /home/cbcloud/hdconf
[cbcloud@hd202 hdconf]$ vi core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hd202:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/cbcloud/hdtmp</value>
</property>
</configuration>
8.5.2 編輯hdfs-site.xml
[cbcloud@hd202 hdconf]$ vi hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/home/cbcloud/hddata</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
8.5.3 編輯mapred-site.xml
[cbcloud@hd202 hdconf]$ vi mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hd202:9001</value>
</property>
</configuration>
8.5.4 編輯masters
[cbcloud@hd202 hdconf]$ vi masters
加入以下內(nèi)容
hd203 # 因?yàn)閔d203為SecondaryNameNode,所以在此只需要配置hd203即可,不需要配置hd202
8.5.5 編輯slaves
[cbcloud@hd202 hdconf]$ vi slaves
加入以下內(nèi)容
hd204
hd205
hd206
8.6 復(fù)制/home/cbcloud/hadoop目錄和/home/cbcloud/hdconf目錄到其他四臺(tái)虛擬機(jī)上
[cbcloud@hd202 hdconf]$ scp -r /home/cbcloud/hadoop hd203:/home/cbcloud #由于前面配置了用戶(hù)等價(jià)性,因此這條命令執(zhí)行時(shí)不再需要密碼
[cbcloud@hd202 hdconf]$ scp -r /home/cbcloud/hadoop hd204:/home/cbcloud
[cbcloud@hd202 hdconf]$ scp -r /home/cbcloud/hadoop hd205:/home/cbcloud
[cbcloud@hd202 hdconf]$ scp -r /home/cbcloud/hadoop hd206:/home/cbcloud
[cbcloud@hd202 hdconf]$ scp -r /home/cbcloud/hdconf hd203:/home/cbcloud
[cbcloud@hd202 hdconf]$ scp -r /home/cbcloud/hdconf hd204:/home/cbcloud
[cbcloud@hd202 hdconf]$ scp -r /home/cbcloud/hdconf hd205:/home/cbcloud
[cbcloud@hd202 hdconf]$ scp -r /home/cbcloud/hdconf hd206:/home/cbcloud
8.7 在NameNode(hd202)上執(zhí)行命令格式化命令空間
[cbcloud@hd202 ~]$ cd $HADOOP_HOME/bin
[cbcloud@hd202 bin]$ hadoop namenode -format
如果控制臺(tái)打印的信息中沒(méi)有ERROR之灰的信息,表示格式化命名空間命令就執(zhí)行成功了。
8.8 啟動(dòng)hadoop
[cbcloud@hd202 ~]$ cd $HADOOP_HOME/bin
[cbcloud@hd202 bin]$ ./start-dfs.sh
starting namenode, logging to /home/cbcloud/hadoop/libexec/../logs/hadoop-cbcloud-namenode-hd202.out
hd204: starting datanode, logging to /home/cbcloud/hadoop/libexec/../logs/hadoop-cbcloud-datanode-hd204.out
hd205: starting datanode, logging to /home/cbcloud/hadoop/libexec/../logs/hadoop-cbcloud-datanode-hd205.out
hd206: starting datanode, logging to /home/cbcloud/hadoop/libexec/../logs/hadoop-cbcloud-datanode-hd206.out
hd203: starting secondarynamenode, logging to /home/cbcloud/hadoop/libexec/../logs/hadoop-cbcloud-secondarynamenode-hd203.out
8.9 啟動(dòng)mapred
[cbcloud@hd202 bin]$ ./start-mapred.sh
starting jobtracker, logging to /home/cbcloud/hadoop/libexec/../logs/hadoop-cbcloud-jobtracker-hd202.out
hd204: starting tasktracker, logging to /home/cbcloud/hadoop/libexec/../logs/hadoop-cbcloud-tasktracker-hd204.out
hd205: starting tasktracker, logging to /home/cbcloud/hadoop/libexec/../logs/hadoop-cbcloud-tasktracker-hd205.out
hd206: starting tasktracker, logging to /home/cbcloud/hadoop/libexec/../logs/hadoop-cbcloud-tasktracker-hd206.out
8.10 查看進(jìn)程
[cbcloud@hd202 bin]$ jps
4335 JobTracker
4460 Jps
4153 NameNode
[cbcloud@hd203 hdconf]$ jps
1142 Jps
1078 SecondaryNameNode
[cbcloud@hd204 hdconf]$ jps
1783 Jps
1575 DataNode
1706 TaskTracker
[cbcloud@hd205 hdconf]$ jps
1669 Jps
1461 DataNode
1590 TaskTracker
[cbcloud@hd206 hdconf]$ jps
1494 DataNode
1614 TaskTracker
1694 Jps
8.11 查看集群狀態(tài)
[cbcloud@hd202 bin]$ hadoop dfsadmin -report
Configured Capacity: 27702829056 (25.8 GB)
Present Capacity: 13044953088 (12.15 GB)
DFS Remaining: 13044830208 (12.15 GB)
DFS Used: 122880 (120 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 3 (3 total, 0 dead)
Name: 192.168.0.205:50010
Decommission Status : Normal
Configured Capacity: 9234276352 (8.6 GB)
DFS Used: 40960 (40 KB)
Non DFS Used: 4885942272 (4.55 GB)
DFS Remaining: 4348293120(4.05 GB)
DFS Used%: 0%
DFS Remaining%: 47.09%
Last contact: Wed Jan 30 18:02:17 CST 2013
Name: 192.168.0.206:50010
Decommission Status : Normal
Configured Capacity: 9234276352 (8.6 GB)
DFS Used: 40960 (40 KB)
Non DFS Used: 4885946368 (4.55 GB)
DFS Remaining: 4348289024(4.05 GB)
DFS Used%: 0%
DFS Remaining%: 47.09%
Last contact: Wed Jan 30 18:02:17 CST 2013
Name: 192.168.0.204:50010
Decommission Status : Normal
Configured Capacity: 9234276352 (8.6 GB)
DFS Used: 40960 (40 KB)
Non DFS Used: 4885987328 (4.55 GB)
DFS Remaining: 4348248064(4.05 GB)
DFS Used%: 0%
DFS Remaining%: 47.09%
Last contact: Wed Jan 30 18:02:17 CST 2013
注意:如果報(bào)錯(cuò)“INFO ipc.Client: Retrying connect to server”,是因?yàn)閏ore-site.xml失效的原因。停止,重啟hadoop后,格式化namenode即可。
另外,每次啟動(dòng)VM都要關(guān)閉防火墻。
8.12 通過(guò)WEB瀏覽器查看Hadoop運(yùn)行情況
http://192.168.1.202:50070 查看Hadoop運(yùn)行情況
8.13 通過(guò)WEB瀏覽器查看Job運(yùn)行情況
http://192.168.0.202:50030 查看Job執(zhí)行情況
9、列出HDFS文件系統(tǒng)中存在的目錄情況
[cbcloud@hd202 logs]$ hadoop dfs -ls
ls: Cannot access .: No such file or directory.
上面的錯(cuò)誤是因?yàn)楸辉L(fǎng)問(wèn)目錄為空所致。
可以改為執(zhí)行hadoop fs -ls /
[cbcloud@hd202 logs]$ hadoop fs -ls /
Found 1 items
drwxr-xr-x - cbcloud supergroup 0 2013-01-30 15:52 /home
可以看到有一條空結(jié)果
執(zhí)行hadoop fs -mkdir hello #hello為文件夾的名字
[cbcloud@hd202 logs]$ hadoop fs -mkdir hello
[cbcloud@hd202 logs]$ hadoop fs -ls
Found 1 items
drwxr-xr-x - cbcloud supergroup 0 2013-01-30 21:16 /user/cbcloud/hello
10、HDFS使用測(cè)試
[cbcloud@hd202 logs]$ hadoop dfs -rmr hello
Deleted hdfs://hd202:9000/user/cbcloud/hello #刪除前面創(chuàng)建的文件夾
[cbcloud@hd202 logs]$ hadoop dfs -mkdir input
[cbcloud@hd202 logs]$ hadoop dfs -ls
Found 1 items
drwxr-xr-x - cbcloud supergroup 0 2013-01-30 21:18 /user/cbcloud/input
11、運(yùn)行Hadoop自帶框架的wordcount示例
11.1、建立數(shù)據(jù)文件
在主機(jī)192.168.0.202虛擬機(jī)中建立兩個(gè)文件input1和input2
[cbcloud@hd202 hadoop]$ echo "Hello Hadoop in input1" > input1
[cbcloud@hd202 hadoop]$ echo "Hello Hadoop in input2" > input2
11.2、發(fā)布數(shù)據(jù)文件至Hadoop集群上
1、在HDFS中建立一個(gè)input目錄
[cbcloud@hd202 hadoop]$ hadoop dfs -mkdir input
2、將文件input1和input2拷貝到HDFS的input目錄下
[cbcloud@hd202 hadoop]$ hadoop dfs -copyFromLocal /home/cbcloud/hadoop/input* input
3、查看input目錄下有沒(méi)有復(fù)制成功
[cbcloud@hd202 hadoop]$ hadoop dfs -ls input
Found 2 items
-rw-r--r-- 3 cbcloud supergroup 23 2013-01-30 21:28 /user/cbcloud/input/input1
-rw-r--r-- 3 cbcloud supergroup 23 2013-01-30 21:28 /user/cbcloud/input/input2
11.3、執(zhí)行wordcount程序 #確保HDFS上沒(méi)有output目錄,查看結(jié)果
[cbcloud@hd202 hadoop]$ hadoop jar hadoop-examples-1.1.1.jar wordcount input output
13/01/30 21:33:05 INFO input.FileInputFormat: Total input paths to process : 2
13/01/30 21:33:05 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/01/30 21:33:05 WARN snappy.LoadSnappy: Snappy native library not loaded
13/01/30 21:33:07 INFO mapred.JobClient: Running job: job_201301302110_0001
13/01/30 21:33:08 INFO mapred.JobClient: map 0% reduce 0%
13/01/30 21:33:32 INFO mapred.JobClient: map 50% reduce 0%
13/01/30 21:33:33 INFO mapred.JobClient: map 100% reduce 0%
13/01/30 21:33:46 INFO mapred.JobClient: map 100% reduce 100%
13/01/30 21:33:53 INFO mapred.JobClient: Job complete: job_201301302110_0001
13/01/30 21:33:53 INFO mapred.JobClient: Counters: 29
13/01/30 21:33:53 INFO mapred.JobClient: Job Counters
13/01/30 21:33:53 INFO mapred.JobClient: Launched reduce tasks=1
13/01/30 21:33:53 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=29766
13/01/30 21:33:53 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/01/30 21:33:53 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/01/30 21:33:53 INFO mapred.JobClient: Launched map tasks=2
13/01/30 21:33:53 INFO mapred.JobClient: Data-local map tasks=2
13/01/30 21:33:53 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=13784
13/01/30 21:33:53 INFO mapred.JobClient: File Output Format Counters
13/01/30 21:33:53 INFO mapred.JobClient: Bytes Written=40
13/01/30 21:33:53 INFO mapred.JobClient: FileSystemCounters
13/01/30 21:33:53 INFO mapred.JobClient: FILE_BYTES_READ=100
13/01/30 21:33:53 INFO mapred.JobClient: HDFS_BYTES_READ=262
13/01/30 21:33:53 INFO mapred.JobClient: FILE_BYTES_WRITTEN=71911
13/01/30 21:33:53 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=40
13/01/30 21:33:53 INFO mapred.JobClient: File Input Format Counters
13/01/30 21:33:53 INFO mapred.JobClient: Bytes Read=46
13/01/30 21:33:53 INFO mapred.JobClient: Map-Reduce Framework
13/01/30 21:33:53 INFO mapred.JobClient: Map output materialized bytes=106
13/01/30 21:33:53 INFO mapred.JobClient: Map input records=2
13/01/30 21:33:53 INFO mapred.JobClient: Reduce shuffle bytes=106
13/01/30 21:33:53 INFO mapred.JobClient: Spilled Records=16
13/01/30 21:33:53 INFO mapred.JobClient: Map output bytes=78
13/01/30 21:33:53 INFO mapred.JobClient: CPU time spent (ms)=5500
13/01/30 21:33:53 INFO mapred.JobClient: Total committed heap usage (bytes)=336928768
13/01/30 21:33:53 INFO mapred.JobClient: Combine input records=8
13/01/30 21:33:53 INFO mapred.JobClient: SPLIT_RAW_BYTES=216
13/01/30 21:33:53 INFO mapred.JobClient: Reduce input records=8
13/01/30 21:33:53 INFO mapred.JobClient: Reduce input groups=5
13/01/30 21:33:53 INFO mapred.JobClient: Combine output records=8
13/01/30 21:33:53 INFO mapred.JobClient: Physical memory (bytes) snapshot=417046528
13/01/30 21:33:53 INFO mapred.JobClient: Reduce output records=5
13/01/30 21:33:53 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1612316672
13/01/30 21:33:53 INFO mapred.JobClient: Map output records=8
[cbcloud@hd202 hadoop]$ hadoop dfs -ls output
Found 2 items
-rw-r--r-- 3 cbcloud supergroup 0 2013-01-30 21:33 /user/cbcloud/output/_SUCCESS
-rw-r--r-- 3 cbcloud supergroup 40 2013-01-30 21:33 /user/cbcloud/output/part-r-00000
[cbcloud@hd202 hadoop]$ hadoop dfs -cat output/part-r-00000
Hadoop 2
Hello 2
in 2
input1 1
input2 1
第二步:搭建Zookeeper集群環(huán)境
上一篇關(guān)于Hadoop1.1.1集群安裝記錄中已經(jīng)詳細(xì)記錄了在Oracle Linux 5.4 64bit上搭建Hadoop集群的方法?,F(xiàn)在接著上一篇的內(nèi)容,進(jìn)一步安裝Zookeeper和HBASE
1、安裝zookeeper (在hd202上安裝)
1.1、準(zhǔn)備安裝介質(zhì)zookeeper-3.4.5.tar.gz
1.2、使用cbcloud用戶(hù)將介質(zhì)上傳到hd202虛擬機(jī)上的/home/cbcloud/目錄下面
1.3、解壓縮zookeeper-3.4.5.tar.gz
[cbcloud@hd202 ~]$ tar zxvf zookeeper-3.4.5.tar.gz
1.4、在hd204、hd205、hd206三臺(tái)機(jī)器上創(chuàng)建目錄
[cbcloud@hd204 ~]$ mkdir /home/cbcloud/zookeeperdata
[cbcloud@hd205 ~]$ mkdir /home/cbcloud/zookeeperdata
[cbcloud@hd206 ~]$ mkdir /home/cbcloud/zookeeperdata
1.5、在hd202上執(zhí)行以下內(nèi)容
[cbcloud@hd202 ~]$ mv zookeeper-3.4.5 zookeeper
[cbcloud@hd202 ~]$ cd zookeeper/conf
[cbcloud@hd202 ~]$ mv zoo_sample.cfg zoo.cfg
[cbcloud@hd202 ~]$ vi zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/home/cbcloud/zookeeperdata
# the port at which the clients will connect
clientPort=2181
server.1=hd204:2888:3888
server.2=hd205:2888:3888
server.3=hd206:2888:3888
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
1.6、將zookeeper文件夾復(fù)制到hd204、hd205、hd206三臺(tái)虛擬機(jī)上
[cbcloud@hd202 ~]$ scp -r zookeeper hd204:/home/cbcloud/
[cbcloud@hd202 ~]$ scp -r zookeeper hd205:/home/cbcloud/
[cbcloud@hd202 ~]$ scp -r zookeeper hd206:/home/cbcloud/
1.7、在hd204、hd205、hd206三臺(tái)虛擬機(jī)的/home/cbcloud/zookeeperdata目錄下新建一個(gè)myid文件,并依次插入數(shù)字1、2、3
[cbcloud@hd204 ~]$ cd zookeeperdata
[cbcloud@hd204 zookeeperdata]$ touch myid
[cbcloud@hd204 zookeeperdata]$ vi myid
加入以下內(nèi)容
1 #與前面配置文件中的server.1=hd204:2888:3888的編號(hào)相對(duì)應(yīng)
[cbcloud@hd205 ~]$ cd zookeeperdata
[cbcloud@hd205 zookeeperdata]$ touch myid
[cbcloud@hd205 zookeeperdata]$ vi myid
加入以下內(nèi)容
2 #與前面配置文件中的server.2=hd205:2888:3888的編號(hào)相對(duì)應(yīng)
[cbcloud@hd206 ~]$ cd zookeeperdata
[cbcloud@hd206 zookeeperdata]$ touch myid
[cbcloud@hd206 zookeeperdata]$ vi myid
加入以下內(nèi)容
3 #與前面配置文件中的server.3=hd206:2888:3888的編號(hào)相對(duì)應(yīng)
1.8、啟動(dòng)zookeeper,在hd204、hd205、hd206機(jī)器上的/home/cbcloud/zookeeper/bin目錄下執(zhí)行zkServer.sh start
[cbcloud@hd204 ~]$ cd zookeeper
[cbcloud@hd204 zookeeper]$ cd bin
[cbcloud@hd204 bin]$ ./zkServer.sh start
JMX enabled by default
Using config: /home/cbcloud/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[cbcloud@hd205 ~]$ cd zookeeper
[cbcloud@hd205 zookeeper]$ cd bin
[cbcloud@hd205 bin]$ ./zkServer.sh start
JMX enabled by default
Using config: /home/cbcloud/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[cbcloud@hd206 ~]$ cd zookeeper
[cbcloud@hd206 zookeeper]$ cd bin
[cbcloud@hd206 bin]$ ./zkServer.sh start
JMX enabled by default
Using config: /home/cbcloud/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
1.9 查看zookeeper的進(jìn)程狀態(tài)
[cbcloud@hd204 bin]$ ./zkServer.sh status
JMX enabled by default
Using config: /home/cbcloud/zookeeper/bin/../conf/zoo.cfg
Mode: follower #從此模式可以看出,hd204當(dāng)前為跟隨者模式
[cbcloud@hd205 bin]$ ./zkServer.sh status
JMX enabled by default
Using config: /home/cbcloud/zookeeper/bin/../conf/zoo.cfg
Mode: leader #從此模式可以看出,hd204當(dāng)前為領(lǐng)導(dǎo)模式
[cbcloud@hd206 bin]$ ./zkServer.sh status
JMX enabled by default
Using config: /home/cbcloud/zookeeper/bin/../conf/zoo.cfg
Mode: follower #從此模式可以看出,hd206當(dāng)前為跟隨者模式
2、查看 zookeeper的進(jìn)程詳細(xì)狀態(tài)
[cbcloud@hd204 bin]$ echo stat |nc localhost 2181
Zookeeper version: 3.4.5-1392090, built on 09/30/2012 17:52 GMT
Clients:
/127.0.0.1:41205[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 0/0/0
Received: 2
Sent: 1
Connections: 1
Outstanding: 0
Zxid: 0x0
Mode: follower
Node count: 4
[cbcloud@hd205 bin]$ echo stat |nc localhost 2181
Zookeeper version: 3.4.5-1392090, built on 09/30/2012 17:52 GMT
Clients:
/127.0.0.1:38712[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 0/0/0
Received: 2
Sent: 1
Connections: 1
Outstanding: 0
Zxid: 0x100000000
Mode: leader
Node count: 4
[cbcloud@hd206 bin]$ echo stat |nc localhost 2181
Zookeeper version: 3.4.5-1392090, built on 09/30/2012 17:52 GMT
Clients:
/127.0.0.1:39268[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 0/0/0
Received: 2
Sent: 1
Connections: 1
Outstanding: 0
Zxid: 0x100000000
Mode: follower
Node count: 4
第三步:搭建HBase集群
1、準(zhǔn)備安裝介質(zhì) hbase-0.94.4.tar.gz
2、使用用戶(hù)cbcloud將安裝介質(zhì)上傳到hd202虛擬機(jī)上的/home/cbcloud/目錄下
3、使用cbcloud用戶(hù)登陸到hd202虛擬機(jī)上,解壓縮hbase-0.94.4.tar.gz
[cbcloud@hd202 ~]$ tar zxvf hbase-0.94.4.tar.gz
[cbcloud@hd202 ~]$ mv hbase-0.94.4 hbas
4、在五臺(tái)虛擬機(jī)上都創(chuàng)建hbase的配置文件目錄hbconf (使用cbcloud用戶(hù)操作)
[cbcloud@hd202 ~]$ mkdir /home/cbcloud/hbconf
[cbcloud@hd203 ~]$ mkdir /home/cbcloud/hbconf
[cbcloud@hd204 ~]$ mkdir /home/cbcloud/hbconf
[cbcloud@hd205 ~]$ mkdir /home/cbcloud/hbconf
[cbcloud@hd206 ~]$ mkdir /home/cbcloud/hbconf
5、配置系統(tǒng)環(huán)境變量(以root用戶(hù)操作)
[root@hd202 ~]# vi /etc/profile
在文件尾部加入以下內(nèi)容
export HBASE_CONF_DIR=/home/cbcloud/hbconf
export HBASE_HOME=/home/cbcloud/hbase
[root@hd203 ~]# vi /etc/profile
在文件尾部加入以下內(nèi)容
export HBASE_CONF_DIR=/home/cbcloud/hbconf
export HBASE_HOME=/home/cbcloud/hbase
[root@hd204 ~]# vi /etc/profile
在文件尾部加入以下內(nèi)容
export HBASE_CONF_DIR=/home/cbcloud/hbconf
export HBASE_HOME=/home/cbcloud/hbase
[root@hd205 ~]# vi /etc/profile
在文件尾部加入以下內(nèi)容
export HBASE_CONF_DIR=/home/cbcloud/hbconf
export HBASE_HOME=/home/cbcloud/hbase
[root@hd206 ~]# vi /etc/profile
在文件尾部加入以下內(nèi)容
export HBASE_CONF_DIR=/home/cbcloud/hbconf
export HBASE_HOME=/home/cbcloud/hbase
6、配置用戶(hù)環(huán)境變量(以cbcloud用戶(hù)操作)
[cbcloud@hd202 ~]$ vi .bash_profile
在文件尾部加入以下內(nèi)容
export HBASE_CONF_DIR=/home/cbcloud/hbconf
export HBASE_HOME=/home/cbcloud/hbase
[cbcloud@hd203 ~]$ vi .bash_profile
在文件尾部加入以下內(nèi)容
export HBASE_CONF_DIR=/home/cbcloud/hbconf
export HBASE_HOME=/home/cbcloud/hbase
[cbcloud@hd204 ~]$ vi .bash_profile
在文件尾部加入以下內(nèi)容
export HBASE_CONF_DIR=/home/cbcloud/hbconf
export HBASE_HOME=/home/cbcloud/hbase
[cbcloud@hd205 ~]$ vi .bash_profile
在文件尾部加入以下內(nèi)容
export HBASE_CONF_DIR=/home/cbcloud/hbconf
export HBASE_HOME=/home/cbcloud/hbase
[cbcloud@hd206 ~]$ vi .bash_profile
在文件尾部加入以下內(nèi)容
export HBASE_CONF_DIR=/home/cbcloud/hbconf
export HBASE_HOME=/home/cbcloud/hbase
7、復(fù)制$HBASE_HOME目錄下的conf子目錄下的所有文件到$HBASE_CONF_DIR目錄下(只在hd202上操作)
[cbcloud@hd202 ~]$ cp /home/cbcloud/hbase/conf/* /home/cbcloud/hbconf/
8、編輯$HBASE_CONF_DIR目錄下的hbase_env.sh(只在hd202上操作)
找到export HBASE_OPTS="-XX:+UseConcMarkSweepGC" 這一行,將其注釋掉,然后添加以下內(nèi)容
export HBASE_OPTS="$HBASE_OPTS -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC"
export JAVA_HOME=/usr/java/default
export HBASE_HOME=/home/cbcloud/hbase
export HADOOP_HOME=/home/cbcloud/hadoop
export HBASE_MANAGES_ZK=true //由HBASE自動(dòng)管理zookeeper進(jìn)程
9、編輯$HBASE_CONF_DIR目錄下的hbase_site.xml(只在hd202上操作)
加入以下內(nèi)容
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hd202:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master</name>
<value>hd202:60000</value>
</property>
<property>
<name>hbase.master.port</name>
<value>60000</value>
<description>The port master should bind to.</description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hd204,hd205,hd206</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/cbcloud/zookeeperdata</value>
</property>
</configuration>
10、編輯regionservers文件
刪除localhost,然后加入以下內(nèi)容
hd204
hd205
hd206
11、復(fù)制$HBASE_HOME目錄及$HBASE_CONF_DIR目錄到其他四臺(tái)虛擬機(jī)上
[cbcloud@hd202 ~]$ scp -r hbase hd203:/home/cbcloud/
[cbcloud@hd202 ~]$ scp -r hbase hd204:/home/cbcloud/
[cbcloud@hd202 ~]$ scp -r hbase hd205:/home/cbcloud/
[cbcloud@hd202 ~]$ scp -r hbase hd206:/home/cbcloud/
[cbcloud@hd202 ~]$ scp -r hbconf hd203:/home/cbcloud/
[cbcloud@hd202 ~]$ scp -r hbconf hd204:/home/cbcloud/
[cbcloud@hd202 ~]$ scp -r hbconf hd205:/home/cbcloud/
[cbcloud@hd202 ~]$ scp -r hbconf hd206:/home/cbcloud/
12、啟動(dòng)HBASE
[cbcloud@hd202 ~]$ cd hbase
[cbcloud@hd202 hbase]$ cd bin
[cbcloud@hd202 bin]$ ./start-hbase.sh #在主節(jié)點(diǎn)上啟動(dòng)hbase
starting master, logging to /home/cbcloud/hbase/logs/hbase-cbcloud-master-hd202.out
hd204: starting regionserver, logging to /home/cbcloud/hbase/logs/hbase-cbcloud-regionserver-hd204.out
hd205: starting regionserver, logging to /home/cbcloud/hbase/logs/hbase-cbcloud-regionserver-hd205.out
hd206: starting regionserver, logging to /home/cbcloud/hbase/logs/hbase-cbcloud-regionserver-hd206.out
[cbcloud@hd202 bin]$ jps
3779 JobTracker
4529 HMaster
4736 Jps
3633 NameNode
[cbcloud@hd203 ~]$ cd hbase
[cbcloud@hd203 hbase]$ cd bin
[cbcloud@hd203 bin]$ ./hbase-daemon.sh start master #在SecondaryNameNode上啟動(dòng)HMaster
starting master, logging to /home/cbcloud/hbase/logs/hbase-cbcloud-master-hd203.out
[cbcloud@hd203 bin]$ jps
3815 Jps
3618 SecondaryNameNode
3722 HMaster
[cbcloud@hd204 hbconf]$ jps
3690 TaskTracker
3614 DataNode
4252 Jps
3845 QuorumPeerMain
4124 HRegionServer
[cbcloud@hd205 hbconf]$ jps
3826 QuorumPeerMain
3612 DataNode
3688 TaskTracker
4085 HRegionServer
4256 Jps
[cbcloud@hd206 ~]$ jps
3825 QuorumPeerMain
3693 TaskTracker
4091 HRegionServer
4279 Jps
3617 DataNode
13、使用WEB界面查看HMaster的情況http://192.168.0.202:60010
14、關(guān)閉HBbase的方法
第一步:關(guān)閉SecondaryNameNode上的HMaster服務(wù)
[cbcloud@hd203 ~]$ cd hbase
[cbcloud@hd203 hbase]$ cd bin
[cbcloud@hd203 bin]$ ./hbase-daemon.sh stop master
stopping master.
[cbcloud@hd203 bin]$ jps
4437 Jps
3618 SecondaryNameNode
第二步:關(guān)閉NameNode上的HMaster服務(wù)
[cbcloud@hd202 ~]$ cd hbase
[cbcloud@hd202 hbase]$ cd bin
[cbcloud@hd202 bin]$ ./stop-hbase.sh
stopping hbase...................
[cbcloud@hd202 bin]$ jps
5620 Jps
3779 JobTracker
3633 NameNode
第三步:關(guān)閉zookeeper服務(wù)
[cbcloud@hd204 ~]$ cd zookeeper/bin
[cbcloud@hd204 bin]$ ./zkServer.sh stop
JMX enabled by default
Using config: /home/cbcloud/zookeeper/bin/../conf/zoo.cfg
Stopping zookeeper ... STOPPED
[cbcloud@hd204 bin]$ jps
3690 TaskTracker
3614 DataNode
4988 Jps
[cbcloud@hd205 hbconf]$ cd ..
[cbcloud@hd205 ~]$ cd zookeeper/bin
[cbcloud@hd205 bin]$ ./zkServer.sh stop
JMX enabled by default
Using config: /home/cbcloud/zookeeper/bin/../conf/zoo.cfg
Stopping zookeeper ... STOPPED
[cbcloud@hd205 bin]$ jps
3612 DataNode
3688 TaskTracker
4920 Jps
[cbcloud@hd206 ~]$ cd zookeeper
[cbcloud@hd206 zookeeper]$ cd bin
[cbcloud@hd206 bin]$ ./zkServer.sh stop
JMX enabled by default
Using config: /home/cbcloud/zookeeper/bin/../conf/zoo.cfg
Stopping zookeeper ... STOPPED
[cbcloud@hd206 bin]$ jps
4931 Jps
3693 TaskTracker
3617 DataNode
第四步:關(guān)閉hadoop
[cbcloud@hd202 bin]$ ./stop-all.sh
stopping jobtracker
hd205: stopping tasktracker
hd204: stopping tasktracker
hd206: stopping tasktracker
stopping namenode
hd205: stopping datanode
hd206: stopping datanode
hd204: stopping datanode
hd203: stopping secondarynamenode
15、啟動(dòng)HBase的順序與上面的順序嚴(yán)格相反
第一步:?jiǎn)?dòng)hadoop
第二步:?jiǎn)?dòng)各個(gè)DataNode節(jié)點(diǎn)上的zookeeper
第三步:?jiǎn)?dòng)NameNode上的HMaster
第四步:?jiǎn)?dòng)SecondaryNameNode上的HMaster
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。