數(shù)據(jù)到hadoop的遷移步驟

發(fā)布時(shí)間：2021-08-20 20:08:47 來源：億速云閱讀：352 作者：chen 欄目：開發(fā)技術(shù)

本篇內(nèi)容主要講解“數(shù)據(jù)到hadoop的遷移步驟”，感興趣的朋友不妨來看看。本文介紹的方法操作簡單快捷，實(shí)用性強(qiáng)。下面就讓小編來帶大家學(xué)習(xí)“數(shù)據(jù)到hadoop的遷移步驟”吧!

最近在用flume和sqoop來做非關(guān)系數(shù)據(jù)(日志)和關(guān)系數(shù)據(jù)(MYSQL)遷移到hdfs的工作，簡單記錄下使用過程，以此總結(jié)
一 flume的使用
使用flume把web的log日志數(shù)據(jù)導(dǎo)入到hdfs上
步驟
1 在 elephant 節(jié)點(diǎn)上
先安裝flume sudo yum install --assumeyes flume-ng
2 創(chuàng)建配置文件
vi /etc/hadoop/conf/flume-conf.properties

tail1.sources = src1
tail1.channels = ch2
tail1.sinks = sink1
tail1.sources.src1.type = exec
tail1.sources.src1.command = tail -F /tmp/access_log
tail1.sources.src1.channels = ch2
tail1.channels.ch2.type = memory
tail1.channels.ch2.capacity = 500
tail1.sinks.sink1.type = avro
tail1.sinks.sink1.hostname = localhost
tail1.sinks.sink1.port = 6000
tail1.sinks.sink1.batch-size = 1
tail1.sinks.sink1.channel = ch2
##
collector1.sources = src1
collector1.channels = ch2
collector1.sinks = sink1
collector1.sources.src1.type = avro
collector1.sources.src1.bind = localhost
collector1.sources.src1.port = 6000
collector1.sources.src1.channels = ch2
collector1.channels.ch2.type = memory
collector1.channels.ch2.capacity = 500
collector1.sinks.sink1.type = hdfs
collector1.sinks.sink1.hdfs.path = flume/collector1
collector1.sinks.sink1.hdfs.filePrefix = access_log
collector1.sinks.sink1.channel = ch2

配置文件說明結(jié)構(gòu)是
src取日志數(shù)據(jù),通過內(nèi)存?zhèn)魉偷奖镜匾詀vro文件格式保存，做中轉(zhuǎn)，然后從avro文件，通過內(nèi)存?zhèn)魉偷絟dfs上。hdfs保存路徑是flume/collector1，

3 在hfds上創(chuàng)建保存目錄
hadoop fs -mkdir -p flume/collector1

4 模擬產(chǎn)生大量日志文件，在log目錄中
$ accesslog-gen.bash /tmp/access_log
5 啟動(dòng)日志收集器
flume-ng agent --conf /etc/hadoop/conf/ \
--conf-file /etc/hadoop/conf/flume-conf.properties \
--name collector1
6 啟動(dòng)日志采集器
$ flume-ng agent \
--conf-file /etc/hadoop/conf/flume-conf.properties \
--name tail1

二 sqoop的使用
使用sqoop把mysql中的表數(shù)據(jù)導(dǎo)入到hdfs
1安裝sqoop
sudo yum install --assumeyes sqoop
2在sqoop的lib下創(chuàng)建一個(gè)mysql連接的驅(qū)動(dòng)鏈接，也就是在sqoop的lib下面能找到mysql的驅(qū)動(dòng)包
就是在/usr/lib/sqoop/lib目錄，創(chuàng)建 $ sudo ln -s /usr/share/java/mysql-connector-java.jar /usr/lib/sqoop/lib/
3導(dǎo)入數(shù)據(jù)
sqoop help
用sqoop查看mysql中有哪些數(shù)據(jù)庫
sqoop list-databases \
--connect jdbc:mysql://localhost \
--username training --password training
再看看庫里有哪些表
sqoop list-tables \
--connect jdbc:mysql://localhost/movielens \
--username training --password training
開始導(dǎo)入命令表movie到hdfs，表中字段的數(shù)據(jù)用tab分割
sqoop import \
--connect jdbc:mysql://localhost/movielens \
--table movie --fields-terminated-by '\t' \
--username training --password training
4驗(yàn)證
hadoop fs -ls movie
hadoop fs -tail movie/part-m-00000
可以看到數(shù)據(jù)已文件形式保存到hdfs

到此，相信大家對“數(shù)據(jù)到hadoop的遷移步驟”有了更深的了解，不妨來實(shí)際操作一番吧！這里是億速云網(wǎng)站，更多相關(guān)內(nèi)容可以進(jìn)入相關(guān)頻道進(jìn)行查詢，關(guān)注我們，繼續(xù)學(xué)習(xí)！

向AI問一下細(xì)節(jié)

數(shù)據(jù)到hadoop的遷移步驟

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標(biāo)簽