您好,登錄后才能下訂單哦!
這篇文章主要為大家展示了“如何在hadoop YARN上運(yùn)行spark-shell”,內(nèi)容簡(jiǎn)而易懂,條理清晰,希望能夠幫助大家解決疑惑,下面讓小編帶領(lǐng)大家一起研究并學(xué)習(xí)一下“如何在hadoop YARN上運(yùn)行spark-shell”這篇文章吧。
1. spark模式架構(gòu)圖 ![](https://cache.yisu.com/upload/information/20210522/355/683134.png "在這里輸入圖片標(biāo)題") 2. Scala下載安裝 a. 官網(wǎng): http://www.scala-alng.org/files/archive/ b. 選擇好版本,復(fù)制鏈接,使用wget 命令下載 wget http://www.scala-alng.org/files/archive/scala-2.11.6.tgz c. 解壓 tar xvf scala-2.11.6.tgz sudo mv scala-2.11.6 /usr/local/scala # 將scala移動(dòng)到/usr/local目錄 d. 設(shè)置環(huán)境變量 sudo gedit ~/.bashrc export SCALA_HOME=/usr/local/scala export PATH=$PATH:$SCALA_HOME/bin source ~/.bashrc # 使配置生效 e. 啟動(dòng)scala hduser[@master](https://my.oschina.net/u/48054):~$ scala 3. Spark安裝 a. 官網(wǎng): http://spark.apache.org/downloads.html b. 選擇版本1.4 || Pre-built for Hadoop 2.6 and later || 復(fù)制鏈接使用wget 命令下載 c. wget http://d3kbcqa49mib13.cloudfront.net/spark-1.4.0-bin-hadoop2.6.tgz d. 解壓并移動(dòng)到 /usr/local/spark/ e. 編輯環(huán)境變量 f. sudo gedit ~/.bashrc export SPARK_HOME=/usr/local/spark export PATH=$PATH:$SPARK_HOME/bin g. source ~/.bashrc # 使配置生效 4. 啟動(dòng)spark-shell交互頁(yè)面 hduser[@master](https://my.oschina.net/u/48054):~$ spark-shell 5. 啟動(dòng)hadoop 6. 在本地運(yùn)行spark-shell a. spark-shell --master local[4] b. 讀取本地文件 val textFile=sc.textFile("file:/usr/local/spark/LREADME.md") textFile.count 7. 在Hadoop Yarn 運(yùn)行spark-shell SPARK_JAR=/usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client /usr/local/spark/bin/spark-shell SPARK_JAR=/usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar # 設(shè)置sparkjar文件路徑 HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop # 設(shè)置hadoop配置文件目錄 MASTER=yarn-client # 設(shè)置運(yùn)行模式是yarn-client /usr/local/spark/bin/spark-shell # 要運(yùn)行的spark-shell的完整路徑 8. 構(gòu)建Spark Standalone Cluster執(zhí)行環(huán)境 a. cp /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh # 復(fù)制模板文件 在進(jìn)行設(shè)置 b. 設(shè)置spark-env.sh c. sudo gedit /usr/local/spark/conf/spark-env.sh export SPARK_MASTER_IP=master master的IP export SPARK_WORKER_CORES=1 每個(gè)worker使用的cpu核心 export SPARK_WORKER_MEMORY=600m 每個(gè)worker使用的內(nèi)存 export SPARK_WORKER_INSTANCES=1 設(shè)置每個(gè)worker實(shí)例 # 一定要注意自己的內(nèi)存 # hadoop+spark 在多個(gè)虛擬機(jī)上運(yùn)行起來后8G內(nèi)存是不夠用的 非常耗內(nèi)存 # 資源在經(jīng)過虛擬機(jī)后會(huì)有比較大的損耗 d. 使用ssh鏈接data1,data2 并創(chuàng)建spark目錄 sudo mkdir /usr/local/spark sudo chown hduser:hduser /usr/local/spark # 對(duì)data1 和data2執(zhí)行上面的操作 e. 將master的spark復(fù)制到data1,data2上 sudo scp -r /usr/local/spark hduser@data1:/usr/local sudo scp -r /usr/local/spark hduser@data2:/usr/local f. 編輯slaves文件 sudo gedit /usr/local/spark/conf/slaves data1 data2 9. 在Spark Standalone運(yùn)行spark-shell a. 啟動(dòng)Spark Standalone Cluster /usr/local/spark/sbin/start-all.sh b. 運(yùn)行 spark-shell --master spark://master:7077 c. 查看Spark Standalone Web UI界面 http://master:8080/ d. 停止Spark Standalone Cluster /usr/local/spark/sbin/stop-all.sh 10. 命令參考 152 scala 153 jps 154 wget http://d3kbcqa49mib13.cloudfront.net/spark-1.4.0-bin-hadoop2.6.tgz 155 ping www.baidu.com 156 ssh data3 157 ssh data2 158 ssh data1 159 jps 160 start-all.sh 161 jps 162 spark-shell 163 spark-shell --master local[4] 164 SPARK_JAR=/usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client /usr/local/spark/bin/spark-shell 165 ssh data2 166 ssh data1 167 cd /usr/local/hadoop/etc/hadoop/ 168 ll 169 sudo gedit masters 170 sudo gedit slaves 171 sudo gedit /etc/hosts 172 sudo gedit hdfs-site.xml 173 sudo rm -rf /usr/local/hadoop/hadoop_data/hdfs 174 mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode 175 sudo chown -R hduser:hduser /usr/local/hadoop 176 hadoop namenode -format 177 start-all.sh 178 jps 179 spark-shell 180 SPARK_JAR=/usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client /usr/local/spark/bin/spark-shell 181 ssh data1 182 ssh data2 183 ssh data1 184 start-all.sh 185 jps 186 cp /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh 187 sudo gedit /usr/local/spark/conf/spark-env.sh 188 sudo scp -r /usr/local/spark hduser@data1:/usr/local 189 sudo scp -r /usr/local/spark hduser@data2:/usr/local 190 sudo gedit /usr/local/spark/conf/slaves 191 /usr/local/spark/sbin/start-all.sh 192 spark-shell --master spark://master:7077 193 /usr/local/spark/sbin/stop-all.sh 194 jps 195 stop-all.sh 196 history
以上是“如何在hadoop YARN上運(yùn)行spark-shell”這篇文章的所有內(nèi)容,感謝各位的閱讀!相信大家都有了一定的了解,希望分享的內(nèi)容對(duì)大家有所幫助,如果還想學(xué)習(xí)更多知識(shí),歡迎關(guān)注億速云行業(yè)資訊頻道!
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。