溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊(cè)×
其他方式登錄
點(diǎn)擊 登錄注冊(cè) 即表示同意《億速云用戶服務(wù)條款》

Spark應(yīng)用程序怎么部署

發(fā)布時(shí)間:2021-12-16 16:51:57 來源:億速云 閱讀:148 作者:iii 欄目:云計(jì)算

這篇文章主要介紹“Spark應(yīng)用程序怎么部署”,在日常操作中,相信很多人在Spark應(yīng)用程序怎么部署問題上存在疑惑,小編查閱了各式資料,整理出簡(jiǎn)單好用的操作方法,希望對(duì)大家解答”Spark應(yīng)用程序怎么部署”的疑惑有所幫助!接下來,請(qǐng)跟著小編一起來學(xué)習(xí)吧!

Spark應(yīng)用程序的部署
local
spark standalone
hadoop yarn
apache mesos
amazon ec2
spark standalone集群部署
standalonestandalone ha
SPARK源碼編譯
SBT編譯
SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt assembly
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package
Spark部署包生成命令make-distribution.sh
--hadoop VERSION:hadoop版本號(hào) 不加此參數(shù)是hadoop版本為1.0.4
--with-yarn是否支持hadoop yarn不加參數(shù)時(shí)為不支持
--with-hive是否在sparksql中支持hive不加此參數(shù)為不支持hive
--skip-tachyon是否支持內(nèi)存文件系統(tǒng)Tachyon,不加此參數(shù)時(shí)不生成tgz文件,只生成/dist目錄
--name NAME和-tgz結(jié)合可以生成spark-¥VERSION-bin-$NAME.tgz的部署包,不加此參數(shù)時(shí)NAME為hadoop的版本號(hào)
部署包生成
生成支持yarn hadoop2.2.0的部署包
./make-distribution.sh --hadoop 2.2.0 --with-yarn --tgz
生成支持yarn hive的部署包
./make-distribution.sh --hadoop 2.2.0 --with-yarn --with-hive --tgz


[root@localhost lib]# ls /root/soft/spark-1.4.0-bin-hadoop2.6/lib/spark-assembly-1.4.0-hadoop2.6.0.jar
/root/soft/spark-1.4.0-bin-hadoop2.6/lib/spark-assembly-1.4.0-hadoop2.6.0.jar

[root@localhost conf]# vi slaves【slave節(jié)點(diǎn),如果偽分布就是】
localhost

[root@localhost conf]# cp spark-env.sh.template spark-env.sh
[root@localhost conf]# vi spark-env.sh拷貝到所有節(jié)點(diǎn)
文件conf/spark-env.sh
export SPARK_MASTER_IP=localhost
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_CORES=1
export SPARK__WORKER_INSTANCES=1
export SPARK__WORKER_MEMORY=1

[root@localhost conf]# ../sbin/start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /root/soft/spark-1.4.0-bin-hadoop2.6/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-localhost.localdomain.out
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /root/soft/spark-1.4.0-bin-hadoop2.6/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out
localhost: failed to launch org.apache.spark.deploy.worker.Worker:
localhost:   JAVA_HOME is not set
localhost: full log in /root/soft/spark-1.4.0-bin-hadoop2.6/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out
訪問http://192.168.141.10:8080/


[root@localhost conf]# ../bin/spark-shell --master  spark://localhost:7077

訪問http://192.168.141.10:8080/有application id生成

sparkstandalone HA部署
基于文件系統(tǒng)的HA
spark.deploy.recoveryMode設(shè)成FILESYSTEM
spark.deploy.recoveryDirecory Spark保存恢復(fù)狀態(tài)的目錄
Spark-env.sh里對(duì)SPARK_DAEMON_JAVA_OPTS設(shè)置
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=FILESYSTEM -Dspark.deploy.recoveryDirecory=$dir"
基于zookeeper的ha
spark.deploy.recoveryMode設(shè)成ZOOKEEPER
spark.deploy.zookeeper.url Zookeeper url
spark.deploy.zookeeper.dir Zookeeper保存恢復(fù)狀態(tài)的目錄缺省為spark
spark-env里對(duì)SPARK_DAEMON_JAVA_OPTS設(shè)置
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=hadoop1:2181,hadoop2:2181 -D=spark.deploy.zookeeper.dir=$DIR"
啟動(dòng)startall
然后在另外一臺(tái)啟動(dòng)start-master

[root@localhost ~]# jps
4609 Jps
4416 SparkSubmit
4079 Master
4291 SparkSubmit

ssh 免密
[root@localhost ~]# ssh-keygen -t rsa -P ''

[root@localhost ~]# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[root@localhost ~]# chmod 600 ~/.ssh/authorized_keys

[root@localhost conf]# ../bin/spark-shell --master  spark://localhost:7077 --executor-memory 2g

spark工具簡(jiǎn)介
spark交互工具 spark-shell
spark應(yīng)用程序部署工具 spark-submit
option
--master MASTER_URL spark://host:port mesos://host:port yarn or local
--deploy-mode DEPLOY_MODE driver運(yùn)行之處 client運(yùn)行在本機(jī) cluster運(yùn)行在集群
--class CLASS_NAME應(yīng)用程序包要運(yùn)行的class
--name 應(yīng)用程序名稱
--jars用逗號(hào)隔開的driver本地要運(yùn)行的本地jar包以及executor類路徑
--py-files PY_FILES用逗號(hào)隔開的要放置在每個(gè)executor工作目錄的文件列表
--properties-file FILE設(shè)置應(yīng)用程序?qū)傩缘奈募胖梦淖帜J(rèn)是conf/spark-defaults.conf
--driver-memory MEMDRIVER內(nèi)存大小默認(rèn)512m
--driver-java-options driver的java選項(xiàng)
--driver-library-path driver庫路徑
--driver-class-path driver類路徑
--executor-memory MEM設(shè)置內(nèi)存大小默認(rèn)1G
[root@localhost sbin]# sh start-dfs.sh
scala>  val rdd=sc.textFile("hdfs://localhost.localdomain:9000/20140824/test-data.csv")
scala> val rdd2=rdd.flatMap(_.split(" ")).map(x=>(x,1)).reduceByKey(_+_)

到此,關(guān)于“Spark應(yīng)用程序怎么部署”的學(xué)習(xí)就結(jié)束了,希望能夠解決大家的疑惑。理論與實(shí)踐的搭配能更好的幫助大家學(xué)習(xí),快去試試吧!若想繼續(xù)學(xué)習(xí)更多相關(guān)知識(shí),請(qǐng)繼續(xù)關(guān)注億速云網(wǎng)站,小編會(huì)繼續(xù)努力為大家?guī)砀鄬?shí)用的文章!

向AI問一下細(xì)節(jié)

免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。

AI