您好,登錄后才能下訂單哦!
本篇內(nèi)容介紹了“Spark環(huán)境搭建與測(cè)試方法”的有關(guān)知識(shí),在實(shí)際案例的操作過(guò)程中,不少人都會(huì)遇到這樣的困境,接下來(lái)就讓小編帶領(lǐng)大家學(xué)習(xí)一下如何處理這些情況吧!希望大家仔細(xì)閱讀,能夠?qū)W有所成!
官方推薦:
Spark runs on Java 6+, Python 2.6+ and R 3.1+. For the Scala API, Spark 1.4.0 uses Scala 2.10. You will need to use a compatible Scala version (2.10.x).
scala2.11.x需要下載額外的spark支持包
本機(jī)環(huán)境:
ubuntu14.04 + jdk1.8 + python2.7 + scala2.10.5 + hadoop2.6.0 + spark1.4.0
下載scala,下載地址為:http://www.scala-lang.org/download/2.10.5.html#Other_resources
上傳scala安裝包 并解壓
配置環(huán)境變量,vim /etc/profile添加如下:
export JAVA_HOME=/usr/local/java/jdk1.8.0_45 export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export HADOOP_HOME=/home/nob/opt/hadoop-2.6.0 export SCALA_HOME=/home/nob/opt/scala-2.10.5 export SPARK_HOME=/home/nob/opt/spark-1.4.0-bin-hadoop2.6 export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${SCALA_HOME}/bin:${SPARK_HOME}/bin:$PATH
source /etc/profile后,輸入scala -version可以看到版本信息
下載解壓到:/home/nob/opt/spark-1.4.0-bin-hadoop2.6
配置運(yùn)行環(huán)境,編輯spark-env.sh
nob@nobubuntu:~/opt/spark-1.4.0-bin-hadoop2.6$ vim conf/spark-env.sh export JAVA_HOME=/usr/local/java/jdk1.8.0_45 export SCALA_HOME=/home/nob/opt/scala-2.10.5 export HADOOP_HOME=/home/nob/opt/hadoop-2.6.0 export HADOOP_CONF_DIR=/home/nob/opt/hadoop-2.6.0/etc/hadoop export SPARK_MASTER_IP=nobubuntu export SPARK_WORKER_MEMORY=512M
SPARK_MASTER_IP為master節(jié)點(diǎn)的ip或hostname
nob@nobubuntu:~/opt/spark-1.4.0-bin-hadoop2.6$ sbin/start-all.sh starting org.apache.spark.deploy.master.Master, logging to /data/server/spark-1.4.0-bin-hadoop2.6/sbin/../logs/spark-nob-org.apache.spark.deploy.master.Master-1-nobubuntu.out nobubuntu: org.apache.spark.deploy.worker.Worker running as process 10297. Stop it first. nob@nobubuntu:~/opt/spark-1.4.0-bin-hadoop2.6$ jps 8706 DataNode 9062 ResourceManager 10775 Jps 9192 NodeManager 10569 Master 10297 Worker 8572 NameNode 8911 SecondaryNameNode nob@nobubuntu:~/opt/spark-1.4.0-bin-hadoop2.6$
jps可以看到Master和Worker進(jìn)程,訪問(wèn)http://nobubuntu:8080/可以看到運(yùn)行的詳細(xì)信息
使用PySpark shell, 在Spark解壓的源碼路徑下,運(yùn)行
bin/pyspark
在提示符下,依次輸入下面的命令
>>> lines = sc.textFile("README.md") >>> lines.count() >>> lines.first()
經(jīng)過(guò)上面的運(yùn)行,發(fā)現(xiàn)shell環(huán)境中打印的日志過(guò)多, 為此我需要調(diào)整以下日志的級(jí)別.為此,我在
conf目錄下面新建一個(gè)文件log4j.properties,它是log4j.properties.template的副本,將其中
下面的行
log4j.rootCategory=INFO, console
改為
log4j.rootCategory=WARN, console
然后重新打開(kāi)shell,發(fā)現(xiàn)調(diào)試信息少了很多
打開(kāi)Scala版本的shell,運(yùn)行
bin/spark-shell scala> val lines = sc.textFile("README.md") scala> lines.cout() scala> lines.first()
一個(gè)獨(dú)立的應(yīng)用,這里演示python,當(dāng)時(shí)你也可以使用scala或者java都很簡(jiǎn)單,自官方文檔
"""SimpleApp.py""" from pyspark import SparkContext logFile = "YOUR_SPARK_HOME/README.md" # Should be some file on your system sc = SparkContext("local", "Simple App") logData = sc.textFile(logFile).cache() numAs = logData.filter(lambda s: 'a' in s).count() numBs = logData.filter(lambda s: 'b' in s).count() print "Lines with a: %i, lines with b: %i" % (numAs, numBs)
使用 bin/spark-submit來(lái)執(zhí)行上面的腳本
# Use spark-submit to run your application $ YOUR_SPARK_HOME/bin/spark-submit --master local[4] SimpleApp.py ... Lines with a: 46, Lines with b: 23
“Spark環(huán)境搭建與測(cè)試方法”的內(nèi)容就介紹到這里了,感謝大家的閱讀。如果想了解更多行業(yè)相關(guān)的知識(shí)可以關(guān)注億速云網(wǎng)站,小編將為大家輸出更多高質(zhì)量的實(shí)用文章!
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。