spark入門知識(shí)和job任務(wù)提交流程

發(fā)布時(shí)間：2020-04-11 09:11:35 來(lái)源：網(wǎng)絡(luò) 閱讀：892 作者：ChinaUnicom110 欄目：大數(shù)據(jù)

cd /usr/local/spark
./bin/spark-submit --class org.apache.spark.examples.SparkPi ./examples/jars/spark-examples_2.11-2.1.0.jar 10000

創(chuàng)建root下的文本文件hello.txt
./bin/spark-shell
再次連接一個(gè)terminal，用jps觀察進(jìn)程，會(huì)看到spark-submit進(jìn)程
sc
sc.textFile("/root/hello.txt")
val lineRDD = sc.textFile("/root/hello.txt")
lineRDD.foreach(println)
觀察網(wǎng)頁(yè)端情況
val wordRDD = lineRDD.flatMap(line => line.split(" "))
wordRDD.collect
val wordCountRDD = wordRDD.map(word => (word,1))
wordCountRDD.collect
val resultRDD = wordCountRDD.reduceByKey((x,y)=>x+y)
resultRDD.collect
val orderedRDD = resultRDD.sortByKey(false)
orderedRDD.collect
orderedRDD.saveAsTextFile("/root/result")
觀察結(jié)果
簡(jiǎn)便寫法：sc.textFile("/root/hello.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).sortByKey().collect

start-dfs.sh
spark-shell執(zhí)行：sc.textFile("hdfs://192.168.56.100:9000/hello.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).sortByKey().collect （可以把ip換成master，修改/etc/hosts）
sc.textFile("hdfs://192.168.56.100:9000/hello.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).sortByKey().saveAsTextFile("hdfs://192.168.56.100:9000/output1")

向AI問(wèn)一下細(xì)節(jié)

猜你喜歡