<ruby id="8z0t1"><meter id="8z0t1"></meter></ruby>

溫馨提示×

溫馨提示×

您好，登錄后才能下訂單哦！

密碼登錄×

忘記密碼？

登錄注冊×

獲取短信驗(yàn)證碼

其他方式登錄

點(diǎn)擊登錄注冊即表示同意《億速云用戶服務(wù)條款》

用戶登錄×

賬戶密碼登錄

請使用微信掃描上方二維碼

使用幫助

請求超時(shí)！

請點(diǎn)擊重新獲取二維碼

hive on spark怎樣編譯

發(fā)布時(shí)間：2021-12-08 11:43:11 來源：億速云閱讀：171 作者：小新欄目：云計(jì)算

這篇文章給大家分享的是有關(guān)hive on spark怎樣編譯的內(nèi)容。小編覺得挺實(shí)用的，因此分享給大家做個(gè)參考，一起跟隨小編過來看看吧。

前置條件說明

Hive on Spark是Hive跑在Spark上，用的是Spark執(zhí)行引擎，而不是MapReduce，和Hive on Tez的道理一樣。
從Hive 1.1版本開始，Hive on Spark已經(jīng)成為Hive代碼的一部分了，并且在spark分支上面。

源碼下載

git clone https://github.com/apache/hive.git hive_on_spark

編譯

 cd hive_on_spark/
 git branch -r
  origin/HEAD -> origin/master
  origin/HIVE-4115
  origin/HIVE-8065
  origin/beeline-cli
  origin/branch-0.10
  origin/branch-0.11
  origin/branch-0.12
  origin/branch-0.13
  origin/branch-0.14
  origin/branch-0.2
  origin/branch-0.3
  origin/branch-0.4
  origin/branch-0.5
  origin/branch-0.6
  origin/branch-0.7
  origin/branch-0.8
  origin/branch-0.8-r2
  origin/branch-0.9
  origin/branch-1
  origin/branch-1.0
  origin/branch-1.0.1
  origin/branch-1.1
  origin/branch-1.1.1
  origin/branch-1.2
  origin/cbo
  origin/hbase-metastore
  origin/llap
  origin/master
  origin/maven
  origin/next
  origin/parquet
  origin/ptf-windowing
  origin/release-1.1
  origin/spark
  origin/spark-new
  origin/spark2
  origin/tez
  origin/vectorization

 git checkout origin/spark
 git branch* （分離自 origin/spark）
  master123456789101112131415161718192021222324252627282930313233343536373839404142434445

修改$HIVE_ON_SPARK/pom.xml
spark版本改成spark1.4.1

 <spark.version>1.4.1</spark.version>1

hadoop版本改成2.3.0-cdh6.1.0

<hadoop-23.version>2.3.0-cdh6.1.0</hadoop-23.version>1

編譯命令

export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"mvn clean package -Phadoop-2 -DskipTests12

添加Spark的依賴到Hive的方法

spark home:/home/cluster/apps/spark/spark-1.4.1
hive home:/home/cluster/apps/hive_on_spark

1.set the property ‘spark.home’ to point to the Spark installation:

hive> set spark.home=/home/cluster/apps/spark/spark-1.4.1;  1

Define the SPARK_HOME environment variable before starting Hive CLI/HiveServer2:

export SPARK_HOME=/home/cluster/apps/spark/spark-1.4.11

3.Set the spark-assembly jar on the Hive auxpath:

hive --auxpath /home/cluster/apps/spark/spark-1.4.1/lib/spark-assembly-*.jar1

Add the spark-assembly jar for the current user session:

hive> add jar /home/cluster/apps/spark/spark-1.4.1/lib/spark-assembly-*.jar;1

Link the spark-assembly jar to $HIVE_HOME/lib.

啟動Hive過程中可能出現(xiàn)的錯(cuò)誤：

[ERROR] Terminal initialization failed; falling back to unsupportedjava.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected
        at jline.TerminalFactory.create(TerminalFactory.java:101)
        at jline.TerminalFactory.get(TerminalFactory.java:158)
        at jline.console.ConsoleReader.<init>(ConsoleReader.java:229)
        at jline.console.ConsoleReader.<init>(ConsoleReader.java:221)
        at jline.console.ConsoleReader.<init>(ConsoleReader.java:209)
        at org.apache.hadoop.hive.cli.CliDriver.getConsoleReader(CliDriver.java:773)
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:715)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Exception in thread "main" java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected123456789101112131415161718

解決方法：export HADOOP_USER_CLASSPATH_FIRST=true

其他場景的錯(cuò)誤解決方法參見：https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started

需要設(shè)置spark.eventLog.dir參數(shù)，比如：

set spark.eventLog.dir= hdfs://master:8020/directory
否則查詢會報(bào)錯(cuò)，否則一直報(bào)錯(cuò):/tmp/spark-event類似的文件夾不存在

啟動hive后設(shè)置執(zhí)行引擎為spark：

hive> set hive.execution.engine=spark;1

設(shè)置spark的運(yùn)行模式：

hive> set spark.master=spark://master:70771

或者yarn：spark.master=yarn

Configure Spark-application configs for Hive

可以配置在spark-defaults.conf或者h(yuǎn)ive-site.xml

spark.master=<Spark Master URL>
spark.eventLog.enabled=true;            
spark.executor.memory=512m;             
spark.serializer=org.apache.spark.serializer.KryoSerializer;
spark.executor.memory=...  #Amount of memory to use per executor process.spark.executor.cores=...  #Number of cores per executor.spark.yarn.executor.memoryOverhead=...spark.executor.instances=...  #The number of executors assigned to each application.spark.driver.memory=...  #The amount of memory assigned to the Remote Spark Context (RSC). We recommend 4GB.spark.yarn.driver.memoryOverhead=...  #We recommend 400 (MB).12345678910

參數(shù)配置詳見文檔：https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started

執(zhí)行sql語句后可以在監(jiān)控頁面查看job/stages等信息

hive (default)> select city_id, count(*) c from city_info group by city_id order by c desc limit 5;
Query ID = spark_20150309173838_444cb5b1-b72e-4fc3-87db-4162e364cb1e
Total jobs = 1Launching Job 1 out of 1In order to change the average load for a reducer (in bytes):  set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:  set hive.exec.reducers.max=<number>In order to set a constant number of reducers:  set mapreduce.job.reduces=<number>
state = SENT
state = STARTED
state = STARTED
state = STARTED
state = STARTED
Query Hive on Spark job[0] stages:1Status: Running (Hive on Spark job[0])
Job Progress Format
CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost]2015-03-09 17:38:11,822 Stage-0_0: 0(+1)/1      Stage-1_0: 0/1  Stage-2_0: 0/1state = STARTED
state = STARTED
state = STARTED2015-03-09 17:38:14,845 Stage-0_0: 0(+1)/1      Stage-1_0: 0/1  Stage-2_0: 0/1state = STARTED
state = STARTED2015-03-09 17:38:16,861 Stage-0_0: 1/1 Finished Stage-1_0: 0(+1)/1      Stage-2_0: 0/1state = SUCCEEDED2015-03-09 17:38:17,867 Stage-0_0: 1/1 Finished Stage-1_0: 1/1 Finished Stage-2_0: 1/1 Finished
Status: Finished successfully in 10.07 seconds
OK
city_id c
-1000   22826-10     17294-20     10608-1      6186
    4158Time taken: 18.417 seconds, Fetched: 5 row(s)

感謝各位的閱讀！關(guān)于“hive on spark怎樣編譯 ”這篇文章就分享到這里了，希望以上內(nèi)容可以對大家有一定的幫助，讓大家可以學(xué)到更多知識，如果覺得文章不錯(cuò)，可以把它分享出去讓更多的人看到吧！

向AI問一下細(xì)節(jié)

推薦閱讀：

免責(zé)聲明：本站發(fā)布的內(nèi)容（圖片、視頻和文字）以原創(chuàng)、轉(zhuǎn)載和分享為主，文章觀點(diǎn)不代表本網(wǎng)站立場，如果涉及侵權(quán)請聯(lián)系站長郵箱：is@yisu.com進(jìn)行舉報(bào)，并提供相關(guān)證據(jù)，一經(jīng)查實(shí)，將立刻刪除涉嫌侵權(quán)內(nèi)容。

上一篇新聞：
Linux的I/O模型概念是什么
下一篇新聞：
使用Sqoop工具把mysql的表往Hive import的時(shí)候發(fā)生的錯(cuò)誤該怎么解決

猜你喜歡

AI
助
手

產(chǎn)品服務(wù)

地區(qū)劃分

專題活動

幫助支持

關(guān)于我們

售后咨詢

7*24小時(shí)在線電話：400-100-2938

7*24小時(shí)在線 QQ：800811969

關(guān)注億速云

億速云公眾號

手機(jī)網(wǎng)站二維碼

<ins id="lsq6m"></ins>

<acronym id="lsq6m"></acronym>