溫馨提示×

溫馨提示×

您好，登錄后才能下訂單哦！

密碼登錄×

忘記密碼？

登錄注冊×

獲取短信驗(yàn)證碼

其他方式登錄

點(diǎn)擊登錄注冊即表示同意《億速云用戶服務(wù)條款》

用戶登錄×

賬戶密碼登錄

請使用微信掃描上方二維碼

使用幫助

請求超時！

請點(diǎn)擊重新獲取二維碼

Spark SQL如何訪問Hive和MySQL

發(fā)布時間：2021-12-04 14:59:39 來源：億速云閱讀：126 作者：iii 欄目：云計算

本篇內(nèi)容主要講解“Spark SQL如何訪問Hive和MySQL”，感興趣的朋友不妨來看看。本文介紹的方法操作簡單快捷，實(shí)用性強(qiáng)。下面就讓小編來帶大家學(xué)習(xí)“Spark SQL如何訪問Hive和MySQL”吧!

一：版本

搭建好的Hadoop環(huán)境，Hive環(huán)境,Spark環(huán)境。本文Hadoop版本為 Hadoop-2.6.4，Hive版本為Hive-2.0.0，Spark版本為spark-1.6.1-bin-hadoop2.6。

二：配置spark-env.sh

在 SPARK_HOME/conf/spark-env.sh 中配置以下內(nèi)容：

export SCALA_HOME=/mysoftware/scala-2.11.8

export JAVA_HOME=/mysoftware/jdk1.7.0_80

export SPARK_MASTER_IP=master

export SPARK_WORKER_MEMORY=512m

export master=spark://master:7077

另外往上很多資料都添加了如下兩行內(nèi)容，即：

export CLASSPATH=$CLASSPATH:/mysoftware/spark-1.6.1/lib

export SPARK_CLASSPATH=$SPARK_CLASSPATH:/mysoftware/spark-1.6.1/lib/mysql-connector-java-5.1.5-bin.jar

在這里spark-env.sh中并沒有添加如上兩行內(nèi)容，因?yàn)?strong>Spark1.0+版本已經(jīng)將這個否決了，所以在此沒有添加，可以看到在啟動spark-shell出現(xiàn)如下信息，即：

SPARK_CLASSPATH was detected (set to ':/mysoftware/spark-1.6.1/lib/mysql-connector-java-5.1.5-bin.jar').
This is deprecated in Spark 1.0+.

Please instead use:
- ./spark-submit with --driver-class-path to augment the driver classpath
- spark.executor.extraClassPath to augment the executor classpath

三：配置spark-defaults.sh

首先將SPARK_HOME/conf/spark-defaults.conf.template 拷貝（cp）一份為 spark-defaults.conf ，然后可以看到該文件中已告知眾多配置信息都是默認(rèn)的即default。所以本文并沒有修改，如需要修改，請修改成與自己環(huán)境相符合的。

另外，網(wǎng)上很多資料在該文件中內(nèi)容添加了如下內(nèi)容，即：

spark.executor.extraClassPath /mysoftware/spark-1.6.1/lib/mysql-connector-java-5.1.5-bin.jar
spark.driver.extraClassPath /mysoftware/spark-1.6.1/lib/mysql-connector-java-5.1.5-bin.jar

結(jié)果啟動spark-shell時，出現(xiàn)了WARN，原因是設(shè)置了上面兩行內(nèi)容。Setting。

四：添加mysql的驅(qū)動jar包

將mysql-connector-java-5.1.5-bin.jar 添加到 SPARK_HOME/lib/目錄下

五：添加SPARK_HOME/conf目錄下文件

將 hive-site.xml , core-site.xml（為安全起見），hdfs-site.xml（為HDFS配置）拷貝一份至 SPARK_HOME/conf目錄下。

官網(wǎng)介紹：

Configuration of Hive is done by placing your hive-site.xml, core-site.xml (for security configuration), hdfs-site.xml (for HDFS configuration) file in conf/. Please note when running the query on a YARN cluster (cluster mode), the datanucleus jars under the lib directory and hive-site.xml under conf/ directory need to be available on the driver and all executors launched by the YARN cluster. The convenient way to do this is adding them through the --jars option and --file option of the spark-submit command.

六：Spark SQL 訪問Hive

6.1 第一種方式啟動spark-shell：

bin/spark-shell --driver-class-path /mysoftware/spark-1.6.1/lib/mysql-connector-java-5.1.5-bin.jar

hadoop@master:/mysoftware/spark-1.6.1$ bin/spark-shell --driver-class-path /mysoftware/spark-1.6.1/lib/mysql-connector-java-5.1.5-bin.jar
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties
To adjust logging level use sc.setLogLevel("INFO")
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.1
      /_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
16/06/06 18:56:11 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/06/06 18:56:12 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/06/06 18:56:20 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/06/06 18:56:20 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
16/06/06 18:56:28 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/06/06 18:56:28 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/06/06 18:56:31 ERROR ObjectStore: Version information found in metastore differs 2.0.0 from expected schema version 1.2.0. Schema verififcation is disabled hive.metastore.schema.verification so setting version.
SQL context available as sqlContext.

scala>

運(yùn)行如下命令,即：

scala> sc
res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@631a8160

scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@3a957b9e

scala> sqlContext.sql("CREATE TABLE IF NOT EXISTS sparkhive (key INT, value STRING)")
res1: org.apache.spark.sql.DataFrame = [result: string]

當(dāng)運(yùn)行完上述第三條命令后，創(chuàng)建的表 sparkhive，能夠在hive中查詢到，即：

hive> show tables;
OK
hbase_person
hivehbase
hivehbase_person
hivehbase_student
multiplehive
sparkhive
testhive
testsparkhive
Time taken: 1.154 seconds, Fetched: 8 row(s)

現(xiàn)在往表中添加數(shù)據(jù)和查看數(shù)據(jù)，即：

scala> sqlContext.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE sparkhive");

scala> sqlContext.sql("FROM sparkhive SELECT key, value").collect()

'examples/src/main/resources/kv1.txt' --》該路徑在安裝包中有。

scala> sqlContext.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE sparkhive");
res2: org.apache.spark.sql.DataFrame = [result: string]

scala>  sqlContext.sql("FROM sparkhive SELECT key, value").collect()
res3: Array[org.apache.spark.sql.Row] = Array([238,val_238], [86,val_86], [311,val_311], [27,val_27], [165,val_165], [409,val_409], [255,val_255], [278,val_278], [98,val_98], [484,val_484], [265,val_265], [193,val_193], [401,val_401], [150,val_150], [273,val_273], [224,val_224], [369,val_369], [66,val_66], [128,val_128], [213,val_213], [146,val_146], [406,val_406], [429,val_429], [374,val_374], [152,val_152], [469,val_469], [145,val_145], [495,val_495], [37,val_37], [327,val_327], [281,val_281], [277,val_277], [209,val_209], [15,val_15], [82,val_82], [403,val_403], [166,val_166], [417,val_417], [430,val_430], [252,val_252], [292,val_292], [219,val_219], [287,val_287], [153,val_153], [193,val_193], [338,val_338], [446,val_446], [459,val_459], [394,val_394], [237,val_237], [482,val_482], ...
scala>

也可以在hive中通過 seelct * from sparkhive查看數(shù)據(jù)。

6.2 第二種方式啟動spark-shell：

SPARK_CLASSPATH=$SPARK_CLASSPATH:/mysoftware/spark-1.6.1/lib/mysql-connector-java-5.1.5-bin.jar bin/spark-shell

但是會出現(xiàn) 一些 WARN 信息，如下：（建議第一種方式啟動）

hadoop@master:/mysoftware/spark-1.6.1$ SPARK_CLASSPATH=$SPARK_CLASSPATH:/mysoftware/spark-1.6.1/lib/mysql-connector-java-5.1.5-bin.jar bin/spark-shell
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties
To adjust logging level use sc.setLogLevel("INFO")
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.1
      /_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)
Type in expressions to have them evaluated.
Type :help for more information.
16/06/06 19:14:10 WARN SparkConf: 
SPARK_CLASSPATH was detected (set to ':/mysoftware/spark-1.6.1/lib/mysql-connector-java-5.1.5-bin.jar').
This is deprecated in Spark 1.0+.

Please instead use:
 - ./spark-submit with --driver-class-path to augment the driver classpath
 - spark.executor.extraClassPath to augment the executor classpath
        
16/06/06 19:14:10 WARN SparkConf: Setting 'spark.executor.extraClassPath' to ':/mysoftware/spark-1.6.1/lib/mysql-connector-java-5.1.5-bin.jar' as a work-around.
16/06/06 19:14:10 WARN SparkConf: Setting 'spark.driver.extraClassPath' to ':/mysoftware/spark-1.6.1/lib/mysql-connector-java-5.1.5-bin.jar' as a work-around.
Spark context available as sc.
16/06/06 19:14:26 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/06/06 19:14:27 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/06/06 19:14:35 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/06/06 19:14:35 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
16/06/06 19:14:39 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/06/06 19:14:39 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
SQL context available as sqlContext.

scala>

七：Spark SQL 訪問MySQL

同樣以上述第一種方式啟動 spark-shell。

注意以下參數(shù)的書寫：

"url" -> "jdbc:mysql://192.168.226.129:3306/hive?user=hive&password=xujun",

-- (遠(yuǎn)程端連接mysql的url地址加用戶名與密碼企圖連接hive數(shù)據(jù)庫)

"dbtable" -> "hive.TBLS", (這里用的是 hive數(shù)據(jù)庫中原本存在的一張表 TBLS )

"driver" -> "com.mysql.jdbc.Driver" ( 驅(qū)動 )

7.1 第一種方式，通過 sqlContext.read.format("jdbc").options("xxxx") 加載數(shù)據(jù)， (中途產(chǎn)生了一個DataFrameReader對象，詳情可參見API)

val jdbcDF = sqlContext.read.format("jdbc").options( Map("url" -> "jdbc:mysql://192.168.226.129:3306/hive?user=hive&password=xujun", "dbtable" -> "hive.TBLS","driver" -> "com.mysql.jdbc.Driver")).load()

具體信息如下：

  scala> val jdbcDF = sqlContext.read.format("jdbc").options( Map("url" -> "jdbc:mysql://192.168.226.129:3306/hive?user=hive&password=xujun", "dbtable" -> "hive.TBLS","driver" -> "com.mysql.jdbc.Driver")).load()
jdbcDF: org.apache.spark.sql.DataFrame = [TBL_ID: bigint, CREATE_TIME: int, DB_ID: bigint, LAST_ACCESS_TIME: int, OWNER: string, RETENTION: int, SD_ID: bigint, TBL_NAME: string, TBL_TYPE: string, VIEW_EXPANDED_TEXT: string, VIEW_ORIGINAL_TEXT: string]

scala> jdbcDF.show()
+------+-----------+-----+----------------+------+---------+-----+-----------------+--------------+------------------+------------------+
|TBL_ID|CREATE_TIME|DB_ID|LAST_ACCESS_TIME| OWNER|RETENTION|SD_ID|         TBL_NAME|      TBL_TYPE|VIEW_EXPANDED_TEXT|VIEW_ORIGINAL_TEXT|
+------+-----------+-----+----------------+------+---------+-----+-----------------+--------------+------------------+------------------+
|    11| 1464510462|    1|               0|  hive|        0|   11|         testhive| MANAGED_TABLE|              null|              null|
|    22| 1464513715|    1|               0|hadoop|        0|   22|        hivehbase| MANAGED_TABLE|              null|              null|
|    23| 1464517000|    1|               0|hadoop|        0|   23|     hbase_person|EXTERNAL_TABLE|              null|              null|
|    24| 1464517563|    1|               0|hadoop|        0|   24|hivehbase_student|EXTERNAL_TABLE|              null|              null|
|    29| 1464521014|    1|               0|hadoop|        0|   29|     multiplehive| MANAGED_TABLE|              null|              null|
|    36| 1464522011|    1|               0|hadoop|        0|   36| hivehbase_person| MANAGED_TABLE|              null|              null|
|    41| 1465227955|    1|               0|hadoop|        0|   41|    testsparkhive| MANAGED_TABLE|              null|              null|
|    46| 1465264720|    1|               0|hadoop|        0|   46|        sparkhive| MANAGED_TABLE|              null|              null|
+------+-----------+-----+----------------+------+---------+-----+-----------------+--------------+------------------+------------------+

7.2 第二種方式，通過 sqlContext.load("jdbc","xxxx")來加載數(shù)據(jù)即：

val jdbcDF = sqlContext.load( "jdbc",Map("url" -> "jdbc:mysql://192.168.226.129:3306/hive?user=hive&password=xujun", "dbtable" -> "hive.TBLS","driver" -> "com.mysql.jdbc.Driver"))

顯示數(shù)據(jù)：

jdbcDF.show()

具體信息跟上述第一種方式一樣。

到此，相信大家對“Spark SQL如何訪問Hive和MySQL”有了更深的了解，不妨來實(shí)際操作一番吧！這里是億速云網(wǎng)站，更多相關(guān)內(nèi)容可以進(jìn)入相關(guān)頻道進(jìn)行查詢，關(guān)注我們，繼續(xù)學(xué)習(xí)！

向AI問一下細(xì)節(jié)

推薦閱讀：

免責(zé)聲明：本站發(fā)布的內(nèi)容（圖片、視頻和文字）以原創(chuàng)、轉(zhuǎn)載和分享為主，文章觀點(diǎn)不代表本網(wǎng)站立場，如果涉及侵權(quán)請聯(lián)系站長郵箱：is@yisu.com進(jìn)行舉報，并提供相關(guān)證據(jù)，一經(jīng)查實(shí)，將立刻刪除涉嫌侵權(quán)內(nèi)容。

上一篇新聞：
怎么用Docker運(yùn)行MySQL
下一篇新聞：
怎樣用Python進(jìn)行深度學(xué)習(xí)

猜你喜歡

AI
助
手

產(chǎn)品服務(wù)

地區(qū)劃分

專題活動

幫助支持

關(guān)于我們

售后咨詢

7*24小時在線電話：400-100-2938

7*24小時在線 QQ：800811969

關(guān)注億速云

億速云公眾號

手機(jī)網(wǎng)站二維碼