<legend id="l7mb8"><td id="l7mb8"></td></legend>

<small id="l7mb8"></small>

<sub id="l7mb8"></sub>

<noframes id="l7mb8">

溫馨提示×

溫馨提示×

您好，登錄后才能下訂單哦！

密碼登錄×

忘記密碼？

登錄注冊×

獲取短信驗證碼

其他方式登錄

點擊登錄注冊即表示同意《億速云用戶服務(wù)條款》

用戶登錄×

賬戶密碼登錄

請使用微信掃描上方二維碼

使用幫助

請求超時！

請點擊重新獲取二維碼

Spark submit依賴包管理！

發(fā)布時間：2020-07-31 05:25:52 來源：網(wǎng)絡(luò) 閱讀：5045 作者：moviebat 欄目：大數(shù)據(jù)

Spark submit依賴包管理！

使用spark-submit時，應(yīng)用程序的jar包以及通過—jars選項包含的任意jar文件都會被自動傳到集群中。

spark-submit --class --master --jars

Spark使用了下面的URL格式允許不同的jar包分發(fā)策略。

1、文件file方式:

絕對路徑且file:/URIs是作為driver的HTTP文件服務(wù)器，且每個executor會從driver的HTTP服務(wù)器拉取文件；

2、hdfs方式:

http:,https:,ftp:，從這些給定的URI中拉取文件和JAR包；

3、本地local方式：

以local:/開始的URI應(yīng)該是每個worker節(jié)點的本地文件，這意味著沒有網(wǎng)絡(luò)IO開銷，并且推送或通過NFS/GlusterFS等共享到每個worker大文件/JAR文件或能很好的工作。

注意：每個SparkContext的JAR包和文件都會被復(fù)制到executor節(jié)點的工作目錄下，這將用掉大量的空間，然后還需要清理干凈。

在YARN下，清理是自動進(jìn)行的。在Spark Standalone下，自動清理可以通過配置spark.worker.cleanup.appDataTtl屬性做到，此配置屬性的默認(rèn)值是7*24*3600。

用戶可以用--packages選項提供一個以逗號分隔的maven清單來包含任意其他依賴。

其它的庫（或SBT中的resolvers）可以用--repositories選項添加（同樣用逗號分隔），這些命令都可以用在pyspark,spark-shell和spark-submit中來包含一些Spark包。

對Python而言，--py-files選項可以用來向executors分發(fā).egg,.zip和.py庫。

源碼走讀：

1、

object SparkSubmit

2、

appArgs.{
  SparkSubmitAction.=> (appArgs)
  SparkSubmitAction.=> (appArgs)
  SparkSubmitAction.=> (appArgs)
}

3、

(args: SparkSubmitArguments): = {
  (childArgschildClasspathsysPropschildMainClass) = (args)

  (): = {
    (args.!= ) {
      proxyUser = UserGroupInformation.createProxyUser(args.UserGroupInformation.getCurrentUser())
      {
        proxyUser.doAs(PrivilegedExceptionAction[]() {
          (): = {
            (childArgschildClasspathsysPropschildMainClassargs.)
          }
        })

4、

(jar <- childClasspath) {
  (jarloader)
}

5、

(localJar: loader: MutableURLClassLoader) {
  uri = Utils.(localJar)
  uri.getScheme {
    | =>
      file = File(uri.getPath)
      (file.exists()) {
        loader.addURL(file.toURI.toURL)
      } {
        (file)
      }
    _ =>
      (uri)
  }
}

之后線索就斷了，回歸到j(luò)ava的class類調(diào)用jar包。

6、誰調(diào)用,executor。

(newFiles: HashMap[]newJars: HashMap[]) {
  hadoopConf = SparkHadoopUtil..newConfiguration()
  synchronized {
    ((nametimestamp) <- newFiles .getOrElse(name-) < timestamp) {
      logInfo(+ name + + timestamp)
      Utils.(nameFile(SparkFiles.())env.securityManagerhadoopConftimestampuseCache = !isLocal)
      (name) = timestamp
    }
    ((nametimestamp) <- newJars) {
      localName = name.split().last
      currentTimeStamp = .get(name)
        .orElse(.get(localName))
        .getOrElse(-)
      (currentTimeStamp < timestamp) {
        logInfo(+ name + + timestamp)
        Utils.(nameFile(SparkFiles.())env.securityManagerhadoopConftimestampuseCache = !isLocal)
        (name) = timestamp
        url = File(SparkFiles.()localName).toURI.toURL
        (!.getURLs().contains(url)) {
          logInfo(+ url + )
          .addURL(url)
        }
      }
    }
  }
}

Utils.fetchFile方法，進(jìn)入

 /**
* Download a file or directory to target directory. Supports fetching the file in a variety of
* ways, including HTTP, Hadoop-compatible filesystems, and files on a standard filesystem, based
* on the URL parameter. Fetching directories is only supported from Hadoop-compatible
* filesystems.
*
* If `useCache` is true, first attempts to fetch the file to a local cache that's shared
* across executors running the same application. `useCache` is used mainly for
* the executors, and not in local mode.
*
* Throws SparkException if the target file already exists and has different contents than
* the requested file.
*/

(!cachedFile.exists()) {
  (urllocalDircachedFileNameconfsecurityMgrhadoopConf)
}

可見，支持本地files，Hadoop的hdfs，還有http格式的文件。

其中目錄目前支持hdfs！

完畢！

向AI問一下細(xì)節(jié)

推薦閱讀：

免責(zé)聲明：本站發(fā)布的內(nèi)容（圖片、視頻和文字）以原創(chuàng)、轉(zhuǎn)載和分享為主，文章觀點不代表本網(wǎng)站立場，如果涉及侵權(quán)請聯(lián)系站長郵箱：is@yisu.com進(jìn)行舉報，并提供相關(guān)證據(jù)，一經(jīng)查實，將立刻刪除涉嫌侵權(quán)內(nèi)容。

上一篇新聞：
搭建獨立DC域控制器、DNS服務(wù)器
下一篇新聞：
數(shù)據(jù)庫基本使用

猜你喜歡

AI
助
手

產(chǎn)品服務(wù)

地區(qū)劃分

專題活動

幫助支持

關(guān)于我們

售后咨詢

7*24小時在線電話：400-100-2938

7*24小時在線 QQ：800811969

關(guān)注億速云

億速云公眾號

手機(jī)網(wǎng)站二維碼