TensorFlow Serving在Kubernetes中怎么配置

發(fā)布時(shí)間：2021-12-20 10:22:57 來(lái)源：億速云閱讀：148 作者：iii 欄目：云計(jì)算

本篇內(nèi)容介紹了“TensorFlow Serving在Kubernetes中怎么配置”的有關(guān)知識(shí)，在實(shí)際案例的操作過(guò)程中，不少人都會(huì)遇到這樣的困境，接下來(lái)就讓小編帶領(lǐng)大家學(xué)習(xí)一下如何處理這些情況吧！希望大家仔細(xì)閱讀，能夠?qū)W有所成！

關(guān)于TensorFlow Serving

下面是TensorFlow Serving的架構(gòu)圖：

關(guān)于TensorFlow Serving的更多基礎(chǔ)概念等知識(shí)，請(qǐng)看官方文檔，翻譯的再好也不如原文寫的好。

這里，我總結(jié)了下面一些知識(shí)點(diǎn)，我認(rèn)為是比較重要的：

TensorFlow Serving通過(guò)Model Version Policy來(lái)配置多個(gè)模型的多個(gè)版本同時(shí)serving；
默認(rèn)只加載model的latest version；
支持基于文件系統(tǒng)的模型自動(dòng)發(fā)現(xiàn)和加載；
請(qǐng)求處理延遲低；
無(wú)狀態(tài)，支持橫向擴(kuò)展；
可以使用A/B測(cè)試不同Version Model；
支持從本地文件系統(tǒng)掃描和加載TensorFlow模型；
支持從HDFS掃描和加載TensorFlow模型；
提供了用于client調(diào)用的gRPC接口；

TensorFlow Serving配置

當(dāng)我翻遍整個(gè)TensorFlow Serving的官方文檔，我還是沒(méi)找到一個(gè)完整的model config是怎么配置的，很沮喪。沒(méi)辦法，發(fā)展太快了，文檔跟不上太正常，只能擼代碼了。

在model_servers的main方法中，我們看到tensorflow_model_server的完整配置項(xiàng)及說(shuō)明如下：

tensorflow_serving/model_servers/main.cc#L314

int main(int argc, char** argv) {
...
    std::vector<tensorflow::Flag> flag_list = {
        tensorflow::Flag("port", &port, "port to listen on"),
        tensorflow::Flag("enable_batching", &enable_batching, "enable batching"),
        tensorflow::Flag("batching_parameters_file", &batching_parameters_file,
                       "If non-empty, read an ascii BatchingParameters "
                       "protobuf from the supplied file name and use the "
                       "contained values instead of the defaults."),
        tensorflow::Flag("model_config_file", &model_config_file,
                       "If non-empty, read an ascii ModelServerConfig "
                       "protobuf from the supplied file name, and serve the "
                       "models in that file. This config file can be used to "
                       "specify multiple models to serve and other advanced "
                       "parameters including non-default version policy. (If "
                       "used, --model_name, --model_base_path are ignored.)"),
        tensorflow::Flag("model_name", &model_name,
                       "name of model (ignored "
                       "if --model_config_file flag is set"),
        tensorflow::Flag("model_base_path", &model_base_path,
                       "path to export (ignored if --model_config_file flag "
                       "is set, otherwise required)"),
        tensorflow::Flag("file_system_poll_wait_seconds",
                       &file_system_poll_wait_seconds,
                       "interval in seconds between each poll of the file "
                       "system for new model version"),
        tensorflow::Flag("tensorflow_session_parallelism",
                       &tensorflow_session_parallelism,
                       "Number of threads to use for running a "
                       "Tensorflow session. Auto-configured by default."
                       "Note that this option is ignored if "
                       "--platform_config_file is non-empty."),
        tensorflow::Flag("platform_config_file", &platform_config_file,
                       "If non-empty, read an ascii PlatformConfigMap protobuf "
                       "from the supplied file name, and use that platform "
                       "config instead of the Tensorflow platform. (If used, "
                       "--enable_batching is ignored.)")};
...
}

因此，我們看到關(guān)于model version config的配置，全部在--model_config_file中進(jìn)行配置，下面是model config的完整結(jié)構(gòu)：

tensorflow_serving/config/model_server_config.proto#L55

// Common configuration for loading a model being served.
message ModelConfig {
  // Name of the model.
  string name = 1;

  // Base path to the model, excluding the version directory.
  // E.g> for a model at /foo/bar/my_model/123, where 123 is the version, the
  // base path is /foo/bar/my_model.
  //
  // (This can be changed once a model is in serving, *if* the underlying data
  // remains the same. Otherwise there are no guarantees about whether the old
  // or new data will be used for model versions currently loaded.)
  string base_path = 2;

  // Type of model.
  // TODO(b/31336131): DEPRECATED. Please use 'model_platform' instead.
  ModelType model_type = 3 [deprecated = true];

  // Type of model (e.g. "tensorflow").
  //
  // (This cannot be changed once a model is in serving.)
  string model_platform = 4;

  reserved 5;

  // Version policy for the model indicating how many versions of the model to
  // be served at the same time.
  // The default option is to serve only the latest version of the model.
  //
  // (This can be changed once a model is in serving.)
  FileSystemStoragePathSourceConfig.ServableVersionPolicy model_version_policy =
      7;

  // Configures logging requests and responses, to the model.
  //
  // (This can be changed once a model is in serving.)
  LoggingConfig logging_config = 6;
}

我們看到了model_version_policy，那便是我們要找的配置,它的定義如下：

tensorflow_serving/sources/storage_path/file_system_storage_path_source.proto

message ServableVersionPolicy {
    // Serve the latest versions (i.e. the ones with the highest version
    // numbers), among those found on disk.
    //
    // This is the default policy, with the default number of versions as 1.
    message Latest {
      // Number of latest versions to serve. (The default is 1.)
      uint32 num_versions = 1;
    }

    // Serve all versions found on disk.
    message All {
    }

    // Serve a specific version (or set of versions).
    //
    // This policy is useful for rolling back to a specific version, or for
    // canarying a specific version while still serving a separate stable
    // version.
    message Specific {
      // The version numbers to serve.
      repeated int64 versions = 1;
    }
}

因此model_version_policy目前支持三種選項(xiàng)：

all: {} 表示加載所有發(fā)現(xiàn)的model；
latest: { num_versions: n } 表示只加載最新的那n個(gè)model，也是默認(rèn)選項(xiàng)；
specific: { versions: m } 表示只加載指定versions的model，通常用來(lái)測(cè)試；

因此，通過(guò)tensorflow_model_server —port=9000 —model_config_file=<file>啟動(dòng)時(shí)，一個(gè)完整的model_config_file格式可參考如下：

model_config_list: {
	config: {
		name: "mnist",
		base_path: "/tmp/monitored/_model",mnist
		model_platform: "tensorflow",
		model_version_policy: {
		   all: {}
		}
	},
	config: {
		name: "inception",
		base_path: "/tmp/monitored/inception_model",
		model_platform: "tensorflow",
		model_version_policy: {
		   latest: {
		   	num_versions: 2
		   }
		}
	},
	config: {
		name: "mxnet",
		base_path: "/tmp/monitored/mxnet_model",
		model_platform: "tensorflow",
		model_version_policy: {
		   specific: {
		   	versions: 1
		   }
		}
	}
}

TensorFlow Serving編譯

其實(shí)TensorFlow Serving的編譯安裝，在github setup文檔中已經(jīng)寫的比較清楚了，在這里我只想強(qiáng)調(diào)一點(diǎn)，而且是非常重要的一點(diǎn),就是文檔中提到的：

Optimized build

It's possible to compile using some platform specific instruction sets (e.g. AVX) that can significantly improve performance. Wherever you see 'bazel build' in the documentation, you can add the flags -c opt --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-O3 (or some subset of these flags). For example:

bazel build -c opt --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-O3 tensorflow_serving/...
Note: These instruction sets are not available on all machines, especially with older processors, so it may not work with all flags. You can try some subset of them, or revert to just the basic '-c opt' which is guaranteed to work on all machines.

這很重要，開始的時(shí)候我們并沒(méi)有加上對(duì)應(yīng)的copt選項(xiàng)進(jìn)行編譯，測(cè)試發(fā)現(xiàn)這樣編譯出來(lái)的tensorflow_model_server的性能是很差的（至少不能滿足我們的要求），client并發(fā)請(qǐng)求tensorflow serving的延遲很高(基本上所有請(qǐng)求延遲都大于100ms)。加上這些copt選項(xiàng)時(shí)，對(duì)同樣的model進(jìn)行同樣并發(fā)測(cè)試，結(jié)果99.987%的延遲都在50ms以內(nèi)，對(duì)比懸殊。

關(guān)于使用--copt=O2還是O3及其含義，請(qǐng)看gcc optimizers的說(shuō)明，這里不作討論。（因?yàn)槲乙膊欢?..）

那么，是不是都是按照官方給出的一模一樣的copt選項(xiàng)進(jìn)行編譯呢？答案是否定的！這取決于你運(yùn)行TensorFlow Serving的服務(wù)器的cpu配置，通過(guò)查看/proc/cpuinfo可知道你該用的編譯copt配置項(xiàng)：

TensorFlow Serving在Kubernetes中怎么配置

使用注意事項(xiàng)

由于TensorFlow支持同時(shí)serve多個(gè)model的多個(gè)版本，因此建議client在gRPC調(diào)用時(shí)盡量指明想調(diào)用的model和version，因?yàn)椴煌膙ersion對(duì)應(yīng)的model不同，得到的預(yù)測(cè)值也可能大不相同。
將訓(xùn)練好的模型復(fù)制導(dǎo)入到model base path時(shí)，盡量先壓縮成tar包，復(fù)制到base path后再解壓。因?yàn)槟Ｐ秃艽?，?fù)制過(guò)程需要耗費(fèi)一些時(shí)間，這可能會(huì)導(dǎo)致導(dǎo)出的模型文件已復(fù)制，但相應(yīng)的meta文件還沒(méi)復(fù)制，此時(shí)如果TensorFlow Serving開始加載這個(gè)模型，并且無(wú)法檢測(cè)到meta文件，那么服務(wù)器將無(wú)法成功加載該模型，并且會(huì)停止嘗試再次加載該版本。
如果你使用的protobuf version <= 3.2.0,那么請(qǐng)注意TensorFlow Serving只能加載不超過(guò)64MB大小的model?？梢酝ㄟ^(guò)命令 pip list | grep proto查看到probtobuf version。我的環(huán)境是使用3.5.0 post1，不存在這個(gè)問(wèn)題，請(qǐng)你留意。更多請(qǐng)查看issue 582。
官方宣稱支持通過(guò)gRPC接口動(dòng)態(tài)更改model_config_list,但實(shí)際上你需要開發(fā)custom resource才行，意味著不是開箱即用的?？沙掷m(xù)關(guān)注issue 380。

TensorFlow Serving on Kubernetes

將TensorFlow Serving以Deployment方式部署到Kubernetes中，下面是對(duì)應(yīng)的Deployment yaml：

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: tensorflow-serving
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: "tensorflow-serving"
    spec:
      restartPolicy: Always
      imagePullSecrets:
      - name: harborsecret
      containers:
      - name: tensorflow-serving
        image: registry.vivo.xyz:4443/bigdata_release/tensorflow_serving1.3.0:v0.5
        command: ["/bin/sh", "-c","export CLASSPATH=.:/usr/lib/jvm/java-1.8.0/lib/tools.jar:$(/usr/lib/hadoop-2.6.1/bin/hadoop classpath --glob); /root/tensorflow_model_server --port=8900 --model_name=test_model --model_base_path=hdfs://xx.xx.xx.xx:zz/data/serving_model"]
        ports:
        - containerPort: 8900

“TensorFlow Serving在Kubernetes中怎么配置”的內(nèi)容就介紹到這里了，感謝大家的閱讀。如果想了解更多行業(yè)相關(guān)的知識(shí)可以關(guān)注億速云網(wǎng)站，小編將為大家輸出更多高質(zhì)量的實(shí)用文章！

向AI問(wèn)一下細(xì)節(jié)

TensorFlow Serving在Kubernetes中怎么配置

關(guān)于TensorFlow Serving

TensorFlow Serving配置

TensorFlow Serving編譯

使用注意事項(xiàng)

TensorFlow Serving on Kubernetes

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標(biāo)簽