Fluid的自定義彈性伸縮是怎樣的

發(fā)布時(shí)間：2021-12-01 17:10:24 來源：億速云閱讀：109 作者：柒染欄目：云計(jì)算

這期內(nèi)容當(dāng)中小編將會(huì)給大家?guī)碛嘘P(guān)Fluid的自定義彈性伸縮是怎樣的，文章內(nèi)容豐富且以專業(yè)的角度為大家分析和敘述，閱讀完這篇文章希望大家可以有所收獲。

**導(dǎo)讀：**彈性伸縮作為 Kubernetes 的核心能力之一，但它一直是圍繞這無狀態(tài)的應(yīng)用負(fù)載展開。而 Fluid 提供了分布式緩存的彈性伸縮能力，可以靈活擴(kuò)充和收縮數(shù)據(jù)緩存。它基于 Runtime 提供了緩存空間、現(xiàn)有緩存比例等性能指標(biāo), 結(jié)合自身對(duì)于 Runtime 資源的擴(kuò)縮容能力，提供數(shù)據(jù)緩存按需伸縮能力。

背景

隨著越來越多的大數(shù)據(jù)和 AI 等數(shù)據(jù)密集應(yīng)用開始部署和運(yùn)行在 Kubernetes 環(huán)境下，數(shù)據(jù)密集型應(yīng)用計(jì)算框架的設(shè)計(jì)理念和云原生靈活的應(yīng)用編排的分歧，導(dǎo)致了數(shù)據(jù)訪問和計(jì)算瓶頸。云原生數(shù)據(jù)編排引擎 Fluid 通過數(shù)據(jù)集的抽象，利用分布式緩存技術(shù)，結(jié)合調(diào)度器，為應(yīng)用提供了數(shù)據(jù)訪問加速的能力。

Fluid的自定義彈性伸縮是怎樣的

彈性伸縮作為 Kubernetes 的核心能力之一，但它一直是圍繞這無狀態(tài)的應(yīng)用負(fù)載展開。而 Fluid 提供了分布式緩存的彈性伸縮能力，可以靈活擴(kuò)充和收縮數(shù)據(jù)緩存。它基于 Runtime 提供了緩存空間、現(xiàn)有緩存比例等性能指標(biāo), 結(jié)合自身對(duì)于 Runtime 資源的擴(kuò)縮容能力，提供數(shù)據(jù)緩存按需伸縮能力。

這個(gè)能力對(duì)于互聯(lián)網(wǎng)場(chǎng)景下大數(shù)據(jù)應(yīng)用非常重要，由于多數(shù)的大數(shù)據(jù)應(yīng)用都是通過端到端流水線來實(shí)現(xiàn)的。而這個(gè)流水線包含以下幾個(gè)步驟：

數(shù)據(jù)提取：利用 Spark，MapReduce 等大數(shù)據(jù)技術(shù)對(duì)于原始數(shù)據(jù)進(jìn)行預(yù)處理。
模型訓(xùn)練：利用第一階段生成特征數(shù)據(jù)進(jìn)行機(jī)器學(xué)習(xí)模型訓(xùn)練，并且生成相應(yīng)的模型。
模型評(píng)估：通過測(cè)試集或者驗(yàn)證集對(duì)于第二階段生成模型進(jìn)行評(píng)估和測(cè)試。
模型推理：第三階段驗(yàn)證后的模型最終推送到線上為業(yè)務(wù)提供推理服務(wù)。

Fluid的自定義彈性伸縮是怎樣的

可以看到端到端的流水線會(huì)包含多種不同類型的計(jì)算任務(wù)，針對(duì)每一個(gè)計(jì)算任務(wù)，實(shí)踐中會(huì)有合適的專業(yè)系統(tǒng)來處理（TensorFlow，PyTorch，Spark， Presto）；但是這些系統(tǒng)彼此獨(dú)立，通常要借助外部文件系統(tǒng)來實(shí)現(xiàn)把數(shù)據(jù)從一個(gè)階段傳遞到下一個(gè)階段。但是頻繁的使用文件系統(tǒng)實(shí)現(xiàn)數(shù)據(jù)交換，會(huì)帶來大量的 I/O 開銷，經(jīng)常會(huì)成為整個(gè)工作流的瓶頸。

而 Fluid 對(duì)于這個(gè)場(chǎng)景非常適合，用戶可以創(chuàng)建一個(gè) Dataset 對(duì)象，這個(gè)對(duì)象有能力將數(shù)據(jù)分散緩存到 Kubernetes 計(jì)算節(jié)點(diǎn)中，作為數(shù)據(jù)交換的介質(zhì)，這樣避免了數(shù)據(jù)的遠(yuǎn)程寫入和讀取，提升了數(shù)據(jù)使用的效率。但是這里的問題是臨時(shí)數(shù)據(jù)緩存的資源預(yù)估和預(yù)留。由于在數(shù)據(jù)生產(chǎn)消費(fèi)之前，精確的數(shù)據(jù)量預(yù)估是比較難滿足，過高的預(yù)估會(huì)導(dǎo)致資源預(yù)留浪費(fèi)，過低的預(yù)估會(huì)導(dǎo)致數(shù)據(jù)寫入失敗可能性增高。還是按需擴(kuò)縮容對(duì)于使用者更加友好。我們希望能夠達(dá)成類似 page cache 的使用效果，對(duì)于最終用戶來說這一層是透明的但是它帶來的緩存加速效果是實(shí)實(shí)在在的。

我們通過自定義 HPA 機(jī)制，通過 Fluid 引入了緩存彈性伸縮能力。彈性伸縮的條件是當(dāng)已有緩存數(shù)據(jù)量達(dá)到一定比例時(shí)，就會(huì)觸發(fā)彈性擴(kuò)容，擴(kuò)容緩存空間。例如將觸發(fā)條件設(shè)置為緩存空間占比超過 75%，此時(shí)總的緩存空間為 10G，當(dāng)數(shù)據(jù)已經(jīng)占滿到 8G 緩存空間的時(shí)候，就會(huì)觸發(fā)擴(kuò)容機(jī)制。

下面我們通過一個(gè)例子幫助您體驗(yàn) Fluid 的自動(dòng)擴(kuò)縮容能力。

前提條件

推薦使用 Kubernetes 1.18 以上，因?yàn)樵?1.18 之前，HPA 是無法自定義擴(kuò)縮容策略的，都是通過硬編碼實(shí)現(xiàn)的。而在 1.18 后，用戶可以自定義擴(kuò)縮容策略的，比如可以定義一次擴(kuò)容后的冷卻時(shí)間。

具體步驟

1. 安裝 jq 工具方便解析 json。

在本例子中我們使用操作系統(tǒng)是 centos，可以通過 yum 安裝 jq。

yum install -y jq

2. 下載、安裝 Fluid 最新版。

git clone https://github.com/fluid-cloudnative/fluid.git
cd fluid/charts
kubectl create ns fluid-system
helm install fluid fluid

3. 部署或配置 Prometheus。

這里通過 Prometheus 對(duì)于 AlluxioRuntime 的緩存引擎暴露的 Metrics 進(jìn)行收集，如果集群內(nèi)無 prometheus：

$ cd fluid
$ kubectl apply -f integration/prometheus/prometheus.yaml

如集群內(nèi)有 prometheus，可將以下配置寫到 prometheus 配置文件中：

scrape_configs:
  - job_name: 'alluxio runtime'
    metrics_path: /metrics/prometheus
    kubernetes_sd_configs:
      - role: endpoints
    relabel_configs:
    - source_labels: [__meta_kubernetes_service_label_monitor]
      regex: alluxio_runtime_metrics
      action: keep
    - source_labels: [__meta_kubernetes_endpoint_port_name]
      regex: web
      action: keep
    - source_labels: [__meta_kubernetes_namespace]
      target_label: namespace
      replacement: $1
      action: replace
    - source_labels: [__meta_kubernetes_service_label_release]
      target_label: fluid_runtime
      replacement: $1
      action: replace
    - source_labels: [__meta_kubernetes_endpoint_address_target_name]
      target_label: pod
      replacement: $1
      action: replace

4. 驗(yàn)證 Prometheus 安裝成功。

$ kubectl get ep -n kube-system  prometheus-svc
NAME             ENDPOINTS        AGE
prometheus-svc   10.76.0.2:9090   6m49s
$ kubectl get svc -n kube-system prometheus-svc
NAME             TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
prometheus-svc   NodePort   172.16.135.24   <none>        9090:32114/TCP   2m7s

如果希望可視化監(jiān)控指標(biāo)，您可以安裝 Grafana 驗(yàn)證監(jiān)控?cái)?shù)據(jù)，具體操作可以參考文檔。

Fluid的自定義彈性伸縮是怎樣的

5. 部署 metrics server。

檢查該集群是否包括 metrics-server，執(zhí)行kubectl top node有正確輸出可以顯示內(nèi)存和 CPU，則該集群 metrics server 配置正確。

kubectl top node
NAME                       CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
192.168.1.204   93m          2%     1455Mi          10%
192.168.1.205   125m         3%     1925Mi          13%
192.168.1.206   96m          2%     1689Mi          11%

否則手動(dòng)執(zhí)行以下命令：

kubectl create -f integration/metrics-server

6. 部署 custom-metrics-api 組件。

為了基于自定義指標(biāo)進(jìn)行擴(kuò)展，你需要擁有兩個(gè)組件：

第一個(gè)組件是從應(yīng)用程序收集指標(biāo)并將其存儲(chǔ)到 Prometheus 時(shí)間序列數(shù)據(jù)庫(kù)。
第二個(gè)組件使用收集的度量指標(biāo)來擴(kuò)展 Kubernetes 自定義 metrics API，即 k8s-prometheus-adapter。

第一個(gè)組件在第三步部署完成，下面部署第二個(gè)組件。

如果已經(jīng)配置了custom-metrics-api，在 adapter 的 configmap 配置中增加與 dataset 相關(guān)的配置：

apiVersion: v1
kind: ConfigMap
metadata:
  name: adapter-config
  namespace: monitoring
data:
  config.yaml: |
    rules:
    - seriesQuery: '{__name__=~"Cluster_(CapacityTotal|CapacityUsed)",fluid_runtime!="",instance!="",job="alluxio runtime",namespace!="",pod!=""}'
      seriesFilters:
      - is: ^Cluster_(CapacityTotal|CapacityUsed)$
      resources:
        overrides:
          namespace:
            resource: namespace
          pod:
            resource: pods
          fluid_runtime:
            resource: datasets
      name:
        matches: "^(.*)"
        as: "capacity_used_rate"
      metricsQuery: ceil(Cluster_CapacityUsed{<<.LabelMatchers>>}*100/(Cluster_CapacityTotal{<<.LabelMatchers>>}))

否則手動(dòng)執(zhí)行以下命令：

kubectl create -f integration/custom-metrics-api/namespace.yaml
kubectl create -f integration/custom-metrics-api

注意：因?yàn)?custom-metrics-api 對(duì)接集群中的 Prometheous 的訪問地址，請(qǐng)?zhí)鎿Q prometheous url 為你真正使用的 Prometheous 地址。

檢查自定義指標(biāo)：

$ kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq
{
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "custom.metrics.k8s.io/v1beta1",
  "resources": [
    {
      "name": "pods/capacity_used_rate",
      "singularName": "",
      "namespaced": true,
      "kind": "MetricValueList",
      "verbs": [
        "get"
      ]
    },
    {
      "name": "datasets.data.fluid.io/capacity_used_rate",
      "singularName": "",
      "namespaced": true,
      "kind": "MetricValueList",
      "verbs": [
        "get"
      ]
    },
    {
      "name": "namespaces/capacity_used_rate",
      "singularName": "",
      "namespaced": false,
      "kind": "MetricValueList",
      "verbs": [
        "get"
      ]
    }
  ]
}

7. 提交測(cè)試使用的 Dataset。

$ cat<<EOF >dataset.yaml
apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: spark
spec:
  mounts:
    - mountPoint: https://mirrors.bit.edu.cn/apache/spark/
      name: spark
---
apiVersion: data.fluid.io/v1alpha1
kind: AlluxioRuntime
metadata:
  name: spark
spec:
  replicas: 1
  tieredstore:
    levels:
      - mediumtype: MEM
        path: /dev/shm
        quota: 1Gi
        high: "0.99"
        low: "0.7"
  properties:
    alluxio.user.streaming.data.timeout: 300sec
EOF
$ kubectl create -f dataset.yaml
dataset.data.fluid.io/spark created
alluxioruntime.data.fluid.io/spark created

8. 查看這個(gè) Dataset 是否處于可用狀態(tài)。

可以看到該數(shù)據(jù)集的數(shù)據(jù)總量為 2.71GiB，目前 Fluid 提供的緩存節(jié)點(diǎn)數(shù)為 1，可以提供的最大緩存能力為 1GiB。此時(shí)數(shù)據(jù)量是無法滿足全量數(shù)據(jù)緩存的需求。

$ kubectl get dataset
NAME    UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
spark   2.71GiB          0.00B    1.00GiB          0.0%                Bound   7m38s

9. 當(dāng)該 Dataset 處于可用狀態(tài)后，查看是否已經(jīng)可以從 custom-metrics-api 獲得監(jiān)控指標(biāo)。

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/datasets.data.fluid.io/*/capacity_used_rate" | jq
{
  "kind": "MetricValueList",
  "apiVersion": "custom.metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/datasets.data.fluid.io/%2A/capacity_used_rate"
  },
  "items": [
    {
      "describedObject": {
        "kind": "Dataset",
        "namespace": "default",
        "name": "spark",
        "apiVersion": "data.fluid.io/v1alpha1"
      },
      "metricName": "capacity_used_rate",
      "timestamp": "2021-04-04T07:24:52Z",
      "value": "0"
    }
  ]
}

10. 創(chuàng)建 HPA 任務(wù)。

$ cat<<EOF > hpa.yaml
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: spark
spec:
  scaleTargetRef:
    apiVersion: data.fluid.io/v1alpha1
    kind: AlluxioRuntime
    name: spark
  minReplicas: 1
  maxReplicas: 4
  metrics:
  - type: Object
    object:
      metric:
        name: capacity_used_rate
      describedObject:
        apiVersion: data.fluid.io/v1alpha1
        kind: Dataset
        name: spark
      target:
        type: Value
        value: "90"
  behavior:
    scaleUp:
      policies:
      - type: Pods
        value: 2
        periodSeconds: 600
    scaleDown:
      selectPolicy: Disabled
EOF

首先，我們解讀一下從樣例配置，這里主要有兩部分一個(gè)是擴(kuò)縮容的規(guī)則，另一個(gè)是擴(kuò)縮容的靈敏度：

規(guī)則：觸發(fā)擴(kuò)容行為的條件為 Dataset 對(duì)象的緩存數(shù)據(jù)量占總緩存能力的 90%；擴(kuò)容對(duì)象為AlluxioRuntime，最小副本數(shù)為 1，最大副本數(shù)為 4；而 Dataset 和 AlluxioRuntime 的對(duì)象需要在同一個(gè) namespace。
策略：可以 K8s 1.18 以上的版本，可以分別針對(duì)擴(kuò)容和縮容場(chǎng)景設(shè)置穩(wěn)定時(shí)間和一次擴(kuò)縮容步長(zhǎng)比例。比如在本例子, 一次擴(kuò)容周期為 10 分鐘（periodSeconds），擴(kuò)容時(shí)新增 2 個(gè)副本數(shù)，當(dāng)然這也不可以超過 maxReplicas 的限制；而完成一次擴(kuò)容后，冷卻時(shí)間（stabilizationWindowSeconds）為 20 分鐘；而縮容策略可以選擇直接關(guān)閉。

11. 查看 HPA 配置，當(dāng)前緩存空間的數(shù)據(jù)占比為 0。遠(yuǎn)遠(yuǎn)低于觸發(fā)擴(kuò)容的條件。

$ kubectl get hpa
NAME    REFERENCE              TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
spark   AlluxioRuntime/spark   0/90      1         4         1          33s
$ kubectl describe hpa
Name:                                                    spark
Namespace:                                               default
Labels:                                                  <none>
Annotations:                                             <none>
CreationTimestamp:                                       Wed, 07 Apr 2021 17:36:39 +0800
Reference:                                               AlluxioRuntime/spark
Metrics:                                                 ( current / target )
  "capacity_used_rate" on Dataset/spark (target value):  0 / 90
Min replicas:                                            1
Max replicas:                                            4
Behavior:
  Scale Up:
    Stabilization Window: 0 seconds
    Select Policy: Max
    Policies:
      - Type: Pods  Value: 2  Period: 600 seconds
  Scale Down:
    Select Policy: Disabled
    Policies:
      - Type: Percent  Value: 100  Period: 15 seconds
AlluxioRuntime pods:   1 current / 1 desired
Conditions:
  Type            Status  Reason               Message
  ----            ------  ------               -------
  AbleToScale     True    ScaleDownStabilized  recent recommendations were higher than current one, applying the highest recent recommendation
  ScalingActive   True    ValidMetricFound     the HPA was able to successfully calculate a replica count from Dataset metric capacity_used_rate
  ScalingLimited  False   DesiredWithinRange   the desired count is within the acceptable range
Events:           <none>

12. 創(chuàng)建數(shù)據(jù)預(yù)熱任務(wù)。

$ cat<<EOF > dataload.yaml
apiVersion: data.fluid.io/v1alpha1
kind: DataLoad
metadata:
  name: spark
spec:
  dataset:
    name: spark
    namespace: default
EOF
$ kubectl create -f dataload.yaml
$ kubectl get dataload
NAME    DATASET   PHASE       AGE   DURATION
spark   spark     Executing   15s   Unfinished

13. 此時(shí)可以發(fā)現(xiàn)緩存的數(shù)據(jù)量接近了 Fluid 可以提供的緩存能力（1GiB）同時(shí)觸發(fā)了彈性伸縮的條件。

$  kubectl  get dataset
NAME    UFS TOTAL SIZE   CACHED       CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
spark   2.71GiB          1020.92MiB   1.00GiB          36.8%               Bound   5m15s

從 HPA 的監(jiān)控，可以看到 Alluxio Runtime 的擴(kuò)容已經(jīng)開始, 可以發(fā)現(xiàn)擴(kuò)容的步長(zhǎng)為 2。

$ kubectl get hpa
NAME    REFERENCE              TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
spark   AlluxioRuntime/spark   100/90    1         4         2          4m20s
$ kubectl describe hpa
Name:                                                    spark
Namespace:                                               default
Labels:                                                  <none>
Annotations:                                             <none>
CreationTimestamp:                                       Wed, 07 Apr 2021 17:56:31 +0800
Reference:                                               AlluxioRuntime/spark
Metrics:                                                 ( current / target )
  "capacity_used_rate" on Dataset/spark (target value):  100 / 90
Min replicas:                                            1
Max replicas:                                            4
Behavior:
  Scale Up:
    Stabilization Window: 0 seconds
    Select Policy: Max
    Policies:
      - Type: Pods  Value: 2  Period: 600 seconds
  Scale Down:
    Select Policy: Disabled
    Policies:
      - Type: Percent  Value: 100  Period: 15 seconds
AlluxioRuntime pods:   2 current / 3 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    SucceededRescale    the HPA controller was able to update the target scale to 3
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from Dataset metric capacity_used_rate
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:
  Type     Reason                        Age                    From                       Message
  ----     ------                        ----                   ----                       -------
  Normal   SuccessfulRescale             21s                    horizontal-pod-autoscaler  New size: 2; reason: Dataset metric capacity_used_rate above target
  Normal   SuccessfulRescale             6s                     horizontal-pod-autoscaler  New size: 3; reason: Dataset metric capacity_used_rate above target

14. 在等待一段時(shí)間之后發(fā)現(xiàn)數(shù)據(jù)集的緩存空間由 1GiB 提升到了 3GiB，數(shù)據(jù)緩存已經(jīng)接近完成。

$ kubectl  get dataset
NAME    UFS TOTAL SIZE   CACHED    CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
spark   2.71GiB          2.59GiB   3.00GiB          95.6%               Bound   12m

同時(shí)觀察 HPA 的狀態(tài)，可以發(fā)現(xiàn)此時(shí) Dataset 對(duì)應(yīng)的 runtime 的 replicas 數(shù)量為 3，已經(jīng)使用的緩存空間比例 capacity_used_rate 為 85%，已經(jīng)不會(huì)觸發(fā)緩存擴(kuò)容。

$ kubectl get hpa
NAME    REFERENCE              TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
spark   AlluxioRuntime/spark   85/90     1         4         3          11m

15. 清理環(huán)境。

kubectl delete hpa spark
kubectl delete dataset spark

Fluid 提供了結(jié)合 Prometheous，Kubernetes HPA 和 Custom Metrics 能力，根據(jù)占用緩存空間的比例觸發(fā)自動(dòng)彈性伸縮的能力，實(shí)現(xiàn)緩存能力的按需使用。這樣能夠幫助用戶更加靈活的使用通過分布式緩存提升數(shù)據(jù)訪問加速能力，后續(xù)我們會(huì)提供定時(shí)擴(kuò)縮的能力，為擴(kuò)縮容提供更強(qiáng)的確定性。

上述就是小編為大家分享的Fluid的自定義彈性伸縮是怎樣的了，如果剛好有類似的疑惑，不妨參照上述分析進(jìn)行理解。如果想知道更多相關(guān)知識(shí)，歡迎關(guān)注億速云行業(yè)資訊頻道。

向AI問一下細(xì)節(jié)

Fluid的自定義彈性伸縮是怎樣的

背景

前提條件

具體步驟

1. 安裝 jq 工具方便解析 json。

在本例子中我們使用操作系統(tǒng)是 centos，可以通過 yum 安裝 jq。

2. 下載、安裝 Fluid 最新版。

3. 部署或配置 Prometheus。

4. 驗(yàn)證 Prometheus 安裝成功。

5. 部署 metrics server。

6. 部署 custom-metrics-api 組件。

7. 提交測(cè)試使用的 Dataset。

8. 查看這個(gè) Dataset 是否處于可用狀態(tài)。

可以看到該數(shù)據(jù)集的數(shù)據(jù)總量為 2.71GiB， 目前 Fluid 提供的緩存節(jié)點(diǎn)數(shù)為 1，可以提供的最大緩存能力為 1GiB。此時(shí)數(shù)據(jù)量是無法滿足全量數(shù)據(jù)緩存的需求。

9. 當(dāng)該 Dataset 處于可用狀態(tài)后，查看是否已經(jīng)可以從 custom-metrics-api 獲得監(jiān)控指標(biāo)。

10. 創(chuàng)建 HPA 任務(wù)。

11. 查看 HPA 配置， 當(dāng)前緩存空間的數(shù)據(jù)占比為 0。遠(yuǎn)遠(yuǎn)低于觸發(fā)擴(kuò)容的條件。

12. 創(chuàng)建數(shù)據(jù)預(yù)熱任務(wù)。

13. 此時(shí)可以發(fā)現(xiàn)緩存的數(shù)據(jù)量接近了 Fluid 可以提供的緩存能力（1GiB）同時(shí)觸發(fā)了彈性伸縮的條件。

14. 在等待一段時(shí)間之后發(fā)現(xiàn)數(shù)據(jù)集的緩存空間由 1GiB 提升到了 3GiB，數(shù)據(jù)緩存已經(jīng)接近完成。

15. 清理環(huán)境。

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標(biāo)簽

1. 安裝 jq 工具方便解析 json。

在本例子中我們使用操作系統(tǒng)是 centos，可以通過 yum 安裝 jq。

2. 下載、安裝 Fluid 最新版。

3. 部署或配置 Prometheus。

4. 驗(yàn)證 Prometheus 安裝成功。

5. 部署 metrics server。

7. 提交測(cè)試使用的 Dataset。

可以看到該數(shù)據(jù)集的數(shù)據(jù)總量為 2.71GiB，目前 Fluid 提供的緩存節(jié)點(diǎn)數(shù)為 1，可以提供的最大緩存能力為 1GiB。此時(shí)數(shù)據(jù)量是無法滿足全量數(shù)據(jù)緩存的需求。

9. 當(dāng)該 Dataset 處于可用狀態(tài)后，查看是否已經(jīng)可以從 custom-metrics-api 獲得監(jiān)控指標(biāo)。

10. 創(chuàng)建 HPA 任務(wù)。

11. 查看 HPA 配置，當(dāng)前緩存空間的數(shù)據(jù)占比為 0。遠(yuǎn)遠(yuǎn)低于觸發(fā)擴(kuò)容的條件。

12. 創(chuàng)建數(shù)據(jù)預(yù)熱任務(wù)。

13. 此時(shí)可以發(fā)現(xiàn)緩存的數(shù)據(jù)量接近了 Fluid 可以提供的緩存能力（1GiB）同時(shí)觸發(fā)了彈性伸縮的條件。

14. 在等待一段時(shí)間之后發(fā)現(xiàn)數(shù)據(jù)集的緩存空間由 1GiB 提升到了 3GiB，數(shù)據(jù)緩存已經(jīng)接近完成。

15. 清理環(huán)境。