使用Prometheus和Thanos怎樣進(jìn)行高可用K8S監(jiān)控

發(fā)布時(shí)間：2021-12-16 09:58:30 來(lái)源：億速云閱讀：188 作者：柒染欄目：云計(jì)算

這篇文章將為大家詳細(xì)講解有關(guān)使用Prometheus和Thanos怎樣進(jìn)行高可用K8S監(jiān)控，文章內(nèi)容質(zhì)量較高，因此小編分享給大家做個(gè)參考，希望大家閱讀完這篇文章后對(duì)相關(guān)知識(shí)有一定的了解。

介紹

Prometheus高可用的必要性

在過去的幾年里，Kubernetes的采用量增長(zhǎng)了數(shù)倍。很明顯，Kubernetes是容器編排的不二選擇。與此同時(shí)，Prometheus也被認(rèn)為是監(jiān)控容器化和非容器化工作負(fù)載的絕佳選擇。監(jiān)控是任何基礎(chǔ)設(shè)施的一個(gè)重要關(guān)注點(diǎn)，我們應(yīng)該確保我們的監(jiān)控設(shè)置具有高可用性和高可擴(kuò)展性，以滿足不斷增長(zhǎng)的基礎(chǔ)設(shè)施的需求，特別是在采用Kubernetes的情況下。

因此，今天我們將部署一個(gè)集群化的Prometheus設(shè)置，它不僅能夠彈性應(yīng)對(duì)節(jié)點(diǎn)故障，還能保證合適的數(shù)據(jù)存檔，供以后參考。我們的設(shè)置還具有很強(qiáng)的可擴(kuò)展性，以至于我們可以在同一個(gè)監(jiān)控保護(hù)傘下跨越多個(gè)Kubernetes集群。

當(dāng)前方案

大部分的Prometheus部署都是使用持久卷的pod，而Prometheus則是使用聯(lián)邦機(jī)制進(jìn)行擴(kuò)展。但是并不是所有的數(shù)據(jù)都可以使用聯(lián)邦機(jī)制進(jìn)行聚合，在這里，當(dāng)你增加額外的服務(wù)器時(shí)，你往往需要一個(gè)機(jī)制來(lái)管理Prometheus配置。

解決方法

Thanos旨在解決上述問題。在Thanos的幫助下，我們不僅可以對(duì)Prometheus的實(shí)例進(jìn)行多重復(fù)制，并在它們之間進(jìn)行數(shù)據(jù)去重，還可以將數(shù)據(jù)歸檔到GCS或S3等長(zhǎng)期存儲(chǔ)中。

實(shí)施過程

Thanos 架構(gòu)

使用Prometheus和Thanos怎樣進(jìn)行高可用K8S監(jiān)控

圖片來(lái)源: https://thanos.io/quick-tutorial.md/

Thanos由以下組件構(gòu)成：

Thanos sidecar：這是運(yùn)行在Prometheus上的主要組件。它讀取和歸檔對(duì)象存儲(chǔ)上的數(shù)據(jù)。此外，它還管理著Prometheus的配置和生命周期。為了區(qū)分每個(gè)Prometheus實(shí)例，sidecar組件將外部標(biāo)簽注入到Prometheus配置中。該組件能夠在 Prometheus 服務(wù)器的 PromQL 接口上運(yùn)行查詢。Sidecar組件還能監(jiān)聽Thanos gRPC協(xié)議，并在gRPC和REST之間翻譯查詢。
Thanos 存儲(chǔ)：該組件在對(duì)象storage bucket中的歷史數(shù)據(jù)之上實(shí)現(xiàn)了Store API，它主要作為API網(wǎng)關(guān)，因此不需要大量的本地磁盤空間。它在啟動(dòng)時(shí)加入一個(gè)Thanos集群，并公布它可以訪問的數(shù)據(jù)。它在本地磁盤上保存了少量關(guān)于所有遠(yuǎn)程區(qū)塊的信息，并使其與 bucket 保持同步。通常情況下，在重新啟動(dòng)時(shí)可以安全地刪除此數(shù)據(jù)，但會(huì)增加啟動(dòng)時(shí)間。
Thanos查詢：查詢組件在HTTP上監(jiān)聽并將查詢翻譯成Thanos gRPC格式。它從不同的源頭匯總查詢結(jié)果，并能從Sidecar和Store讀取數(shù)據(jù)。在HA設(shè)置中，它甚至?xí)?duì)查詢結(jié)果進(jìn)行重復(fù)數(shù)據(jù)刪除。

HA組的運(yùn)行時(shí)重復(fù)數(shù)據(jù)刪除

Prometheus是有狀態(tài)的，不允許復(fù)制其數(shù)據(jù)庫(kù)。這意味著通過運(yùn)行多個(gè)Prometheus副本來(lái)提高高可用性并不易于使用。簡(jiǎn)單的負(fù)載均衡是行不通的，比如在發(fā)生某些崩潰之后，一個(gè)副本可能會(huì)啟動(dòng)，但是查詢這樣的副本會(huì)導(dǎo)致它在關(guān)閉期間出現(xiàn)一個(gè)小的缺口（gap）。你有第二個(gè)副本可能正在啟動(dòng)，但它可能在另一個(gè)時(shí)刻（如滾動(dòng)重啟）關(guān)閉，因此在這些副本上面的負(fù)載均衡將無(wú)法正常工作。

Thanos Querier則從兩個(gè)副本中提取數(shù)據(jù)，并對(duì)這些信號(hào)進(jìn)行重復(fù)數(shù)據(jù)刪除，從而為Querier使用者填補(bǔ)了缺口（gap）。
Thanos Compact組件將Prometheus 2.0存儲(chǔ)引擎的壓實(shí)程序應(yīng)用于對(duì)象存儲(chǔ)中的塊數(shù)據(jù)存儲(chǔ)。它通常不是語(yǔ)義上的并發(fā)安全，必須針對(duì)bucket 進(jìn)行單例部署。它還負(fù)責(zé)數(shù)據(jù)的下采樣——40小時(shí)后執(zhí)行5m下采樣，10天后執(zhí)行1h下采樣。
Thanos Ruler基本上和Prometheus的規(guī)則具有相同作用，唯一區(qū)別是它可以與Thanos組件進(jìn)行通信。

配置

前期準(zhǔn)備

要完全理解這個(gè)教程，需要準(zhǔn)備以下東西：

對(duì)Kubernetes和使用kubectl有一定的了解。
運(yùn)行中的Kubernetes集群至少有3個(gè)節(jié)點(diǎn)（在本demo中，使用GKE集群）
實(shí)現(xiàn)Ingress Controller和Ingress對(duì)象（在本demo中使用Nginx Ingress Controller）。雖然這不是強(qiáng)制性的，但為了減少創(chuàng)建外部端點(diǎn)的數(shù)量，強(qiáng)烈建議使用。
創(chuàng)建用于Thanos組件訪問對(duì)象存儲(chǔ)的憑證（在本例中為GCS bucket）。
創(chuàng)建2個(gè)GCS bucket，并將其命名為Prometheus-long-term和thanos-ruler。
創(chuàng)建一個(gè)服務(wù)賬戶，角色為Storage Object Admin。
下載密鑰文件作為json證書，并命名為thanos-gcs-credentials.json。
使用憑證創(chuàng)建Kubernetes sercret

kubectl create secret generic thanos-gcs-credentials --from-file=thanos-gcs-credentials.json

部署各類組件

部署Prometheus服務(wù)賬戶、Clusterroler和Clusterrolebinding

apiVersion: v1
kind: Namespace
metadata:
  name: monitoring
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: monitoring
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: monitoring
  namespace: monitoring
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources:
  - configmaps
  verbs: ["get"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: monitoring
subjects:
  - kind: ServiceAccount
    name: monitoring
    namespace: monitoring
roleRef:
  kind: ClusterRole
  Name: monitoring
  apiGroup: rbac.authorization.k8s.io
---

以上manifest創(chuàng)建了Prometheus所需的監(jiān)控命名空間以及服務(wù)賬戶、clusterrole以及clusterrolebinding。

部署Prometheues配置configmap

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-server-conf
  labels:
    name: prometheus-server-conf
  namespace: monitoring
data:
  prometheus.yaml.tmpl: |-
    global:
      scrape_interval: 5s
      evaluation_interval: 5s
      external_labels:
        cluster: prometheus-ha
        # Each Prometheus has to have unique labels.
        replica: $(POD_NAME)

    rule_files:
      - /etc/prometheus/rules/*rules.yaml

    alerting:

      # We want our alerts to be deduplicated
      # from different replicas.
      alert_relabel_configs:
      - regex: replica
        action: labeldrop

      alertmanagers:
        - scheme: http
          path_prefix: /
          static_configs:
            - targets: ['alertmanager:9093']

    scrape_configs:
    - job_name: kubernetes-nodes-cadvisor
      scrape_interval: 10s
      scrape_timeout: 10s
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
        - role: node
      relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)
        # Only for Kubernetes ^1.7.3.
        # See: https://github.com/prometheus/prometheus/issues/2916
        - target_label: __address__
          replacement: kubernetes.default.svc:443
        - source_labels: [__meta_kubernetes_node_name]
          regex: (.+)
          target_label: __metrics_path__
          replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
      metric_relabel_configs:
        - action: replace
          source_labels: [id]
          regex: '^/machine\.slice/machine-rkt\\x2d([^\\]+)\\.+/([^/]+)\.service$'
          target_label: rkt_container_name
          replacement: '${2}-${1}'
        - action: replace
          source_labels: [id]
          regex: '^/system\.slice/(.+)\.service$'
          target_label: systemd_service_name
          replacement: '${1}'

    - job_name: 'kubernetes-pods'
      kubernetes_sd_configs:
        - role: pod
      relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          action: replace
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_pod_name]
          action: replace
          target_label: kubernetes_pod_name
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
          action: replace
          target_label: __scheme__
          regex: (https?)
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_pod_prometheus_io_port]
          action: replace
          target_label: __address__
          regex: ([^:]+)(?::\d+)?;(\d+)
          replacement: $1:$2


    - job_name: 'kubernetes-apiservers'
      kubernetes_sd_configs:
        - role: endpoints
      scheme: https 
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
          action: keep
          regex: default;kubernetes;https

    - job_name: 'kubernetes-service-endpoints'
      kubernetes_sd_configs:
        - role: endpoints
      relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_service_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          action: replace
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_service_name]
          action: replace
          target_label: kubernetes_name
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
          action: replace
          target_label: __scheme__
          regex: (https?)
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
          action: replace
          target_label: __address__
          regex: (.+)(?::\d+);(\d+)
          replacement: $1:$2

上述Configmap創(chuàng)建了Prometheus配置文件模板。這個(gè)配置文件模板將被Thanos sidecar組件讀取，它將生成實(shí)際的配置文件，而這個(gè)配置文件又將被運(yùn)行在同一個(gè)pod中的Prometheus容器所消耗。在配置文件中添加external_labels部分是極其重要的，這樣Querier就可以根據(jù)這個(gè)來(lái)重復(fù)刪除數(shù)據(jù)。

部署Prometheus Rules configmap

這將創(chuàng)建我們的告警規(guī)則，這些規(guī)則將被轉(zhuǎn)發(fā)到alertmanager，以便發(fā)送。

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-rules
  labels:
    name: prometheus-rules
  namespace: monitoring
data:
  alert-rules.yaml: |-
    groups:
      - name: Deployment
        rules:
        - alert: Deployment at 0 Replicas
          annotations:
            summary: Deployment {{$labels.deployment}} in {{$labels.namespace}} is currently having no pods running
          expr: |
            sum(kube_deployment_status_replicas{pod_template_hash=""}) by (deployment,namespace)  < 1
          for: 1m
          labels:
            team: devops

        - alert: HPA Scaling Limited  
          annotations: 
            summary: HPA named {{$labels.hpa}} in {{$labels.namespace}} namespace has reached scaling limited state
          expr: | 
            (sum(kube_hpa_status_condition{condition="ScalingLimited",status="true"}) by (hpa,namespace)) == 1
          for: 1m
          labels: 
            team: devops

        - alert: HPA at MaxCapacity 
          annotations: 
            summary: HPA named {{$labels.hpa}} in {{$labels.namespace}} namespace is running at Max Capacity
          expr: | 
            ((sum(kube_hpa_spec_max_replicas) by (hpa,namespace)) - (sum(kube_hpa_status_current_replicas) by (hpa,namespace))) == 0
          for: 1m
          labels: 
            team: devops

      - name: Pods
        rules:
        - alert: Container restarted
          annotations:
            summary: Container named {{$labels.container}} in {{$labels.pod}} in {{$labels.namespace}} was restarted
          expr: |
            sum(increase(kube_pod_container_status_restarts_total{namespace!="kube-system",pod_template_hash=""}[1m])) by (pod,namespace,container) > 0
          for: 0m
          labels:
            team: dev

        - alert: High Memory Usage of Container 
          annotations: 
            summary: Container named {{$labels.container}} in {{$labels.pod}} in {{$labels.namespace}} is using more than 75% of Memory Limit
          expr: | 
            ((( sum(container_memory_usage_bytes{image!="",container_name!="POD", namespace!="kube-system"}) by (namespace,container_name,pod_name)  / sum(container_spec_memory_limit_bytes{image!="",container_name!="POD",namespace!="kube-system"}) by (namespace,container_name,pod_name) ) * 100 ) < +Inf ) > 75
          for: 5m
          labels: 
            team: dev

        - alert: High CPU Usage of Container 
          annotations: 
            summary: Container named {{$labels.container}} in {{$labels.pod}} in {{$labels.namespace}} is using more than 75% of CPU Limit
          expr: | 
            ((sum(irate(container_cpu_usage_seconds_total{image!="",container_name!="POD", namespace!="kube-system"}[30s])) by (namespace,container_name,pod_name) / sum(container_spec_cpu_quota{image!="",container_name!="POD", namespace!="kube-system"} / container_spec_cpu_period{image!="",container_name!="POD", namespace!="kube-system"}) by (namespace,container_name,pod_name) ) * 100)  > 75
          for: 5m
          labels: 
            team: dev

      - name: Nodes
        rules:
        - alert: High Node Memory Usage
          annotations:
            summary: Node {{$labels.kubernetes_io_hostname}} has more than 80% memory used. Plan Capcity
          expr: |
            (sum (container_memory_working_set_bytes{id="/",container_name!="POD"}) by (kubernetes_io_hostname) / sum (machine_memory_bytes{}) by (kubernetes_io_hostname) * 100) > 80
          for: 5m
          labels:
            team: devops

        - alert: High Node CPU Usage
          annotations:
            summary: Node {{$labels.kubernetes_io_hostname}} has more than 80% allocatable cpu used. Plan Capacity.
          expr: |
            (sum(rate(container_cpu_usage_seconds_total{id="/", container_name!="POD"}[1m])) by (kubernetes_io_hostname) / sum(machine_cpu_cores) by (kubernetes_io_hostname)  * 100) > 80
          for: 5m
          labels:
            team: devops

        - alert: High Node Disk Usage
          annotations:
            summary: Node {{$labels.kubernetes_io_hostname}} has more than 85% disk used. Plan Capacity.
          expr: |
            (sum(container_fs_usage_bytes{device=~"^/dev/[sv]d[a-z][1-9]$",id="/",container_name!="POD"}) by (kubernetes_io_hostname) / sum(container_fs_limit_bytes{container_name!="POD",device=~"^/dev/[sv]d[a-z][1-9]$",id="/"}) by (kubernetes_io_hostname)) * 100 > 85
          for: 5m
          labels:
            team: devops

部署Prometheus Stateful Set

apiVersion: storage.k8s.io/v1beta1
kind: StorageClass
metadata:
  name: fast
  namespace: monitoring
provisioner: kubernetes.io/gce-pd
allowVolumeExpansion: true
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: prometheus
  namespace: monitoring
spec:
  replicas: 3
  serviceName: prometheus-service
  template:
    metadata:
      labels:
        app: prometheus
        thanos-store-api: "true"
    spec:
      serviceAccountName: monitoring
      containers:
        - name: prometheus
          image: prom/prometheus:v2.4.3
          args:
            - "--config.file=/etc/prometheus-shared/prometheus.yaml"
            - "--storage.tsdb.path=/prometheus/"
            - "--web.enable-lifecycle"
            - "--storage.tsdb.no-lockfile"
            - "--storage.tsdb.min-block-duration=2h"
            - "--storage.tsdb.max-block-duration=2h"
          ports:
            - name: prometheus
              containerPort: 9090
          volumeMounts:
            - name: prometheus-storage
              mountPath: /prometheus/
            - name: prometheus-config-shared
              mountPath: /etc/prometheus-shared/
            - name: prometheus-rules
              mountPath: /etc/prometheus/rules
        - name: thanos
          image: quay.io/thanos/thanos:v0.8.0
          args:
            - "sidecar"
            - "--log.level=debug"
            - "--tsdb.path=/prometheus"
            - "--prometheus.url=http://127.0.0.1:9090"
            - "--objstore.config={type: GCS, config: {bucket: prometheus-long-term}}"
            - "--reloader.config-file=/etc/prometheus/prometheus.yaml.tmpl"
            - "--reloader.config-envsubst-file=/etc/prometheus-shared/prometheus.yaml"
            - "--reloader.rule-dir=/etc/prometheus/rules/"
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name : GOOGLE_APPLICATION_CREDENTIALS
              value: /etc/secret/thanos-gcs-credentials.json
          ports:
            - name: http-sidecar
              containerPort: 10902
            - name: grpc
              containerPort: 10901
          livenessProbe:
              httpGet:
                port: 10902
                path: /-/healthy
          readinessProbe:
            httpGet:
              port: 10902
              path: /-/ready
          volumeMounts:
            - name: prometheus-storage
              mountPath: /prometheus
            - name: prometheus-config-shared
              mountPath: /etc/prometheus-shared/
            - name: prometheus-config
              mountPath: /etc/prometheus
            - name: prometheus-rules
              mountPath: /etc/prometheus/rules
            - name: thanos-gcs-credentials
              mountPath: /etc/secret
              readOnly: false
      securityContext:
        fsGroup: 2000
        runAsNonRoot: true
        runAsUser: 1000
      volumes:
        - name: prometheus-config
          configMap:
            defaultMode: 420
            name: prometheus-server-conf
        - name: prometheus-config-shared
          emptyDir: {}
        - name: prometheus-rules
          configMap:
            name: prometheus-rules
        - name: thanos-gcs-credentials
          secret:
            secretName: thanos-gcs-credentials
  volumeClaimTemplates:
  - metadata:
      name: prometheus-storage
      namespace: monitoring
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: fast
      resources:
        requests:
          storage: 20Gi

關(guān)于上面提供的manifest，理解以下內(nèi)容很重要：

Prometheus是作為一個(gè)有狀態(tài)集部署的，有3個(gè)副本，每個(gè)副本動(dòng)態(tài)地提供自己的持久化卷。
Prometheus配置是由Thanos sidecar容器使用我們上面創(chuàng)建的模板文件生成的。
Thanos處理數(shù)據(jù)壓縮，因此我們需要設(shè)置--storage.tsdb.min-block-duration=2h和--storage.tsdb.max-block-duration=2h。
Prometheus有狀態(tài)集被標(biāo)記為thanos-store-api: true，這樣每個(gè)pod就會(huì)被我們接下來(lái)創(chuàng)建的headless service發(fā)現(xiàn)。正是這個(gè)headless service將被Thanos Querier用來(lái)查詢所有Prometheus實(shí)例的數(shù)據(jù)。我們還將相同的標(biāo)簽應(yīng)用于Thanos Store和Thanos Ruler組件，這樣它們也會(huì)被Querier發(fā)現(xiàn)，并可用于查詢指標(biāo)。
GCS bucket credentials路徑是使用GOOGLE_APPLICATION_CREDENTIALS環(huán)境變量提供的，配置文件是由我們作為前期準(zhǔn)備中創(chuàng)建的secret掛載到它上面的。

部署Prometheus服務(wù)

apiVersion: v1
kind: Service
metadata: 
  name: prometheus-0-service
  annotations: 
    prometheus.io/scrape: "true"
    prometheus.io/port: "9090"
  namespace: monitoring
  labels:
    name: prometheus
spec:
  selector: 
    statefulset.kubernetes.io/pod-name: prometheus-0
  ports: 
    - name: prometheus 
      port: 8080
      targetPort: prometheus
---
apiVersion: v1
kind: Service
metadata: 
  name: prometheus-1-service
  annotations: 
    prometheus.io/scrape: "true"
    prometheus.io/port: "9090"
  namespace: monitoring
  labels:
    name: prometheus
spec:
  selector: 
    statefulset.kubernetes.io/pod-name: prometheus-1
  ports: 
    - name: prometheus 
      port: 8080
      targetPort: prometheus
---
apiVersion: v1
kind: Service
metadata: 
  name: prometheus-2-service
  annotations: 
    prometheus.io/scrape: "true"
    prometheus.io/port: "9090"
  namespace: monitoring
  labels:
    name: prometheus
spec:
  selector: 
    statefulset.kubernetes.io/pod-name: prometheus-2
  ports: 
    - name: prometheus 
      port: 8080
      targetPort: prometheus
---
#This service creates a srv record for querier to find about store-api's
apiVersion: v1
kind: Service
metadata:
  name: thanos-store-gateway
  namespace: monitoring
spec:
  type: ClusterIP
  clusterIP: None
  ports:
    - name: grpc
      port: 10901
      targetPort: grpc
  selector:
    thanos-store-api: "true"

除了上述方法外，你還可以點(diǎn)擊這篇文章了解如何在Rancher上快速部署和配置Prometheus服務(wù)。

我們?yōu)閟tateful set中的每個(gè)Prometheus pod創(chuàng)建了不同的服務(wù)，盡管這并不是必要的。這些服務(wù)的創(chuàng)建只是為了調(diào)試。上文已經(jīng)解釋了 thanos-store-gateway headless service的目的。我們稍后將使用一個(gè) ingress 對(duì)象來(lái)暴露 Prometheus 服務(wù)。

部署Prometheus Querier

apiVersion: v1
kind: Namespace
metadata:
  name: monitoring
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: thanos-querier
  namespace: monitoring
  labels:
    app: thanos-querier
spec:
  replicas: 1
  selector:
    matchLabels:
      app: thanos-querier
  template:
    metadata:
      labels:
        app: thanos-querier
    spec:
      containers:
      - name: thanos
        image: quay.io/thanos/thanos:v0.8.0
        args:
        - query
        - --log.level=debug
        - --query.replica-label=replica
        - --store=dnssrv+thanos-store-gateway:10901
        ports:
        - name: http
          containerPort: 10902
        - name: grpc
          containerPort: 10901
        livenessProbe:
          httpGet:
            port: http
            path: /-/healthy
        readinessProbe:
          httpGet:
            port: http
            path: /-/ready
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: thanos-querier
  name: thanos-querier
  namespace: monitoring
spec:
  ports:
  - port: 9090
    protocol: TCP
    targetPort: http
    name: http
  selector:
    app: thanos-querier

這是Thanos部署的主要內(nèi)容之一。請(qǐng)注意以下幾點(diǎn)：

容器參數(shù)-store=dnssrv+thanos-store-gateway:10901有助于發(fā)現(xiàn)所有應(yīng)查詢的指標(biāo)數(shù)據(jù)的組件。
thanos-querier服務(wù)提供了一個(gè)Web接口來(lái)運(yùn)行PromQL查詢。它還可以選擇在不同的Prometheus集群中去重復(fù)刪除數(shù)據(jù)。
這是我們提供Grafana作為所有dashboard的數(shù)據(jù)源的終點(diǎn)（end point）。

部署Thanos存儲(chǔ)網(wǎng)關(guān)

apiVersion: v1
kind: Namespace
metadata:
  name: monitoring
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: thanos-store-gateway
  namespace: monitoring
  labels:
    app: thanos-store-gateway
spec:
  replicas: 1
  selector:
    matchLabels:
      app: thanos-store-gateway
  serviceName: thanos-store-gateway
  template:
    metadata:
      labels:
        app: thanos-store-gateway
        thanos-store-api: "true"
    spec:
      containers:
        - name: thanos
          image: quay.io/thanos/thanos:v0.8.0
          args:
          - "store"
          - "--log.level=debug"
          - "--data-dir=/data"
          - "--objstore.config={type: GCS, config: {bucket: prometheus-long-term}}"
          - "--index-cache-size=500MB"
          - "--chunk-pool-size=500MB"
          env:
            - name : GOOGLE_APPLICATION_CREDENTIALS
              value: /etc/secret/thanos-gcs-credentials.json
          ports:
          - name: http
            containerPort: 10902
          - name: grpc
            containerPort: 10901
          livenessProbe:
            httpGet:
              port: 10902
              path: /-/healthy
          readinessProbe:
            httpGet:
              port: 10902
              path: /-/ready
          volumeMounts:
            - name: thanos-gcs-credentials
              mountPath: /etc/secret
              readOnly: false
      volumes:
        - name: thanos-gcs-credentials
          secret:
            secretName: thanos-gcs-credentials
---

這將創(chuàng)建存儲(chǔ)組件，它將從對(duì)象存儲(chǔ)中向Querier提供指標(biāo)。

部署Thanos Ruler

apiVersion: v1
kind: Namespace
metadata:
  name: monitoring
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: thanos-ruler-rules
  namespace: monitoring
data:
  alert_down_services.rules.yaml: |
    groups:
    - name: metamonitoring
      rules:
      - alert: PrometheusReplicaDown
        annotations:
          message: Prometheus replica in cluster {{$labels.cluster}} has disappeared from Prometheus target discovery.
        expr: |
          sum(up{cluster="prometheus-ha", instance=~".*:9090", job="kubernetes-service-endpoints"}) by (job,cluster) < 3
        for: 15s
        labels:
          severity: critical
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  labels:
    app: thanos-ruler
  name: thanos-ruler
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: thanos-ruler
  serviceName: thanos-ruler
  template:
    metadata:
      labels:
        app: thanos-ruler
        thanos-store-api: "true"
    spec:
      containers:
        - name: thanos
          image: quay.io/thanos/thanos:v0.8.0
          args:
            - rule
            - --log.level=debug
            - --data-dir=/data
            - --eval-interval=15s
            - --rule-file=/etc/thanos-ruler/*.rules.yaml
            - --alertmanagers.url=http://alertmanager:9093
            - --query=thanos-querier:9090
            - "--objstore.config={type: GCS, config: {bucket: thanos-ruler}}"
            - --label=ruler_cluster="prometheus-ha"
            - --label=replica="$(POD_NAME)"
          env:
            - name : GOOGLE_APPLICATION_CREDENTIALS
              value: /etc/secret/thanos-gcs-credentials.json
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
          ports:
            - name: http
              containerPort: 10902
            - name: grpc
              containerPort: 10901
          livenessProbe:
            httpGet:
              port: http
              path: /-/healthy
          readinessProbe:
            httpGet:
              port: http
              path: /-/ready
          volumeMounts:
            - mountPath: /etc/thanos-ruler
              name: config
            - name: thanos-gcs-credentials
              mountPath: /etc/secret
              readOnly: false
      volumes:
        - configMap:
            name: thanos-ruler-rules
          name: config
        - name: thanos-gcs-credentials
          secret:
            secretName: thanos-gcs-credentials
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: thanos-ruler
  name: thanos-ruler
  namespace: monitoring
spec:
  ports:
    - port: 9090
      protocol: TCP
      targetPort: http
      name: http
  selector:
    app: thanos-ruler

現(xiàn)在，如果你在與我們的工作負(fù)載相同的命名空間中啟動(dòng)交互式shell，并嘗試查看我們的thanos-store-gateway解析到哪些pods，你會(huì)看到以下內(nèi)容：

root@my-shell-95cb5df57-4q6w8:/# nslookup thanos-store-gateway
Server:    10.63.240.10
Address:  10.63.240.10#53

Name:  thanos-store-gateway.monitoring.svc.cluster.local
Address: 10.60.25.2
Name:  thanos-store-gateway.monitoring.svc.cluster.local
Address: 10.60.25.4
Name:  thanos-store-gateway.monitoring.svc.cluster.local
Address: 10.60.30.2
Name:  thanos-store-gateway.monitoring.svc.cluster.local
Address: 10.60.30.8
Name:  thanos-store-gateway.monitoring.svc.cluster.local
Address: 10.60.31.2

root@my-shell-95cb5df57-4q6w8:/# exit

上面返回的IP對(duì)應(yīng)的是我們的Prometheus Pod、thanos-store和thanos-ruler。這可以被驗(yàn)證為：

$ kubectl get pods -o wide -l thanos-store-api="true"
NAME                     READY   STATUS    RESTARTS   AGE    IP           NODE                              NOMINATED NODE   READINESS GATES
prometheus-0             2/2     Running   0          100m   10.60.31.2   gke-demo-1-pool-1-649cbe02-jdnv   <none>           <none>
prometheus-1             2/2     Running   0          14h    10.60.30.2   gke-demo-1-pool-1-7533d618-kxkd   <none>           <none>
prometheus-2             2/2     Running   0          31h    10.60.25.2   gke-demo-1-pool-1-4e9889dd-27gc   <none>           <none>
thanos-ruler-0           1/1     Running   0          100m   10.60.30.8   gke-demo-1-pool-1-7533d618-kxkd   <none>           <none>
thanos-store-gateway-0   1/1     Running   0          14h    10.60.25.4   gke-demo-1-pool-1-4e9889dd-27gc   <none>           <none>

部署Alertmanager

apiVersion: v1
kind: Namespace
metadata:
  name: monitoring
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: alertmanager
  namespace: monitoring
data:
  config.yml: |-
    global:
      resolve_timeout: 5m
      slack_api_url: "<your_slack_hook>"
      victorops_api_url: "<your_victorops_hook>"

    templates:
    - '/etc/alertmanager-templates/*.tmpl'
    route:
      group_by: ['alertname', 'cluster', 'service']
      group_wait: 10s
      group_interval: 1m
      repeat_interval: 5m  
      receiver: default 
      routes:
      - match:
          team: devops
        receiver: devops
        continue: true 
      - match: 
          team: dev
        receiver: dev
        continue: true

    receivers:
    - name: 'default'

    - name: 'devops'
      victorops_configs:
      - api_key: '<YOUR_API_KEY>'
        routing_key: 'devops'
        message_type: 'CRITICAL'
        entity_display_name: '{{ .CommonLabels.alertname }}'
        state_message: 'Alert: {{ .CommonLabels.alertname }}. Summary:{{ .CommonAnnotations.summary }}. RawData: {{ .CommonLabels }}'
      slack_configs:
      - channel: '#k8-alerts'
        send_resolved: true


    - name: 'dev'
      victorops_configs:
      - api_key: '<YOUR_API_KEY>'
        routing_key: 'dev'
        message_type: 'CRITICAL'
        entity_display_name: '{{ .CommonLabels.alertname }}'
        state_message: 'Alert: {{ .CommonLabels.alertname }}. Summary:{{ .CommonAnnotations.summary }}. RawData: {{ .CommonLabels }}'
      slack_configs:
      - channel: '#k8-alerts'
        send_resolved: true

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: alertmanager
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: alertmanager
  template:
    metadata:
      name: alertmanager
      labels:
        app: alertmanager
    spec:
      containers:
      - name: alertmanager
        image: prom/alertmanager:v0.15.3
        args:
          - '--config.file=/etc/alertmanager/config.yml'
          - '--storage.path=/alertmanager'
        ports:
        - name: alertmanager
          containerPort: 9093
        volumeMounts:
        - name: config-volume
          mountPath: /etc/alertmanager
        - name: alertmanager
          mountPath: /alertmanager
      volumes:
      - name: config-volume
        configMap:
          name: alertmanager
      - name: alertmanager
        emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: 'true'
    prometheus.io/path: '/metrics'
  labels:
    name: alertmanager
  name: alertmanager
  namespace: monitoring
spec:
  selector:
    app: alertmanager
  ports:
  - name: alertmanager
    protocol: TCP
    port: 9093
    targetPort: 9093

這將創(chuàng)建我們的Alertmanager部署，它將根據(jù)Prometheus規(guī)則生成所有告警。

部署Kubestate指標(biāo)

apiVersion: v1
kind: Namespace
metadata:
  name: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1 
# kubernetes versions before 1.8.0 should use rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: kube-state-metrics
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kube-state-metrics
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
# kubernetes versions before 1.8.0 should use rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: kube-state-metrics
rules:
- apiGroups: [""]
  resources:
  - configmaps
  - secrets
  - nodes
  - pods
  - services
  - resourcequotas
  - replicationcontrollers
  - limitranges
  - persistentvolumeclaims
  - persistentvolumes
  - namespaces
  - endpoints
  verbs: ["list", "watch"]
- apiGroups: ["extensions"]
  resources:
  - daemonsets
  - deployments
  - replicasets
  verbs: ["list", "watch"]
- apiGroups: ["apps"]
  resources:
  - statefulsets
  verbs: ["list", "watch"]
- apiGroups: ["batch"]
  resources:
  - cronjobs
  - jobs
  verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
  resources:
  - horizontalpodautoscalers
  verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
# kubernetes versions before 1.8.0 should use rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
  name: kube-state-metrics
  namespace: monitoring
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: kube-state-metrics-resizer
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
# kubernetes versions before 1.8.0 should use rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
  namespace: monitoring
  name: kube-state-metrics-resizer
rules:
- apiGroups: [""]
  resources:
  - pods
  verbs: ["get"]
- apiGroups: ["extensions"]
  resources:
  - deployments
  resourceNames: ["kube-state-metrics"]
  verbs: ["get", "update"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kube-state-metrics
  namespace: monitoring
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kube-state-metrics
  namespace: monitoring
spec:
  selector:
    matchLabels:
      k8s-app: kube-state-metrics
  replicas: 1
  template:
    metadata:
      labels:
        k8s-app: kube-state-metrics
    spec:
      serviceAccountName: kube-state-metrics
      containers:
      - name: kube-state-metrics
        image: quay.io/mxinden/kube-state-metrics:v1.4.0-gzip.3
        ports:
        - name: http-metrics
          containerPort: 8080
        - name: telemetry
          containerPort: 8081
        readinessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 5
          timeoutSeconds: 5
      - name: addon-resizer
        image: k8s.gcr.io/addon-resizer:1.8.3
        resources:
          limits:
            cpu: 150m
            memory: 50Mi
          requests:
            cpu: 150m
            memory: 50Mi
        env:
          - name: MY_POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: MY_POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
        command:
          - /pod_nanny
          - --container=kube-state-metrics
          - --cpu=100m
          - --extra-cpu=1m
          - --memory=100Mi
          - --extra-memory=2Mi
          - --threshold=5
          - --deployment=kube-state-metrics
---
apiVersion: v1
kind: Service
metadata:
  name: kube-state-metrics
  namespace: monitoring
  labels:
    k8s-app: kube-state-metrics
  annotations:
    prometheus.io/scrape: 'true'
spec:
  ports:
  - name: http-metrics
    port: 8080
    targetPort: http-metrics
    protocol: TCP
  - name: telemetry
    port: 8081
    targetPort: telemetry
    protocol: TCP
  selector:
    k8s-app: kube-state-metrics

Kubestate指標(biāo)部署需要轉(zhuǎn)發(fā)一些重要的容器指標(biāo)，這些指標(biāo)不是kubelet原生暴露的，因此不能直接提供給Prometheus。

部署Node-Exporter Daemonset

apiVersion: v1
kind: Namespace
metadata:
  name: monitoring
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: monitoring
  labels:
    name: node-exporter
spec:
  template:
    metadata:
      labels:
        name: node-exporter
      annotations:
         prometheus.io/scrape: "true"
         prometheus.io/port: "9100"
    spec:
      hostPID: true
      hostIPC: true
      hostNetwork: true
      containers:
        - name: node-exporter
          image: prom/node-exporter:v0.16.0
          securityContext:
            privileged: true
          args:
            - --path.procfs=/host/proc
            - --path.sysfs=/host/sys
          ports:
            - containerPort: 9100
              protocol: TCP
          resources:
            limits:
              cpu: 100m
              memory: 100Mi
            requests:
              cpu: 10m
              memory: 100Mi
          volumeMounts:
            - name: dev
              mountPath: /host/dev
            - name: proc
              mountPath: /host/proc
            - name: sys
              mountPath: /host/sys
            - name: rootfs
              mountPath: /rootfs
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: dev
          hostPath:
            path: /dev
        - name: sys
          hostPath:
            path: /sys
        - name: rootfs
          hostPath:
            path: /

Node-Exporter daemonset在每個(gè)節(jié)點(diǎn)上運(yùn)行一個(gè)node-exporter的pod，并暴露出非常重要的節(jié)點(diǎn)相關(guān)指標(biāo)，這些指標(biāo)可以被Prometheus實(shí)例拉取。

部署Grafana

apiVersion: v1
kind: Namespace
metadata:
  name: monitoring
---
apiVersion: storage.k8s.io/v1beta1
kind: StorageClass
metadata:
  name: fast
  namespace: monitoring
provisioner: kubernetes.io/gce-pd
allowVolumeExpansion: true
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: grafana
  namespace: monitoring
spec:
  replicas: 1
  serviceName: grafana
  template:
    metadata:
      labels:
        task: monitoring
        k8s-app: grafana
    spec:
      containers:
      - name: grafana
        image: k8s.gcr.io/heapster-grafana-amd64:v5.0.4
        ports:
        - containerPort: 3000
          protocol: TCP
        volumeMounts:
        - mountPath: /etc/ssl/certs
          name: ca-certificates
          readOnly: true
        - mountPath: /var
          name: grafana-storage
        env:
        - name: GF_SERVER_HTTP_PORT
          value: "3000"
          # The following env variables are required to make Grafana accessible via
          # the kubernetes api-server proxy. On production clusters, we recommend
          # removing these env variables, setup auth for grafana, and expose the grafana
          # service using a LoadBalancer or a public IP.
        - name: GF_AUTH_BASIC_ENABLED
          value: "false"
        - name: GF_AUTH_ANONYMOUS_ENABLED
          value: "true"
        - name: GF_AUTH_ANONYMOUS_ORG_ROLE
          value: Admin
        - name: GF_SERVER_ROOT_URL
          # If you're only using the API Server proxy, set this value instead:
          # value: /api/v1/namespaces/kube-system/services/monitoring-grafana/proxy
          value: /
      volumes:
      - name: ca-certificates
        hostPath:
          path: /etc/ssl/certs
  volumeClaimTemplates:
  - metadata:
      name: grafana-storage
      namespace: monitoring
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: fast
      resources:
        requests:
          storage: 5Gi
---
apiVersion: v1
kind: Service
metadata:
  labels:
    kubernetes.io/cluster-service: 'true'
    kubernetes.io/name: grafana
  name: grafana
  namespace: monitoring
spec:
  ports:
  - port: 3000
    targetPort: 3000
  selector:
    k8s-app: grafana

這將創(chuàng)建我們的Grafana部署和服務(wù)，它將使用我們的Ingress對(duì)象暴露。為了做到這一點(diǎn)，我們應(yīng)該添加Thanos-Querier作為我們Grafana部署的數(shù)據(jù)源：

點(diǎn)擊添加數(shù)據(jù)源
設(shè)置Name: DS_PROMETHEUS
設(shè)置Type: Prometheus
設(shè)置URL: http://thanos-querier:9090
保存并測(cè)試?，F(xiàn)在你可以構(gòu)建你的自定義dashboard或從grafana.net簡(jiǎn)單導(dǎo)入dashboard。Dashboard #315和#1471都非常適合入門。

部署Ingress對(duì)象

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: monitoring-ingress
  namespace: monitoring
  annotations:
    kubernetes.io/ingress.class: "nginx"
spec:
  rules:
  - host: grafana.<yourdomain>.com
    http:
      paths:
      - path: /
        backend:
          serviceName: grafana
          servicePort: 3000
  - host: prometheus-0.<yourdomain>.com
    http:
      paths:
      - path: /
        backend:
          serviceName: prometheus-0-service
          servicePort: 8080
  - host: prometheus-1.<yourdomain>.com
    http:
      paths:
      - path: /
        backend:
          serviceName: prometheus-1-service
          servicePort: 8080
  - host: prometheus-2.<yourdomain>.com
    http:
      paths:
      - path: /
        backend:
          serviceName: prometheus-2-service
          servicePort: 8080
  - host: alertmanager.<yourdomain>.com
    http: 
      paths:
      - path: /
        backend:
          serviceName: alertmanager
          servicePort: 9093
  - host: thanos-querier.<yourdomain>.com
    http:
      paths:
      - path: /
        backend:
          serviceName: thanos-querier
          servicePort: 9090
  - host: thanos-ruler.<yourdomain>.com
    http:
      paths:
      - path: /
        backend:
          serviceName: thanos-ruler
          servicePort: 9090

這是拼圖的最后一塊。有助于將我們的所有服務(wù)暴露在Kubernetes集群之外，并幫助我們?cè)L問它們。確保將<yourdomain>替換為一個(gè)你可以訪問的域名，并且你可以將Ingress-Controller的服務(wù)指向這個(gè)域名。

現(xiàn)在你應(yīng)該可以訪問Thanos Querier，網(wǎng)址是：http://thanos-querier.<yourdomain>.com。它如下所示：

使用Prometheus和Thanos怎樣進(jìn)行高可用K8S監(jiān)控

確保選中重復(fù)數(shù)據(jù)刪除（deduplication）。

如果你點(diǎn)擊Store，可以看到所有由thanos-store-gateway服務(wù)發(fā)現(xiàn)的活動(dòng)端點(diǎn)。

使用Prometheus和Thanos怎樣進(jìn)行高可用K8S監(jiān)控

現(xiàn)在你可以在Grafana中添加Thanos Querier作為數(shù)據(jù)源，并開始創(chuàng)建dashboard。

使用Prometheus和Thanos怎樣進(jìn)行高可用K8S監(jiān)控

Kubernetes集群監(jiān)控dashboard

使用Prometheus和Thanos怎樣進(jìn)行高可用K8S監(jiān)控

Kubernetes節(jié)點(diǎn)監(jiān)控dashboard

使用Prometheus和Thanos怎樣進(jìn)行高可用K8S監(jiān)控

將Thanos與Prometheus集成在一起，無(wú)疑提供了橫向擴(kuò)展Prometheus的能力，而且由于Thanos-Querier能夠從其他querier實(shí)例中提取指標(biāo)數(shù)據(jù)，因此實(shí)際上你可以跨集群提取指標(biāo)數(shù)據(jù)，并在一個(gè)單一的儀表板中可視化。

我們還能夠?qū)⒅笜?biāo)數(shù)據(jù)歸檔在對(duì)象存儲(chǔ)中，為我們的監(jiān)控系統(tǒng)提供無(wú)限的存儲(chǔ)空間，同時(shí)從對(duì)象存儲(chǔ)本身提供指標(biāo)數(shù)據(jù)。這種設(shè)置的主要成本部分可以歸結(jié)為對(duì)象存儲(chǔ)（S3或GCS）。如果我們對(duì)它們應(yīng)用適當(dāng)?shù)谋Ａ舨呗裕梢赃M(jìn)一步降低成本。

然而，實(shí)現(xiàn)這一切需要你進(jìn)行大量的配置。

關(guān)于使用Prometheus和Thanos怎樣進(jìn)行高可用K8S監(jiān)控就分享到這里了，希望以上內(nèi)容可以對(duì)大家有一定的幫助，可以學(xué)到更多知識(shí)。如果覺得文章不錯(cuò)，可以把它分享出去讓更多的人看到。

向AI問一下細(xì)節(jié)

使用Prometheus和Thanos怎樣進(jìn)行高可用K8S監(jiān)控

介 紹

Prometheus高可用的必要性

當(dāng)前方案

解決方法

實(shí)施過程

Thanos 架構(gòu)

HA組的運(yùn)行時(shí)重復(fù)數(shù)據(jù)刪除

配 置

前期準(zhǔn)備

部署各類組件

部署Prometheues配置configmap

部署Prometheus Rules configmap

部署Prometheus Stateful Set

部署Prometheus服務(wù)

部署Prometheus Querier

部署Thanos存儲(chǔ)網(wǎng)關(guān)

部署Thanos Ruler

部署Alertmanager

部署Kubestate指標(biāo)

部署Node-Exporter Daemonset

部署Grafana

部署Ingress對(duì)象

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標(biāo)簽

介紹

配置