使用Prometheus的規(guī)則有哪些

發(fā)布時間：2021-11-19 11:11:04 來源：億速云閱讀：245 作者：iii 欄目：云計算

這篇文章主要講解了“使用Prometheus的規(guī)則有哪些”，文中的講解內(nèi)容簡單清晰，易于學習與理解，下面請大家跟著小編的思路慢慢深入，一起來研究和學習“使用Prometheus的規(guī)則有哪些”吧！

在配置系統(tǒng)監(jiān)控的時候，是不是即使絞盡腦汁監(jiān)控的也還是不夠全面，或者不知如何獲取想要的指標。

Awesome Prometheus alerts 維護了一套開箱即用的 Prometheus 告警規(guī)則集合，有 300 多個告警規(guī)則。同時，還是說明如何獲取對應的指標。這些規(guī)則，對每個 Prometheus 都是通用的。

涉及如主機、硬件、容器等基礎資源，到數(shù)據(jù)庫、消息代理、運行時、反向代理、負責均衡器，運行時、服務編排，甚至是網(wǎng)絡層面和 Prometheus 自身和集群。Prometheus 的安裝和配置不做贅述，配置可以看這里。下面簡單看下幾個常用規(guī)則

主機和硬件資源

主機和硬件資源的告警依賴 node-exporter 輸出的指標。例如：

內(nèi)存不足

可用內(nèi)存低于閾值 10% 就會觸發(fā)告警。

  - alert: HostOutOfMemory
    expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: Host out of memory (instance {{ $labels.instance }})
      description: "Node memory is filling up (< 10% left)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

主機異常的網(wǎng)絡吞吐

最近兩分鐘入站的流量超過 100m。

rate 語法見這里。

  - alert: HostUnusualNetworkThroughputIn
    expr: sum by (instance) (rate(node_network_receive_bytes_total[2m])) / 1024 / 1024 > 100
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: Host unusual network throughput in (instance {{ $labels.instance }})
      description: "Host network interfaces are probably receiving too much data (> 100 MB/s)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

Mysql

Mysql 的告警依賴 prometheus/mysqld_exporter 輸出的指標。

連接數(shù)過多

Mysql 實例的連接數(shù)最近一分鐘的連接數(shù)超過最大值的 80% 觸發(fā)告警

  - alert: MysqlTooManyConnections(>80%)
    expr: avg by (instance) (rate(mysql_global_status_threads_connected[1m])) / avg by (instance) (mysql_global_variables_max_connections) * 100 > 80
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: MySQL too many connections (> 80%) (instance {{ $labels.instance }})
      description: "More than 80% of MySQL connections are in use on {{ $labels.instance }}\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

慢查詢

最近一分鐘慢查詢數(shù)量大于 0 時觸發(fā)。

  - alert: MysqlSlowQueries
    expr: increase(mysql_global_status_slow_queries[1m]) > 0
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: MySQL slow queries (instance {{ $labels.instance }})
      description: "MySQL server mysql has some new slow query.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

運行時 JVM

JVM 的運行時告警，居然只有可憐巴巴的一個。堆空間占用超過 80% 觸發(fā)告警。

依賴 java-client 輸出的指標。

  - alert: JvmMemoryFillingUp
    expr: (sum by (instance)(jvm_memory_used_bytes{area="heap"}) / sum by (instance)(jvm_memory_max_bytes{area="heap"})) * 100 > 80
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: JVM memory filling up (instance {{ $labels.instance }})
      description: "JVM memory is filling up (> 80%)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

Kubernetes

Kubernetes 相關的告警規(guī)則有 33 個，比較豐富。

摘個比較常見的：容器OOM告警。

  - alert: KubernetesContainerOomKiller
    expr: (kube_pod_container_status_restarts_total - kube_pod_container_status_restarts_total offset 10m >= 1) and ignoring (reason) min_over_time(kube_pod_container_status_last_terminated_reason{reason="OOMKilled"}[10m]) == 1
    for: 0m
    labels:
      severity: warning
    annotations:
      summary: Kubernetes container oom killer (instance {{ $labels.instance }})
      description: "Container {{ $labels.container }} in pod {{ $labels.namespace }}/{{ $labels.pod }} has been OOMKilled {{ $value }} times in the last 10 minutes.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

SSL 證書過期

通過輸出的指標，可以監(jiān)控證書過期：未來 7 天 有證書過期便會觸發(fā)告警。

  - alert: SslCertificateExpiry(<7Days)
    expr: ssl_verified_cert_not_after{chain_no="0"} - time() < 86400 * 7
    for: 0m
    labels:
      severity: warning
    annotations:
      summary: SSL certificate expiry (< 7 days) (instance {{ $labels.instance }})
      description: "{{ $labels.instance }} Certificate is expiring in 7 days\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

感謝各位的閱讀，以上就是“使用Prometheus的規(guī)則有哪些”的內(nèi)容了，經(jīng)過本文的學習后，相信大家對使用Prometheus的規(guī)則有哪些這一問題有了更深刻的體會，具體使用情況還需要大家實踐驗證。這里是億速云，小編將為大家推送更多相關知識點的文章，歡迎關注！

向AI問一下細節(jié)

使用Prometheus的規(guī)則有哪些

主機和硬件資源

內(nèi)存不足

主機異常的網(wǎng)絡吞吐

Mysql

連接數(shù)過多

慢查詢

運行時 JVM

Kubernetes

SSL 證書過期

猜你喜歡

最新資訊

相關推薦

相關標簽