溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊(cè)×
其他方式登錄
點(diǎn)擊 登錄注冊(cè) 即表示同意《億速云用戶服務(wù)條款》

基于k8s Prometheus+Grafana+Altermanager釘釘報(bào)警

發(fā)布時(shí)間:2020-05-31 02:12:09 來(lái)源:網(wǎng)絡(luò) 閱讀:12403 作者:李永峰Billy 欄目:云計(jì)算

相關(guān)推薦

1. 使用Prometheus Operator監(jiān)控kubetnetes集群

2. 使用Prometheus Operator實(shí)現(xiàn)應(yīng)用自定義監(jiān)控

一、概述

Alertmanager與Prometheus是相互分離的兩個(gè)組件。Prometheus服務(wù)器根據(jù)報(bào)警規(guī)則將警報(bào)發(fā)送給Alertmanager,然后Alertmanager將silencing、inhibition、aggregation等消息通過(guò)電子郵件、dingtalk和HipChat發(fā)送通知。

Alertmanager處理由例如Prometheus服務(wù)器等客戶端發(fā)來(lái)的警報(bào)。它負(fù)責(zé)刪除重復(fù)數(shù)據(jù)、分組,并將警報(bào)通過(guò)路由發(fā)送到正確的接收器,比如電子郵件、Slack、dingtalk等。Alertmanager還支持groups,silencing和警報(bào)抑制的機(jī)制。

釘釘作為內(nèi)部通訊工具,基本上大家在電腦和手機(jī)上都能用,消息可以第一時(shí)間查看,報(bào)警消息的即時(shí)性要求比較高,所以適合用釘釘通知。

二、添加釘釘機(jī)器人

請(qǐng)參考官方文檔:自定義機(jī)器人

添加機(jī)器人后獲取機(jī)器人的hook(機(jī)器人好像只能在釘釘群里面添加),在后面部署會(huì)用到。

機(jī)器人hook:https://oapi.dingtalk.com/robot/send?access_token=xxxxxx

三、配置Alertmanager

Alertmanager官方文檔:https://github.com/prometheus/docs/blob/db2a09a8a7e193d6e474f37055908a6d432b88b5/content/docs/alerting/configuration.md#webhook_config

修改Alertmanager報(bào)警配置,因上面的官方文檔已經(jīng)給出來(lái)每個(gè)參數(shù)的詳細(xì)信息,就不再一一解釋了。

[root@node-01 prometheus]# vim prometheus-operator/values.yaml 

  config:
    global:
      resolve_timeout: 2m
    route:
      group_by: ['job']
      group_wait: 30s
      group_interval: 2m
      repeat_interval: 12h
      receiver: 'webhook'
      routes:
      - match:
          alertname: DeadMansSwitch
        receiver: 'webhook'
    receivers:
    - name: 'webhook'
      webhook_configs:
      - url: http://webhook-dingtalk/dingtalk/send/
        send_resolved: true

更新prometheus-operator

[root@node-01 prometheus]# helm upgrade p ./prometheus-operator

修改成功后可以在alertmanager的status頁(yè)面看到相關(guān)配置
基于k8s Prometheus+Grafana+Altermanager釘釘報(bào)警

四、轉(zhuǎn)換alertmanager數(shù)據(jù)格式

Alertmanager會(huì)以下列JSON格式的數(shù)據(jù)通過(guò)HTTP POST請(qǐng)求發(fā)送到端點(diǎn):

{
  "version": "4",
  "groupKey": <string>,    // key identifying the group of alerts (e.g. to deduplicate)
  "status": "<resolved|firing>",
  "receiver": <string>,
  "groupLabels": <object>,
  "commonLabels": <object>,
  "commonAnnotations": <object>,
  "externalURL": <string>,  // backlink to the Alertmanager.
  "alerts": [
    {
      "labels": <object>,
      "annotations": <object>,
      "startsAt": "<rfc3339>",
      "endsAt": "<rfc3339>"
    },
    ...
  ]
}

這是測(cè)試報(bào)警數(shù)據(jù)的示例:

b'{
"receiver":"webhook",
"status":"firing",
"alerts":[{
    "status":"firing",
    "labels":{
        "alertname":"DeadMansSwitch",
        "prometheus":"monitoring/p-prometheus",
        "severity":"none"

    },
    "annotations":{
        "message":"This is a DeadMansSwitch meant to ensure that the entire alerting pipeline is functional."

    },
    "startsAt":"2019-03-08T10:02:28.680317737Z",
    "endsAt":"0001-01-01T00:00:00Z",
    "generatorURL":"http://prom.cnlinux.club/graph?g0.expr=vector%281%29\\u0026g0.tab=1"

}],
"groupLabels":{},
"commonLabels":{
    "alertname":"DeadMansSwitch",
    "prometheus":"monitoring/p-prometheus",
    "severity":"none"

},
"commonAnnotations":{
"message":"This is a DeadMansSwitch meant to ensure that the entire alerting pipeline is functional."

},
"externalURL":"http://alert.cnlinux.club","version":"4",
"groupKey":"{}/{alertname=\\"DeadMansSwitch\\"}:{}"}\n' 

釘釘對(duì)數(shù)據(jù)的格式是有要求的(具體要求在上面釘釘官方文檔),所以需要將Alertmanager傳過(guò)來(lái)的數(shù)據(jù)進(jìn)行格式轉(zhuǎn)化。

以下我們用自己寫(xiě)的python腳本來(lái)轉(zhuǎn)換。

腳本說(shuō)明:

  • alertmanager傳過(guò)來(lái)的數(shù)據(jù)中,重要的是labels{}的數(shù)據(jù),但是里面數(shù)據(jù)太多,很多信息在報(bào)警的信息中是不需要的,所以在腳本中添加了一個(gè)EXCLUDE_LIST列表,用于排除不需要的數(shù)據(jù)。
[root@node-01 prometheus]# cat app.py
#!/usr/bin/env python
import io, sys

sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding='utf-8')
sys.stderr = io.TextIOWrapper(sys.stderr.detach(), encoding='utf-8')

from flask import Flask, Response
from flask import request
import requests
import logging
import json
import locale
#locale.setlocale(locale.LC_ALL,"en_US.UTF-8")

app = Flask(__name__)

console = logging.StreamHandler()
fmt = '%(asctime)s - %(filename)s:%(lineno)s - %(name)s - %(message)s'
formatter = logging.Formatter(fmt)
console.setFormatter(formatter)
log = logging.getLogger("flask_webhook_dingtalk")
log.addHandler(console)
log.setLevel(logging.DEBUG)

EXCLUDE_LIST = ['prometheus', 'endpoint']

@app.route('/')
def index():
    return 'Webhook Dingtalk by Billy https://blog.51cto.com/billy98'

@app.route('/dingtalk/send/',methods=['POST'])

def hander_session():

    profile_url = sys.argv[1]
    post_data = request.get_data()
    post_data = json.loads(post_data.decode("utf-8"))['alerts']
    post_data = post_data[0]
    messa_list = []
    messa_list.append('### 報(bào)警類(lèi)型: %s' % post_data['status'].upper())
    messa_list.append('**startsAt:** %s' % post_data['startsAt'])
    for i in post_data['labels'].keys():
        if i in EXCLUDE_LIST:
            continue
        else:
            messa_list.append("**%s:** %s" % (i, post_data['labels'][i]))
    messa_list.append('**Describe:** %s' % post_data['annotations']['message'])

    messa = (' \\n\\n > '.join(messa_list))
    status = alert_data(messa, post_data['labels']['alertname'], profile_url )
    log.info(status)
    return status

def alert_data(data,title,profile_url):
    headers = {'Content-Type':'application/json'}
    send_data = '{"msgtype": "markdown","markdown": {"title": \"%s\" ,"text": \"%s\" }}' %(title,data)  # type: str
    send_data = send_data.encode('utf-8')
    reps = requests.post(url=profile_url, data=send_data, headers=headers)
    return reps.text

if __name__ == '__main__':
    app.debug = False
    app.run(host='0.0.0.0', port='8080')

五、制作Docker鏡像

將上面的python腳本做成鏡像,然后把他們以服務(wù)的形式運(yùn)行在k8s集群中,保證高可用。

大家也可以用我已經(jīng)制作成功的鏡像:docker pull billy98/webhook-dingtalk:latest,直接pull即可。

[root@node-01 prometheus]# cat Dockerfile

FROM centos:7 as build
MAINTAINER billy98 5884625@qq.com

RUN curl -o /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo && yum install -y python36 python36-pip && pip3.6 install flask requests werkzeug
ADD app.py /usr/local/alert-dingtalk.py

FROM gcr.io/distroless/python3
COPY --from=build /usr/local/alert-dingtalk.py /usr/local/alert-dingtalk.py
COPY --from=build usr/local/lib64/python3.6/site-packages usr/local/lib64/python3.6/site-packages
COPY --from=build usr/local/lib/python3.6/site-packages usr/local/lib/python3.6/site-packages
ENV PYTHONPATH=usr/local/lib/python3.6/site-packages:usr/local/lib64/python3.6/site-packages
EXPOSE 8080
ENTRYPOINT ["python","/usr/local/alert-dingtalk.py"]
[root@node-01 prometheus]# docker build -t billy98/webhook-dingtalk:latest .

我這樣build出來(lái)的鏡像只有50多M,具體的使用方法參考:

distroless:https://github.com/GoogleContainerTools/distroless

六、部署webhook-dingtalk

[root@node-01 prometheus]# cat webhook-dingtalk.yaml 
apiVersion: apps/v1beta2
kind: Deployment
metadata:
  labels:
    app: webhook-dingtalk
  name: webhook-dingtalk
  namespace: monitoring
  #需要和alertmanager在同一個(gè)namespace
spec:
  replicas: 1
  selector:
    matchLabels:
      app: webhook-dingtalk
  template:
    metadata:
      labels:
        app: webhook-dingtalk
    spec:
      containers:
      - image: billy98/webhook-dingtalk:latest
        name: webhook-dingtalk
        args:
        - "https://oapi.dingtalk.com/robot/send?access_token=xxxxxx"
        #上面創(chuàng)建的釘釘機(jī)器人hook
        ports:
        - containerPort: 8080
          protocol: TCP
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
          limits:
            cpu: 500m
            memory: 500Mi
        livenessProbe:
          failureThreshold: 3
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
          tcpSocket:
            port: 8080
        readinessProbe:
          failureThreshold: 3
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
          httpGet:
            port: 8080
            path: /
      imagePullSecrets:
        - name: IfNotPresent
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: webhook-dingtalk
  name: webhook-dingtalk
  namespace: monitoring
  #需要和alertmanager在同一個(gè)namespace
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 8080
  selector:
    app: webhook-dingtalk
  type: ClusterIP 

釘釘中報(bào)警信息如下:

基于k8s Prometheus+Grafana+Altermanager釘釘報(bào)警

基于k8s Prometheus+Grafana+Altermanager釘釘報(bào)警

報(bào)警恢復(fù)的消息
基于k8s Prometheus+Grafana+Altermanager釘釘報(bào)警

至此所有的操作已完成。

如有問(wèn)題歡迎在下面留言交流。希望大家多多關(guān)注和點(diǎn)贊,謝謝!

向AI問(wèn)一下細(xì)節(jié)

免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。

AI