您好,登錄后才能下訂單哦!
prometheus operator應(yīng)當(dāng)是使用監(jiān)控系統(tǒng)的最佳實(shí)踐了,首先它一鍵構(gòu)建整個(gè)監(jiān)控系統(tǒng),通過一些無侵入的手段去配置如監(jiān)控?cái)?shù)據(jù)源等
故障自動(dòng)恢復(fù),高可用的告警等。。
不過對(duì)于新手使用上還是有一丟丟小門檻,本文就結(jié)合如何給envoy做監(jiān)控這個(gè)例子來分享使用prometheus operator的正確姿勢(shì)
至于如何寫告警規(guī)則,如何配置prometheus查詢語句不是本文探討的重點(diǎn),會(huì)在后續(xù)文章中給大家分享,本文著重探討如何使用prometheus operator
sealyun離線安裝包內(nèi)已經(jīng)包含prometheus operator,安裝完直接使用即可
原理:通過operator的CRD發(fā)現(xiàn)監(jiān)控?cái)?shù)據(jù)源service
apiVersion: apps/v1
kind: Deployment
metadata:
name: envoy
labels:
app: envoy
spec:
replicas: 1
selector:
matchLabels:
app: envoy
template:
metadata:
labels:
app: envoy
spec:
volumes:
- hostPath: # 為了配置方便把envory配置文件掛載出來了
path: /root/envoy
type: DirectoryOrCreate
name: envoy
containers:
- name: envoy
volumeMounts:
- mountPath: /etc/envoy
name: envoy
readOnly: true
image: envoyproxy/envoy:latest
ports:
- containerPort: 10000 # 數(shù)據(jù)端口
- containerPort: 9901 # 管理端口,metric是通過此端口暴露
---
kind: Service
apiVersion: v1
metadata:
name: envoy
labels:
app: envoy # 給service貼上標(biāo)簽,operator會(huì)去找這個(gè)service
spec:
selector:
app: envoy
ports:
- protocol: TCP
port: 80
targetPort: 10000
name: user
- protocol: TCP # service暴露metric的端口
port: 81
targetPort: 9901
name: metrics # 名字很重要,ServiceMonitor 會(huì)找端口名
envoy配置文件:
監(jiān)聽的地址一定需要修改成0.0.0.0,否則通過service獲取不到metric
/root/envoy/envoy.yaml
admin:
access_log_path: /tmp/admin_access.log
address:
socket_address:
protocol: TCP
address: 0.0.0.0 # 這里一定要改成0.0.0.0,而不能是127.0.0.1
port_value: 9901
static_resources:
listeners:
- name: listener_0
address:
socket_address:
protocol: TCP
address: 0.0.0.0
port_value: 10000
filter_chains:
- filters:
- name: envoy.http_connection_manager
config:
stat_prefix: ingress_http
route_config:
name: local_route
virtual_hosts:
- name: local_service
domains: ["*"]
routes:
- match:
prefix: "/"
route:
host_rewrite: sealyun.com
cluster: service_google
http_filters:
- name: envoy.router
clusters:
- name: service_sealyun
connect_timeout: 0.25s
type: LOGICAL_DNS
# Comment out the following line to test on v6 networks
dns_lookup_family: V4_ONLY
lb_policy: ROUND_ROBIN
hosts:
- socket_address:
address: sealyun.com
port_value: 443
tls_context: { sni: sealyun.com }
envoyServiceMonitor.yaml:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: envoy
name: envoy
namespace: monitoring # 這個(gè)可以與service不在一個(gè)namespace中
spec:
endpoints:
- interval: 15s
port: metrics # envoy service的端口名
path: /stats/prometheus # 數(shù)據(jù)源path
namespaceSelector:
matchNames: # envoy service所在namespace
- default
selector:
matchLabels:
app: envoy # 選擇envoy service
create成功后我們就可以看到envoy的數(shù)據(jù)源了:
然后就可以看到metric了:
然后就可以在grafana上進(jìn)行一些配置了,promethues相關(guān)使用不是本文討論的對(duì)象
[root@dev-86-201 envoy]# kubectl get secret -n monitoring
NAME TYPE DATA AGE
alertmanager-main Opaque 1 27d
我們可以看到這個(gè)secrect,看下里面具體內(nèi)容:
[root@dev-86-201 envoy]# kubectl get secret alertmanager-main -o yaml -n monitoring
apiVersion: v1
data:
alertmanager.yaml: Imdsb2JhbCI6IAogICJyZXNvbHZlX3RpbWVvdXQiOiAiNW0iCiJyZWNlaXZlcnMiOiAKLSAibmFtZSI6ICJudWxsIgoicm91dGUiOiAKICAiZ3JvdXBfYnkiOiAKICAtICJqb2IiCiAgImdyb3VwX2ludGVydmFsIjogIjVtIgogICJncm91cF93YWl0IjogIjMwcyIKICAicmVjZWl2ZXIiOiAibnVsbCIKICAicmVwZWF0X2ludGVydmFsIjogIjEyaCIKICAicm91dGVzIjogCiAgLSAibWF0Y2giOiAKICAgICAgImFsZXJ0bmFtZSI6ICJEZWFkTWFuc1N3aXRjaCIKICAgICJyZWNlaXZlciI6ICJudWxsIg==
kind: Secret
base64解碼一下:
"global":
"resolve_timeout": "5m"
"receivers":
- "name": "null"
"route":
"group_by":
- "job"
"group_interval": "5m"
"group_wait": "30s"
"receiver": "null"
"repeat_interval": "12h"
"routes":
- "match":
"alertname": "DeadMansSwitch"
"receiver": "null"
所以配置alertmanager就非常簡(jiǎn)單了,就是創(chuàng)建一個(gè)secrect即可
如alertmanager.yaml:
global:
smtp_smarthost: 'smtp.qq.com:465'
smtp_from: '474785153@qq.com'
smtp_auth_username: '474785153@qq.com'
smtp_auth_password: 'xxx' # 這個(gè)密碼是開啟smtp授權(quán)后生成的,下文有說怎么配置
smtp_require_tls: false
route:
group_by: ['alertmanager','cluster','service']
group_wait: 30s
group_interval: 5m
repeat_interval: 3h
receiver: 'fanux'
routes:
- receiver: 'fanux'
receivers:
- name: 'fanux'
email_configs:
- to: '474785153@qq.com'
send_resolved: true
delete掉老的secret,根據(jù)自己的配置重新生成secret即可
kubectl delete secret alertmanager-main -n monitoring
kubectl create secret generic alertmanager-main --from-file=alertmanager.yaml -n monitoring
開啟smtp pop3服務(wù)
照著操作即可,后面會(huì)彈框一個(gè)授權(quán)碼,配置到上面的配置文件中
然后就可以收到告警了:
prometheus operator自定義PrometheusRule crd去描述告警規(guī)則
[root@dev-86-202 shell]# kubectl get PrometheusRule -n monitoring
NAME AGE
prometheus-k8s-rules 6m
直接edit這個(gè)rule即可,也可以再自己去創(chuàng)建個(gè)PrometheusRule
kubectl edit PrometheusRule prometheus-k8s-rules -n monitoring
如我們?cè)趃roup里加一個(gè)告警:
spec:
groups:
- name: ./example.rules
rules:
- alert: ExampleAlert
expr: vector(1)
- name: k8s.rules
rules:
重啟prometheuspod:
kubectl delete pod prometheus-k8s-0 prometheus-k8s-1 -n monitoring
然后在界面上就可以看到新加的規(guī)則:
探討可加QQ群:98488045
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。