Actuator + Prometheus + Grafan

發(fā)布時(shí)間：2020-09-27 21:10:57 來源：網(wǎng)絡(luò) 閱讀：989 作者：ZeroOne01 欄目：系統(tǒng)運(yùn)維

[TOC]

前言

關(guān)于Actuator：

對(duì)Spring Boot監(jiān)控能力有過了解的小伙伴都應(yīng)該知道Spring Boot Actuator這個(gè)子項(xiàng)目，它為應(yīng)用提供了強(qiáng)大的監(jiān)控能力。從Spring Boot 2.x開始，Actuator將底層改為Micrometer，提供了更強(qiáng)、更靈活的監(jiān)控能力。Micrometer是一個(gè)監(jiān)控門面，可以類比成監(jiān)控界的 Slf4j 。借助Micrometer，應(yīng)用能夠?qū)痈鞣N監(jiān)控系統(tǒng)，例如本文所要介紹的：Prometheus

關(guān)于Prometheus ：

Prometheus是一個(gè)由SoundCloud開發(fā)的開源系統(tǒng)監(jiān)控+告警+時(shí)序列數(shù)據(jù)庫(TSDB)，Prometheus大部分組件使用Go語言編寫，是Google BorgMon監(jiān)控系統(tǒng)的開源版本。目前在CNCF基金會(huì)托管，并已成功孵化。在開源社區(qū)Prometheus目前也是相當(dāng)活躍，在性能上Prometheus也足夠支撐上萬臺(tái)規(guī)模的集群。

Prometheus的功能：

用度量名和鍵值對(duì)識(shí)別時(shí)間序列數(shù)據(jù)的多維數(shù)據(jù)模型

擁有靈活的查詢語言：PromQL

不依賴分布式存儲(chǔ)，單個(gè)服務(wù)器節(jié)點(diǎn)是自治的

通過基于HTTP的pull方式采集時(shí)序數(shù)據(jù)

可以通過中間網(wǎng)關(guān)進(jìn)行時(shí)序列數(shù)據(jù)的推送

支持通過服務(wù)發(fā)現(xiàn)或者靜態(tài)配置來發(fā)現(xiàn)目標(biāo)服務(wù)對(duì)象

支持多種多樣的圖表和界面展示，比如Grafana等

更多內(nèi)容參考：官方文檔，GitHub倉庫

關(guān)于Grafana：

Grafana 是一款采用 GO 語言編寫的開源應(yīng)用，支持跨平臺(tái)度量分析和可視化 + 告警?？梢酝ㄟ^將采集的數(shù)據(jù)查詢?nèi)缓罂梢暬卣故?，并及時(shí)通知。Grafana 支持多種數(shù)據(jù)源和展示方式，總而言之是一款強(qiáng)大酷炫的監(jiān)控指標(biāo)可視化工具。

更多內(nèi)容參考：官方文檔，GitHub倉庫

創(chuàng)建項(xiàng)目

本文的主要目的是實(shí)現(xiàn)微服務(wù)的監(jiān)控，簡(jiǎn)單了解了上述工具的概念后，我們就來動(dòng)手實(shí)踐一下。首先創(chuàng)建一個(gè)簡(jiǎn)單的Spring Boot項(xiàng)目，其主要依賴如下：

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

Tips：這里如果想要對(duì)接其他的監(jiān)控系統(tǒng)，只需要更改依賴的包名。例如想對(duì)接 Influx ，則將依賴改為 micrometer-registry-influx 即可。

編輯項(xiàng)目配置：

server:
  port: 9562
spring:
  application:
    # 指定應(yīng)用名
    name: prometheus-demo
management:
  endpoints:
    web:
      exposure:
        # 將 Actuator 的 /actuator/prometheus 端點(diǎn)暴露出來
        include: 'prometheus'
  metrics:
    tags:
      # 為指標(biāo)設(shè)置一個(gè)Tag，這里設(shè)置為應(yīng)用名，Tag是Prometheus提供的一種能力，從而實(shí)現(xiàn)更加靈活的篩選
      application: ${spring.application.name}

完成以上步驟后，進(jìn)行一個(gè)簡(jiǎn)單的測(cè)試，看看端點(diǎn)是否能正常返回監(jiān)控?cái)?shù)據(jù)。啟動(dòng)項(xiàng)目，訪問/actuator/prometheus端點(diǎn)。正常情況下會(huì)返回如下內(nèi)容：

# HELP process_start_time_seconds Start time of the process since unix epoch.
# TYPE process_start_time_seconds gauge
process_start_time_seconds{application="prometheus-demo",} 1.577697308142E9
# HELP jvm_buffer_memory_used_bytes An estimate of the memory that the Java virtual machine is using for this buffer pool
# TYPE jvm_buffer_memory_used_bytes gauge
jvm_buffer_memory_used_bytes{application="prometheus-demo",id="mapped",} 0.0
jvm_buffer_memory_used_bytes{application="prometheus-demo",id="direct",} 16384.0
# HELP tomcat_sessions_expired_sessions_total  
# TYPE tomcat_sessions_expired_sessions_total counter
tomcat_sessions_expired_sessions_total{application="prometheus-demo",} 0.0
# HELP jvm_gc_pause_seconds Time spent in GC pause
# TYPE jvm_gc_pause_seconds summary
jvm_gc_pause_seconds_count{action="end of minor GC",application="prometheus-demo",cause="Metadata GC Threshold",} 1.0
jvm_gc_pause_seconds_sum{action="end of minor GC",application="prometheus-demo",cause="Metadata GC Threshold",} 0.006
jvm_gc_pause_seconds_count{action="end of major GC",application="prometheus-demo",cause="Metadata GC Threshold",} 1.0
jvm_gc_pause_seconds_sum{action="end of major GC",application="prometheus-demo",cause="Metadata GC Threshold",} 0.032
jvm_gc_pause_seconds_count{action="end of minor GC",application="prometheus-demo",cause="Allocation Failure",} 1.0
jvm_gc_pause_seconds_sum{action="end of minor GC",application="prometheus-demo",cause="Allocation Failure",} 0.008
# HELP jvm_gc_pause_seconds_max Time spent in GC pause
# TYPE jvm_gc_pause_seconds_max gauge
jvm_gc_pause_seconds_max{action="end of minor GC",application="prometheus-demo",cause="Metadata GC Threshold",} 0.006
jvm_gc_pause_seconds_max{action="end of major GC",application="prometheus-demo",cause="Metadata GC Threshold",} 0.032
jvm_gc_pause_seconds_max{action="end of minor GC",application="prometheus-demo",cause="Allocation Failure",} 0.008
# HELP jvm_memory_used_bytes The amount of used memory
# TYPE jvm_memory_used_bytes gauge
jvm_memory_used_bytes{application="prometheus-demo",area="heap",id="PS Survivor Space",} 0.0
jvm_memory_used_bytes{application="prometheus-demo",area="heap",id="PS Old Gen",} 1.3801776E7
jvm_memory_used_bytes{application="prometheus-demo",area="nonheap",id="Metaspace",} 3.522832E7
jvm_memory_used_bytes{application="prometheus-demo",area="nonheap",id="Code Cache",} 6860800.0
jvm_memory_used_bytes{application="prometheus-demo",area="heap",id="PS Eden Space",} 1.9782928E7
jvm_memory_used_bytes{application="prometheus-demo",area="nonheap",id="Compressed Class Space",} 4825568.0
# HELP logback_events_total Number of error level events that made it to the logs
# TYPE logback_events_total counter
logback_events_total{application="prometheus-demo",level="info",} 7.0
logback_events_total{application="prometheus-demo",level="trace",} 0.0
logback_events_total{application="prometheus-demo",level="warn",} 0.0
logback_events_total{application="prometheus-demo",level="debug",} 0.0
logback_events_total{application="prometheus-demo",level="error",} 0.0
# HELP process_uptime_seconds The uptime of the Java virtual machine
# TYPE process_uptime_seconds gauge
process_uptime_seconds{application="prometheus-demo",} 30.499
# HELP jvm_buffer_count_buffers An estimate of the number of buffers in the pool
# TYPE jvm_buffer_count_buffers gauge
jvm_buffer_count_buffers{application="prometheus-demo",id="mapped",} 0.0
jvm_buffer_count_buffers{application="prometheus-demo",id="direct",} 2.0
# HELP system_cpu_count The number of processors available to the Java virtual machine
# TYPE system_cpu_count gauge
system_cpu_count{application="prometheus-demo",} 6.0
# HELP jvm_threads_peak_threads The peak live thread count since the Java virtual machine started or peak was reset
# TYPE jvm_threads_peak_threads gauge
jvm_threads_peak_threads{application="prometheus-demo",} 22.0
# HELP tomcat_sessions_alive_max_seconds  
# TYPE tomcat_sessions_alive_max_seconds gauge
tomcat_sessions_alive_max_seconds{application="prometheus-demo",} 0.0
# HELP jvm_memory_committed_bytes The amount of memory in bytes that is committed for the Java virtual machine to use
# TYPE jvm_memory_committed_bytes gauge
jvm_memory_committed_bytes{application="prometheus-demo",area="heap",id="PS Survivor Space",} 1.5204352E7
jvm_memory_committed_bytes{application="prometheus-demo",area="heap",id="PS Old Gen",} 1.31596288E8
jvm_memory_committed_bytes{application="prometheus-demo",area="nonheap",id="Metaspace",} 3.7879808E7
jvm_memory_committed_bytes{application="prometheus-demo",area="nonheap",id="Code Cache",} 6881280.0
jvm_memory_committed_bytes{application="prometheus-demo",area="heap",id="PS Eden Space",} 1.76685056E8
jvm_memory_committed_bytes{application="prometheus-demo",area="nonheap",id="Compressed Class Space",} 5373952.0
# HELP jvm_buffer_total_capacity_bytes An estimate of the total capacity of the buffers in this pool
# TYPE jvm_buffer_total_capacity_bytes gauge
jvm_buffer_total_capacity_bytes{application="prometheus-demo",id="mapped",} 0.0
jvm_buffer_total_capacity_bytes{application="prometheus-demo",id="direct",} 16384.0
# HELP jvm_gc_live_data_size_bytes Size of old generation memory pool after a full GC
# TYPE jvm_gc_live_data_size_bytes gauge
jvm_gc_live_data_size_bytes{application="prometheus-demo",} 1.3801776E7
# HELP jvm_memory_max_bytes The maximum amount of memory in bytes that can be used for memory management
# TYPE jvm_memory_max_bytes gauge
jvm_memory_max_bytes{application="prometheus-demo",area="heap",id="PS Survivor Space",} 1.5204352E7
jvm_memory_max_bytes{application="prometheus-demo",area="heap",id="PS Old Gen",} 2.841116672E9
jvm_memory_max_bytes{application="prometheus-demo",area="nonheap",id="Metaspace",} -1.0
jvm_memory_max_bytes{application="prometheus-demo",area="nonheap",id="Code Cache",} 2.5165824E8
jvm_memory_max_bytes{application="prometheus-demo",area="heap",id="PS Eden Space",} 1.390411776E9
jvm_memory_max_bytes{application="prometheus-demo",area="nonheap",id="Compressed Class Space",} 1.073741824E9
# HELP jvm_threads_daemon_threads The current number of live daemon threads
# TYPE jvm_threads_daemon_threads gauge
jvm_threads_daemon_threads{application="prometheus-demo",} 18.0
# HELP jvm_threads_states_threads The current number of threads having NEW state
# TYPE jvm_threads_states_threads gauge
jvm_threads_states_threads{application="prometheus-demo",state="runnable",} 8.0
jvm_threads_states_threads{application="prometheus-demo",state="new",} 0.0
jvm_threads_states_threads{application="prometheus-demo",state="timed-waiting",} 2.0
jvm_threads_states_threads{application="prometheus-demo",state="blocked",} 0.0
jvm_threads_states_threads{application="prometheus-demo",state="waiting",} 12.0
jvm_threads_states_threads{application="prometheus-demo",state="terminated",} 0.0
# HELP jvm_gc_memory_promoted_bytes_total Count of positive increases in the size of the old generation memory pool before GC to after GC
# TYPE jvm_gc_memory_promoted_bytes_total counter
jvm_gc_memory_promoted_bytes_total{application="prometheus-demo",} 8296848.0
# HELP tomcat_sessions_active_max_sessions  
# TYPE tomcat_sessions_active_max_sessions gauge
tomcat_sessions_active_max_sessions{application="prometheus-demo",} 0.0
# HELP tomcat_sessions_created_sessions_total  
# TYPE tomcat_sessions_created_sessions_total counter
tomcat_sessions_created_sessions_total{application="prometheus-demo",} 0.0
# HELP jvm_gc_memory_allocated_bytes_total Incremented for an increase in the size of the young generation memory pool after one GC to before the next
# TYPE jvm_gc_memory_allocated_bytes_total counter
jvm_gc_memory_allocated_bytes_total{application="prometheus-demo",} 1.36924824E8
# HELP process_cpu_usage The "recent cpu usage" for the Java Virtual Machine process
# TYPE process_cpu_usage gauge
process_cpu_usage{application="prometheus-demo",} 0.10024585094452443
# HELP system_cpu_usage The "recent cpu usage" for the whole system
# TYPE system_cpu_usage gauge
system_cpu_usage{application="prometheus-demo",} 0.38661791030714154
# HELP tomcat_sessions_active_current_sessions  
# TYPE tomcat_sessions_active_current_sessions gauge
tomcat_sessions_active_current_sessions{application="prometheus-demo",} 0.0
# HELP jvm_classes_loaded_classes The number of classes that are currently loaded in the Java virtual machine
# TYPE jvm_classes_loaded_classes gauge
jvm_classes_loaded_classes{application="prometheus-demo",} 7195.0
# HELP http_server_requests_seconds  
# TYPE http_server_requests_seconds summary
http_server_requests_seconds_count{application="prometheus-demo",exception="None",method="GET",outcome="CLIENT_ERROR",status="404",uri="/**",} 1.0
http_server_requests_seconds_sum{application="prometheus-demo",exception="None",method="GET",outcome="CLIENT_ERROR",status="404",uri="/**",} 0.012429856
# HELP http_server_requests_seconds_max  
# TYPE http_server_requests_seconds_max gauge
http_server_requests_seconds_max{application="prometheus-demo",exception="None",method="GET",outcome="CLIENT_ERROR",status="404",uri="/**",} 0.012429856
# HELP jvm_gc_max_data_size_bytes Max size of old generation memory pool
# TYPE jvm_gc_max_data_size_bytes gauge
jvm_gc_max_data_size_bytes{application="prometheus-demo",} 2.841116672E9
# HELP jvm_threads_live_threads The current number of live threads including both daemon and non-daemon threads
# TYPE jvm_threads_live_threads gauge
jvm_threads_live_threads{application="prometheus-demo",} 22.0
# HELP jvm_classes_unloaded_classes_total The total number of classes unloaded since the Java virtual machine has started execution
# TYPE jvm_classes_unloaded_classes_total counter
jvm_classes_unloaded_classes_total{application="prometheus-demo",} 1.0
# HELP tomcat_sessions_rejected_sessions_total  
# TYPE tomcat_sessions_rejected_sessions_total counter
tomcat_sessions_rejected_sessions_total{application="prometheus-demo",} 0.0

該端點(diǎn)返回的數(shù)據(jù)是Prometheus需要使用的。每一項(xiàng)都有相應(yīng)的注釋解釋其含義，相信不難看懂。例如：

# HELP jvm_memory_used_bytes The amount of used memory
# TYPE jvm_memory_used_bytes gauge
jvm_memory_used_bytes{application="prometheus-demo",area="heap",id="PS Survivor Space",} 0.0

表示：prometheus-demo 應(yīng)用堆內(nèi)存中的 PS Survivor Space 區(qū)域占用的空間是 0.0 個(gè)字節(jié)。

安裝Prometheus服務(wù)

接下來就是需要在服務(wù)器上安裝Prometheus服務(wù)，用于從微服務(wù)暴露的監(jiān)控端點(diǎn)中采集監(jiān)控?cái)?shù)據(jù)。為了簡(jiǎn)單起見，我這里采用docker的安裝方式，其他安裝方式可以參考官方安裝文檔。

首先為Prometheus準(zhǔn)備一個(gè)配置文件：

[root@localhost ~]# mkdir /etc/prometheus
[root@localhost ~]# vim /etc/prometheus/prometheus.yml
scrape_configs:
# 任意寫，建議英文，不要包含特殊字符
- job_name: 'spring'
  # 多久采集一次數(shù)據(jù)
  scrape_interval: 15s
  # 采集時(shí)的超時(shí)時(shí)間
  scrape_timeout: 10s
  # 采集的端點(diǎn)
  metrics_path: '/actuator/prometheus'
  # 被采集的服務(wù)地址，即微服務(wù)的ip及端口
  static_configs:
  - targets: ['192.168.1.252:9562']

該配置文件的目的是讓Prometheus服務(wù)自動(dòng)每隔15秒請(qǐng)求 http://192.168.1.252:9562/actuator/prometheus 。更多配置項(xiàng)參考：Prometheus Configuration官方文檔

最后通過docker啟動(dòng)Prometheus服務(wù)，命令如下：

[root@localhost ~]# docker run -d -p 9090:9090 -v /etc/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus --config.file=/etc/prometheus/prometheus.yml

啟動(dòng)成功后，正常情況下訪問http://{ip}:9090，就可以看到Prometheus的首頁：
Actuator + Prometheus + Grafan

點(diǎn)擊 Insert metric at cursor ，即可選擇監(jiān)控指標(biāo)；點(diǎn)擊 Graph ，即可讓指標(biāo)以圖表方式展示；點(diǎn)擊Execute 按鈕，即可看到類似下圖的結(jié)果：
Actuator + Prometheus + Grafan

功能說明：

Insert metric at cursor：選擇展示的指標(biāo)
Graph：讓指標(biāo)以圖形展示
Execute：繪制指標(biāo)圖表信息
Add Graph：繪制更多指標(biāo)圖表

Grafana可視化

上一小節(jié)我們已經(jīng)成功搭建了Prometheus服務(wù)，并簡(jiǎn)單介紹了Prometheus自帶的監(jiān)控?cái)?shù)據(jù)可視化界面，然而使用體驗(yàn)并不好，功能也比較少。下面我們來集成Grafana實(shí)現(xiàn)更友好、更貼近生產(chǎn)的監(jiān)控?cái)?shù)據(jù)可視化平臺(tái)。

同樣需要在服務(wù)器上安裝Grafana服務(wù)，為了簡(jiǎn)單起見，我這里依舊采用docker的安裝方式。其他安裝方式可以參考官方安裝文檔。

使用docker只需要一行命令就可以啟動(dòng)Grafana，如下：

[root@localhost ~]# docker run -d -p 3000:3000 grafana/grafana

配置監(jiān)控?cái)?shù)據(jù)源

Grafana啟動(dòng)成功后，訪問http://{ip}:3000/login進(jìn)行登錄，默認(rèn)賬戶密碼均為admin：
Actuator + Prometheus + Grafan

登錄成功后，首頁如下：
Actuator + Prometheus + Grafan

首先需要添加監(jiān)控?cái)?shù)據(jù)的來源，點(diǎn)擊首頁中的Add data source ，即可看到類似如下的界面：
Actuator + Prometheus + Grafan

這里點(diǎn)擊Prometheus，即可看到類似如下界面，在這里配置Prometheus服務(wù)相關(guān)的信息：
Actuator + Prometheus + Grafan

保存成功后會(huì)有如下提示：
Actuator + Prometheus + Grafan

創(chuàng)建監(jiān)控Dashboard

點(diǎn)擊導(dǎo)航欄上的 + 按鈕，并點(diǎn)擊Dashboard，將會(huì)看到類似如下的界面：
Actuator + Prometheus + Grafan

點(diǎn)擊 Add Query ，即可看到類似如下的界面：
Actuator + Prometheus + Grafan

在紅框標(biāo)記的位置添加指標(biāo)查詢，指標(biāo)的取值詳見Spring Boot應(yīng)用的 /actuator/prometheus 端點(diǎn)，例如jvm_memory_used_bytes 、jvm_threads_states_threads 、jvm_threads_live_threads 等。

Grafana會(huì)給你較好的提示，并且支持較為復(fù)雜的計(jì)算，例如聚合、求和、平均等。如果想要繪制多個(gè)線條，可點(diǎn)擊Add Query 按鈕。如上圖所示，筆者為圖表繪制了兩條線，分別代表daemon以及peak線程。

點(diǎn)擊下圖的按鈕，并填入Title，即可設(shè)置圖表標(biāo)題：
Actuator + Prometheus + Grafan

若需要為Dashboard添加新的圖表則點(diǎn)擊上圖中的左上角按鈕：
Actuator + Prometheus + Grafan

并按下圖步驟操作即可：
Actuator + Prometheus + Grafan

如果需要保存該Dashboard，則點(diǎn)擊右上角的保存按鈕即可：
Actuator + Prometheus + Grafan

Dashboard市場(chǎng)

至此，我們已經(jīng)成功將Grafana與Prometheus集成，實(shí)現(xiàn)了較為豐富的圖表展示——將關(guān)心的監(jiān)控指標(biāo)放置到Dashboard上，并且非常靈活！然而，這個(gè)配置的操作雖然不難，但還是挺費(fèi)時(shí)間的。

那么是否有配置好的又強(qiáng)大、又通用、拿來即用的Dashboard呢？答案是肯定的！前往 Grafana Lab - Dashboards ，輸入關(guān)鍵詞即可搜索指定Dashboard：
Actuator + Prometheus + Grafan

如上圖所示，可以找到若干款以 Prometheus 作為數(shù)據(jù)源，支持Micrometer的Dashboard。下面，簡(jiǎn)單演示一下如何使用 JVM(Micrometer) 這個(gè)Dashboard。點(diǎn)擊 JVM(Micrometer) 進(jìn)入Dashboard詳情介紹頁，如下圖所示：
Actuator + Prometheus + Grafan

如圖已詳細(xì)描述了該Dashboard的特性、配置。其中的management.metrics.tags.application ，前面安裝Prometheus服務(wù)時(shí)已經(jīng)配置過了。該頁的右上角用紅框標(biāo)注的 4701 是一個(gè)非常重要的數(shù)字，因?yàn)檫@是該Dashboard的id。

回到Grafana的首頁，我們來導(dǎo)入這個(gè)Dashboard，按下圖步驟操作：
Actuator + Prometheus + Grafan

輸入后即可看到類似如下的界面，選擇數(shù)據(jù)源，并點(diǎn)擊Import：
Actuator + Prometheus + Grafan

此時(shí)，即可看到類似如下的界面，我們常關(guān)心的指標(biāo)該Dashboard均已支持：
Actuator + Prometheus + Grafan

在上方的選項(xiàng)欄中可以選擇查看不同的服務(wù)/應(yīng)用：
Actuator + Prometheus + Grafan

此外，還有一些比較好用的Dashboard，可以自行了解一下這里就不贅述了：

JVM (Actuator)
Spring Boot Statistics

向AI問一下細(xì)節(jié)

Actuator + Prometheus + Grafan

前言

創(chuàng)建項(xiàng)目

安裝Prometheus服務(wù)

Grafana可視化

配置監(jiān)控?cái)?shù)據(jù)源

創(chuàng)建監(jiān)控Dashboard

Dashboard市場(chǎng)

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標(biāo)簽