分布式應(yīng)用監(jiān)控：SkyWalking 快速接入實(shí)踐

發(fā)布時(shí)間：2020-07-05 17:28:07 來源：網(wǎng)絡(luò) 閱讀：9552 作者：wx5d6cccb1cb158 欄目：編程語言

分布式應(yīng)用，會存在各種問題。而要解決這些難題，除了要應(yīng)用自己做一些監(jiān)控埋點(diǎn)外，還應(yīng)該有一些外圍的系統(tǒng)進(jìn)行主動探測，主動發(fā)現(xiàn)。

APM工具就是干這活的，SkyWalking 是國人開源的一款優(yōu)秀的APM應(yīng)用，已成為apache的頂級項(xiàng)目。

今天我們就來實(shí)踐下 SkyWalking 下吧。

實(shí)踐目標(biāo)：達(dá)到監(jiān)控現(xiàn)有的幾個(gè)系統(tǒng)，清楚各調(diào)用關(guān)系，可以找到出性能問題點(diǎn)。

實(shí)踐步驟：

SkyWalking 服務(wù)端安裝運(yùn)行；
應(yīng)用端的接入；
后臺查看效果；
分析排查問題；
深入了解（如有心情）；
SkyWalking 服務(wù)端安裝

下載應(yīng)用包:

# 主下載頁
 http://skywalking.apache.org/downloads/
 # 點(diǎn)開具體下載地址后進(jìn)行下載，如：
 wget http://mirrors.tuna.tsinghua.edu.cn/apache/skywalking/6.5.0/apache-skywalking-apm-6.5.0.tar.gz

解壓安裝包:

 tar -xzvf apache-skywalking-apm-6.5.0.tar.gz

使用默認(rèn)配置端口，默認(rèn)存儲方式 h3, 直接啟動服務(wù)：

  ./bin/startup.sh

好產(chǎn)品就是這么簡單！

現(xiàn)在服務(wù)端就啟起來了，可以打開后臺地址查看(默認(rèn)是8080端口): http://localhost:8080 界面如下：
分布式應(yīng)用監(jiān)控：SkyWalking 快速接入實(shí)踐

當(dāng)然，上面是已存在應(yīng)用的頁面。現(xiàn)在你是看不到任何應(yīng)用的，因?yàn)槟氵€沒有接入嘛。

應(yīng)用端的接入

我們只以java應(yīng)用接入方式實(shí)踐。

直接使用 javaagent 進(jìn)行啟動即可：

java -javaagent:/root/skywalking/agent/skywalking-agent.jar -Dskywalking.agent.service_name=app1 -Dskywalking.collector.backend_service=localhost:11800 -jar myapp.jar

參數(shù)說明：

# 參數(shù)解釋
 skywalking.agent.service_name: 本應(yīng)用在skywalking中的名稱
 skywalking.collector.backend_service: skywalking 服務(wù)端地址，grpc上報(bào)地址，默認(rèn)端口是 11800
 # 上面兩個(gè)參數(shù)也可以使用另外的表現(xiàn)形式
 SW_AGENT_COLLECTOR_BACKEND_SERVICES: 與 skywalking.collector.backend_service 含義相同
 SW_AGENT_NAME: 與 skywalking.agent.service_name 含義相同

隨便訪問幾個(gè)接口或頁面，使監(jiān)控抓取到數(shù)據(jù)。

再回管理頁面，已經(jīng)看到有節(jié)點(diǎn)了。截圖如上。

現(xiàn)在我們還可以查看各應(yīng)用之間的關(guān)系了!
分布式應(yīng)用監(jiān)控：SkyWalking 快速接入實(shí)踐

關(guān)系清晰吧！一目了然，代碼再復(fù)雜也不怕了。

我們還可以追蹤具體鏈路：
分布式應(yīng)用監(jiān)控：SkyWalking 快速接入實(shí)踐

只要知道問題發(fā)生的時(shí)間點(diǎn)，即可以很快定位到發(fā)生問題的接口、系統(tǒng)，快速解決。

SkyWalking 配置文件

如上，我們并沒有改任何配置文件，就讓系統(tǒng)跑起來了。幸運(yùn)的同時(shí)，我們應(yīng)該要知道更多！至少配置得知道。

config/application.yml : 收集器服務(wù)端配置

webapp/webapp.yml : 配置 Web 的端口及獲取數(shù)據(jù)的 OAP(Collector)的IP和端口

agent/config/agent.config : 配置 Agent 信息，如 Skywalking OAP(Collector)的地址和名稱

下面是 skywalking 的默認(rèn)配置，我們可以不用更改就能跑起來一個(gè)樣例！更改以生產(chǎn)化配置！

config/application.yml

cluster:
 standalone:
 # Please check your ZooKeeper is 3.5+, However, it is also compatible with ZooKeeper 3.4.x. Replace the ZooKeeper 3.5+
 # library the oap-libs folder with your ZooKeeper 3.4.x library.
# zookeeper:
# nameSpace: ${SW_NAMESPACE:""}
# hostPort: ${SW_CLUSTER_ZK_HOST_PORT:localhost:2181}
# #Retry Policy
# baseSleepTimeMs: ${SW_CLUSTER_ZK_SLEEP_TIME:1000} # initial amount of time to wait between retries
# maxRetries: ${SW_CLUSTER_ZK_MAX_RETRIES:3} # max number of times to retry
# # Enable ACL
# enableACL: ${SW_ZK_ENABLE_ACL:false} # disable ACL in default
# schema: ${SW_ZK_SCHEMA:digest} # only support digest schema
# expression: ${SW_ZK_EXPRESSION:skywalking:skywalking}
# kubernetes:
# watchTimeoutSeconds: ${SW_CLUSTER_K8S_WATCH_TIMEOUT:60}
# namespace: ${SW_CLUSTER_K8S_NAMESPACE:default}
# labelSelector: ${SW_CLUSTER_K8S_LABEL:app=collector,release=skywalking}
# uidEnvName: ${SW_CLUSTER_K8S_UID:SKYWALKING_COLLECTOR_UID}
# consul:
# serviceName: ${SW_SERVICE_NAME:"SkyWalking_OAP_Cluster"}
# Consul cluster nodes, example: 10.0.0.1:8500,10.0.0.2:8500,10.0.0.3:8500
# hostPort: ${SW_CLUSTER_CONSUL_HOST_PORT:localhost:8500}
# nacos:
# serviceName: ${SW_SERVICE_NAME:"SkyWalking_OAP_Cluster"}
# hostPort: ${SW_CLUSTER_NACOS_HOST_PORT:localhost:8848}
# # Nacos Configuration namespace
# namespace: 'public'
# etcd:
# serviceName: ${SW_SERVICE_NAME:"SkyWalking_OAP_Cluster"}
# etcd cluster nodes, example: 10.0.0.1:2379,10.0.0.2:2379,10.0.0.3:2379
# hostPort: ${SW_CLUSTER_ETCD_HOST_PORT:localhost:2379}
core:
 default:
 # Mixed: Receive agent data, Level 1 aggregate, Level 2 aggregate
 # Receiver: Receive agent data, Level 1 aggregate
 # Aggregator: Level 2 aggregate
 role: ${SW_CORE_ROLE:Mixed} # Mixed/Receiver/Aggregator
 restHost: ${SW_CORE_REST_HOST:0.0.0.0}
 restPort: ${SW_CORE_REST_PORT:12800}
 restContextPath: ${SW_CORE_REST_CONTEXT_PATH:/}
 gRPCHost: ${SW_CORE_GRPC_HOST:0.0.0.0}
 gRPCPort: ${SW_CORE_GRPC_PORT:11800}
 downsampling:
 - Hour
 - Day
 - Month
 # Set a timeout on metrics data. After the timeout has expired, the metrics data will automatically be deleted.
 enableDataKeeperExecutor: ${SW_CORE_ENABLE_DATA_KEEPER_EXECUTOR:true} # Turn it off then automatically metrics data delete will be close.
 dataKeeperExecutePeriod: ${SW_CORE_DATA_KEEPER_EXECUTE_PERIOD:5} # How often the data keeper executor runs periodically, unit is minute
 recordDataTTL: ${SW_CORE_RECORD_DATA_TTL:90} # Unit is minute
 minuteMetricsDataTTL: ${SW_CORE_MINUTE_METRIC_DATA_TTL:90} # Unit is minute
 hourMetricsDataTTL: ${SW_CORE_HOUR_METRIC_DATA_TTL:36} # Unit is hour
 dayMetricsDataTTL: ${SW_CORE_DAY_METRIC_DATA_TTL:45} # Unit is day
 monthMetricsDataTTL: ${SW_CORE_MONTH_METRIC_DATA_TTL:18} # Unit is month
 # Cache metric data for 1 minute to reduce database queries, and if the OAP cluster changes within that minute,
 # the metrics may not be accurate within that minute.
 enableDatabaseSession: ${SW_CORE_ENABLE_DATABASE_SESSION:true}
storage:
# elasticsearch:
# nameSpace: ${SW_NAMESPACE:""}
# clusterNodes: ${SW_STORAGE_ES_CLUSTER_NODES:localhost:9200}
# protocol: ${SW_STORAGE_ES_HTTP_PROTOCOL:"http"}
# trustStorePath: ${SW_SW_STORAGE_ES_SSL_JKS_PATH:"../es_keystore.jks"}
# trustStorePass: ${SW_SW_STORAGE_ES_SSL_JKS_PASS:""}
# user: ${SW_ES_USER:""}
# password: ${SW_ES_PASSWORD:""}
# indexShardsNumber: ${SW_STORAGE_ES_INDEX_SHARDS_NUMBER:2}
# indexReplicasNumber: ${SW_STORAGE_ES_INDEX_REPLICAS_NUMBER:0}
# # Those data TTL settings will override the same settings in core module.
# recordDataTTL: ${SW_STORAGE_ES_RECORD_DATA_TTL:7} # Unit is day
# otherMetricsDataTTL: ${SW_STORAGE_ES_OTHER_METRIC_DATA_TTL:45} # Unit is day
# monthMetricsDataTTL: ${SW_STORAGE_ES_MONTH_METRIC_DATA_TTL:18} # Unit is month
# # Batch process setting, refer to https://www.elastic.co/guide/en/elasticsearch/client/java-api/5.5/java-docs-bulk-processor.html
# bulkActions: ${SW_STORAGE_ES_BULK_ACTIONS:1000} # Execute the bulk every 1000 requests
# flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:10} # flush the bulk every 10 seconds whatever the number of requests
# concurrentRequests: ${SW_STORAGE_ES_CONCURRENT_REQUESTS:2} # the number of concurrent requests
# resultWindowMaxSize: ${SW_STORAGE_ES_QUERY_MAX_WINDOW_SIZE:10000}
# metadataQueryMaxSize: ${SW_STORAGE_ES_QUERY_MAX_SIZE:5000}
# segmentQueryMaxSize: ${SW_STORAGE_ES_QUERY_SEGMENT_SIZE:200}
 h3:
 driver: ${SW_STORAGE_H2_DRIVER:org.h3.jdbcx.JdbcDataSource}
 url: ${SW_STORAGE_H2_URL:jdbc:h3:mem:skywalking-oap-db}
 user: ${SW_STORAGE_H2_USER:sa}
 metadataQueryMaxSize: ${SW_STORAGE_H2_QUERY_MAX_SIZE:5000}
# mysql:
# properties:
# jdbcUrl: ${SW_JDBC_URL:"jdbc:mysql://localhost:3306/swtest"}
# dataSource.user: ${SW_DATA_SOURCE_USER:root}
# dataSource.password: ${SW_DATA_SOURCE_PASSWORD:root@1234}
# dataSource.cachePrepStmts: ${SW_DATA_SOURCE_CACHE_PREP_STMTS:true}
# dataSource.prepStmtCacheSize: ${SW_DATA_SOURCE_PREP_STMT_CACHE_SQL_SIZE:250}
# dataSource.prepStmtCacheSqlLimit: ${SW_DATA_SOURCE_PREP_STMT_CACHE_SQL_LIMIT:2048}
# dataSource.useServerPrepStmts: ${SW_DATA_SOURCE_USE_SERVER_PREP_STMTS:true}
# metadataQueryMaxSize: ${SW_STORAGE_MYSQL_QUERY_MAX_SIZE:5000}
receiver-sharing-server:
 default:
receiver-register:
 default:
receiver-trace:
 default:
 bufferPath: ${SW_RECEIVER_BUFFER_PATH:../trace-buffer/} # Path to trace buffer files, suggest to use absolute path
 bufferOffsetMaxFileSize: ${SW_RECEIVER_BUFFER_OFFSET_MAX_FILE_SIZE:100} # Unit is MB
 bufferDataMaxFileSize: ${SW_RECEIVER_BUFFER_DATA_MAX_FILE_SIZE:500} # Unit is MB
 bufferFileCleanWhenRestart: ${SW_RECEIVER_BUFFER_FILE_CLEAN_WHEN_RESTART:false}
 sampleRate: ${SW_TRACE_SAMPLE_RATE:10000} # The sample rate precision is 1/10000. 10000 means 100% sample in default.
 slowDBAccessThreshold: ${SW_SLOW_DB_THRESHOLD:default:200,mongodb:100} # The slow database access thresholds. Unit ms.
receiver-jvm:
 default:
receiver-clr:
 default:
service-mesh:
 default:
 bufferPath: ${SW_SERVICE_MESH_BUFFER_PATH:../mesh-buffer/} # Path to trace buffer files, suggest to use absolute path
 bufferOffsetMaxFileSize: ${SW_SERVICE_MESH_OFFSET_MAX_FILE_SIZE:100} # Unit is MB
 bufferDataMaxFileSize: ${SW_SERVICE_MESH_BUFFER_DATA_MAX_FILE_SIZE:500} # Unit is MB
 bufferFileCleanWhenRestart: ${SW_SERVICE_MESH_BUFFER_FILE_CLEAN_WHEN_RESTART:false}
istio-telemetry:
 default:
envoy-metric:
 default:
# alsHTTPAnalysis: ${SW_ENVOY_METRIC_ALS_HTTP_ANALYSIS:k8s-mesh}
#receiver_zipkin:
# default:
# host: ${SW_RECEIVER_ZIPKIN_HOST:0.0.0.0}
# port: ${SW_RECEIVER_ZIPKIN_PORT:9411}
# contextPath: ${SW_RECEIVER_ZIPKIN_CONTEXT_PATH:/}
query:
 graphql:
 path: ${SW_QUERY_GRAPHQL_PATH:/graphql}
alarm:
 default:
telemetry:
 none:
configuration:
 none:
# apollo:
# apolloMeta: http://106.12.25.204:8080
# apolloCluster: default
# # apolloEnv: # defaults to null
# appId: skywalking
# period: 5
# nacos:
# # Nacos Server Host
# serverAddr: 127.0.0.1
# # Nacos Server Port
# port: 8848
# # Nacos Configuration Group
# group: 'skywalking'
# # Nacos Configuration namespace
# namespace: ''
# # Unit seconds, sync period. Default fetch every 60 seconds.
# period : 60
# # the name of current cluster, set the name if you want to upstream system known.
# clusterName: "default"
# zookeeper:
# period : 60 # Unit seconds, sync period. Default fetch every 60 seconds.
# nameSpace: /default
# hostPort: localhost:2181
# #Retry Policy
# baseSleepTimeMs: 1000 # initial amount of time to wait between retries
# maxRetries: 3 # max number of times to retry
# etcd:
# period : 60 # Unit seconds, sync period. Default fetch every 60 seconds.
# group : 'skywalking'
# serverAddr: localhost:2379
# clusterName: "default"
# consul:
# # Consul host and ports, separated by comma, e.g. 1.2.3.4:8500,2.3.4.5:8500
# hostAndPorts: ${consul.address}
# # Sync period in seconds. Defaults to 60 seconds.
# period: 1

#exporter:
# grpc:
# targetHost: ${SW_EXPORTER_GRPC_HOST:127.0.0.1}
# targetPort: ${SW_EXPORTER_GRPC_PORT:9870}

webapp/webapp.yml

 server:
 port: 8080

collector:
 path: /graphql
 ribbon:
 ReadTimeout: 10000
 # Point to all backend's restHost:restPort, split by ,
 listOfServers: 127.0.0.1:12800

agent/config/agent.config

 # The agent namespace
# agent.namespace=${SW_AGENT_NAMESPACE:default-namespace}

# The service name in UI
agent.service_name=${SW_AGENT_NAME:Your_ApplicationName}

# The number of sampled traces per 3 seconds
# Negative number means sample traces as many as possible, most likely 100%
# agent.sample_n_per_3_secs=${SW_AGENT_SAMPLE:-1}

# Authentication active is based on backend setting, see application.yml for more details.
# agent.authentication = ${SW_AGENT_AUTHENTICATION:xxxx}

# The max amount of spans in a single segment.
# Through this config item, skywalking keep your application memory cost estimated.
# agent.span_limit_per_segment=${SW_AGENT_SPAN_LIMIT:300}

# Ignore the segments if their operation names end with these suffix.
# agent.ignore_suffix=${SW_AGENT_IGNORE_SUFFIX:.jpg,.jpeg,.js,.css,.png,.bmp,.gif,.ico,.mp3,.mp4,.html,.svg}

# If true, skywalking agent will save all instrumented classes files in `/debugging` folder.
# Skywalking team may ask for these files in order to resolve compatible problem.
# agent.is_open_debugging_class = ${SW_AGENT_OPEN_DEBUG:true}

# The operationName max length
# agent.operation_name_threshold=${SW_AGENT_OPERATION_NAME_THRESHOLD:500}

# Backend service addresses.
collector.backend_service=${SW_AGENT_COLLECTOR_BACKEND_SERVICES:127.0.0.1:11800}

# Logging file_name
logging.file_name=${SW_LOGGING_FILE_NAME:skywalking-api.log}

# Logging level
logging.level=${SW_LOGGING_LEVEL:DEBUG}

# Logging dir
# logging.dir=${SW_LOGGING_DIR:""}

# Logging max_file_size, default: 300 * 1024 * 1024 = 314572800
# logging.max_file_size=${SW_LOGGING_MAX_FILE_SIZE:314572800}

# The max history log files. When rollover happened, if log files exceed this number,
# then the oldest file will be delete. Negative or zero means off, by default.
# logging.max_history_files=${SW_LOGGING_MAX_HISTORY_FILES:-1}

# mysql plugin configuration
# plugin.mysql.trace_sql_parameters=${SW_MYSQL_TRACE_SQL_PARAMETERS:false}

SkyWalking 架構(gòu)

來自官網(wǎng)的圖片，感受一下！無須細(xì)說，大概原理就是：針對各種不同客戶端實(shí)現(xiàn)不同的指標(biāo)采集，統(tǒng)一通過grpc/http發(fā)送到apm服務(wù)端，然后經(jīng)過分析引擎后存儲到es/h3/mysql等等存儲系統(tǒng)，最后由前端通過查詢引擎進(jìn)行展現(xiàn)。
分布式應(yīng)用監(jiān)控：SkyWalking 快速接入實(shí)踐

可以用來干啥

發(fā)現(xiàn)系統(tǒng)耗時(shí)或者說瓶頸在哪里。

發(fā)現(xiàn)各系統(tǒng)之間的調(diào)用關(guān)系。

監(jiān)控服務(wù)異常。

排查系統(tǒng)故障。

向AI問一下細(xì)節(jié)

分布式應(yīng)用監(jiān)控：SkyWalking 快速接入實(shí)踐

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標(biāo)簽