skywalking docker nacos 集群部署 skywalking集群模式

转载

mob64ca1410eb61 2024-01-15 15:40:25

文章标签 skywalking elasticsearch 模版数据 文章分类 Docker 云计算

背景

某天接到告警skywalking没有采集调用链信息，于是展开分析并进行优化

优化点

日志配置优化

根据磁盘容量调整文件保持个数，修改log4j2.xml文件<DefaultRolloverStrategy max="10"/>

2.调整skywalking存储elasticsearch配置，修改application.yml文件

storage:

elasticsearch: 

 

      nameSpace: ${SW_NAMESPACE:"CollectorRdpCluster"} 

 

      clusterNodes: ${SW_STORAGE_ES_CLUSTER_NODES:IP1:9200,IP2:9200,IP3:9200} 

 

      protocol: ${SW_STORAGE_ES_HTTP_PROTOCOL:"http"} 

 

      indexShardsNumber: ${SW_STORAGE_ES_INDEX_SHARDS_NUMBER:2} 

 

      indexReplicasNumber: ${SW_STORAGE_ES_INDEX_REPLICAS_NUMBER:1} 

 

      bulkActions: ${SW_STORAGE_ES_BULK_ACTIONS:4000} # Execute the bulk every 1000 requests 

 

      flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:30} # flush the bulk every 10 seconds whatever the number of requests 

 

      concurrentRequests: ${SW_STORAGE_ES_CONCURRENT_REQUESTS:4} # the number of concurrent requests 

 

      metadataQueryMaxSize: ${SW_STORAGE_ES_QUERY_MAX_SIZE:8000}

各个修改值介绍：

clusterNodes: 根据实际情况调整es集群配置(后续可以调整顺序)

indexReplicasNumber：存储副本数量，该值在一定程度上会影响存储效率，打开关闭索引速度；但目前我们的elasticsearch集群不是很稳定，为了能高可用而修改该值

bulkActions：批量提交的大小，通过增加该值，减少skwalking服务端和es之间交互，增加吞吐量

flushInterval：每间隔多久提交请求，不管现有多少个请求，目的也是减少交互，增加吞吐量

concurrentRequests: 增加现有请求数量

metadataQueryMaxSize：查询最大数量，默认5000，调整为8000

3.对ES中现有的 索引进行优化

修改所有collectrdpcluster打头索引的setting配置，主要目的也是增强IO；修改后同时也会带来丢失部分日志的风险，具体操作步骤如下

（1）关闭索引

http://IP1:9200/collectorrdpcluster*/_close

POST

（2）调整索引配置

http://IP1:9200/collectorrdpcluster*/_settings?preserve_existing=true

PUT 

 

  { 

 

      "index.merge.scheduler.max_thread_count" : "1", 

 

      "index.refresh_interval" : "30s", 

 

      "index.translog.durability" : "async", 

 

      "index.translog.sync_interval" : "120s" 

 

  }

说明:

<1>  
 preserve_existing=true， 是否覆盖已有的配置项，需要根据实际情况调整 

 

  <2>index.merge.scheduler.max_thread_count, 调整索引merge的线程数， 默认会根据cpu核数进行计算，但对于机械硬盘来讲，默认值反而影响效率 

 

  <3>index.refresh_interval 刷新间隔时间, 默认是1s，对于日志类数据，及时性没必要这么高 

 

  <4>index.translog.durability 持久化方式，异步；异步会存在丢失部分数据风险 

 

  <5>index.translog.sync_interval 同步间隔，越长的间隔会让io更平稳

（3）打开索引

http://IP1:9200/collectorrdpcluster*/_open

POST

（4）创建索引模版

http://IP1:9200/_template/collectorrdpcluster_temp

PUT 

 

  { 

 

    "index_patterns": ["collectorrdpcluster*"], 

 

    "settings": { 

 

        "index.merge.scheduler.max_thread_count" : "1", 

 

        "index.refresh_interval" : "30s", 

 

        "index.translog.durability" : "async", 

 

        "index.translog.sync_interval" : "120s" 

 

    } 

 

  }