EFK（elasticsearch + fluentd + kibana）日志系统

原创

山西空管技术支持 2022-01-05 22:15:21 博主文章分类：AirNet-Linux-B-RedHat7.5 ©著作权

文章标签 fluent 文章分类 运维

©著作权归作者所有：来自51CTO博客作者山西空管技术支持的原创作品，请联系作者获取转载授权，否则将追究法律责任

ELK/EFK：

https://cloud.tencent.com/developer/article/1644126

https://blog.csdn.net/weixin_37887248/article/details/82772199

https://logz.io/blog/fluentd-Logstash/

EFK（elasticsearch + fluentd + kibana）日志系统_fluent

ELK（Elasticsearch，Logstash，Kibana）目前已经转变为 EFK（Elasticsearch，Filebeat or Fluentd，Kibana），对于容器云的日志方案业内也普遍推荐采用 Fluentd。

EFK（elasticsearch + fluentd + kibana）日志系统_fluent_02

可以使用 Fluentd 的插件（fluent-plugin-elasticsearch）直接将日志发送给 Elasticsearch，可以根据自己的需要替换掉 Filebeat，从而形成 Fluentd => Elasticsearch => Kibana 的架构，也称作 EFK。

1、fluentd实现功能的定位

EFK（elasticsearch + fluentd + kibana）日志系统_fluent_03

配置文件：/etc/td-agent/td-agent.conf

typical routing scenarios.

Simple: Input -> Filter -> Output

解析资料：The @type parameter specifies the type of the parser plugin

https://docs.fluentd.org/configuration/parse-section

2、fluent将rsyslogd的日志转给ES的操作步骤。

EFK（elasticsearch + fluentd + kibana）日志系统_fluent_04

3、Elasticsearch：一个节点(node)就是一个Elasticsearch实例，一个集群(cluster)由一个或多个节点组成，它们具有相同的cluster.name，它们协同工作，分享数据和负载。当加入新的节点或者删除一个节点时，集群就会感知到并平衡数据。集群中一个节点会被选举为主节点(master),它将临时管理集群级别的一些变更，例如新建或删除索引、增加或移除节点等。主节点不参与文档级别的变更或搜索，这意味着在流量增长的时候，该主节点不会成为集群的瓶颈。做为用户，我们能够与集群中的任何节点通信，包括主节点。每一个节点都知道文档存在于哪个节点上，它们可以转发请求到相应的节点上。我们访问的节点负责收集各节点返回的数据，最后一起返回给客户端。这一切都由Elasticsearch处理。

4、部署fluentd（image: registry.cn-beijing.aliyuncs.com/dotbalo/fluentd:v3.1.0）DaemonSet，使用 volume 将 ConfigMap 作为目录直接挂载/etc/fluent/config.d/

volumeMounts:
        - name: config-volume
          mountPath: /etc/fluent/config.d   
     ......
      volumes:
      - name: config-volume
        configMap:
          name: fluentd-es-config-v0.2.1

5、临时容器

# kubectl debug fluentd-es-v3.1.1-wstrj -ti --image=registry.cn-beijing.aliyuncs.com/dotbalo/debug-tools -n logging
如果卡住...
# k describe pod fluentd-es-v3.1.1-wstrj -n logging
Events:
  Normal   Started  31s                   kubelet  Started container debugger-6j9xn
直接进入debugger-6j9xn  
# k exec -ti fluentd-es-v3.1.1-9tct6 -c debugger-6j9xn -n logging -- bash

6、关于configmap——> veth_mtu:的修改记录：

k edit cm calico-config  -n kube-system
   veth_mtu: "1520"    //默认为“0”

7、按照视频课程部署了EFK收集日志，kibana网页中只有k8s-node03节点的日志（原因：nodePort类型的和宿主机的已有端口冲突！）

k logs  fluentd-es-v3.1.1-9tct6 -n logging
[warn]: [elasticsearch] failed to flush the buffer. retry_time=230   //超时了
超时了，检查proxy：
# k logs  kube-proxy-m2mdn -n kube-system

EFK（elasticsearch + fluentd + kibana）日志系统_fluent_05

有问题的k8s-node01节点网络显示tcp:...State:FIN_WAIT1。fin-wait-1的发送队列中，还有一部分数据Send-Q没有发送到对端。理论上说，fin-wait-1的状态应该很难看到才对，因为只要收到对方的ack，就应该迁移到fin-wait-2了，如果收到对方的fin，则应该迁移到closing状态。https://www.21ic.com/article/896156.html

EFK（elasticsearch + fluentd + kibana）日志系统_fluent_06

nodePort类型的和宿主机的已有端口冲突！
删掉（hostNetwork: true）的fdp2-host.yaml后，有了k8s-node01/02节点的日志！
kind: Pod
metadata:
  name: fdp-pod
spec:
  hostNetwork: true

若pod使用主机网络，也就是hostNetwork=true。则该pod会使用主机的dns以及所有网络配置，无法使用k8s自带的dns解析服务，也就意味着无法访问service中定义的服务。除非修改主机上的域名解析，也就是修改/etc/resolv.conf文件（文件中内容和宿主机内容不一致），使主机可以用k8s自身的dns服务。

EFK（elasticsearch + fluentd + kibana）日志系统_fluent_07

https://blog.csdn.net/a8138/article/details/121184631?spm=1001.2101.3001.6650.1&utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7Edefault-1.no_search_link&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7Edefault-1.no_search_link&utm_relevant_index=2

参考文献：https://kubernetes.io/zh/docs/concepts/services-networking/dns-pod-service/

dnsPolicy是 ClusterFirstWithHostNet，官方解释是这样，对于以 hostNetwork 方式运行的 Pod，应显式设置其 DNS 策略 "ClusterFirstWithHostNet",白话就是使用这个参数，可以将使用hostNetwork的pod共享的名称空间再做一个细化的区分，与宿主机只共享网络（IP、hosts），不共享其他网络相关配置（/etc/resolv.conf）

$ kubectl edit deployment ht-deployment
···
spec:
  template:
    spec:
      dnsPolicy: ClusterFirstWithHostNet     # 调整策略

8、在POD中使用"hostNetwork: true"配置网络，pod中运行的应用程序可以直接看到宿主主机的网络接口，宿主机所在的局域网上所有网络接口都可以访问到该应用程序及端口。对于同Deployment下的hostNetwork: true启动的Pod，每个node上只能启动一个。也就是说，Host模式的Pod启动副本数不可以多于“目标node”的数量，“目标node”指的是在启动Pod时选定的node，若未选定（没有指定nodeSelector），“目标node”的数量就是集群中全部的可用的node的数量。当副本数大于“目标node”的数量时，多出来的Pod会一直处于Pending状态，因为schedule已经找不到可以调度的node了。

因为每个node上只能启动一个同deployment的pod，通过该特性，在某种程度上可以实现同一应用的pod不部署在同一台主机的需求。倾向于使用pod的反亲和性来解决。

9、增加toleration 配置来调度Deployment 到 master 节点，taint 和 toleration 要匹配上，需要满足两者的 keys 和 effects 是一致的。

kind: DaemonSet
metadata:
  name: fluentd

      tolerations:
        - key: node-role.kubernetes.io/master
          effect: NoSchedule

10、在 Windows 系统上安装和使用 Fluentd。

PS C:\Users\m> WINGET install td-agent
 Td-agent [TreasureData.TDAgent] 版本 4.3.0
fluentd --reg-winsvc i
fluentd --reg-winsvc-fluentdopt '-c C:/opt/td-agent/etc/td-agent/td-agent.conf -o C:/opt/td-agent/td-agent.log'
windows配置文件C:/opt/td-agent/etc/td-agent/td-agent.conf中
  @type tail
  path C:\opt\td-agent\CUA2928\fdp1\Log20210307
不能加引号""：“C:\opt\td-agent\CUA2928\fdp1\Log20210307”
否则命令行启动：（也不能有中文字符，全角字符）
fluentd -c "C:\opt\td-agent\etc\td-agent\td-agent.conf"
报错： invalid byte sequence in UTF-8 (ArgumentError)
PS> Start-Service fluentdwinsvc
> net start fluentdwinsvc

直接打开任务管理器启动Fluentd Windows Service

11、fluent-cat 是 Fluentd 提供的一个命令行工具，特别适合于对插件功能的验证性测试。它主要和 in_forward / in_unix 搭配使用，用于向这两个插件发送日志事件。发送 tag 为 debug.log 的 json 消息到本地 fluentd 服务（24224/tcp） fluent tcp port (default: 24224)

# Receive events from 24224/tcp
# This is used by log forwarding and the fluent-cat command
<source>
  @type forward
  port 24224
</source>
C:\opt\td-agent>netstat -ano |findstr "24224"
  TCP    0.0.0.0:24224          0.0.0.0:0              LISTENING       16836
  TCP    127.0.0.1:55464        127.0.0.1:24224        TIME_WAIT       0
  UDP    0.0.0.0:24224          *:*                                    16836
C:\opt\td-agent>tasklist |findstr "16836"
ruby.exe                     16836 Services                   0     69,676 K

echo {"message":"hello"} | fluent-cat debug.log

12、The regexp parser plugin parses logs by given regexp pattern. The regexp must have at least one named capture (?<NAME>PATTERN). If the regexp has a capture named time, this is configurable via time_key parameter, it is used as the time of the event. You can specify the time format using the time_format parameter.

<source>
  @type tail
  path C:\opt\td-agent\logexp
  pos_file C:\opt\td-agent\logexp.pos
  tag logexp.access
  <parse>
    @type regexp
    expression /^\<(?<loglevel>[^ ]*)\>:\s*(?<logtime>\d{14})\t(?<log>.*)$/
    time_key logtime
    time_format %Y%m%d%H%M%S
  </parse>
</source>
fdp日志：
<Warn>: 20210307000000  The plan[202103061350CSC8426ZBHDZUUU]:区内机场起飞的计划, 候选雷达位置在区域外.
解析结果：内容里可以有中文
2021-03-07T00:00:00+08:00 logexp.access {"loglevel":"Warn","log":"The plan[202103061350CSC8426ZBHDZUUU]:区内机场起飞的计划, 候选雷达位置在区域外."}

in_windows_eventlog plugin 用于跟踪Windows event logs

a Fluentd regular expression editor：http://fluentular.herokuapp.com/

13、Fluentd事件的生命周期

每个输入的事件会带有一个tag
Fluentd通过tag匹配output
Fluentd发送事件到匹配的output
Fluentd支持多个数据源和数据输出
通过过滤器，事件可以被重新触发。

Input -> filter 1 -> ... -> filter N -> Output

source指令把事件提交到Fluentd的路由引擎。一个事件由三个实体组成：tag、time和record。tag是由’.’分割的字符串组成，被内部路由引擎使用。time由input插件指定，必须是Unix时间戳格式。record是一个Json对象。

Fluentd是按顺序匹配的，先在配置文件里面出现的match会先匹配。

**可以用来匹配tag的0个或多个部分（比如：<match a.**> 可以匹配a、a.b和a.b.c）
<match logexp.access>
  @type file
  path C:\opt\td-agent\logfile
</match>

“label”指令用来降低tag路由的复杂度（Fluentd基于标记的路由允许清晰地表达复杂的路由），通过”label”指令可以用来组织filter和match的内部路由，由于”label”是内建的插件，所以他的参数需要以@开头。

通过–dry-run选项，可以在不启动插件的情况下检查配置文件。

$ fluentd --dry-run -c fluent.conf

如果想让[或者{开头的字符串不被解析成数组或者对象，则需要用’或者“把该字符串包起来。

14、Sematext是用于日志管理和应用程序性能监视的解决方案。Sematex提供了系统状态的可见性。Sematext公开了Elasticsearch API，因此也可以使用任何与Elasticsearch配合使用的工具，例如Filebeat和Logstash与Sematex。Sematex和Kibana不能在一个仪表板上混合使用。

Elasticsearch is an open source search engine known for its ease of use. Sematext runs and manages Elasticsearch in the cloud. You also have the option to use Kibana alongside the dashboards in the Sematext UI.

By combining Fluentd and Sematext's managed Elasticsearch + Kibana you get a scalable, flexible, easy to use log management tool and search engine with an intuitive native web UI. You also get Kibana, if you want to use it. This provides a managed Splunk alternative, for a fraction of the cost.

15、配置td-agent（Fluentd）与Elasticsearch正确接口。需编辑/etc/td-agent/td-agent：

https://docs.fluentd.org/how-to-guides/logs-to-sematext

16、Fluentd是一个高级开源日志收集器，最初由Treasure Data，Inc.开发。Fluentd不仅是一个日志收集器，而且是一个通用的流处理平台。可以编写插件来处理多种事件。

然而，Fluentd主要不是为流处理而设计的。我们必须在修改Fluentd的配置/代码后重新启动它，使其不适合运行短跨度（秒或分钟）计算和长跨度（小时或天）计算。如果重新启动Fluentd以执行短跨度计算，则短跨度和长跨度计算的所有现有内部状态都将丢失。对于大规模流处理平台，必须添加/删除代码/进程，而不会造成任何此类损失。

17、结合这三种工具（Fluentd+Elasticsearch+Kibana），得到了一个可扩展、灵活、易于使用的日志搜索引擎，它具有一个强大的Web UI，提供了一个开源的Splunk替代方案，所有这些都是免费的。

https://docs.fluentd.org/how-to-guides/free-alternative-to-splunk-by-fluentd